Efficient and Effective Subimage Similarity Matching for Large Image Databases by Kai Sang Leung B.Sc. (Honours), The University of British Columbia, 1995 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in T H E F A C U L T Y O F G R A D U A T E STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia August 1997 © K a i Sang Leung, 1997 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely avail-able for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia 2366 Main M a l l Vancouver B C Canada V 6 T 1Z4 Date: A«S«st 30 e 1997 A b s t r a c t As network connectivity has continued, its explosive growth and storage devices have become smaller, faster, arid less expensive, the number of on-line digital images has increased rapidly] Correspondingly, efficient and effective content-based retrieval systems for handling image queries have become necessary. In addition, users are often interested in local contents within subimages. In this thesis, we develop Padding and Reduction Algorithms to support subimage queries.of arbitrary size based on local color information. The idea is to estimate the best-case lower bound to the dissimilarity measure between the query and the image. By making use of multiscale representation, this lower bound becomes tighter as the scale becomes finer. Because image contents are usually pre-extracted and stored, a key issue is how to determine the number of levels used in the representation. We address this issue analytically by estimating the required C P U and I/O costs, and experimentally by comparing the performance and the accuracy of the outcomes of various filtering schemes. Our findings suggest that a 3-level hierarchy is preferred. We also study three strategies for searching multiple scales. Our studies indicate that the hybrid strategy with horizontal filtering on the coarse level and vertical filtering on remaining levels is the best choice when using Padding and Reduction Algorithms. Using the hybrid search strategy in the multiscale representation with the determined number of levels, the best 10 desired images can be retrieved efficiently and effectively from a collection of a thousand images in about 3.5 seconds. ii Contents A b s t r a c t i i C o n t e n t s i i i L i s t o f T a b l e s v i L i s t o f F i g u r e s v i i A c k n o w l e d g e m e n t s ix D e d i c a t i o n x 1 I n t r o d u c t i o n 1 , 1.1 Image Databases . : . 1 1.2 Content-Based Retrieval . . . . . 2 1.2.1 Color and Spatial Similarity Matching . 5 1.2.2 Multiscale Matching . . . : . 7 1.2.3 . Subimage Query Matching . 10 1.3 Problem Definition and Contributions .12 1.4 Outline of Thesis 16 2 R e l a t e d W o r k s 18 iii 2.1 Indexing , 19 2.2 Query Language 20 2.3 Query Processing based on Color and Spatial Similarity 21 2.4 Query Processing with Multiscale Matching . 24 2.5 Subimage Query Processing 29 2.6 Summary 31 3 P a d d i n g a n d R e d u c t i o n A l g o r i t h m s 32 3.1 Lower Bound to Histogram Distance 33 3.2 Development of Padding and Reduction Algorithms 37 3.2.1 Generate-and-Test ' 38 3.2.2 Quadratic Programming with Integer Programming 39 3.2.3 Algorithms P A D and R E D . . . . 41 3.3 Analytical Comparison . . . . . . . 47 3.4 Experimental Comparison : 49 3.5 Another Metric for Histogram Distance . . . 58 ... 3.6 Summary . . 60 4 M u l t i s c a l e R e p r e s e n t a t i o n 62 4.1 A 4-level Multiscale Representation 63 4.2 Formulation of Distance Function . 65 4.2.1 Histogram Distance 65 4.2.2 Positional Distance 66 4.2.3 Distance Function , 69 4.3 Multiscale Pure Vertical Search 71 4.3.1 Analytical Evaluation 72 4.3.2 Experimental Evaluation 89 iv 4.4 Summary . 97 5 Search Strategies 100 5.1 Problems of Branch-and-Bound Search '. : • • 101 5.2 Pure Vertical Search . 103 5.3 Pure Horizontal Search . 103 5.3.1 Analytical Evaluation 105 5.3.2 Experimental Evaluation . .. . 116 5.4 Horizontal-and-Vertical Search. 120 5.4.1 Analytical Evaluation - 123 5.4-2 Experimental Evaluation 133' 5.5 The Best Search Strategy . . . 137 5.6 Summary 141 6 Conclusions 143 6.1 Conclusions 143 6.2 Future Work 146 Bibliography 150 List of Tables 3.1 Computation Time for Algorithms P A D and R E D (8-dimensional Histograms) 50 3.2 Computation Time for Algorithms P A D and R E D (64-dimensional His-tograms) 52 3.3 Computation Time for Algorithms P A D and R E D (512-dimensional His-tograms) . .' 54 4.1 Experimental Results for Search P V ("h"-typed Queries) . 91 4.2 Experimental Results for Search P V ("i"-typed Queries) . . . 92 4.3 Experimental Results for Search P V ("j"-typed Queries) 93 4.4 Experimental Results for Search P V ("k"-typed Queries) 95 5.1 Experimental Results for Search P H ("h"-typed Queries) . 117 5.2 Experimental Results for Search P H ("i"-typed Queries) 117 5.3 Experimental Results for Search P H ("j"-typed-Queries) .118 5.4 Experimental Results for Search P H ("k"-typed Queries) 119 5.5 Experimental Results for Search H V ("h"-typed Queries) 133 5.6 Experimental Results for Search H V ("i"-typed Queries) 134 5.7 Experimental Results for Search H V ("j"-typed Queries) 135 5.8 Experimental Results for Search H V ("k"-typed Queries) 136 5.9 Experimental Results for Three Search Strategies . .- 140 vi List of Figures 1.1 Example of "Random Effect" . . . . . 6 1.2 National Flags 10 2.1 Fuzzy Regions . . . 23 3.1 The Use of Normalized Histograms 34 3.2 Padding Approach . . . . . . . 35 3.3 Reduction Approach 36 3.4 Computation Time for Algorithms P A D and R E D (8-dimensional Histograms) 51 3.5 Computation Time for Algorithms P A D and R E D (64-dimensional His-tograms) 53 3.6 Computation Time for Algorithms P A D and R E D (512-dimensional His-tograms) . . . . . . . 55 4.1 The 4-level Multiscale Representation 64 4.2 Subimage Query Q and Image Subregion I s u p 66 4.3 The B C Legislative Buildings : . . . 67 4.4 C P U Costs for Search P V 82 4.5 I /O Costs for Search P V 83 4.5 I /O Costs for Search P V (Continued) 84 vii 4.6 Combined C P U and I /O Costs for Search P V . . . 85 4.6 Combined C P U and I /O Costs for Search P V (Continued) -86 4.7 Experimental Results for Search P V ("h"-typed Queries) 91 4.8 Experimental Results for Search P V ("i"-typed Queries) . . 92 4.9 Experimental Results for Search P V ("j"-typed Queries) 94 4.10 Experimental Results for Search P V ("k"-typed Queries) . . . . . . . . . . . 96 4.11 The Recommended Multiscale Representation 99 5.1 Analytical Results for Search P H 114 5.1 Analytical Results for Search P H (Continued) . 115 5.2 Experimental Results for Search P H ("i"-typed Queries) 118 5.3 Experimental Results for Search P H ("j"-typed Queries) 119 5.4 Experimental Results for Search P H ("k"-typed Queries) . 120 5.5 Analytical Results for Search H V . 130 5.5 Analytical Results for Search H V (Continued) 131 5.6 Experimental Results for Search H V ("i"-typed Queries) 134 5.7 Experimental Results for Search H V ("j"-typed Queries) . . . . . . . . . . . 135 5.8 Experimental Results for Search H V , ("k"-typed Queries) 136 5.9 Analytical Results for Three Search Strategies 137 5.9 Analytical Results for Three Search Strategies (Continued) 138 5.10 I /O Costs for Four Search Strategies 139 6.1 Example of a Subimage Query of Arbitrary Shape 147 6.2 Example of a "Hollow" Subimage Query 147 6.3 Example of a Subimage Query with Multiple "Known" Portions 148 viii Acknowledgements I would like to sincerely thank Dr Raymond Ng, my supervisor, for his support and guidance throughout the course of my work. His advice, comments, and suggestions concerning this research project were invaluable. I would also like to thank Dr J im Little, my second reader, for the time that he spent reading and reviewing my work. His comments and recommendations were very helpful and valuable. I am indebted to my parents for their support and love. Thanks to friends as well as faculty members, staffs, and fellow students in the Department for their help and encouragements; thanks also to the Natural Sciences and Engineering Research Council of Canada for the financial support in the form of a Postgraduate Scholarship. K A I S A N G L E U N G The University of British Columbia August 1997 ix To my parents x Chapter 1 Introduction 1.1 Image Databases Over the past few decades, database management systems (DBMSs) have been recognized as tools with practical value for handling large amounts of data. A primary purpose of a D B M S is to provide an environment for efficient and effective retrieval and storage of data. Until recently, many DBMSs were designed to handle only alphanumeric data. . In recent years, advances in technologies for scanning, networking, and C D - R O M s have lowered the prices for disk storage. These advances, coupled with acceptance of .common image compression, and file formats, enable users to acquire, store, manipulate and transmit large numbers of images. As a result, the number of digital image archives . has increased tremendously. The use of images and graphics.in World-Wide Web publi-cations has also increased at an incredible rate. In addition, images are generated at an increasing, rate by growing number of sources such as civilian and military satellites as well as commercial satellites, instruments used in petroleum and mining industries, med-ical information systems, Geographic Information Systems (GIS), and art galleries and museum management systems. For example, an estimated 281 gigabytes (GB) of image data will be produced daily by the instruments on two Earth Observing Systems plat-1 forms [Castelli et al 1997]. With all these changes, we need the D B M S s to support not only traditional alphanumeric data, but also visual-based data in the form of still images. Image databases also play important roles in the four "Grand Challenge" appli-cation domains — (1) Geographic/Environmental Information Systems, (2) Engineer-ing/Scientific Visualization Systems, (3) Medical Information Systems, and (4) Educa-tion Systems — which were identified in the 1992 US National Science Foundation work-shop on visual information management systems [Jain 1993]. The demands for image database management systems (IDBMSs) also increase in other application areas like stock-photography for remote printing, retail cataloging, art work retrievals, advertisement creation, and imaging clip arts [Sawhney and Hafner 1993, Barber et al 1994, Gudivada and Raghavan 1995, Petkovic et al 1996, Castelli et al 1997]. 1.2 Content-Based Retrieval Traditional database retrieval technology uses an "exact match" approach. To do so, traditional structured alphanumeric information is indexed by descriptive keywords or numeric descriptors. Due to the success in handling alphanumeric data, many IDBMSs also use the "exact match" approach in handling image data. In particular, images are indexed based on some identifying text such as titles, captions, creator's names, production years, or catalog numbers. However, these identifiers are usually "external" to image contents (like color, texture, and shapes). Searching these external features of the images in the collection may not necessarily yield fruitful results. In many situations, users may remember the contents, but not the associated identifying text, of the images they have seen before. Thus, retrieval methods based on "external" identifiers may not be helpful. For example, if the images in a collection are indexed by names of painters, a user may not be able to retrieve the image of Mona Lisa unless he remembers the painter's name, Leonardo da Vinci . 2 Complementary to retrieval methods based on "external" identifiers, content-based retrieval (CBR) methods have, been developed. With C B R , images are searched and indexed based on contents of the images. These contents can be classified into.two main types: 1. "syntactic" contents — such as color, texture, and shape — which are context inde-pendent, and 2. "semantic" contents — such as objects — which are context dependent: To exploit existing advantages of the "exact match" approach, some C B R systems rely on manually associating textual descriptions with the image contents. However, these textual content descriptors are subjective, domain dependent, and expensive to produce. Sometimes, the image contents of the underlying data, like texture and shape, are difficult or nearly- impossible to describe with text annotations. It is well known that "an image is worth a thousand words"; descriptors/keywords are often incomplete and not precise enough for satisfactory image retrievals. For example, an indexer, may assign keyword "sunrise" to an image which the user may perceive to be a sunset. As another example, for a picture taken at dawn, an indexer may associate the text annotation "moon" with an object which the user may perceive to be the sun. Given the rapid growth in the availability and demand for image data, the high labor cost involved with manually, associating text annotations with images, and the dif-ficulty of anticipating every user's needs when assigning keywords and descriptors, it is more efficient to represent the image contents, particularly "syntactic" contents, by au-tomatically extracted features. Since the "syntactic" contents are context independent, they can be extracted without prior knowledge of the images. In many C B R systems, these contents are extracted automatically using image processing methods. The results are feature vectors representing the contents. Examples of these vectors include: 3 • color Color can be represented by three-dimensional average color histograms [Niblack et al 1993, Sawhney and Hafner 1993, Faloutsos et al 1994, Ashley et al 1995, Flickner et al 1995], low-dimensional dominant color histograms (sets of the most frequently occurring colors) [Ashley et al 1995], or three-dimensional moment-based color dis-tributions composed of the first 3 moments (average^ variance, and skewness of each color) [Strieker and Orengo 1995, Dimai and Strieker 1996]. Since the discriminat-ing power with these low dimensional feature vectors may not be adequate in some applications, color is also represented by rc-dimensional (where n 3, say 64) color histograms in other applications [Ioka 1989, Niblack et al 1993, Sawhney and Hafner 1993, Faloutsos et al 1994, Sawhney and Hafner 1994, Ashley et al 1995, Flickner et al 1995, Tarn 1996]. • texture Retrieval is often based on modified versions of coarseness, contrast, and direction-ality [Niblack et al 1993, Ashley et al 1995]. Coarseness measures the scale of the texture — the average size of the regions of similar intensity in an image; contrast describes the vividness of the pattern — the amount of variation between the light and dark areas in an image; and, directionality describes whether the image has a favored direction or is isotropic. • shape Shape can be represented in several different ways. A common representation is 20-element shape vectors composed of a combination of heuristic shape features (for example, area, circularity, eccentricity, and major axis orientation) and sets of algebraic moment invariants [Niblack et al 1993, Ashley et al 1995]. 4 Regarding the "semantic" contents, they are context dependent and need to be extracted with prior knowledge of the images and of real world properties. Automatic extraction of the contents is not easy, and usually requires some computationally expensive scene analysis methods. 1.2.1 Color and Spatial Similarity Matching Demands for Color and Spatial Similarity In the vision and image processing community, color has been widely used for image segmentation and classification tasks. Studies have been done suggesting that color plays ah important role in human understanding of an image [Hurvich 1981]. This explains why color is one of the most popular among all image contents, and is supported by many IDBMSs like QBIC [Flickner et al 1995], Virage [Bach et al 1996, Gupta 1997], and Miyabi [Hirata et al 1993]. A simple way to represent color is to use a single global three-dimensional average color histogram or a single global n-dimensional color histogram. Similarity matching is then performed by comparing the global feature vector of the query with the global feature vector of each image in the database. Matching appears to be computationally efficient. Color histograms contain information about the statistical distribution of various colors within the image; however, they lack information about the spatial locations of different colors. As a result, C B R without spatial information of color may not be effective due to the "random effect". For example, the color histogram for an image with upper portion black and lower portion white is the same as the color histogram for black/white checkerboard (Figure 1.1). Hence, C B R with both color and spatial information provides better selectivity. 5 50% black • 50% black 50% white 50% white (a) Black/White Block , .(b) Black/White Checkerboard Figure 1.1: Example of "Random Effect" Processing of Color and Spatial Similarity Regarding the processing of spatial similarity matching, a method for handling spatial matching of objects exists [Sistla et al 1994, Sistla et al 1995]. In their system, the au-thors use meta-data to describe the spatial relationships of objects. The meta-data are generated a priori and stored. During the query processing step, new rules about the spa-tial relationships of objects can be deduced from the meta-data. Wi th deductive reasoning about spatial relationships of objects, desired images can then be retrieved. This method handles spatial matching of objects quite well, but it may not be practical in handling spatial matching of color because meta-data on color cannot be generated in the same way as the meta-data on objects. For example, given a checkerboard, it is not easy to describe the spatial relationships of the colors "black" and "white". A natural method to handle similarity matching with color and spatial information is direct spatial comparisons of the pixel colors of the query and the images. However, it can be more computationally expensive than comparison of global feature vectors because the number of pixels in an image is usually much greater than the number of dimensions in the feature vector. Moreover, it is usually assumed that the query and images are properly aligned in the direct spatial comparison; this assumption may not hold in many cases as the user may remember only an approximate location of color when specifying the query. With these problems, direct spatial comparison may not be helpful. A n alternative method to capture information .about the spatial distribution of color is to divide an image into several blocks and create a color histogram for each of the blocks. For example, the system presented by Gong et al divides the image into 9 (= 3 X 3) blocks [Gong et al 1994]. In QBIC, the image is divided into a grid of either (a) 6 vertical X 8 horizontal blocks or (b) 9 vertical X 12 horizontal blocks for partition-based search [Ashley et al 1995]. In these systems, each image block is of a certain size and of a certain scale. So, similarity, matching is applied on these uni-scale image blocks to handle color and spatial similarity. To improve the efficiency and the effectiveness of C B R , it seems appropriate to explore and analyze the possible use of multiscale matching. 1.2.2 Multiscale Matching Demands for Multiscale Matching For similarity matching with fixed grid segmentation, color histograms for all blocks in the grid need to be examined during the query processing step. The efficiency and the effectiveness of C B R systems depend on the quality of the segmentation and the relevance of the segmentation to the user's needs. In some situations, the scale at which the images are blocked may be considered too fine. Applying similarity comparisons to all these fine blocks of the query and the images may not be worth the effort. For example, comparing every block of a red-only query with the corresponding block of the non-red images seems inefficient. In other situations, the scale at which the images are divided may be considered not fine enough for discriminating the desired images from the collection. Hence, picking the best scale at which the images are blocked is not easy. To deal with the problem, multiscale matching can be applied. Poor matches can be identified and then eliminated by comparisons on feature vectors at the coarse granularity level. For example, non-red images can be filtered out by comparing global feature vectors of the red-only query and the database images. On the other hand, if the current scale does not provide satisfactory discriminating power, matching can be performed at a finer granularity level when using multiscale matching. Processing of Multiscale Matching To aim for efficient and effective C B R , a search algorithm that makes use of multiresolution wavelet decompositions of query and images has been proposed [Jacobs et al 1995]. Coeffi-cients of the decompositions are distilled through processes of truncation and quantization. During the query processing step, the algorithm simply compares the number of distilled coefficients that are common in both the query and the images. With the algorithm, desired images can be. retrieved efficiently and effectively by submitting a whole-image query. To do so, users are required to know the color distribution of the entire image. In many cases, users may not know or care about the color distribution on some portion of the image, and they may want to retrieve the desired images by posing a subimage query. Unfortunately, subimage queries cannot be handled by the algorithm. Recently, a formal framework was presented for designing search algorithms which can identify target images by spatial distribution of color [Chen et al 1997]. The frame-work is based on a multiscale representation of both the image data and the associated parameter space that must be searched. To process the whole-image query, a branch-and-bound algorithm is used. The algorithm eliminates poor matches at coarse scales with minimum computation. This insures that each image is only searched to a scale necessary to determine if it is a potential match candidate. Using the branch-and-bound algorithm, all the images in the database are checked at a coarse scale in the first iteration. Then, the algorithm proceeds to finer scales in 8 non-descending order of distance value 1 . In general, the branch-and-bound algorithm works in such a way that it always keeps track of distance values of all images contending for further consideration. Images with smallest values are "extended" one level (to the next/finer scale). Then, these most recently "extended" images are considered, along with the remaining old ones; again, images with smallest values are "extended". The process repeats until, the target images are found. Notice that at each iteration, the next set of images to be "extended" contains images with smallest values, and the set, does not necessarily include images in the current set or images at the current level. Hence, jumping back and forth among images and levels may occur frequently. For large image databases, such jumping makes it hard to optimize file organization and buffer management, and may impose a high I /O cost. To avoid jumping, developments of some other search strategies seem appropriate. Like similarity matching in the uni-scale representation, color histograms at each scale are usually extracted and stored prior to execution time. The efficiency and the effectiveness of the C B R systems are expected to be affected by the number of levels used; as such, we need to choose the number of levels carefully through analyses and experiments. However, detailed analytical.and experimental results for the determination of a suitable number of levels are seldom provided in depth. Furthermore, it is observed that these multiscale systems cannot handle subimage queries. As subimage queries are useful in many applications, investigations on how to make use of multiscale representation in developing algorithms for dealing with subimage queries are worth pursuing. 'A distance value measures the dissimilarity/distance between the query and the image. 9 1.2.3 S u b i m a g e Q u e r y M a t c h i n g Demands for Subimage Queries Many current IDBMSs support only whole-image matching, but not subimage matching. However, it is well known that human memory is weak in retaining a fine granularity of spatial information of color. In many cases, the user does not remember every detail of the image he has seen before, and he remembers only a portion of the image. And in other cases, the user is interested in local image contents. For example, a user may want to find images of national flags with the Union Jack in the upper left corner. As another example, a user may be interested in finding all images of sunrises where the sun is rising in the upper right portion of the image. Then, whole-image matching may not be helpful because the user does not know or care about the color distribution on the remaining portion of the image. For instance, each of the national flags in Figure 1.2 contains the Union Jack in the upper left corner, but the remaining portion is quite different. With whole-image matching, the user needs to come up with the color distribution on the remaining portion of the image. Hence, subimage query matching is needed. upper-left: Union Jack upper-left: Union Jack upper-left: Union Jack remaining: mainly blue remaining: mainly red remaining: mainly light blue (a) Flag of Australia (b) Flag of Bermuda (c) Flag of Fi j i Figure 1.2: National Flags 10 Processing of Subimage Queries As mentioned in Subsection 1.2.1, in order to capture information about the spatial distri-bution of color, some IDBMSs divide an image into several blocks and create a histogram for each of the blocks. It is observed that a subimage query may occupy some but not all the blocks. So, to handle subimage matching, one can compare only the histograms of the "occupied" blocks of the query and the corresponding blocks of the images. For the systems in which the user can assign weights to different blocks,, one can assign a weight of 0 to the "vacant" blocks (blocks that are not occupied by the subimage query). However, a problem with these techniques is that it is not unusual to have a query whose size is not an integral multiple of the chosen block size (for example, a subimage query — whose size is of the image — can be submitted to a collection of images with 2 x 2 segmentation, in which, each segment covers \ of the entire image). To deal with subimage queries of arbitrary size, template matching, algorithms can be applied. The idea is that the user query consisting of s x s pixels is served as a "template" and is compared with every image subregion of the same size. Notice that if an image contains 5 x 5 pixels (where S > s), there are (5 — s + l ) 2 image subregions having the same size as the "template". The smaller the query size, the greater the number of image subregions to be compared. This may lead to a huge amount of computation, and thus algorithms based on template matching are usually inefficient. In general, each image contains 5 2 subregions of size l x l pixel, (5 — l ) 2 subregions of size 2 x 2 pixels, and so on. These add up to a total of s('g" t"1M2 S+1) subregions with size ranging from l x l pixel to 5 X 5 pixels. Thus, pre-extraction becomes impractical. For these template matching algorithms, image contents are usually not precomputed or stored, and execution times are expected to be very high. Therefore, other algorithms for performing subimage matching efficiently and effectively are needed. 11 1 1.3 Problem Definition and Contributions In the previous section, the demands for similarity matching on color and spatial informa-tion have been mentioned. To process the query, some IDBMSs use fixed grid segmentation approaches. Some multiscale matching approaches have been proposed to further improve the efficiency and the effectiveness of C B R . However, detailed analytical and experimental results on the determination of the suitable number of levels for the approaches are seldom reported, and comparisons of different search strategies using multiscale representation are also rare. Moreover, subimage queries, which are in demand, cannot be handled by many IDBMSs. Techniques for handling these subimage queries of arbitrary size are needed. Therefore, in this thesis, the following questions are investigated: 1. How can we deal with subimage queries of arbitrary size ? Two efficient and effective algorithms for handling subimage queries are de-veloped, namely Padding and Reduction Algorithms. The idea is that Padding and Reduction Algorithms estimate the best possible color histograms for a subimage query Q and an image block I of size larger than Q by either: (a) enlarging the query Q into a new query Q E n l a r s e d that is of the same size as the image block I , or (b) reducing the image block I to a new block I E d u c e d that is of the same size as the query Q. More precisely. Padding and Reduction Algorithms give lower bounds to the Eu-clidean distance between the histograms of subimage query Q and image block I : Padding: min (q H ^ r a m _ 1 Histogram) 7 (Q H i s ^ r a m ~ 1 Histogram) Reduction: min (Q Histogram - I g f s t g r a m ) 7 , (Q Histogram " I Histogram) 12 For given Q and I, both algorithms provide the same best-case lower bound distance estimation. However, their efficiency may differ significantly, depending on the size differential between Q and I . If the size differential is large, the Reduction Algorithm is more efficient; otherwise, the Padding Algorithm is the winner. Depending on the location of I , similarity matching can incorporate color similarity as well as spatial similarity. For instance, the distance between Q and I is defined as a weighted sum of the form: {3 X histogram distance + (1 — (3) X positional distance where the histogram distance is computed by either the Padding Algorithm or the Reduction Algorithm. The weighting factor j3 is chosen by the user to specify a preference on the relative importance of color distribution and its spatial information: 2. How many levels do we need for multiscale representation ? With multiscale representation, an image is divided into several blocks, and an associated color histogram is created and stored for each of the blocks, Each block can be further divided into subblocks. In this thesis, images are divided into four levels of (sub)blocks: (a) A t Level H , the entire image is represented by a single color histogram. (b) A t Level I, the image is divided into four (non-overlapping) blocks 2 , and each block is represented by a color histogram covering | of the entire image. (c) At Level J , each block at Level I is further divided into four subblocks, and each subblock is represented by a color histogram covering ^ of the entire image. (d) At Level K , each, subblock at Level J is again divided into four, each of which . is represented by a color histogram covering ^ of the entire image. 2Alternatively, an image in the database can be divided into five or nine overlapping blocks. 13 Given user queries of arbitrary size, we investigate the number of levels that are required for efficient and effective retrieval of the desired images. With four levels of (sub)blocks for each image, many multi-level filtering schemes can be developed to find the desired matches. For example, the choices include a four-level H I J K scheme, a one-level J scheme, a two-level HJ scheme that skips Level I, and some other schemes. Analytically, we set up cost models to evaluate all 15 possible filtering schemes, and estimate their C P U , I /O, and combined C P U & I / O costs. The results tend to favor: • the filtering schemes which start with Level H (or Level I), and • the filtering schemes which do not skip the intermediate level. Experimentally, given subimage queries of particular size, both the efficiency (for example, the time required to find the desired images) and the effectiveness (for example, the number of desired images being retrieved) are measured. The results show that the three-level HIJ filtering scheme works well for most queries when operating together with the Padding and Reduction Algorithms mentioned earlier. Therefore, only three levels (up to 4 x 4 segmentation) are needed for the multiscale representation. What is the best strategy for searching multiple scales ? The branch-and-bound algorithm proposed by Chen et al [Chen et al 1997] is accurate in its search, but it can be inefficient for large databases, In particular, the number of images the algorithm must keep track of can be large, and jumping back and forth among images and levels may be required frequently. For large databases, such jumping makes it hard to optimize file organization and buffer management, and may generate a large number of I/Os. 14 In this thesis, we consider three strategies that try to avoid the kind of jump-ing mentioned above: (a) Search P V (Pure Vertical) — a strategy in which a vertical filter searches each image "vertically" across scales. The idea is that images are checked one after another. A t any point in time, a set of u images currently having smallest distance values are kept (where u is the number of images requested by the user). For each image, .. the algorithm keeps proceeding to finer scales until (i) the distance value at a particular scale is already so large that the image cannot be qualified as a good match, or (ii) the finest scale is reached and the image is either discarded or selected as a member of the answer set, depending on the distance value. (b) Search P H (Pure Horizontal) — a strategy in which horizontal filters search "horizontally" across the database level by level. The idea is that all the images in the database are checked by the .filter at the coarse scale in the first iteration. Poor matches at this scale are eliminated. In subsequent iterations, images that are left from the previous level/scale are checked by filters at finer scales, and poor matches are again removed. The process repeats until the finest scale is reached and the top u images are returned. Search P H is more efficient than Search P V , because the latter tends to require many comparisons at the finest scale, particularly at the beginning of the search. However, in Search P H , it is hard to determine the number of images to be carried over from the current level to the next level. For some queries, if such a number is small,-' an image I that gives a good match at a finer scale may have been eliminated before reaching the finer scale. This may happen when there are sufficiently many images which are not as good as I at 15 the finer scale but are better than I at the coarser scale. Consequently, while delivering efficiency, Search P H may suffer from a loss of effectiveness. . (c) Search H V (Horizontal-and-Vertical) — a hybrid of the above two search strate-gies in which we use a horizontal filter on the first level and vertical filters on the remaining levels. The idea is that all the images in the database are checked by a horizon-tal fijter at the coarse scale in the first iteration. Poor matches at the coarse scale are eliminated. For each of the images that are left, the algorithm keeps proceeding to finer scales until (i) the distance value at a particular scale is al-ready so large that the image cannot qualify as a good match, or (ii) the finest scale, is reached and the image is either discarded or selected as a member of the answer set, depending on the distance value. In this strategy, the detailed search with the use of vertical filters is applied not to the set of all the images, but only to its most promising subset.. The experimental results confirm that Search H V is the best strategy for re-trieval of desired images when compared with the other two strategies because Search H V keeps a good balance of performance/speed and accuracy. • • • / 1.4 Outline of Thesis In the next chapter, related works are described. We give an overview of some of the research projects in C B R , and mention the research projects for handling color and spatial similarity, multiscale matching, and subimage queries. Chapter 3 focuses on Padding and Reduction Algorithms. Concepts, implementations, and experimental results are discussed. Chapter 4 describes the multiscale representation and a strategy for searching such a representation (Search P V ) . Analytical and experimental results for each filtering 16 scheme are studied so that the second question mentioned in Section 1.3 can be answered. In Chapter 5, two more search strategies, namely Search P H and Search H V , are proposed. We compare both analytical and experimental results of the three search strategies. As a result, an answer to the third question mentioned in Section 1.3 is provided. Finally, conclusions and suggestions for future work are presented in Chapter 6. 17 Chapter 2 Related W o r k s Efficient and effective content-based retrieval systems usually require that several impor-tant objectives are satisfied, namely: • efficient indexing methods, • expressive query language, and • efficient and effective query processing and optimization. To achieve the goals, several research' projects have been carried out. 18 2.1 Indexing To handle a user query, image contents of the query are compared to the correspond-ing image contents of the images in the database to determine which images are good matches. For efficiency, image contents are usually captured by feature vectors which are precomputed and stored. For a small database, sequential scanning of these feature vectors is fast. However, as the database grows, the linear scale-up of the sequential scanning becomes prohibitively slow. One way to speed up the process is to treat each feature vector as a point in n-dimensional space, and employ multi-dimensional indexing structures. Many multi-dimensional indexing structures have been designed. Examples are KD-trees [Samet 1990], R-trees [Guttman 1984], R+-trees [Sellis et al 1987], R*-trees [Beckmann et al 1990], SS-trees [White and Jain 1996], SS +-trees [Kurniawati et al 1997], TV-trees [Lin et al 1994], VP-trees [Chiueh 1994], and X-trees [Berchtold et al 1996]. The above indexing structures help in improving the efficiency for low dimensional feature vectors. However, for high dimensions, many of these multi-dimensional index-ing structures explode exponentially with the dimensionality, and eventually reducing to sequential scanning. Given that image feature vectors are usually of high dimension-ality (for example, it is not unusual to have 64-dimensional vectors for color features), ways to deal with this problem are needed. One way to deal with this "dimensional-ity curse" problem is by using filters. A 2-level filtering approach has been proposed [Sawhney and Hafner 1993, Faloutsos et al 1994, Sawhney and Hafner 1994]. The idea is to abstract a low dimensional vector from each original high dimensional feature vector by an information preserving transformation. Based on distance values of the abstracted vec-tors, images which are far from the query are eliminated. As a result, only a small number of candidates are left. The original high dimensional vectors of these candidates are then passed to the detailed matching operation to obtain the best matches. Unfortunately, the dimensions of abstracted vectors cannot be too high; otherwise, vectors cannot take full 19 . advantage of multi-dimensional indexing structures. On the other hand, the dimensions of abstracted vectors cannot be too low; otherwise, the detailed matching at the finest level needs to be operated on a large number of high dimensional vectors. To improve the efficiency, a 3-level filtering approach has been proposed [Tam 1996, Ng and Tam 199.7]. The idea is to. add an additional intermediate level so that both the coarsest and the finest filterings can be more efficient. As the efficiency of spatial indexing structures usually deteriorates with the in-crease in dimensionality, methods to compress or reduce the dimensionality of the image vector space without losing much information are necessary. One method is Karhunen-Loeve (KL) Transformation, or Principal Component Analysis [Sedighian 1995, Ng and Sedighian 1996]. The idea is to transform the data space by removing dependent di-mensions and converging most of the information into the first few dimensions. Another method is to apply singular value decomposition and clustering techniques recursively to feature vectors until the dimensions cannot be further reduced [Thomasian et al 1997]. 2.2 Query Language With the ever-growing use of the Internet, there are more and more image database servers. To maximize the efficiency and the throughput of an image database server, it is benefi-cial to have an expressive query language so that the user can be as precise as possible in specifying his query. Such precision in query specification may lead to reductions in the number of query reformulations over the network. In the past few years, many query language systems have been developed. Examples are QBIC [Sawhney and Hafner 1993, Faloutsos et al 1994, Sawhney and Hafner 1994, Flickner et al 1995, Finn 1996], Virage [Bach etal 1996, Gupta 1997], Miyabi [Hirata et al 1993], Photobook [Pentland et al 1994], and F M R [Sakamoto et al 1994]. With these systems, querying from image databases for applications with a dense image space (for example, medical image management 20 or remote-sensing where the images are very similar) as well as a sparse image space (for. example, art gallery management where the neighboring images are different from each other) is possible. Recently, another query language, EXQUISI , has been proposed [Faulus 1996, Faulus and Ng 1996, Faujus and Ng 1997]. To incorporate imprecision and ambiguities in user queries, the user is allowed to specify a range of values for image content, and all values within the range are then treated as identical during the query processing step. Moreover, an additional reformulation function is provided so that the user can specify the parts of the returned image he wants to include or exclude. 2.3 Query Processing based on Color and Spatial Similarity . In the vision and image processing community, color has been widely used for image seg-mentation and classification tasks. Studies have been done suggesting that color plays an important ..role in human understanding of an image [Hurvich 1981]. This explains why, among all image contents, color is one of the most popular, and is supported by IDBMSs like QBIC [Flickner et.al 1995], Virage [Bach et al 1996, Gupta 1997], and Miyabi [Hirata et al 1993]. A popular way to represent color is to use color histograms. A color histogram holds information on color distribution, but it lacks spatial information on color. The problem may be overcome by dividing an image into several blocks and creating a histogram for each of the blocks. Locality information is captured as each of the histograms holds color distribution for a particular block in the image. The more blocks in the image, the more accurate is the locality information; however, more memory would be consumed in holding the histograms, and more computation time would be required for comparing histograms. It is a tradeoff between efficiency (time and space efficiency) and effectiveness. In Q B I C , the user is allowed to submit, queries on~ large image databases based on example images (query by example) as well as user-constructed sketches.and draw-21 ings (query by painting) [Sawhney and Hafner 1993, Faloutsos et al 1994, Sawhney and Hafner 1994, Ashley et al 1995, Flickner et al 1995, Finn 1996]. For query by example, similarity matching between the histogram of a query and the histogram of an image is based on the weighted Euclidean distance of the normalized histograms: (Query Histogram - Image Histogram) 7 A (Query Histogram - Image Histogram) where A is a similarity matrix with entries a tj describing similarity between color i and color j. The similarity matrix accounts for both the perceptual distance between the pairs of colors and the difference in the amounts of each color. Two types of histograms are used: 1. average Munsell color histograms for handling average color queries. This kind of histogram is useful for images that have a dominant color or a small range of hues. The 3-dimensional average color histogram for each image is formed by adding up the red, green and blue components of each pixel. 2. n-dimensional color histograms for handling histogram color queries. This kind of histogram is useful for searching images with a desired color distribution. To create a histogram, color space is usually divided into 64 ranges, and the percentages of pixels in each color range are counted. As a result, a 64-dimensional color histogram is formed. Regarding query by painting, QBIC retrieves images with similar colors in similar spa-tial arrangement. For a partition-based method [Ashley et al 1995], each image in the database is divided into a grid of either (a) 6 vertical X 8 horizontal blocks or (b) 9 ver-tical X 12 horizontal blocks. For each block, (a) an average Munsejl color histogram and (b) a partial color histogram consisting of the most frequently occurring colors and their frequencies are computed and stored. A t runtime, image contents of the query are ex-22 tracted in a similar manner. Similarity matching is based on the average of the distance values of all the blocks. Some other methods for representing color and spatial information and for com-puting similarity measures have been suggested. For example, in the system presented by Gong et al [Gong et al 1994], the image is divided into 3 x 3 subareas. In addition to a histogram for the entire image, a histogram is created for each of the 9 subareas. With the observation that colors within an image do not spread widely in the color space, the authors use the top 20 bins (in terms of pixel counts) of the color histogram. Two parame-ters, the Weighted Perimeter and the Weighted Angle, are extracted from the top 20 bins; they form hyper-polygons. Similarity matching is computed by comparing the values of the two parameters. In another system [Strieker and Orengo 1995, Dimai and Strieker 1996, Strieker and Dimai 1996] - this one based on the observation that important objects are usu-ally placed in the center of images in many applications — the image is divided into five fuzzy regions (Figure 2.1): center, top-left, top-right, bottom-left, and bottom-right re-gions. The authors use moment-based color distributions — (1) average, (2) variance, and (3) skewness of each (L*a*b*) color channel — which they claim to be more robust and more efficient than working with color histograms. Figure 2.1: Fuzzy Regions In Miyabi [Hirata et al 1993], images are divided into 8 regions using values on color and other image contents (such as texture). With an image regioning and merging technique, color information is encoded into a picture index and can be used in matching. For the VisualSEEk system [Smith and Chang 1996], Smith and Chang notice the. difficulty of picking the best scale at which the images should be blocked in fixed block segmentation. They use the color histogram back-projection method of Swain and Ballard [Swain and Ballard 1990] to segment the images instead. The encoding of color informa-tion is done by using a binary color set in which only the colors that are sufficiently present in the region are selected. With this color set back-projection method, images can be retrieved effectively. One common point in these systems is that they have not explored the use of multiscale matching for further improvements on the efficiency and the effectiveness of C B R based on color and spatial similarity. 2.4 Query Processing with Multiscale Matching The difficulty of picking the best scale at which the images should be divided in an I D B M S can be addressed by the use of multiscale representation. With the representation, comparing feature vectors at a coarse granularity level enables the identifications of poor matches from the collection of images.. On the. other hand, if satisfactory discriminating power cannot be provided by'the current scale, matching can be done at a finer granularity level which may lead to the retrieval of the desired images. In QBIC, query by painting is not only handled by a partition-based method,, but also by a region-based method [Ashley et al 1995]. Instead of relying on a fixed grid placed on the image, an approximate segmentation is used on the images and the query. The iterative metric space clustering algorithm 1 is used, which starts with each color in the 'The algorithm is based on the concept of mutual nearest neighborhood. 24 image defining a cluster. A t each iteration, a pair of clusters is collapsed into a single cluster if their mutual ranks fall below the preset threshold. Color distance and spatial distance are combined during the clustering stage. For each color, a bounding rectangle is formed for each group of connected pixels having that color. Then, the bounding rectangles for a given color are successively clustered until one rectangle remains. As a result, multi-level color trees are formed. To process a user query, the image contents of the query are matched against the image contents of images in the database. In order to do so, distance values are computed for every color region of the query based on (1) the distance between the colors and (2) the distance between the trees. The distance between the trees is computed by comparing the query root with the image root as the first step. Each child of the query node is then.iteratively compared against the closest child of the matching image node. Another image querying technique has been proposed [Jacobs et al 1995]. In this system, the user query and the database images are decomposed using standard two-dimensional Haar wavelet decomposition— which involves a one-dimensional decompo-sition on each row of the image, followed by a one-dimensional decomposition on each column of the result. With the observation that the resulting wavelet coefficients of decom-position can be truncated without losing much discriminating power, only the significant coefficients for each color channel (such as Y , I, and Q). are kept. In order to speed the search and reduce the storage, each truncated coefficient is then quantized to two levels representing the presence or the absence of the features.. As a result, for the query and the images, the overall average color as well as the indices and signs of the significant co-efficients in each color channel are stored. Having the expectation that a vast majority of database images may not match the query well at all, the similarity score is computed by counting the number of matching coefficients in query and image. Like the region-based method in Q B I C , this image querying technique cannot handle subimage queries. 25 Recently, Chen et al presented a formal framework for designing search algorithms which can identify the desired images by spatial distribution of color [Chen et al 1997]. The framework is based on a multiscale representation of both the image data and the asso-ciated parameter space that must be searched. To process the query, a branch-and-bound algorithm and a multiscale distance function are used. With the multiscale representation, both the query and the images are decomposed into a pyramid of feature vectors. Each level of the multiscale tree represents a scale, and each node at a particular scale contains a feature vector for the corresponding region of the images. A t the coarsest scale, the entire image is represented as a singlenode. For nodes farther away from the root, a more spatially localized region of the image is represented. The distance value at a particular scale is computed by adding the distances between the corresponding feature vectors of the nodes of the query and the image trees at that scale. Definition 2.1 To conduct the branch-and-bound search for retrieving the top u images from the collection: 1. Check all the images in the database at the coarse scale in the first iteration, and compute the distance value for each image: 2. Sort all images in non-descending order of distance value. 3. Repeat until all top u images have reached the finest scale: (a) For each of the u images with smallest distance values, i . if the image has not reached the finest scale, extend one level/scale; i i . update the distance value of the extended image with the sum of the dis-tances between the corresponding feature vectors of the query and the image at the extended scale. (b) Sort all images in non-descending order of distance value. • 26 Notice that at each iteration of the branch-and-bound search, the feature vectors used in the computation of distance values may be at adifferent level/scale and may be for images different from those in the previous iteration. Unless the feature vectors of all levels and for all images can be stored in memory, jumping back and forth in the data file so as to get the necessary feature vectors for computation seems unavoidable. The frequent jumping among different images and scales make it hard to optimize file organization and buffer management, and may generate a large number of I/Os. Example 2.1 In many database applications, it is hot unusual to retrieve the desired images from a collection of thousands of images. For simplicity, in this example, we try to find the top two images from a collection of five images using the branch-ahd-bound search. Let Scale 3 be the coarsest scale, and let Scale 0 be the finest scale. For the scales in between, the index decreases as the scale becomes finer.- In the first iteration of the branch-and-bound search, all five images are checked at the coarse scale (Scale 3), and a distance value is computed for each image. These images are then sorted in non-descending order of distance value, and the images with the two smallest values are the potential top two images. Image 1 Image 2 Image 3 Image 4 Image 5 Iteration 1 Scale 3 Scale 3 Scale 3 Scale 3 Scale 3 val: 25 val: 5. val: 65 val: 80 val: 10 =$> potential top two images are Images 2 and 5 27 In subsequent iterations, the potential top two images are extended one level/scale and their distance values are updated; the images are then sorted and a new set of potential top two images is obtained. Image 1 Image 2 Image 3 Image 4 Image 5 Iteration 2 Scale 2 Scale 2 val: 30 val: 15 =>• potential top two images are Images 5 and 1 Iteration 3 Scale 2 Scale 1 val: 35 val: 20 =>• potential top two images are Images 5 and 2 Iteration 4 Scale 1 Scale 0 val: 32 val: 21 =>- potential top two images are Images 5 and 2 Iteration 5 Scale 0 val: 33 top two images are Images 5 and 2 To aim for efficient retrieval, feature vectors at each level of the multiscale systems are extracted and stored prior to runtime. Since the system performance is expected to be affected by the number of levels, we need to choose the number of levels carefully through analyses and experiments. However, detailed analytical and experimental results for the determination of a suitable number of levels are seldom provided in depth. 28 2.5 Subimage Query Processing Many of the current IDBMSs support only whole-image matching, but not subimage matching. However, in many situations, users are often interested in local image contents. Due to the poor human memory capability for retaining a fine granularity of spatial information of color, users, in other situations, cannot recall all details of images they have seen before. With whole-image matching, users need to come up with the'color distribution on the remaining portion of the images which they may not know or care about. Hence, subimage matching is needed. To deal with subimage queries of arbitrary size, many template-based algorithms have been proposed. These include the use of traditional template matching techniques based on Euclidean distance for searching for the occurrence of a textured pattern inside each image in the database [Stone and L i 1995], and the use of classified template match-ing techniques in which templates are classified according to texture and edge features [Tao and Dickinson 1996]. One problem with template-based algorithms is that they re-quire a huge amount of computation due to the large number of positions to be compared. Another problem is that image contents are usually not precomputed or stored; as a re-sult, query processing becomes very slow. Hence, other algorithms for handling subimage queries efficiently and effectively are needed. The RITAS system [Tao et al 1997] is a segmentation-based approach for retrieving textured images containing a pattern similar to the template. Here, each image in the database is divided using a quadtree segmentation technique. A quadtree is built by repeated low-pass filtering and down-sampling. A t the coarsest level, feature clustering is performed. Then a boundary refinement procedure is designed to improve the boundary each time when moving down from a higher level. After the segmentation process, for each of the segments, mean and standard deviation of texture energy measures are computed and stored in n-dimensional feature vectors. A n n-dimensional spatial relationship vector 29 is also created for each segment. During the query processing step, the query is segmented in a similar manner. The distance between the query and the image is computed by adding two measures, namely Relational Distance (measuring the difference in spatial locations) and Sum of Minimum Distance. For each query segment, a weighted Euclidean distance between the query segment and its most closely matched image segment is computed. The Sum of Minimum Distance can then be calculated by summing all weighted Euclidean distances. Notice that the system is built specifically for handling textured images. The efficiency and the effectiveness of this approach in handling color and spatial similarity are unknown. White and Jain presented a framework for developing engines that support subim-age matching [White and Jain 1997]. The image representation used in their ImageGREP Engine is a set of binary images stored as bitmaps. The set of binary images is computed by running a bank of 166 color classifiers on the input image. Each color classifier maps an input image into a single binary image by determining whether an input image region corresponding to one bit in the output bitmap contains more than the threshold number of pixels quantized to that color. With this representation, each region is denoted by one of the Present or Absent states, and images can be searched using bitwise operations dur-ing the runtime. To take account of subimage translation, all allowable translations are enumerated and stored. In general, the query processing time increases with the number of allowable translations. As noticed, not-many IDBMSs can handle arbitrary-size subimage queries based on color and spatial similarity. For the systems that can deal with subimage queries of arbitrary size, multiscale matching is rarely used. 30 2.6 Summary Several research projects have been carried out to aim for efficient and effective content-based retrieval systems. To achieve the goals, one of the important objectives is to pro-vide efficient indexing methods; Many multi-dimensional indexing structures have been designed to help improving efficiency for feature vectors that capture image contents. How-ever, for high dimensional vectors, the "dimensionality curse" problem arises. To tackle the problem, multi-level filtering approaches (which perform preliminary matching on the low dimensional abstracted vectors and detailed matching on the original high dimen-sional vectors of potential candidates) have been proposed, and information preserving transformations (which compress or reduce the dimensionality of the vector space) can be applied. The second objective is to develop expressive query languages. With them, efficient querying from image databases for applications with dense and sparse image spaces can be accommodated. The support of efficient and effective query processing and optimization, marks the third important objective. Wi th the popularity of similarity matching on color and spatial information, many IDBMSs store the information in. local color histograms. To process user queries, some IDBMSs use fixed grid segmentation approaches. For further improvement on the efficiency and the effectiveness of C B R , multiscale systems have been proposed. However, detailed analytical and experimental results on the determination of the suitable number of levels for these systems are seldom reported, and comparisons of dif-ferent strategies for searching multiple scales are also rare. Moreover, in many situations, users are interested in, or can remember, only local image contents; therefore, subimage query processing is needed. Unfortunately, not many IDBMSs can handle arbitrary-size subimage queries based on color and spatial similarity. For the systems that can deal with subimage queries of arbitrary size, multiscale matching is rarely used. 31 Chapter 3 P a d d i n g and R e d u c t i o n A l g o r i t h m s Many IDBMSs support whole-image queries, which specify the contents of the whole im-ages to be retrieved. However, users may only remember or care about certain parts of the images. To answer queries of this kind, some systems segment an image into several blocks, each of which has an associated color histogram. One problem with this arrange-ment is that subimage queries may be of arbitrary size, and not necessarily an integral multiple of the chosen block size. To handle the complication of arbitrary size,' some sys-tems use template-based matching algorithms. A key problem with those algorithms is that a huge amount of computation is needed, because of the large number of positions to be compared. As such, we propose two algorithms, called Padding and Reduction, for dealing with subimage queries of arbitrary size. 32 3.1 Lower B o u n d to Histogram Distance In many IDBMSs, color is extracted automatically and, stored in n-dimensionai color histograms. Once the color histograms are created for the images in the database, there are a variety of ways to compute the similarity between the feature vectors of the whole-image query and the image. One popular way is based on the Euclidean distance between the color histograms: (Query Histogram ~ Image Histogram) 7 (Query Histogram " Image Histogram) This measure can also be used in computing similarity between the feature vectors of the subimage query and the image subregion even if the query is not of the same size as the image subregion. However, comparing these two vectors may seem unfair as one of the vectors may contain more pixels than another. Due to the quadratic nature of the above Euclidean measure, the excessive pixels in one vector may dramatically influence the difference in the amount of a given color, and hence the resulting distance. For example, given that a query and three image subregions are represented by 3-dimensional vectors Q={ 2, 4, 52 ) T , /,=( 16, 18, 66 ) T , 72=( 44, 4, 52 ) T , and 73=( 27, 29, 44 ) T respectively, and that the color distribution of the query is the same as a portion of each of the first two (but not the third) image subregions, then the third subregion is a "poorer" match than the first two. However, using the above measure, the distance between Q and I3 isT314, which lies in between 588 (the distance between Q and I\) and 1764 (the distance between Q and I2). Thus, with this measure of histogram distance, it is not easy to provide a satisfactory discriminating power. Alternatively, we can use normalized histograms, in which the percentage (instead of the pixel counts as in the standard histograms) of each color is stored; however, if the user is confident in the query, size when specifying his query, the use of normalized histograms may not be helpful. For example, given the Query and the two Images shown in 33 Figure 3.1, Image 1 is the better match. Unfortunately, using normalized histograms with the above Euclidean measure has an effect of scaling up the Query ^ As a result, Image 1 can no longer match the Query perfectly; instead, Image 2 becomes a better match when operated with normalized histograms. In other words, to maintain the accuracy, the size of a user query is restricted to the size of the image or image subregion. However, our goal is to deal with subimage queries of arbitrary size. (a) Image 1 (b) Image 2 (c) Query (d) "Scaled" Query Figure 3.1: The Use of Normalized Histograms To avoid the problem caused by the size differential between the query and the image, the histogram distance ( D H ) is computed between the color histogram Q of the subimage query and the color histogram I of the equal-sized image subregion as D H = (Q - I)T(Q - I). Given a subimage query consisting o f s x s pixels, each image of size S x S pixels (where S > s) contains S ^ 5 + 1 H 2 5 " 1 " 1 ) potential subregions with size ranging from l x l pixel to 5 X 5 pixels. Pre-extraction of color features for all these subregions becomes impractical. Despite that, feature vectors can still be precomputed and stored for some, but not all, subregions of the images in the database. As a result, given a subimage query of arbitrary size, the image subregion represented by the precomputed feature vector may not necessarily be of the same size as the query. Without loss of generality, we assume that: • the subimage query is square and consists of v pixels, and 'The scale-up occurs unless the Query is of the same size as the Images. 34 • the image subregion consists of w pixels (where v < w). Then, instead of computing the exact histogram distance (DH), we estimate its best-case lower bound (Du). The idea behind the estimation is that we "modify" either the query or the image subregion so that both the query and the image subregion contain the same number of pixels after the "modification" process. The estimated lower bound DH can then be established by computing the best-case similarity between the feature vectors of the "modified" query and the image subregion, or between the feature vectors of the query and the "modified" image subregion. Two approaches for estimating the histogram distance between the subimage query and the image subregion are proposed: 1. Padding Approach We enlarge the subimage query by padding w — v "desired" pixels to it so that the re-sulting padded query is of the same size as the image subregion. In order to minimize the distance measure, the "desired" pixels are chosen from the image subregion. Original Query Image Subregion Padded Query Image Subregion Represented r> r O' — P + O I v ' w 'w-v w by Figure 3.2: Padding Approach Definition 3.1 Let I, P, Q and Q' represent the histograms of the image subre-gion, the padded area, the original subimage query, and the resulting padded query respectively, and let the subscripts w, v and w — v indicate the number of pixels represented in the histograms. The goal of the Padding Approach is to find an ap-propriate assignment to the optimization variable P so that DH is minimized and 35 the vector inequality Pw-V < Iw is met. More precisely, we want to get the optimal > P (denoted by P*) satisfying the condition: Qw Qw min {Pw-V + Qv -Iw) (Pw-v + Qv-Iw) (3.1) — V such that P„,_„. < Iw As a result, the estimated best-case lower bound DJJ = (Pw_v +QV — Tw)T(Pw-v + Qv Iw)' ^ 2. Reduction Approach We reduce the precorriputed image subregion by choosing v "desired" pixels from it so that the resulting reduced, image subregion is of the same size as the subimage query. In other words, the w — v "not so desired" pixels are removed. o r i g ' n a i Reduced Query Image Subregion Query Image Subregion Represented O 1 Ct V by Figure 3.3: Reduction Approach Definition 3.2 Let I, I' and Q represent the histograms of the original precomputed image subregion, the resulting reduced image subregion, and the subimage query respectively, and let the subscripts w and v indicate the number of pixels represented in the histograms. The goal of the Reduction Approach is to find an appropriate assignment to the optimization variable I' so that DJJ is minimized and the vector inequality I'V < IW is met. More precisely, we want to get the optimal / ' (denoted .36 by I'*) satisfying the condition: min (Q„ - I'V)T{QV - I'v) such that J ' < Inn (3.2) As a result, the estimated best-case lower bound DJJ = (QV — I'*)T(QV — I'*) 3.2 Development of Padding and Reduction Algorithms Given that the subimage query and the image subregion are represented by n-dimensional' color histograms (?i-dimensiorial vectors Q and / ) , mathematically, the two proposed ap-proaches can be restated as: Padding Approach objective function . inequality constraint summation constraint domain constraint • Reduction Approach objective function inequality constraint summation constraint domain constraint Given ^ Qj = v and ^ Ij — w > v, j=i i=i . find an optimal vector P* to minimize (P + Q- I)T(P + Q- I) subject to 0 < Pj < Ij for 1 < j < n w — v and J^ P j . i=i and I, P, Q integer vectors Given ^ Qj — v and ^ h = w > v, j=i j=i find an optimal vector / '* to minimize (Q - I')T{Q ~ J') subject to 0 < I'- < Ij for 1 < j < n n and ^2 Ij = v and 7, T7, Q integer vectors (3.3) (3.4) 37 3.2.1 Generate-and-Test A naive method to solve for the optimal vector P* or / '* is to exhaustively find all the possible vectors in which the constraints (inequality, summation, and domain constraints) are satisfied, and then select the vector that gives the minimal value to the objective, function. More precisely, for the Padding Approach, all feasible vectors for P are system-atically generated by enumerating the value for each entry Pj in the n-dimensional vector. The generated vectors are then tested for minimality, and the one that gives the minimal Euclidean distance is returned: P A D D I N G - G E N E R A T E & T E S T 1 Du <- +00 2 for P\ <— 0 to min( / i , w — v) do 3 for P2 <— 0 to m i n ( /2 , w — v — P\) do 4 for F 3 <— 0 to min(/3, w - v — P\ - P2) do 5 : • 6 for P n _ ! <r- 0 to min(/„_ 1 , w - v - YTjZl Pj) d ° 7 . Pn «; - v - YT3Zl Pj 8 if Pn < In 9 then distance 4 - (P + Q - I)T(P + Q - I) 10 if distance < DJJ 11 then DH distance 12 P*^ ( P 1 , P 2 , F 3 , - - - , P n _ 1 , P n ) T Similarly, for the Reduction Approach, all feasible vectors for / '* are systematically generated by enumerating the value for each entry Pj in the n-dimensional vector. The generated vectors are then tested for minimality, and the one that gives the minimal 38 Euclidean distance is reported: R E D U C T I O N - G E N E R A T E & T E S T : 1 DH <- +00 2 for J{ <- 0 to min(/j,<;) do . • . 3 for <r- 0 to m i n ( / 2 , u - /{) do 4 for I'3. «- 0 to m i n ( J 3 , _ - J { do 5 • • ' • :• 6 for j - 0 to min(/„_! , y - E " = i 2 / j ) do 8 if /„< /» . , ; 9 . then distance (Q - /')'(<2 - /') 10 if distance < DH 11 then <r- distance 12 I>*<-{I'1,I'2J^-..J'n_1,I>n)T Like many generate-and-test applications, this naive method of solving for P* and V* is unpalatable in the sense that the execution time is. expected to be very long. 3.2.2 Quadratic Programming with Integer Programming With the observation that the tasks of finding the vector P* in Problem (3.3) and the vector / '* in Problem (3.4) are instances of quadratic programming (QP) problems, ex-isting mathematical software packages can be used. Examples of these software packages include M A T L A B (MATrix LABoratory) [Sigmon 1992] and L I N D O (Linear INteractive Discrete'Optimizer) [Schrage 1986]. In order to use the software packages, the minimiza-tion problems are. usually required to be converted into the forms: 39 • Padding Approach min -PT(2I)P + [2(Q - I)]TP (3.5) s.t. where I is the n X n identity matrix Reduction Approach P < min i ( / ' ) r (2 I ) ( / ' ) + [-2Q] T (/ ' ) ' - l - l - - 1 s.t. I' < (3.6) where I is the n x n identity matrix One problem with the software packages is that the computation of optimal vectors is usually done in the domain of real numbers, not the domain of integers. The domain 40 problem coupled with the roundoff error may lead to the unreliability of some output vectors. For example, given I = ( 1430, 3257, 6133 )T and Q = ( 1429, 0, 2 )T, the software package outputs ( 1430, 3257, 4702 ) T as the answering vector (with value for objective function = 4084082) for the Padding Approach, but the expected optimal vector P* is ( 1, 3257, 6131 ) T with the corresponding optimal value of 0. As another example, given I — ( 2, 7, 5 )T and Q — ( 5, 1, 1 )T, an expected optimal integer vector V* is ( 2, 3, 2 )T, but the software package returns the real vector ( 2, 2.5, 2.5 )T to represent the reduced image subregion. Hence, we want the method to handle not only QP, but also integer programming (IP). With IP, the domain constraints — I,P,Q integer vectors and I,I',Q integer vectors — can be specified. Unfortunately, the IP function is rarely supported in conjunction with Q P ; in many software packages, the IP function is not supported at all. 3.2.3 Algorithms P A D and R E D Since the above methods may sometimes be unsound or time consuming, efficient and effective methods for estimating the best-case lower bound to the histogram distance are needed. It is well known that for a vector x, xTx can be written as J2ixj)2- So, given ]Cj=i Qj = v a n d Z) j=i Ij = w ^ vi Problems (3.3) and (3.4) can be rewritten as: • n DH = mm Y,(pi-aj)- (3-7) j=i ; subject to 0 < Pj < Ij for 1 < j < n n and Pj = w — v j=i and I, P,a integer vectors where a = I — Q; and 41 Ihl = min __(/<-Q.) 2 , (3.8) subject to 0 < I'j < Jj for 1 < _ < n n and 7j = v and 7, / ' , Q integer vectors respectively. Wi th these representations, each of the two objective functions is in the form of the sum of squares of the difference terms: •; E ( * _ - « . ) 2 or Ulj-Qj)2 , : In order to minimize the sum, we need to minimize the difference terms. In which order should the terms be minimized ? Due to the quadratic nature of the squares of the difference terms, we note, on a close examination of the representations, that deducting 1 off a large difference term is more effective in minimizing the sum than deducting 1 off a small difference term: If integers a > b > 1, then (a - l ) 2 + b2 < a 2 + (6 - l ) 2 It is also clear that for two difference terms having the same value c, subtracting an integer d from each of the these difference terms is more effective in minimizing the sum than subtracting 2„.from only one of these two equal-valued terms: If integers c > 2d and d> 1; then (c - d)2 + (c - d)2 < (c - 2d)2 + c2 Hence, for the Padding Approach, we start with Pj = 0 for all j, and each difference term (Pj — aj) becomes —CXJ. These terms are then rearranged in non-ascending order of atj (in other words, non-descending order of —cxj) and result in: {(^(i) - 0 ( i ) ) > ( P ( 2 ) - « ( 2 ) ) . - " » ( ^ > ( n ) -<*(n))} wiiere a ( 1 ) > a ( 2) > • • • > <*(„) 42 After the rearrangement, we try to lower the difference in the first term (P(i) — oj(i)) by adding the value — a^) to P^ 2 so that the first term has the same value as the second difference term (P(2) — a(2))- We then try to reduce the values of these first two difference terms by increasing the values of P ( i ) and F(2) in round-robin fashion. 3 so as to make them have the same difference as the third term (P(3) — 0(3)). We keep devaluing the first k difference terms so that they have the same value as the (k + l ) - th difference term through increases of the values of P(\<j<k) 4 - This process is repeated until the summation constraint Y^j=\Pj = w — v is satisfied. A t any cycle, the value of PQJ is constrained by the inequality P^ < 1^ and will not be increased beyond its allowable maximum I^y A l g o r i t h m 3.1 ( A l g o r i t h m P A D ) 1 Mj, atj <- Ij - Qj 2 V i , V O 3 { (P(j) —.a(j)) } sort the (Pj — CHJ) terms in non-ascending order of aj 4 k <- 1 .' 5 w h i l e k < n a n d Ylj=i P(j) < w — v d o 6 l o o p for at most a^f.) — <*(fc+i) cycles 7 fo r e a c h P(i<t<k) d o 8 i f P(t) < 7 ( < ) t h e n P[t) <- P(t) + 1 9 i f Ylj=i P(j) — w ~ v t h e n r e t u r n P 10 ' k <r- k + 1. 11 i f E i = . P(j) <w-v 12 t h e n l o o p 13 f o r e a c h P(\<t<n) d o 2If P(i) reaches its allowable maximum 7^), the value of P^j will not be increased in subsequent cycles. In such a case, the difference in the first term may not be the same as the difference in the second term. 3 Again, if £ {^"(1)1 ^"(2)} reaches its allowable maximum 1^, the value of Pyj will not be increased in subsequent cycles. Similarly, if P(j) € {P(i), • • •, P(k)} reaches its allowable maximum /(j), the value of P(j) will not be increased in subsequent cycles.. ' 43 14 15 i f P ( f ) < / ( 0 t h e n P ( t ) < - P w + l if J2]=i P(j) = w — v then return P Example 3.1 Given integer vectors I = 3 V 1 0 ; and Q /2\ v 1 / ; then, a = I — Q (.10, —4, 9 ) T . We want to find an appropriate assignment to integer vector ( P i , P 2 , P 3 )T so that the objective function (Pi - 10)2 + (P2 + 4) 2 + ( P 3 - 9) 2 is minimized and P i P 2 the constraints 0 V 0 / / „ \ 7 1 2 \ < < 3 V 1 0 / and E ? = i Pj = 25 - 10 = 15 are satisfied. We start with P j = 0 for all j and rearrange all the difference terms. After the rearrangement, we try to lower the difference in the first term (P(i) — 10 = —10) by adding 1 to P(i) so that the first.two terms have same difference (= - 9 ) . Then, we try to reduce the values of these two difference terms by increasing the values of P(i) and P(2) in round-robin fashion so as to bring them closer to the value of the third difference term (= 4); we stop when the constraint ]£ j= i Pj = 15 is satisfied. *U ) -«( i ) P ( 2 ) - « ( 2 ) P ( 3 ) - o ; ( 3 ) - E P j = Pi - 0 1 = P 3 - a3 = P 2 - a2 Cycle 0 0 - 1 0 = -10 0 - 9 = - 9 0 - ( - 4 ) = 4 0 J f c = 1: Cycle 1 1 - 1 0 = - 9 1 k = 2: Cycle 2 2 - 10 = -8 1 - 9 = -8 3 Cycle 3 3- 10 = - 7 2 - 9 = - 7 5 Cycle 8 8 - 10 = - 2 7 - 9 = - 2 15 44 / o \ Algorithm P A D returns P* with - cxj)2 = 24 0 \ 7 / Similarly, for the Reduction Approach, we start with Pj = 0 for all j, and each difference term (I'j—Qj) becomes —Qj. These terms are then rearranged in non-ascending order of Qj (in other words, non-descending order of —Qj) and result in: {(/(i) - Q(i)). (J(2) - 0(2 ) ) i •' •' (7(n) - Q(n))} where Q ( 1 ) > Q ( 2 ) > • • • > Q ( n ) After the rearrangement, we try to lower the difference in the first term (P^ — <3(i)) by adding the value Q(i) — Q(2) to P^ 5 so that the first term has the same value as the second difference term (P^ — Q\2))- We then try to reduce the values of these first two difference terms by increasing the values of P^ and P^ in round-robin fashion 6 so as to make them have the same difference as the third term (J|3j — <5(3))- We keep devaluing the first k difference terms so that they have the same value as the (k + l )-th difference term through increases of the values of 7| 1 < J < f c j 7 . This process is repeated until the summation constraint $~Jj=1 Ij = v 1S satisfied. A t any cycle, the value of P^ is constrained by the inequality < 1^ and will not be increased beyond its allowable'maximum I^y Algorithm 3.2 (Algorithm R E D ) i Vi, /j <- o • " ^ { ^U)~ } s o r * (Ij ~ Qj) terms in non-ascending order of Q j 3 k <r- 1 4 while k < n and Ylj=i I[j) < v do 5If /(j) reaches its allowable maximum 7^), the value of 7^ will not be increased in subsequent cycles. In such a case, the difference in the first term may not be the same as the difference in the second term. 6Again, if 7^ 6 {^(1)1^(2)} reaches its allowable maximum Iy), the value of 7^ will, not be increased in subsequent cycles. , 7Similarly, if 1'^ £ { (^'1)1 • " 1 (^*)} reaches its allowable maximum the value of 7^ will not be increased in subsequent cycles. 45 loop for at most Q ( f c ) - Q(k+i) c y c l e s for each !' { 1< t< k ) do if I[t) < I[t) then I[t) f - I[t) + 1 8 if Ej=i '(j) = u t h e n return V 9 fc <- fc + 1 10 i f E ^ 1 / ( J ) < " 11 then loop 12 i f / ( ' t ) </( t ) then/ ( ' t ) ^/ ( ' f ) + l if 1'^ = v then return V for each /(!<t<n) do 13 14 Example 3.2 Given integer vectors I and Q we want to find V 1 J an appropriate assignment to integer vector ( /{, 1%, I'3 )T so that the objective function ( 7 j - 2 ) 2 + ( 7 2 - 7 ) 2 + ( / 3 - l ) 2 is minimized and the constraints 0 V 0 / < I'2 < 3 \ 1 0 / and E?=i Ij = 10 are satisfied. We start with 7j = 0 for all j and rearrange all the difference terms. After the rearrangement, we try to lower the difference in the first term ( 7 ^ — 7 = —7) by adding at most 5 to I'^y Since the value of 1'^ is constrained by the inequality < 3, it will not be increased beyond 3. Then, we try to reduce the difference in the second term {I'(2) — 2 = —2) by adding 1 to 1'^ so that the second term has the same value as the third difference term (— —1). We keep devaluing the second and the third difference terms in round-robin fashion until the summation constraint Ej=i Ij — 10 is satisfied. 46 7('i) - Q(i) 7(2) ~'Q(2) 7( 3 ) - Q(3) = I2-Q2 = i[-Qi = Is-Qs Cycle 0 0 - 7 = - 7 0 - 2 = -2 0 - 1 = -1 0 k = l: Cycle 1 1 - 7 = -6 1 Cycle 2 2 - . 7 = - 5 2 Cycle 3 3 - 7 = -4 3 k = 2 : . Cycle 4 1-2 =-1 4 k = 3 : Cycle 5 2 - 2 = 0 1-1 = 0 6 Cycle 6 3-2 = 1 2-1 = 1 8 Cycle 7 4-2 = 2 3-1 = 2 10 / 4 \ Algorithm R E D returns /•'* = with J2(I? ~ Qj? = 24 3 3.3 Analytical Comparison Having developed two algorithms, namely Algorithm P A D and Algorithm R E D , which one produces a better lower bound ? T h e o r e m 3.1 In the domain of integers, given E j = i Qj = v a n d E j = i Ij = w > v. • Define VHAD = (P + Q - I)T{P + Q - /) such that for 1 < j < n , 0 < Pj < Ij and E " = i Pj = w-v • Define VHED = (Q - I'fiQ - I') such that for 1 < j < n, 0 < I'- < Ij ^dEUI'j = v - ' 47 Then, there is a 1-to-l correspondence between VHAD and VHED. Proof [=>]• Let P = I - P. Then, the objective function (P + Q - I)T(P + Q - I) becomes (Q-P)T(Q-P). The inequality constraint V/,. 0 < Pj < Ij can be rewritten as V7, 0 — Pj — Pj < Ij-Pj=I><I3, The summation constraint JTJPj = w — v coupled with the equality J2 Pj = 12{Ij ~ Pj) = J2Ij - £ / < = w - £ / j implies that £ / j = v. [<=] Let P = I - I'. Then, the objective function (Q - I')T{Q - I') = (Q - I' + I -I)T{Q - I' + I - I) becomes (P + Q - I)T(P + Q - I). The inequality constraint V7, 0 < Ij < Ij can be rewritten as V j , 0 = Ij — Ij < I3-I'j = P3<h. The summation constraint J2 Pj = V coupled with the equality Pj = J2(^j ~ Pj) = Ij — J2 Pj = w ~ YI Pj implies that ]F) Pj = w — v. Therefore, given a vector P , there exists a corresponding vector I' such that VHAD = T>HED. Similarly, given a vector / ' , there exists a corresponding vector P such that VHAD •= VHED. Hence, a bijection between VHAD and VHED exists. • Corol lary 3.2 Algorithm P A D and Algorithm R E D produce the same lower bound to the histogram distance. Proof Algorithm P A D is an exact algorithm for computing the special case of VHAD — namely the minimal VHAD (denoted by mihVHAD) — which estimates the best-case lower bound to the histogram distance DJJ. Similarly, Algorithm R E D is an exact algorithm for computing the special case oiVHED — namely the.minimal VHED (denoted by min VHED) — which estimates the best-case lower bound to the histogram distance DJJ. Since there is a 1-to-l correspondence between VHAD and VHED, there exists a 1-to-l correspondence 48 between their special cases (the minima): UJJ — min L>fj min UJJ In other words, the values returned by Algorithms P A D and R E D are the same estimated best-case lower bound to the histogram distance. • Since Algorithms P A D and R E D are methods to implement the two proposed approaches, the Padding Approach and the Reduction Approach ideally produce the same bestrcase lower bound as well. -3.4 Experimental Comparison In terms of accuracy, both Algorithm P A D and Algorithm R E D produce the same best-case lower bound. The question is: Which one produces the.lower bound faster ? To answer this question, we performed experiments using color histograms of 8, 64, and 512 dimensions. We used 500 images or image subregions of each of the assorted sizes (128 X, 128, 64 X 64, 32 X 32, and 16.X 16 pixels) and 10 subimage queries for each of those sizes. The test queries consisted of 5 x 5, 15 x 15, 30 X 30, 60 x 60, and 120 x 120 pixels. The experiments were run on a Sun UltraSPARC-1 workstation. The results (the average computation time per query for each combination of the above mentioned sizes of image subregions and queries) are summarized in the tables and figures on the following pages. In the figures, the time curve for Algorithm P A D is represented by a blue solid line, and the time curve for Algorithm R E D is represented by a red dashed line. 49 • 8-dimensional color histograms Image Subregion Query Computation Time Size Size Algorithm P A D Algorithm R E D 5 X 5 6.52 ± 0.27 ms 0.04 ± 0 . 0 0 ms 1 1 x 1 1 6.50 ± 0 . 2 8 ms 0.07 ± 0 . 0 0 ms 15 x 15 6.37 ± 0 . 2 9 ms 0.13 ± 0.01ms 23 x 23 6.26 ± 0.30 ms 0.27 ± 0 . 0 1 ms 128 X 128 30 x 30 6 . 0 8 ± 0.28 ms 0.43 ± 0 . 0 1 ms 45 x 45. 5.68 ± 0 . 2 6 ms 0.87 ± 0 . 0 5 ms 60 x 60 5.07 ± 0.69 ms 1.61 ± 0 . 2 1 ms 9 1 x 9 1 3.37 ± 0 . 1 9 ms 3.40 ± 0 . 8 7 ms 120 x 120 0.82 ± 0.05 ms 6.04 ± 2 . 5 2 ms 5 x 5 . 1.62 ± 0.12 ms 0.04 ± 0.00. ms - 1 1 x 1 1 1.59 ± 0.07 ms 0.07 ± 0 . 0 1 ms 15 x 15 1.55 ± 0.07 ms 0.13 ± 0 . 0 2 ms 64 x 64 23 x 23 1.43 ± 0 . 0 7 ms 0.26 ± 0 . 0 5 ms 30 x 30 1.28 ± 0 . 0 6 ms 0.43 ± 0 . 1 2 ms 45 x 45 0.89 ± 0.08 ms 0.87 ± 0 . 4 9 ms 60 x 60 0.23 ± 0 . 0 1 ms 1.62 ± 1.08 ms 5 x 5 0.42 ± 0 . 0 3 ms 0.04 ± 0 . 0 1 ms 11 x 11 0.40 ± 0 . 0 2 ms 0.07 ± 0 . 0 2 ms 32 x 32 15 x 15 0.35 ± 0.02 ms 0.13 ± 0 . 0 6 ms 23 x 23 0.24 ± 0.07 ms 0.26 ± 0 . 1 8 ms 30 x 30 0.08 ± 0.00 ms 0.43 ± 0 . 3 1 ms ' 5 x 5 0.12 ± 0 . 0 1 ms 0.04 ± 0.01 ms 16 X 16 11 x 11 0.08 ± 0 . 0 0 ms 0.08 ± 0.04 ms 15 x 15 0.04 ± 0 . 0 1 ms 0.13 ± 0 . 1 0 ms Table 3.1: Computation Time for Algorithms P A D and R E D (8-dimensional Histograms) 50 Subregion consisting of 128x128 Pixels (n = 8) Image Subregion consisting of 64x64 Pixels (n = 8) 5000 10000 Number of Pixels in Subimage Query (a) Image Subregion Size = 128 X 128 Pixels -500 1000 1500 2000 2500 3000 3500 4000 Number of Pixels in Subimage Query (b) Image Subregion Size = 64 X 64 Pixels Image Subregion consisting of 32x32 Pixels (n = 8) Image Subregion consisting of 16x16 Pixels (n = 8) 0 100 200 . 300 400 500 600 700 800 900 Number ol Pixels in Subimage Query . . (c) Image Subregion Size = 32 x 32 Pixels 50 100 150 200 - Number of Pixels in Subimage Query (d) Image Subregion Size = 16 x 16 Pixels ^igure 3.4: Computation Time for Algorithms P A D and R E D (8-dimensional Histograms) 51 • 64-dimensional color histograms Image Subregion Query Computation Time Size Size Algorithm P A D Algorithm R E D 5 x 5 6.71 ±0.22 ms 0.32 ±0.05 ms 11 x 11 6.69 ±0.21 ms 0.36 ± 0.02 ms 15 x 15 6.65 ± 0.22 ms 0.42 ±0.03 ms 23 x 23 6.54 ± 0.69 ms 0.61 ± 0.10. ms 128 x 128 30 x 30 6.42 ±0.21 ms 0.84 ± 0.22 ms 45 x 45 5.98 ± 0.21 ms 1.28 ±0.64 ms 60 x 60 . 5.41 ±0.21 ras 2.03 ± 1.38 ms 91 x 91 3.77 ±0.16 ms 3.80 ± 2.40 ms 120 x 120 1.23 ±0.11 ms 6.28 ±4.58 ms 5x5 1.96 ±0.08 ms 0.32 ±0.01 ms 11 X 11 1.93 ± 0.07 ms 0.36 ±0.04 ms 15 x 15 1.89 ±0.07 ms 0^ 42 ±0.11 ms 64 x 64 23 x 23 1.82 ± 0.07 ms 0.61 ± 0.32.ms 30 x 30 1.67 ± 0.10 ms 0.84 ± 0.66 ms 45 X 45 1.29 ± 0.06 ms 1.28 ±0.75 ms 60 X 60 0.63 ±0.04 ms 2.05 ± 1.59 ms 5x5 0.75 ±0.04 ms 0.32 ±0.03 ms 11 X 11 0.71 ± 0.05 ms 0.36 ±0.13 ms 32 x 32 15 x 15 0.66 ± 0.04 ms 0.42 ± 0.34 ms 23 x 23 0.60 ±0.57 ms 0.61 ± 0.25 ms 30 x 30 0.43 ± 0.04 ms 0.85 ± 0.47 ms 5x5 0.42 ±0.05 ms 0.32 ±0.06 ms 16 x 16 11 X 11 0.38 ± 0.05 ms 0.37 ±0.24 ms 15 x 15 0.33 ±0.04 ms 0.42 ± 0.29 ms Table 3.2: Computation Time for Algorithms P A D and R E D (64-dimensional Histograms) 52 Image Subregion consisting of 128x128 Pixels (n =64) Image Subregion consisting ol 64x64 Pixels (n = 64) 5000 10000 Number ot Pixels in Subimage Query (a) Image Subregion Size = 128 x 128 Pixels E , RED / 0 500 1000 1500 2000 2500 3000 3500 4000 Number of Pixels in Subimage Query (b) Image Subregion Size . = 64 x 64 Pixels Image Subregion consisting of 32x32 Pixels (n = 64) Image Subregion consisting of 16x16 Pixels (n = 64) 0 100 200 300 400 500 600 700 800 900 Number of Pixels in Subimage Query (c) Image Subregion Size = 32 x 32 Pixels' 100 150' 200 Number o) Pixels in Subimage Query (d) Image Subregion Size = 16 x 16 Pixels Figure 3.5: Computation Time for Algorithms P A D and R E D (64-dimehsional His-tograms) 53 • 512-dimensional color histograms Image Subregion Query Computation Time Size Size Algorithm P A D Algorithm R E D 5x5 26.05 ±2.06 ms 12.06 ±0.16 ms 11 X 11 25.91 ±2.00 ms 12.13 ±0.64 ms 15 x 15 25.78 ± 2.09 ms 12.22±0.38 ms 23 X 23 25.58 ±1.74 ms 13.25 ±0.85 ms 128 X 128 30 x 30 25.43 ± 1.71 ms 15.42 ± 1.44 ms 45 x 45 24.98 ± 1.46 ms 16.99 ±4.56 ms 60 X 60 24.06 ± 1.37 ms 18.98 ±6.56 ms 91 x 91 21.82 ± 1.36 ms 21.82 ±8.49 ms 120 x 120 16.99 ±1.08 ms 24.98 ± 10.34 ms 5x5 19.32 ±2.00 ms 12.06 ±0.60 ms 11 x 11 19.24 ±1.84 ms 12.13 ±0.37 ms 15 x 15 19.19 ± 1.85 ms 12.23±0.66 ms 64 X 64 23 X 23 19.09 ± 1.55 ms 13.28 ± 1.71.ms 30 X 30 18.79 ± 1.59 ms 15.40 ±3.57 ms 45 x 45 17.02 ± 1.34 ms 17.02 ±7.24 ms 60 x 60 13.27 ± 1.24 ms 19.09 ± 10.82 ms 5.x 5 14.50 ± 1.57 ms 12.08 ±0.34 ms 11 x 11 14.45 ± 1.46 ms 12.14 ± 1.08 ms 32 x 32 15 x 15 14.23 ± 1.46 ms 12.24 ± 2.97 ms 23x23 13.28 ± 1.41 ms 13.30 ± 4.43 ms 30 x 30 12.14 ±1.26 ms 15.45 ± 5.51 ms 5x5 12.27 ± 1.04 ms 12.09 ±0.38 ms 16 x 16 11 x 11 12.16 ± 1.05 ms 12.16 ± 1.70 ms 15 x 15 12.09 ±1.03 ms 12.27 ±4.00 ms Table 3.3: Computation Time for Algorithms P A D arid R E D (512-dimensional H tograms) " 54.. Image Subregion consisting of 128x128 Pixels (n = 512) . Image Subregion consisting of 64x64 Pixels (n = 512) _ - -RED / ^ P A D / 5000 10000 Number of Pixels in Subimage Query (a) Image Subregion Size = 128 x 128 Pixels 500 1000 1500 2000 2500 3000 3500 4000 Number of Pixels in Subimage Query (b) Image Subregion Size = 64 x 64 Pixels Image Subregion consisting ot 32x32 Pixels (n = 512) ' Image Subregion consisting of 16x16 Pixels (n = 512) 0 100 200 300 400 500 600 700 800 900 Number of Pixels in Subimage Query (c) Image Subregion Size • = 32 x 32 Pixels . 50 100 150 200 Number of Pixels in Subimage Query (d) Image Subregion Size '.= 16 x 16 Pixels Figure 3.6: Computation Time for Algorithms P A D and R E D (512-dimensional His tograms). • 55 Observation 3.1 Computation time is affected by the dimension of the color histogram: As the dimension of the histogram grows, the time required by Algorithms P A D and R E D increases. Explanation In both algorithms, we start with 0 for each of the n entries in the n-dimensional histogram (the vector P* or /'*) and keep increasing the value of each entry. Hence, the increase in the dimension of the histogram leads to an increase in the number of entries. Due to the nested loops in the Algorithms for manipulating the n entries in the vectors, the time required increases (though not linearly). . • Observation 3.2 Computation time is also affected by the number of pixels in the query and the image subregion: Given a fixed-size image subregion, and varying the. size of the subimage query, it is observed that as the query size increases, the time required by Algorithm P A D appears to decrease linearly. Explanation Given an image subregion consisting of w pixels, Algorithm P A D pads w — v "desired" pixels to the subimage query consisting of v pixels. So, as the query increases in size, the number of pixels in the query (= v pixels) also increases; thus, decreasing the number of the pixels needed to be padded (= w — v pixels). • Observation 3.3 Given a fixed-size image subregion, and varying the size of the subim-age query, it is noticed that as the query size increases, the time taken by Algorithm R E D appears to increase linearly. Explanation Given an image subregion consisting of w pixels, Algorithm R E D reduces the subregion by choosing v "desired" pixels. So, as the query increases in size, the number of pixels in the query (= v pixels) also increases; thus, increasing the number of pixels needed to be chosen (= v pixels). • Observation 3.4 Given a fixed-size image subregion, and varying the size of the subim-age query, it can be viewed from the figures that as the query size increases, the time 56 curve for Algorithm P A D declines and the time curve for Algorithm R E D is rises. The two curves meet at the point representing medium-sized queries. For a small-sized query, the time taken by Algorithm R E D is less than that taken by Algorithm P A D ; for a large : sized query, the time required by Algorithm R E D is more than that required by Algorithm P A D . Lastly, for a medium-sized query, the times needed by the two Algorithms are almost the same as indicated by the intersection of the curves in the figures. Explanation . • Given an image subregion consisting of w pixels and a small-sized query consisting of v pixels (where 2v < w), Algorithm R E D reduces the image subregion by picking v "desired" pixels, whereas Algorithm P A D pads w — v > v "desired" pixels to the query. Thus, Algorithm R E D requires less time. • Given an image subregion consisting of w pixels and a large-sized query consisting of. v pixels (where v < w < 2v), Algorithm P A D pads w — v "desired" pixels to the query, whereas Algorithm R E D reduces the image subregion by choosing v > w — v "desired" pixels.. Hence, Algorithm P A D needs less time. • Given an image subregion consisting of w pixels and a medium-sized query consisting of v pixels (where v « |w =£• 2v « w), Algorithm R E D reduces the image subregion . by selecting v "desired" pixels and Algorithm P A D pads v "desired" pixels to the query. So, both Algorithms take almost the same amount of computation time. • Therefore, if the subimage query is of the same size as the precomputed image subregion, then the exact histogram distance of the form (Q — I)T(Q—T) can be applied. Otherwise, use Algorithm P A D when the size differential between the query and the image subregion is large, and use Algorithm R E D when the differential is small. 57 3.5 Another Metr ic for Histogram Distance Other than the metric based on the Euclidean distance between the color histograms of the subimage query and the image subregion, another popular way to compute the similarity is based on the weighted Euclidean distance: • (Q - If A (Q - I) where A is a similarity matrix accounting for both the perceptual distance between the pairs of colors and the difference in the amounts of each color. Let d{j be the Euclidean distance between colors i and j in the chosen color space (such as Luv and Munsell), then there are several choices for the entries'a^- in the matrix A [Sawhney and Hafner 1993]. For example, a{j can be defined as (a) 1 - m , (b) e~m"dn, and (c) e < TVm*"<W f where a is a positive constant. With this metric, new Padding and Reduction Approaches can be stated as:-• New Padding Approach objective function inequality constraint summation constraint domain constraint Given Qj ~ v and Ij = w > v, j=i j=i find an optimal vector P* to ' minimize (P + Q- I)TA(P + Q-I) subject to 0 < Pj < Ij for 1 < j < n n and ^ Pj = w — v and I,P,Q integer vectors (3-9) 58 New Reduction Approach G i v e n Q j = u and Ij = w > v, i=i j=i find an optimal vector /'* to objective function minimize [Q - I')TA(Q - I') (3.10) inequality constraint subject to 0 < Ij < Ij for 1 < j < n n summation constraint and ^ I'j = v i=i . domain constraint and I, I',Q integer vectors Wi th an information preserving transformation such as Singular Value Decomposition, the similarity matrix A can be factorized into BTKB or (VAB)T(y/AB) where A is a diagonal matrix. So, given E j = i Qj = v a n < l E j = i Ij = w > v, the objective function of Problem (3.9) can be rewritten as: min Xjj [(BP)j - (Ba)j)]2 (3.11) or nun^i^-^)]2 (3.12) where a — I—Q and Ajj is the 7-th diagonal entry of the matrix A . Similarly, the objective function of Problem (3.10) can be rewritten as: min ^ A ^ p / ^ - T O ] 2 (3-13) j=i or mm J2[(^BI')J - (VABQ)j)}2 (3.14) With these representations, each of the two objective functions is in the form of the sum of squares of the difference terms. However, these difference terms are no longer monotonic, they are dependent on the values of entries in matrices B and A . So, increasing 59 the value of an entry in P or in V may not reduce the value of the difference terms; some-times, it may even boost the difference in these terms. Therefore, for this metric based on the weighted Euclidean distance, it is not easy to develop efficient tailor-made algorithms (such as Algorithms P A D and R E D for the metric based on the Euclidean distance). 3.6 Summary The histogram distance (DH) between the histogram Q of the subimage query and the histogram I of the equal-sized image subregion can be computed by DH — (Q — I)T(Q — I)-However, given a subimage query of arbitrary size, the image subregion represented by the precomputed feature vector may not necessarily contain the same number of pixels as the query. For instance, it is not unusual that the precomputed vector of the image subregion contains more pixels than that of the query. As such, two algorithms — Algorithm P A D and Algorithm R E D — have been developed for estimating the best-case lower bound (DH) to the histogram distance. It has been shown that both algorithms give the same best-case lower bound. More precisely, in the domain of integers, given Q (where Y2Qj = v) and I (where Ij = w > u), the DH can be computed using Algorithm. P A D : n DH = min £ ( P , - / , + Q , ) 2 j=i • • • • • • n s.t. V j , 0 < Pj < Ij and JZ'-Pj = w — v i= i where P is the histogram of the padded area; the same DH can also be computed using Algorithm R E D : DH = min j^^-Qj)2 , j=i n s.t. V j , 0 < I- < Ij and /j = v j=i 60 :where / M s the histogram of the reduced image subregion. In terms of performance, it is more efficient to use Algorithm P A D when the size differential between the subimage query and the image subregion is large, and to use Algorithm R E D when the differential is small. , 61 Chapter 4 Multiscale Representation In some IDBMSs, images are divided into blocks of a chosen size. For some queries, the scale at which the images are blocked may be too fine, and applying similarity comparisons to all those fine blocks may be a waste of effort. However, for some other queries, the scale may be too coarse, and the desired images may not be discriminated sufficiently. Given that subimage queries can be of arbitrary size, picking one best scale for all queries is hard, if not impossible. To cope with the challenge, we propose a representation that has multiple scales for matching. We also determine analytically and experimentally the appropriate number of levels for such a representation. 4.1 A 4-level Multiscale Representation To process a user query, some IDBMSs divide each database image into blocks of a chosen size. For subimage queries of arbitrary size, picking a best scale at which the images are blocked is not easy. One way to cope with this challenge is to have multiple scales for matching. The idea is that depending on the scale or the need of a given query, a more appropriate scale can be used. With the multiscale representation, given.any subimage query of arbitrary size, there exists an image subregion whose size is not smaller-than the query. So, Padding and Reduction Algorithms can be applied in similarity matching. In. this thesis, a 4-level multiscale representation (Figure 4.1) is proposed as follows: • A t Level H , the entire image is represented by a single global color histogram. • A t Level I, the image is divided into four non-overlapping blocks, and each block is represented by a color histogram covering | of the entire image. • A t Level J , each block at Level I is further divided into four blocks, each of which is represented by a color histogram covering of the entire image. • A t Level K , each block at Level J is again divided into four, each of which is repre-sented by a color histogram covering ^ o f the entire image. 63 (a) Level H •••••••• (d) Level K Figure 4.1: The 4-level Multiscale Representation Given four levels of blocks for each image, many multi-level filtering schemes for finding u desired images (where u is the number of images requested by the user) can be developed. These include: • a complete four-level HIJK scheme; • three-level HI.], HIK, H J K , and U K schemes; • two-level HI, H J , H K , IJ, IK, and J K schemes; and • one-level H , I, J , and K schemes. Depending on the size of the subimage query and the number of levels intended to be examined, appropriate filtering schemes can be chosen. For example, given a subimage query whose size is smaller than of the entire image, the Scheme HJ can be applied in conjunction with Padding and Reduction Algorithms. With this two-level scheme, color histograms at Level I are skipped, and only the histograms at Levels H and J are considered. 64 The filtering scheme determines the levels (of color histograms representing the image subregions) which are intended to be considered, but it does not determine the order in which the histograms are to be examined. For example, it is unclear whether the histograms are to be checked on an image-by-image basis (with the use of vertical filter), on a level-by-level basis (with the use of horizontal filter), or in some other order. The search order is determined by the strategy for searching the multiscale representation. This will be examined in Chapter 5. . • • 4.2 Formulation of Distance Function A primary purpose of an I D B M S is to provide an environment for an efficient and effective retrieval of desired images. Given a large IDBMS, a key to the efficiency and the effec-tiveness of the search strategy for such retrievals relies on the formulation of the distance function. For instance, if the distance value at a coarser scale can be served as a lower bound to the distance value at a finer scale, then an efficient strategy with the use of vertical filters for searching the multiscale representation is possible. 4.2.1 Histogram Distance The histogram distance (DH) between the histogram Q of the subimage query and the histogram / of the equal-sized image subregion can be computed by DH = (Q — I)T(Q — I). However, given that the arbitrary-size subimage query may not necessarily be of the same size as the image subregion, the best-case lower bound (DH) to the histogram distance can be estimated using Algorithm P A D or Algorithm R E D . More details can be found in Subsection 3.2.3. Observation 4.1 Given that an image subregion I s u p encloses another image subre-gion I s u b , and that the subimage query Q is smaller in size than I s u b (Figure 4.2), the 65 estimated best-case histogram distance (DH) between the histograms of Q and I s u b is not smaller than the DH between the histograms of Q and I s u p : D~H(Q,IS»P) < 5S(Q,i s u b) Subimage Image Subregion I s u p sub ^ sub Figure 4.2: Subimage Query CJ and Image Subregion I s u p Explanation In addition to the pixels of I s u b , the image subregion I s u p also contains the pixels found in I s u b . So, for Algorithms P A D and R E D , the selection of pixels in the minimization of DH(Q, I s u p ) is at least the selection of pixels in the minimization of DH(Q, I s u b ) . The. more the pixels selected, the smaller the DH- • • In other words, the DH between Q and I s u p serves as a lower bound to the DH between Q and I s u b . 4.2.2 Positional Distance The estimated best-case lower bound to the histogram distance measures the difference between the statistical distributions of various colors of the subimage query and the image subregion. However, a metric to measure the difference between the spatial locations of the subimage query and the image subregion is lacking. For instance, given a user query containing the central tower of the B C Legislative Buildings (Figure 4.3), the best-case lower bound to the histogram distance between the histograms of the Query and Image 1 is the same as that between the histograms of the Query and Image 2. However, in terms 66 of color and spatial location, Image 2 is clearly a better match. Therefore, an additional metric for measuring the positional difference between the subimage query and the image subregion is needed for better selectivity. (a) Query (b) Image 1 (c) Image 2 Figure 4.3: The B C Legislative Buildings There are a variety of ways to compute the positional distance between the query and the image subregion. Among them, some are more precise and complex than the others. It is well known that human memory is weak in retaining a fine granularity of spatial information of color; more often, the user may remember only an approximate location of the subimage when specifying his query. Moreover, when using Algorithm P A D or R E D for handling subimage queries of arbitrary size, the best-case histogram distance is estimated, but the exact location of the padded query or the reduced image subregion is unknown. So, instead of computing the exact positional distance (Dp), we estimate its lower bound (Dp). The idea is based on the square of the shortest Euclidean distance between the original subimage query and the original non-overlapping image subregion. If the original query and the original image subregion overlap, the Dp is defined to be 0. 67 Definition 4.1 Let q and i denote points in the subimage query and the image subregion respectively. Then, the estimated positional distance (Dp) is computed by: DP 0 when query and image subregion overlap min||g — i\\2 otherwise Let (q™ax, g™aa7) and (g£" n ,g™ n ) denote the maximum and the minimum (x,y)-coordinates of the subimage query, and let (i™ax, irynax) and '(i™ m , ?™m) denote the maxi-mum and the minimum (x, y)-coordinates of the image subregion. With these coordinates, the ar-directional distance (dx) as well as the y-directional distance (dy) can be calculated, and the positional distance can then be estimated. Algorithm 4.1 (Estimated Positional Distance) 1 dx <- 0 2 dy <r- 0 3 if q™n > i™ax 4 then dx «- q™in - i™ax. 5 else if i™in > q™ax 6 then dx <- i™in - q™ax 8 then dy <- q™in - %^ax 9 else if i7ynin > q™ax 10 then dy <- i™ 7 1 -11 return (dx)2 + (dy)2 Observation 4.2 Given that an image subregion I s u p encloses another image subre-gion I s u b , and that the subimage query Q is smaller in size than I s u b , the estimated 68 positional distance (Dp) between Q and I s u b is not shorter than the Dp between Q a n d l s u p : ." /37(Q,I s u p) < /37(Q,Tsub) . Explanation The area of the image subregion I s u p is the sum of the areas of image subregions I s u b and I s u b . So, the (x, y)-directional distances dx and dy between Q and I s u p is not longer than those between Q ad I s u b . • In other words, the Dp between Q and I s u p serves as a lower bound to the Dp between Q and I s u b . .4 .2 .3 D i s t a n c e F u n c t i o n To incorporate both the histogram distance Djj and the positional distance Dp, the distance between query Q and image subregion I is defined as a weighted sum of the form: D = [3DH + (1-/3)DP (4.2) where the parameter /3 is the user preference (with value ranging from 0 to 1) so that the user is allowed to specify the relative importance of the histogram distance and the positional distance. Given that the subimage query can be of arbitrary size, the Djf and the Dp can be estimated using the appropriate Algorithms described earlier, and the resulting estimated distance value (£)) can be computed as the weighted sum of the estimated histogram distance and the estimated positional distance:-D = J3DH + (l-P)Dp , (4.3) Theorem 4.1 Given that an image subregion I s u p encloses another image subregion I s u b , the estimated distance value (D) of the Q - I s u b pair (where the subimage query Q is smaller in size than l s u b ) is greater than or equal to the TJ of the Q - I s u p pair: D(Q, I s u p ) < D(Q, I s u b ) .69 Proof It is observable that DH(Q, ISUP) < DH(Q, I s u b ) and that 7J P(q, ISUP) < DP(Q, I s u b ) . So, for 0 < /? < 1, we can deduce: /3(q,iS UP) = /? D ^ ( q , IS UP) + ( l -/3) TJ ^ q , IS UP) < 73/5^(Q, l s u b ) + ( l - / 3 ) / 3 P ( q , i s u b ) = /3(q, i s u b ) In other words, the estimated distance value increases as the scale becomes finer. • Let I H , I I , I J , a n d I K denote an image subregion at each of Levels H , I, J , and K such that I K C I J C I 1 C I H . According to Theorem 4.1, for a subimage query q of size smaller than I 1 , the relationship 7J(Q,IH) < D(Q, I 1 ) holds. In terms of size, if such q is smaller than I J , then D(Q, I 1) < D(Q,I3). Similarly, D(Q, I J ) < D(Q, IK) provided that such q is smaller than I K . With the property that "the estimated distance keeps increasing (to its exact value) as the scale becomes finer", an efficient strategy with the use of vertical filters for searching multiscale representation is possible. The idea is that the distance value at the coarser scale serves as a lower bound to the distance value at the finer scale. So, poor matches having "large" distance values at the coarser scale (greater than the finest-scale distance values of the top-u x) can be eliminated without further computations at finer scales. For example, if the estimated Level-K distance for each of the top u images is below 200, a "poor match" with an estimated distance of 250 at Level I can be eliminated. The reason is that the estimated Level-K distance for this "poor match" is at least 250 (worse than those of the top-u). Without the property mentioned above, the search strategy needs to estimate the distance values at Levels J and K for this "poor match". 1 As mentioned earlier in this chapter, u is the number of images requested by the user. 70 4.3 Multiscale Pure Vertical Search Having formulated the distance function for the proposed 4-level representation, we can explore efficient and effective search strategies for retrieving the top u images from a database of M images. One of the search strategies is Search P V (Pure Vertical). In it, a vertical filter searches each image "vertically" across scales. The idea is that all M images are checked one after another. A t any point in time, the current u smallest distances (at the finest scale of the u images, where u <C M) are kept. When an image is tested, if its distance value at the current scale is already greater than some, distance value of the current u smallest, the tested image can be eliminated. Otherwise, a finer scale is used. The process repeats until the image is (1) eliminated or (2) added to become one of the current u smallest (when it reaches the finest scale). Definition 4.2 (Search P V ) To conduct the Pure Vertical Search for retrieving the top u images from a collection of M images: 1. The first u images are checked at all levels/scales, and a distance value is computed for each image at the finest scale. These u images become the current top-w. 2. For each of the remaining M — u images: (a) Let the coarsest scale be the current level/scale. (b) Compute the distance value. If the value is less than some value of the top-w, extend one level/scale (provided that we have not reached the finest scale) and repeat Step 2(b) on the extended scale. • Example 4.1 In many database applications, it is not unusual to retrieve the desired images from a collection of thousands of images. For simplicity, in this example, we try to find the top two images from a collection of seven images using Search P V (with Scheme HIJ). In the following trace, the number at the slot representing an image at a 71 particular level is the estimated distance value, and the superscript on its left denotes the search order. To find the top two images using Search P V Img 1 Img 2 Img 3 Img 4 Img 5 Img 6 Img 7 Level H l s < 25 4th 5 7 t h 65 9th 8 0 Wth io i 3 t / i jo Level I 2nd 35 5th 30 sth 200 M* f c 15 Level J 3rd g 7 eth 32 i2<fc 20 Top two Img 1 Imgs Imgs Imgs Imgs Imgs Imgs images 2 and 1 2 and 1 2 and 1 5 and 2 5 and 2 5 and 2 Since the histograms are examined image by image, the best way to organize the pre-computed feature vectors in the data file of the 4-level representation is to arrange them on an image-by-image basis. More precisely, the histograms associated with one image are followed by the histograms associated with another image. For each image, the 4 Level-I histograms are preceded by the Level-H histogram, succeeded by the 16 Level-J histograms, and the 64 Level-K histograms follow. Therefore, the file organization is of the form: HPSI'S 7Ks for Image M I" Ih I3s / K s 7" 7*8 J J s 7 K s • N v ' v ' for Image 1 for Image 2 4.3.1 Analytical Evaluation Cost Models To measure the efficiency of Search P V , we set up cost models to estimate the C P U and the I/O Costs. The C P U cost depends mainly on the time required to apply Padding and Reduction Algorithms to the data (color histograms), and the computation time for each 72 histogram (denoted by Tc) can be estimated using the experimental results in Section 3.4. The I /O cost depends mainly on the time required to sequentially or randomly access the pages containing the data (color histograms), and this access time can be affected by the size of data as well as the number of buffers. Given a minimum buffer (a buffer size of one page) and an intelligent buffer management scheme, the number of page accesses can be estimated with the use of statistical formulations. ' Assuming that the value (pixel count) in each dimension of a color histogram requires 4 bytes, a total of An bytes are needed for one ra-dimensional histogram. Hence, the total number of pages occupied by one histogram is pages where P is the page size. In the cost models, color histograms at each level for each image are treated as a "record" 2 , and the four "records" associated with each image can be grouped to form a "mega-record". So, for a database containing M images, there exists a total of AM "records" or M "mega-records".for the 4-level representation. Depending on the level at which the histograms are represented, the size of "record" may vary. Sometimes a "record" may occupy a small portion of one page. For example, with a page size of 1 kilobyte (KB) , the "record" of one Level-H 8-dimensional color histogram takes less than 4% of a page.. With the file organization for Search P V , after the page containing this Level-H histogram is loaded, no extra page access is needed for reading its neighboring "record" (of the four Level-I histograms of the same image). However, sometimes a "record" may occupy more than one page. For- example, with the above page size, the "record" of sixteen Level-J 512-dirhensional color histograms takes more than 30 pages. The cost of accessing all the histograms in this "record" is the sum of the time required to randomly access the first page and the time required to sequentially access the remaining pages. For a vast majority of subimage queries, in the case that the image subregions I s u b and I s u p are the best matches (which give the smallest Ds) at their corresponding, levels, 2For example, the four Level-I histograms for Image 1 form a "record", and the sixteen Level-J his-tograms for Image 3 form another "record". 73 I s u b is one of the subregions enclosed in I s u p . When aiming for a balance of efficiency and effectiveness, checking all the histograms in a "record" may seem unnecessary, and only a portion of the "record" (some of the histograms) may need to be examined. For instance, with Scheme IJ, the most promising image subregion I 1 can be found after checking histograms of the four subregions at Level I. Then, rather than considering all sixteen Level-J histograms, we consider the four Level-J histograms which represent the subregions covered by I 1 . As a result, both the C P U and the I /O costs can be reduced. E x a m p l e 4.2 For each image in Scheme HIJ , we examine the Level-H histogram, and then consider (if necessary 3 ) the 4 Level-I histograms and (if necessary 4) the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I. Hence, using a 64-dimensional color histogram with page size P = 1 K B , the cost model for Scheme HIJ can be described as follows: • The C P U cost is the product of Tc and the total number of histograms used in the computation. In Scheme HIJ , Level-H histograms are examined for all M images, 4 Level-I histograms are checked for each of the £ 4 images, and 4 Level-J histograms are examined for each of the £5 images (where M > £ 4 > £5 > u, and both £ 4 and £5 are determined dynamically at runtime). Hence, CPU^YJ = ( M + 4& + A^)Tc-• The I/O cost is the sum of total seek times and total data transfer times. Using . the above file organization for Search P V , an average seek time (denoted by TA) is charged for moving the read head to the first "record". Since all M images are searched one after another and their "records" are stored contiguously on an image-by-image basis, only a minimum seek time (denoted by TM) is charged for each jump between the data (for example, jumping from the fourth Level-I histogram to the 3 We consider the 4 Level-I histograms of an image if the D at Level H is less than any D of the current top-u. 4 W e consider the 4 Level-J histograms of an image if the best D at Level I is less than any D of the current top-u. •74 third Level-J histogram of the same image, and jumping from the Level-J "record". of an image to the Level-H "record",of the next image). Moreover, when a page of data (color histograms) is loaded, a data transfer time (denoted by TD) is charged. More precisely, for Scheme HIJ , TA is needed to get to the Level-H histogram of the first image, and Try is charged for loading the page containing the histogram. Since only 4 histograms can fit into a page,.if the 4 Level-I histograms happen to be examined, then we load the next page. There is a probability of | that the selected 4 Level-J histograms are stored contiguously after the loaded Level-I histograms. Hence, I/0])vu = TA + (A/ - 1 + ^ ) 7 A / + ( M + £, + & ) 7 « . The cost models for other filtering schemes, color histograms of other dimensions, or other page sizes can be formulated in a similar manner. , • The cost models for the 15 filtering schemes (with page size P = 1 K B ) are shown below: 1. Scheme H . . . For each image, we examine the Level-H histogram. -• CPUJ(V = MTC • I / O h v - TA + [M - l)TM + MTD for 8-dimensional (8-D) histograms I/0HV = TA + (M - 1)TM + MTD for 64-dimensional (64-D) histograms I/0HV = TA + {Mr— l)TM + 2MTD for 512-dimensional (512-D) histograms 2. Scheme I For each image, we examine the 4 Level-I histograms. • CPUfv = 4MTC • / / O f v = TA + ( M - \)TM + MTD for 8-D histograms I/Ofv = TA + {M - l)TM + MTb for 64-D histograms I/Ofv = TA + {M- l)TM + 8MTD for 512-D histograms 3. Scheme HI For each image, we examine the Level-H histogram, and if necessary 5 , consider the 4 Level-I histograms. • CPUffi = 'M.+ 4£L)TC • J/O&Y = TA + {M - \)TM + MTD for 8-D histograms J/O&V = TA + {M - 1 ) T M + ( M + 6 ) T D for 64-D histograms I/OHVJ =TA + (M - 1)TM +• (2Af + 8 6 ) T D for 512-D histograms where M > £ 1 > w. 4. Scheme J For each image, we examine the 16 Level-J histograms. • CPVp' = 16MTC • I/0*}V - TA +• ( M - 1)TM + MTD for.8-D histograms I/0PV = TA + (M - 1 ) T W + 4 M T D for 64-D histograms I/0$V = TA + ( M - 1 ) T M + 3 2 M T D for 512-D histograms 5. Scheme HJ For each image, we examine the Level-H histogram, and if necessary, consider the 16 Level-J histograms. • CPU%Vj = ( M + 16£ 2 )T C • I/0HVj = TA + ( M -\)TM + MTD for 8-D histograms I/0HVj = TA + (M - 1 + 6 ) T M 4- ( M + A^2)TD for 64-D histograms . 11 Off, = TA + (M - 1 + £2)TM + (2Af + 32£ 2)7D for 512-D histograms where M > £2 > u. 5 We consider the 4 Level-I histograms of an image if the D at Level H is less than any D of the current top-u. Similar conditions apply to the filtering schemes involving more than one level. 76 6. Scheme IJ For each image, we examine the 4 Level-I histograms, and if necessary, consider the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I. . CPUff = {4M + 4£3)Tc' • I/OfJ = TA + (M- l)TM + MTD for 8-D histograms I/Of/ = TA + (M-1 + ^)TM + (M + £3)Tp for 64-D histograms • I/OfY = TA + ( M - 1 + ^)TM + ( 8 M + 8601b for 512-D histograms where M > £ 3 > u. 7. Scheme HIJ For each image, we examine the Level-H histogram, then consider (if necessary) the 4 Level-I histograms and (if necessary) the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I. • CPUffJj = (M + 4£ 4 + 4£ 5 )Tc • II°HIJ = TA + (M - l)TM + MTD for 8-D histograms Il°mJ = TA + (M - 1 + ^)TM•+ ( M + U + & ) 7 b for 64-D histograms t/Onu = TA + ( M - 1 + 2 & )T M + (2Af + 8£ 4 + 8&)Ib for 512-D histograms where A/ > £ 4 > > u. 8. Scheme K For each image, we examine the 64 Level-K histograms. • CPUgv = 6 4 M T C • I/Ofy = TA + ( M - 1 ) T M + 2 M l b for 8-D histograms I/Ofy = TA + {M- 1)TM + WMTD for 64-D histograms I/Of? = TA + (M- 1)TM + 1 2 8 M T D for 512-D histograms 77 9. Scheme H K For each image, we examine.the Level-H histogram, and if necessary, consider the 64 Level-K histograms. • CPUHl=(M + 64Ze)TC • J/OHK = TA.+ (M -1)TM+(M + 2&>)TD for 8-D histograms J/OHK = TA + {M-1 + £ 6 ) T M + ( M + 16C6)2b for 64-D histograms 'T/°HK = TA + {M - 1 + £G)TM + (2Af +.128£6)T£> for 512-D histograms where M > > u. 10. Scheme IK For each image, we examine the 4 Level-I histograms; and if necessary, consider the 16 Level-K histograms which represent the subregions enclosed in the promising block of Level I. • CPU',% = ( 1 M + 1 6 c » 7 b . I/QPV = TA + (M - 1 + &-)TM + ( M + $7)TD for 8-D histograms I/Of% = TA + ( M - 1 + £ 7 ) T M + ( M + 4 £ 7 ) T D for 64-D histograms I/Of%_ = TA + ( M - 1 + £ 7 ) T M + ( 8 M + 32£ 7 )TD for 512-D histograms where M > £ 7 > u. 11. Scheme J K For each image, we examine the 16 Level-J histograms, and if necessary, consider the 4 Level-K histograms which represent the subregions enclosed in the promising block of Level J . • CPUf% = ' ( 16Af+4&)Ib 78 • J / 0 $ = TA + (M - 1 + ^)TM +: ( M + ^)TD for 8-D histograms 1/0*% = TA + ( M - 1 + i f ^ T M + ( 4 M + £ 8 ) 7 b for 64-D histograms, . 1/0$% = TA + (M — 1 +' 1H8-)rM + (32M + 8£ 8 )TD for 512-D histograms where M > £s > 12. Scheme HIK For each image, we examine the Level-H histogram, then consider (if necessary) the 4,Level-I histograms and (if necessary) the 16 Level-K histograms which represent the subregions enclosed in the promising block of Level I. • CPU^YK — (M + 4^9 + 16£io)Tc • I/0HIK = TA + (M - 1 + ^)TM + ( M + £10)TD for 8-D histograms J/OmK = TA + (M-i+ 6 o ) T M + (M + & + 4£10)TD for 64-D histograms J/OHIK = TA + (M-1 + 6 O ) T m + ( 2 M + 8£ 9 + 3 2 £ 1 0 ) T D for 512-D histograms • where M > £io > £9 >,u. 13. Scheme H J K For each image, we examine the Level-H histogram, then consider (if necessary) the . 16 Level-J histograms and (if necessary) the 4 Level-K histograms which represent the subregions enclosed in the promising block of Level J . • CPUPJK = (M+ I 6 6 1 + 4£ 1 2 )Tc • I/OHVJK = TA + ( M - .1 + ^ ) T ¥ + (M•+ ^ ) T D for 8-D histograms I/OHVJK = ^ + (M-l+eii+%)TM+(M+46i+62)r j D for 64-D histograms I/OHJK = ^ +'"(M - .1 + £11 + ^)TM + (2M + 32^ 11 + 8£ 1 2 )Tb for 512-D histograms where M > £n > £12 > u. 79 14. Scheme U K For each image, we examine the 4 Level-I histograms, then consider (if necessary) the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I, and consider (if necessary) the 4 Level-K histograms which represent the subregions enclosed in the promising block of Level J . • CPUfjVK = (4M + 4Z13 + 4t;14)Tc • I/O^fx = TA + {M-1 + ^ ) T M +{M+ ^ ) T D for 8-D histograms I/0?yK = TA + (M-i + ^+Z14JTM + (M+t13+Z14)TD for 64-D histograms I/6jYK = TA + {M-l + ^ + £14)TM + (8M + 86s + &ti)TD for 512-D histograms where M > £ 1 3 > £ 1 4 > u. 15. Scheme H U K For each image, we examine the Level-H histogram, then consider (if necessary) the 4 Level-I histograms, (if necessary) the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I, and (if necessary) the 4 Level-K histograms which represent the subregions enclosed in the promising block of Level J . " • CPU%YJK = (M•+ 4 & S 4 4 £ 1 6 + 4 £ i 7 ) r c • 1 / O H V U K = TA + (M-1 + ^ ) T M + ( M + for 8-D histograms 1/OHVIJK = TA + ( M - 1 + ^ + £ir)TM + (M + 65 + 6e + Sir)TD for 64-D histograms IIQmiK = TA + (M-l + %f + Z17)TM + ( 2 M + 865 + 8 £ 1 6 + ^17)TD for 512-D histograms where M >.^15 > > ^17 > u . 80 Notice from the cost models of the filtering schemes which involve more than one level, the value of each £j is determined dynamically. For instance, in the best case of Scheme HIJ , the first u images are the top-w, and each Level-H distance for the remaining M — u images is greater than all Level-J. distances of the top u images. As such, the histograms at Levels I and J for the M — u images are not examined; thus, £ 4 = £5 = u. Conversely, in the worst case, at any point in time the Level-H and Level-I distances of each image are less than some Level-J distance of the current top-u. As such, all the histograms at all levels for the M images are examined; thus, £ 4 = £5 = M. Analytical Results The above cost models provide a good foundation for analyzing the 15 filtering schemes, and for discovering general trends. The computation time for Padding and Reduction Algorithms (Tc) depends on the sizes of the subimage query and the image subregion. For the analysis, we let Tc be 3 ms for the 8-dimensional and the 64-dimensional histograms, and 20 ms for the 512-dimensional histograms. We assume that each minimum seek (TM) takes 5 ms, each average seek (TA) requires 15 ms, and the transfer of data in each page (TD) with page size P = 1 K B needs 0.3 ms. The analytical results for finding u = 10 images from a database of M=1000 images are summarized in the figures on the next few pages. In the figures, for each filtering scheme, the minimum cost is shown by the solid iine, and the additional cost (if any) is indicated by the dotted line 81 CPU Costs for Search PV (n = 8, 64) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (in ms) x (a) 8-, 64-dimensional Color Histograms CPU Costs for Seach PV (n = 512) (b) 512-dimensional Color Histograms Figure 4.4: C P U Costs for Search P V 82 Costs I/O Costs for Search PV (n = 8) HIJK UK HJK HIK JK IK HK -K -J -HI 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Time (in ms) (a) 8-dimensional Color Histogram I/O Costs (or Search PV (n = 64) HIJK UK HJK HIK JK IK K HIJ U HJ HI I 0 2000 4000 6000 8000 10000 12000 14000 16000 Time (in ms) (b) 64-dimensional Color Histogram Figure 4.5: I /O Costs for Search P V 83 I/O Costs lor Search PV (n = 512) HI H HUK UK HJK HIK JK IK HIJ IJ HK K 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (in ms) (c) 512-dimensional Color Histogram Figure 4.5: I/O Costs for Search P V (Continued) 84 • Combined C P U and I/O Costs CPU * I/O Costs tor Search PV (n = 8) 1 1.5 Time (in ms) (a) 8-dimensiona.l Color Histogram CPU + I/O Costs for Search PV (n = 64) 1 1.5 Time (in ms) (b) 64-dimensional Color Histogram Figure 4.6: Combined C P U and I/O Costs for Search P V 85 CPU + I/O Costs lor Search PV (n = 512) g> 10 6 HIJK UK HJK HIK JK 1 1 1 IK HK HIJ IJ HJ J HI K -H i i i i 0 2 4 Time (in ms) »0i 12 14 x 10s (c) 512-dimensional Color Histogram Figure 4.6: Combined C P U and I/O Costs for Search P V (Continued) Observat ion 4.3 The C P U cost is affected by the dimension of the color histogram: As the dimension of the histogram grows, the time required for C P U operations increases. Explanation The C P U cost is proportional to the computation time Tc for Algorithms P A D and R E D . Recall from Observation 3.1 that Tc increases as the dimension of the histogram grows. • Observat ion 4.4 For low dimensional (for example, 8-dimensional) color histograms, the I/O costs for many filtering schemes appear to be the same. For example, I/OHV — I/O% = I/O%J. Explanation Each of the 8-dimensional histograms occupies less than 4% of a page. So, the Level-H, Level-I, and Level-J "records" associated with an image can fit into one page. As a result, after the Level-H "record" of histogram is loaded, no extra page access is needed for histograms contained in the Level-I and Level-J "records". • NO Observation 4.5 The I /O cost is also affected by the dimension of the color histogram: As the dimension of the histogram grows, the time required for I /O operations increases. Explanation The I/O cost depends on the number of pages occupied by the n-dimensiqnal color histograms. As the dimension of the histograms grows, the number of pages occupied by one histogram (= ^ pages where P is the page size) increases, and thus the total number of pages occupied increases as well. Hence, the I /O cost increases. . . . For the same reason, as the page size P shrinks, the number of pages occupied by one histogram (= ^ pages) increases, and thus the time required for I /O operations also increases. • Observation 4.6 The combined C P U and I/O cost is affected by the dimension of the color histogram: As the dimension of the histogram grows, the time required for both C P U and I /O operations increases. Explanation Note from Observation 4.3 that the C P U cost increases as the dimension of the histogram grows, and from Observation 4.5 that the I /O cost also increases as the dimension of the histogram grows. Thus, the combined C P U and I /O cost follows the same trend. • Observation 4.7 Filtering schemes which start with filter J or K often take more C P U and I /O time. For example, the combined C P U and I /O times demanded by Schemes J , J K , and K are often longer than the times.taken by the worst cases of many filtering schemes such as Schemes HIJ and HI JK . Explanation For the scheme that starts with filter J , at least 1 6 M histograms are required to be examined; for the scheme that starts with filter K , at least 6 4 M histograms are required to be examined. These numbers (of histograms to be examined) are large when compared with at most 9 M (= M + AM + AM) histograms for Scheme HIJ and at most 1 3 M (= M + 4 M + 4 M + AM) histograms for Scheme H I J K . • 87 Observation 4.8 Filtering schemes which skip intermediate levels often incur greater C P U and I /O cost. Explanation For the filtering scheme that does not skip any level, at the next (finer) scale, we consider the 4 histograms which represent the subregions enclosed in the most promising block of the current scale. If a filtering scheme skips one level, then we bypass the 4 histograms at the level skipped; however, at the next available scale, we need to consider 16 histograms, each of which covers a subregion enclosed in the promising block of the current scale. Similarly, if a filtering scheme skips two consecutive levels, then we bypass the 8 (= 4 + 4) histograms at the levels skipped. Unfortunately, at the next available scale, we need to consider 64 histograms, each of which covers a subregion enclosed in the promising block of the current scale. • Observation 4.9 The combined C P U and I /O cost for the additional intermediate levels becomes relatively less expensive as the dimension of the histogram grows. For example, as the dimension of the histogram increases, the combined cost for Scheme HJ is usually more expensive than that for Scheme HIJ . Similarly, the combined cost for Scheme H K is usually more expensive than that for Scheme HIK or H J K , and either of these two costs is more expensive than that for Scheme HI JK . Explanation Following from the previous observation, if a filtering scheme skips one level, we need to consider an extra 12 (= 16 — 4) histograms, and if the filtering scheme skips two consecutive levels, we need to consider an extra 56 (= 64 — 8) histograms. As the dimension of the color histogram grows, fewer histograms fit into a page. Thus, the total number of pages occupied by the histograms increases. It is relatively more expensive to bypass the intermediate levels. • In general, the above analytical results tend to favor the filtering schemes which start with Level H (or Level I), and those that do not skip any intermediate level. More precisely, Schemes H , I, HI, IJ, HIJ , U K , and HIJK are favored. 88 4.3.2 Experimental Evaluation To find the appropriate filtering schemes for Search P V , several experiments have been performed using color histograms of 8, 64, and 512 dimensions. These color histograms are created for a database consisting of 1000 real images which are collected from various sources and stored in Vista formats [Pope and Lowe 1994]. Examples of these images are pictures of tourist attractions in British Columbia, photos of a water project in California, and scenic shots taken at different cities throughout the world. As observed in Section 3.4, the computation time for Padding and Reduction A l -gorithms is affected by the sizes of the subimage query and the image subregion. In the experiments, we used 48 subimage queries of different sizes. These queries were classified into four main types, and we studied the performance/speed and the accuracy of each query type. D e f i n i t i o n 4.3 Given that subimage queries can be of arbitrary size, we classified them into 4 main types: 1. "h"-typed queries — whose sizes are larger than image subregions at Level I, 2. "i"-typed queries — whose sizes are smaller than image subregions at Level I but larger than image subregions at Level J , 3. "j"-typed queries — whose sizes are smaller than image subregions at Level J but larger than image subregions at Level K , and 4. "k"-typed queries -— whose sizes are smaller than image subregions at Level K . • The execution time (denoted by TE, and measuring the combined C P U & I / O time per query on a database of 1000 images) can be used in assessing the efficiency of Search P V . Several measures can be applied in assessing the effectiveness of the strategy. Examples are (1) fallout and (2) standard recall and precision [Salton and M c G i l l 1983].'Fallout shows 89 the portion of non-relevant images being retrieved. Standard recall indicates the portion of the relevant images being retrieved, and standard precision indicates the portion of the retrieved images that are relevant. One problem with this set of measures is that images are categorized into relevant and non-relevant; however, the degree of relevancy cannot be expressed. Another example of a measure for assessing the effectiveness of the search strategy is a variant of normalized recall [Faloutsos et al 1994]. It calculates the ratio of the average rank of all retrieved relevant images to the ideal average rank. However, . images may sometimes be so similar that they are almost indistinguishable; in such cases, it may not be easy to assign exact ranks to the images. Hence, for each subimage query, we categorize the database images into five main classes such that the best 2 images fall into Class A , the next 3 into Class B, and the next 5'into Class C . Then, 15 images ranked from 11-th to 25-th are categorized into Class D , while the remaining M — 25 images are categorized into Class E . A l l images in the same class have the same rank. D e f i n i t i o n 4.4 Wi th the categorization of images into five classes, the effectiveness of the search strategy can be assessed by counting the number of retrieved images which fall into each class. A dissimilarity score (DS) for the retrieval of the best 10 images can be computed as a sum of the weighted difference: DS = 7(2 - rjA) + 5(3 - riB) + (5 - TJC) + 1 5 ^ + 40nE . (4.4) where rjj indicates the number of images falling into Class In the ideal situation, the /AS' is 0. • The experiments were run on a Sun UltraSPARC-1 workstation using a page size of 1 K B and user preference (3 of 0.5 6 . The results (the average DS and TE values for the retrievals of the best 10 images from a collection of 1000 images) are summarized in 6 W i t h the user preference /? set to 0.5, both the color distribution and spatial information are of the same level of importance. 90 the tables and figures on the next few pages. The TE measures the efficiency and the DS assesses the effectiveness. • "h"-typed queries Filtering Scheme Dimension of Color Histograms 8-dimensional at T)B vc m 64-dimensional VA VB VC 'ID 512-dimensional VA f]B VC VD H 0.7 0.6 1.5 7.2 => DS = 132.6 1.9 2.1 1.7 4.3 =* DS = 73.0 1.9 2.4 1.7 4.0 =* DS = 67.0 TE = 2.4 ± 1.1 s TE = 4.5 ± 3.8 s TE = 43.4 ± 22.5 s Table 4.1: Experimental Results for Search P V ("h"-typed Queries) Results for Search PV ("h"-typed Queries) 90 100 110 Dissimilarity Score Figure 4.7: Experimental Results for Search P V ("h"-typed Queries) Among the three dimensions of color histograms, the 8-dimensional case appears to require the shortest execution time, but it has the largest dissimilarity value. By contrast, the 512-dimensional case has the smallest dissimilarity value, but it requires the longest execution time. So, in keeping a balance of efficiency and effectiveness, the best filtering scheme for "h"-typed queries, when operated with Padding and Reduction Algorithms, is Scheme H with 64-dimensional color histograms. 91 • " i " - t y p e d queries F i l te r ing Scheme-Dimension of Color Histograms 8-dimensional VA VB W 1)D 64-dimensional VA TIB Tjc TID 512-dimensional VA VB VC VD H 0.3 0.9 1.3 7.5 =* DS = 138.6 1.5 2.2 1.8 4.5 DS = 78.2 1.9 2.4 1.4 4.3 => 7J5 = 71.8 TE = 2.5 ± 0.6 s TE = 5.5 ± 1.5 s TE = 42.1 ± 27.2 s I 0.5 0.8 1.3 7.4 => DS = 136.2 1.8 2.1 1.8 4.3 D 5 = 73.6 1.9 2.4 1.5 4.2 DS = 70.2 TE = 4.1 ± 2.1 s TJE; = 9.2 ± 8.5 s TE = 127.4 ± 87.9 s HI 0.5 0.8 1.3 7.4 => O S =136 .2 1.8 2.1 1.8 4.3 => DS = 73.6 1.9 2.4 1.5 4.2 => 7J5 = 70.2 7^ = 2.7 ± 0.6 s T E = 7.1 ± 0.8 s I k = 60.4 ± 27.8 s Table 4.2: Exper imental Results for Search P V (" i " - typed Queries) Results for Search PV ("i"-typed Queries) X512I X 5 1 2 H I H 5 1 2 H _ i i i i , i , o n i ~ t p n i 70 80 90 100 110 120 130 140 Dissimilarity Score Figure 4.8: Exper imenta l Results for Search P V (" i " - typed Queries) 120 100 40 20 92 To aim for a balance of performance and accuracy, the best filtering scheme for " i " -typed queries, as observed from the above experimental results, is Scheme HI with 64-dimensional color histograms. "j"-typed queries Filtering Scheme Dimension of Color Histograms 8-dimensional VA VB VC VD . 64-dimensional 7)A VB VC VD 512-dimensional VA VB VC VD H 0.0 0.8 1.2 8.0 =>-DS = 148.8 0.9 2.1 2.5 4.5 => DS = 82.2 1.5 2.5 1.2 4.8 => DS = 81.8 TE = 1.5 ± 0.1 s . TE = 2.6 ± 0.2 s TE = 23.7 ± 9.3 s I 0.0 1.0 1.3 7.7 . DS = 143.2 1.5 2.2 1.8 4.5 ^DS = 78.2 1.7 2.3 1.5 4.5 •.••'=> DS = 76.6 TE = 3.0 ± 0.6 s TE = 8.5 ± 1.6 s TE = 168.3 ± 25.6 s HI 0.0 1.0 1.3 7.7 => DS = 143.2 1.5 2.2 1.8 4,5 . => DS = 78.2 1,7 2.3 1.5 4.5 => DS = 76.6 TE = 1.5 ± 0.1 s TE = 3.4 ± 0.7 s TE = 36.5 ± 16.7 s IJ 0.5 0.8 1.3 7.4 DS = 136.2 1.8 2.0 1.9 4.3 ' =>DS = 74.0' 1.7 2.3 1.7 4.3 DS — 73.4 TE = 3.4 ± 0.5 s TE = 9.2 ± 1.6 s TE = 188.0 ± 37.4 s HIJ 0.5 0.8 1.3 7.4 => DS = 136.2 1.8 2.0 1.9 4.3 => DS = 74.0 1.7 2.3 1.7 4.3 => DS = 73.4 TE = 2.8 ± 0.3 s TE = 5.8 ± 1.7 s TE = 68.5 ± -31.7 s Table 4.3: Experimental Results for Search P V ("j"-typed Queries) 93 Results lor Search P V ("j"-typed Queries) 200 1 1 1 1 1 1 1 1 1 180 *512IJ 160 X512I -140 -(in sec; o • Execution Time O) CD O O O O X512HI J -40 X512HI • 20 0 X 5 1 2 H R i l l 6 4 1 64IJ o o 64HIJ 0 o 0 64H 64H| 8IJ 81 + * +8H -, 8HIJ , 8HI 70 80 90 100 110 120 Dissimilarity Sco re 130 140 150 Figure 4.9: Exper imenta l Results for Search P V (" j " - typed Queries) To aim for a balance of efficiency and effectiveness, Scheme HIJ wi th 64-dimensional color histograms is the best filtering scheme for " j " - typed queries, when operated with Padding and Reduction Algor i thms . 94 k"-typed queries Filtering Dimension of Color Histograms Scheme 8-dimensional 64-dimensional 512-dimensional VA VB VC VD VA VB VC VD VA VB VC VD 0.0 0.0 1.5 8.5 0.5 2.0 2.7 4.8 1.0 2.1 1.9 5.0 H =• DS = 160.0 =>'DS = 89.8 . =*• DS = 89.6 TE = 1.2 ± 0.0 s TE = 1.8 ± 0.1 s l b = 18.9 ± 6.6 s 0.0 0.5 1.5 8.0 0.8 2.3 2.2 4.7 1.6 2.1 1.5 4.8 I =>• DS = 150.0 => DS = 85.2 =>DS = 82.8 TE = 1.6 ± 0.2 s l b = 4.2 ± 0.5 s TE = 124.1 ± 11.1 s 0.0 0.5 1.5 8.0 0.8 2.3 2.2 4.7 1.6 2.1 1.5 4.8 HI => DS = 150.0 => DS = 85.2 => DS = 82.8 l b = 1.2 ± 0.0 s l b = 2.0 ± 0.1 s l b = 25.7 ± 11.8 s 0.4 0.3 1.3 8.0 1.3 2.0 2.1 4.6 1.6 2.2 1.5 4.7 IJ DS = 148.4 => DS = 81.8 =• DS = 80.8 TE = 1.6 ± 0.2 s l b = 4.4 ± 0.7 s TE = 133.3 ± 18.9 s 0.4 0.3 1.3 8.0 1.3 2.0 2.1 4.6 1.6 2.2 1.5 4.7 HIJ DS = 148.4 => DS = 81.8 ' => DS = 80.8 TE = 1.2 ± 0.1 s TE = 2.8 ± 0.7 s TE = 52.8 ± 34.3 s 0.4 0.9 0.7 8.1 1.3 2.1 2.0 4.6 1.6 2.2 1.5 4.7 U K => DS = 147.5 '=> £>5. = 81.4 ^ DS = 80.8 . TE = 1.8 ± 0.4 s T £ . - 5.6 ± 1.3 s l b = 167.7 ± 52.6 s 0.4 0.9 0.7 8.1 1.3 2.1 2.0 4.6 1.6 2.2 1.5 4.7 HI JK => 7J5 = 147.5 => DS = 81.4 DS = 80.8 l b = 1.6 ± 0.5 s l b = 4.4 ± 1.5 s l b = 72.1 ± 48.4 s Table 4.4: Experimental Results for Search P V ("k"-typed Queries) 95 Results for Search PV ("k'-typed Queries) 180r 1 1 1 '512IJK --S12IJ . y 5 1 2 l , X512HIJK K512HIJ 512HI • X512H Zoom i • • i 8IJK 8IJ 81 8H 8HIJK8HU8HI 160 140 - 1 2 0 CD (0 =•100 <D E I-c 80 o s S 60 tu 40 20 0 110 120 130 140 150 160 Dissimilarity Score Zoom ^ 4 o 64IJK 64HIJKO o 64IJ 64I !64HI .^64H 0I 1 : 1 1 1 . 1 : 1 80 82 84 86 88 90 92 Dissimilarity Score Figure 4.10: Experimental Results for Search P V ("k"-typed Queries) 96 To aim for a balance of performance and accuracy, the best filtering scheme for "k"-typed queries, as observed from the above experimental results, is Scheme HIJ with 64-dimensional color histograms. 4.4 Summary Given that subimage queries can be of arbitrary size, picking one best scale at which the images are blocked is not easy. To cope with this challenge, a multiscale representation is proposed. In order to incorporate both color similarity and spatial similarity, the multi-scale distance function is defined as a weighted- sum of the histogram distance (DH) and the positional distance (Dp): . D = (3DH + (l-P)Dp . where the weighting factor (f3) is a user preference for specifying the relative importance of the two distances above. The DH can be estimated using Padding and Reduction Algorithms; the Dp can be estimated as the square of the shortest Euclidean distance between the query and the image subregion. , To aim for efficient and effective retrievals of desired images, the above multiscale distance function has also been formulated in such a way that given a query and an im-age subregion, the distance value,at the coarser scale can serve as the distance value at the finer scale. With this formulation of the distance function, an efficient multiscale search strategy with the use of vertical filters (such as Search PV) is possible. Using Search P V , we analytically and experimentally investigated the suitable.number of levels for the representation. In the analyses, we set up cost models to evaluate the 15 possible filtering schemes for the 4-level representation, and estimated their C P U , I /O, and com-bined C P U & J / O costs. The analytical results tend to favor the filtering schemes which start with Level H (or Level I), and those that do not skip intermediate levels. In the ex-• .97' ' . ' periments, subimage queries of arbitrary size were classified into four main types, and we measured the execution time (to evaluate the performance) and the dissimilarity score (to evaluate the degree of accuracy) for each filtering scheme that is favored by the analytical results. The experimental results show that when operated with Padding and Reduction Algorithms, Scheme H with 64-dimensional histograms is best for the "h"-typed queries, Scheme HI is best for the "i"-typed queries, and Scheme HIJ is best for both.the "j"-typed and the "k"-typed queries: Moreover, among the three dimensions of histograms, while delivering better performance, the 8-dimensional cases suffer from a loss of accuracy. Con-versely,, while delivering better accuracy, the 512-dimensional cases suffer from a loss of performance/speed. Based on these analytical and experimental results, we conclude that desired images can be retrieved efficiently and effectively using only the first three levels of the proposed representation. Hence, the recommended multiscale representation (Figure 4.11) can be described as follows: • A t Level H , the entire image is represented by a single global 64-dimensional color histogram. • A t Level I, the image is divided into.four non-overlapping blocks, and each block is represented by a 64-dimensional color histogram covering | of the entire image. • A t Level J , each block at Level I is further divided into four blocks, each of which is represented by a 64-dimensional color histogram covering ^ of the entire image. 98 (a) Level H (b) Level I (c) Level J Figure 4.11: The Recommended Multiscale Representation 99 Chapter 5 Search Strategies To search a multiscale representation, some search strategies may require frequent jumping back and forth in the data file so as to get the necessary feature vectors representing the subregions of the images. However, such jumping makes it hard to optimize file organization and buffer management, and may generate a large number of I/Os. To avoid such jumping, a search strategy — Pure Vertical — was proposed and studied in Chapter 4. In addition to the Pure Vertical Search, we consider two other strategies, namely Pure Horizontal and Horizontal-and-Vertical. We investigate analytically and experimentally these strategies so as to find the best one which avoids frequent jumping and at the same time maintains a good balance of efficiency and effectiveness. 100 5.1 Problems of Branch-and-Bound Search To use the branch-and-bound strategy for searching multiple scales [Chen et al 1997], all the database images are first checked at the coarsest scale, and the search proceeds to finer scales in non-descending order of distance value. In general, the branch-and-bound search works in such a way that it always, keeps track of the distance values of all images contending for further consideration. Images with smallest values are "extended" to a finer scale. Then, these most recently "extended" images are considered along with the remaining ones. Again, images with smallest values are "extended". The process repeats until the target images are found. D e f i n i t i o n 5.1 To conduct the branch-and-bound search for retrieving the top u 1 images from the collection of M images: 1. Check all M images in the database at the coarse scale in the first iteration, and compute the distance value for each image. 2. Sort all images in non-descending order of distance value. Images with the current u smallest values become the current top-u. 3. Repeat until all the top u images have reached the finest scale: (a) For each of the u images, i . if the image has not reached the finest scale, extend one level/scale; i i . update the distance value of the extended image with the distance at the . extended scale. (b) Sort all images in non-descending order of distance value, and obtain top u images for the next iteration. • 'As mentioned in Chapter 4, u is the number of images requested by the user. 101 Due to the effectiveness of the brand-and-bound strategy for handling whole-image queries, this strategy can be adapted to handle subimage queries. However, at each iteration of the branch-and-bound search, the feature vectors used in the computation of distance values may be at a different level/scale and may be for images different from those in the previous iteration. As a result, it can be inefficient for large databases. In particular, the number of images the strategy must keep track,of can be large. Jumping back and forth in the data file to get the necessary feature vectors for computation seems unavoidable. Example 5.1 In many database applications, it is not unusual to retrieve the desired images from a collection of thousands of images. For simplicity, in this example, we try to find the top two images from a collection of seven images using the branch-and-bound search (with Scheme HIJ). In the first iteration of the branch-and-bound search, all seven images are checked at the coarse scale (Level H) , and a distance value is computed for each image. These images are then sorted in non-descending order of distance value, and the images with the two smallest values are the potential top two images. In each subsequent iteration, the potential top two images are extended one level/scale, and their distance values are updated; images are then sorted and a new set of the potential top two images is obtained. In the following trace, the number at the slot representing an image at a particular level is the estimated distance value, and the superscript on its left denotes the search order (with the iteration number in brackets). r b find the top two images using the branch-and-bound search Img 1 Img 2 Img 3 Img 4 Img 5 Img 6 Img 7 Level H Level I Level J l 5 t ( l ) 2 5 2nd(l) 5 . 3rd(l)g 5 4th(l)8Q 5th(l)1Q 6th(l)7Q 7<fc(l)g8 11^(3)35 8t/i(2)3Q 9<M2)i5 12t/i(4)32 10tfc(3)2() 102 Observe that jumping back and forth among images and levels may frequently be required, which makes it hard to optimize file organization and buffer management, and may impose a high I /O cost. Hence, we consider three search strategies — Search P V (Pure Vertical), Search P H (Pure Horizontal), and Search H V (Horizontal-and-Vertical) — which avoid such jumping. 5.2 Pure Vertical Search We have studied Search P V (Pure Vertical) in the last chapter; here, we summarize the main points. Like the branch-and-bound search, Search P V is also accurate in its search. However, instead of jumping back and forth to get the necessary data during the search, Search P V checks the images one after another. A t any point in time, a set of u images currently having smallest distance values are kept. For each image, the algorithm keeps proceeding to finer scales until (1) the distance value at a particular scale is already so large that the image cannot qualify as a good match, or (2) the finest scale is reached and the image is either discarded or selected as a member of the answer set, depending on the distance value. -To investigate the efficiency and the effectiveness of Search P V , analytical and experimental evaluations have been carried out. The results show that when operated with Padding and Reduction Algorithms, Scheme H with 64-dimensional histograms is best for the "h"-typed queries, Scheme HI is best for the "i"-typed queries, and Scheme HIJ is best for both the "j"-typed and the "k"-typed queries. 5.3 Pure Horizontal Search Note that Search P V tends to require many comparisons at the finest scale, particularly at the beginning of the search. Thus, we consider another search strategy — Search P H 103 (Pure Horizontal) — that may require fewer comparisons at the finest scale. In Search P H , horizontal filters search "horizontally" across the database level by level, and the number of comparisons are predetermined. The idea is that all M images in the database are checked by a horizontal filter at the coarse scale in the first iteration. The best fi matches (where u < .fi <C M, and / i is a predetermined parameter that controls the number of comparisons) are carried over from the current level to the next level (finer scale), while poor matches at the coarse scale are eliminated. In subsequent iterations, images left from the previous level are checked by horizontal filters at finer scales, and poor matches are again removed. The process repeats until the finest scale is reached and the top u images are returned. D e f i n i t i o n 5.2 (Search P H ) To conduct the Pure Horizontal Search for retrieving the top u images from a collection of M images: 1. Check all M images at the coarsest scale/level, and compute the distance value for each image. 2. Sort the images in non-descending order of distance value. A subset of. these images (the fi images with current smallest values) are carried over to the next level. 3. For each of the remaining levels: (a) Check the images which are carried over from the previous level, and compute the distance value for each image. (b) Sort the images in non-descending order of distance value; a subset of these images are again carried over to the next level. (c) Repeat Steps 3 (a) and (b) for the next level. • 104 E x a m p l e 5.2 Let us find the top two images from a collection of seven images using Search P H (with Scheme HIJ) 2 . In the following trace, the number at the slot representing an image at a particular level is the estimated distance value, and the superscript on its left denotes the search order. To find the top two images using Search P H Img 1 Img 2 Img 3 Img 4 Img 5 Img 6 Img 7 Level H 1s t 25 2 n d tj ' 3rd gg 4th gg 5th j^ Q 6 t / i yg 7th gg . - . = > • top five images are Imgs 1, 2, 3, 5, and 7 Level I sth 3 5 9^-30. ioth 200 n t h 15 1 2 t h 78 =>• top three images are Imgs 1, 2, and 5 Level J 13th g j 14th 32 15th 20 top two images are Imgs 2 and 5 Since the histograms are examined level by level, the best way to organize the precomputed feature vectors in the data file is to arrange them on a level-by-level basis. More precisely, the histograms of one level are followed by the histograms of another level (finer scale). Therefore, the file organization is of the form: /H /H . . . / H j l 7 I g . . . 7 I g 7 J g 7 J S . . . 7 J S 7K g 7K g ; . . j t S V ' V V < N V ^ N ^ ' for M images for M images for M images for M images 5.3.1 Analytical Evaluation Cost M o d e l s To measure the efficiency of Search P H , we set up cost models to estimate the C P U and the I /O costs. The C P U cost depends mainly on the time required to apply Padding and 2 In many database applications, it is not unusual to retrieve the desired images from a collection of thousands of images. For simplicity, in the example, we try to retrieve two desired images from a collection of seven images. 105 Reduction Algorithms to the data (color histograms), and the computation time for each histogram (denoted by Tc) can be estimated using the experimental results in Section 3.4. The I /O cost depends mainly on the time required to sequentially or randomly access the pages containing the data (color histograms), and this access time can be affected by the size of data as well as the number of buffers. With the file organization described above, histograms at the first chosen level can be accessed sequentially. Given a minimum buffer size (one page) and an intelligent buffer management scheme, the number of random page accesses at the remaining levels can be estimated with the use of Cardenas' Formula [Cardenas 1975] or Yao's Formula [Yao 1977]. • Definition 5.3 (Cardenas' Formula) Given that R "records" divided into m pages and that r "records" satisfying the query are distributed uniformly among the m pages, the expression 1 — ^ gives the probability that a particular page does not contain a particular "record". If r "records" are selected independently, then the probability that a particular page not being hit is given by (1 — ^-) r , and 1 — (1 — ^ ) r gives the probability that a particular page is hit. Therefore, the number of page accesses for a given query can be estimated using C(m, r) = m i - n - I m (5.1) Definition 5.4 (Yao's Formula) Given R "records" grouped into m pages (where 1 < fn < R), each contains ^ "records". If r "records" (where r < R — ^ ) are randomly selected from the R "records", the expected number of page accesses (pages with at least one "record" selected) is given by > ( i - ^ ) - i + i " i - n Y(R,m,r) = m If r > R — ^ or m = 1, then all m pages are expected to be accessed 106 (5.2) In the cost models, color histograms at each level for each image are treated as a "record". Depending on the level at which the histograms are represented, the size of "record" may vary. Sometimes a "record" may occupy a small portion.of one page. For example, with a page size of 1 kilobyte (KB) , the "record" of one Level-H 64-dimensional color histogram takes less than 30% of a page. With the file organization for Search P H , after the page containing a "record" of the Level-H histograms is loaded, no extra page access is needed for reading the Level-H "record" of the next image. However, sometimes a "record" may occupy more than one page. For example, with the above page size, the "record" of sixteen Level-J 64-dimensional color histograms occupies more than 15 pages. The cost of accessing all the histograms in this "record" is the sum of the time required to randomly access the first page and the time required to sequentially access the remaining pages. Like Search P V , for the vast majority of subimage queries, in the case that the image subregions I s u b and I s u p are the best matches (which give the smallest TJs) at their corresponding levels, I s u b is one of the subregions enclosed in I s u p . When aiming for a balance of efficiency and effectiveness, checking all the histograms in a "record" may seem unnecessary, and only a portion of the "record" (some of the histograms) may need to be examined. As a result, both the C P U and the I/O costs can be reduced. E x a m p l e 5.3 In Scheme HIJ , we examine all M. histograms at Level H , and carry the best [iff images over to Level I where the 4 Level-I histograms for each of these images are examined. Then, at Level J , for each of the best \x\ images (where fijf > H.i) carried over from Level I, we examine the 4 histograms representing the subregions enclosed in the promising block of Level I. Hence, using a 64-dimensional color histogram with page size P — 1 K B , the cost model for Scheme HIJ can be described as follows: • The C P U cost is the product of Tc and the total number of histograms used in the computation. In Scheme HIJ , Level-H histograms are examined for all M images, 107 4 Level-I histograms are checked for each of the HH images, and 4 Level-J histograms are examined for each of the fir images (where M >^ fiH > > u, and both fiH and ni are predefined). Hence, CPU^fj = (M + 4ftH + 4fii)Tc. • The I /O cost is the sum of total seek times and total data transfer times. Using the above file organization for Search P H , an average seek time (denoted by TA) is charged for moving the read head to the first "record" at Level H , and for each random page access of "records" at the remaining levels. When a page of data (color histograms) is loaded, a data transfer time (denoted by'To) is charged. . More precisely, for Scheme HIJ , TA is needed to get to the Level-H his-togram of the first image, and Try is charged for loading the page containing the histogram. Since all Level-H histograms are stored contiguously, no seek is charged for accessing subsequent Level-H histograms, and only Try is charged for each se-quential page access of data. A t Levels I and J , the number of random page ac-cesses can be estimated by Cardenas' Formula, and each random page access re-quires one TA and one TD. Hence, I/Offij = [1 + C*(M, fiH) + C(4M, /*/)] TA + f + C(M, m ) + C(4M, HI)] TD. The cost models for other filtering schemes, color histograms of other dimensions, other page sizes, or for Yao's Formula can be formulated in a similar manner. • The cost models for the 15 filtering schemes (with page size P •= 1 K B , using 64-dimensional color histograms and Cardenas' Formula) are shown below: 1. Scheme H We examine all M histograms at Level H . • CPU'/f11 = M'I'c • l/OpHH = TA + frD 108 2. Scheme I We examine all AM histograms at Level I. • CPU1/" = AMTC • I/O','" = TA + MTD 3. Scheme HI We examine all M histograms at Level H , and carry the best \m images over to Level I where the 4 histograms for each of these images are examined. ... CPUgf = (M + AfiH)Tc ' • 110m = [1 + C ( M ' TA+[f + C ( M , HH)\ TD where fifj is a predefined parameter that controls the number of images to be carried over from Level H.to the next level (Level I) 3 , and its value lies between u and M. A. Scheme J We examine all 1 6 M histograms at Level J . • CPU'j" - WMTC • I/Qij" = TA+ AMTD 5. Scheme H.I . We examine all M histograms at Level H , and carry the best \iu images over to Level J where the.16 histograms for each of these images are examined. • CPUf/j = ( M + 16////)7b . • I/Offi = [1 + C ( 4 M , nH)] TA + f + 4 C ( 4 M , M ) I TD M 3Similarly, fij and in the following cost models control the number of images to be carried over to the next available level from Level I and Level J respectively. 109 6. Scheme IJ '. We examine all AM histograms at Level I, and carry the best fii images over to Level J where for each of these images, the 4 histograms representing the subregions enclosed in the promising block of Level I are examined. . CPUff = (4M + 4fn)Tc • I/Off = [1 + C ( 4 M , m)] TA + [M + C(AM, ^)]TD 7. Scheme HIJ We examine all M histograms at Level H , and carry the best fifj images over to Level I where the 4 Level-I histograms for each of these images are examined. Then, at Level J , for each of the best fi[ images (where > fir) carried over from Level I, " we examine the 4 histograms representing the subregions enclosed in the promising block of Level I. • CPUffij = ( M + 4HH + 4fn)Tc • I/Ogfj = [1 + C(M, HH) + C(4M, m)]TA+ [ f + C ( M , nH) + C ( 4 M , ^ ) ] TD 8. Scheme K We examine all 6 4 M histograms at Level K . • CPUfc" = UMTc • I/Off — TA + lQMTp 9. Scheme H K We examine all M histograms at Level H , and carry the best flu images over to the Level K where the 64 histograms for each of these images are examined. • CPU™ = (M + 6A[iH)Tc • IIOHK = [1 + C(WM,pH)] TA + [ f + 16C(16M, m ) ] TD 110 10. Scheme IK We examine all AM histograms at Level I, and carry the best HI images over to. Level K where for each of these images, the 16 histograms representing the subregions enclosed in the promising block of Level I are examined. • CPUfg = ( 4 M + i6/*/)r c • / / O f * ? = [1 + C ( 1 6 M , A /^)] TA + [M + 4 C ( 1 6 M , HI)] TD 11. Scheme J K We examine all 1 6 M histograms at Level J , and carry the best fiJ images over to Level K where for each of these images, the 4 histograms representing the subregions enclosed in the promising block of Level J are examined. • CPUfg = {16M + A(ij)TC • I/0^ = [l + C(16M,iij)]TA.+ [4M + C(16M,fiJ)]TD . 12. Scheme HIK We examine all M histograms at Level H , and carry-the best HH images over to Level I where the 4 Level-I histograms for each of these images are examined. Then, at Level K , for each of the best HI images (where HH > Hi) carried over from Level I, we examine the 16 histograms representing the subregions enclosed in the promising block of Level I. • CPUffiK = (M + AHH + 16 W )7b ' •. • r/0^K = [i+C(M,HH) + C(16M,fzi)]TA f + C ( M , IIH) •+ 4C(16M, HI)} TD 13. Scheme H J K We examine all M histograms at Level H , and carry the best HH images over to Level J where the 16 Level-J histograms for each of these images are examined. I l l Then, at Level K , for each of the best HJ images (where fijj > /i j) carried over from Level J , we examine the 4 histograms representing the subregions enclosed in the promising block of Level J . • CPU$K = (M+16pH + 4iAj)Tc • I / 0 ^ K = [1 + C(4M, fiH) + C ( 1 6 M , pj)] TA + + 4 C ( 4 M , HH) + C ( 1 6 M , fij)] TD 14. Scheme U K We examine all 4M histograms at Level I, and carry the best fir images.over to Level J where for each of these images, the 4 histograms representing the subregions enclosed in the promising block of Level I are examined. Then, at Level K , for each of the best \ij images (where fii > fij) carried over from Level J , we examine the 4 histograms representing the subregions enclosed in the promising block of Level J . • CPUfJlc = (4M + 4m + 4nj)TC • J/OWK = [l + C(4M,nI)+C(lQM,fij)]TA + [M + C (4M, HI) + C ( 1 6 M , TD 15. Scheme H I J K We examine all M histograms at Level H , and carry the best JJLJJ images over to Level I where the 4 Level-I histograms for each of these images are examined. Then, at Level J , for each of the best ^/images (where HH > m) carried over from Level I, we examine the 4 histograms representing the subregions enclosed in the promising block of Level I, and carry the best fij images (where fir > fij) over to Level K where the 4 Level-K histograms for each of these images are examined.. • CPUffijK = ( M + 4fiH + 4iii + 4fij)TC 112 I/OffjK = [1 + C(M, H H ) + C ( 4 M , HI) + C(16M, HJ)] TA , + [f + C(M,HH) + C(4M,Hi)+C(16M,Hj)]TD Analytical Results The above cost models provide a good foundation for analyzing the 15 filtering schemes, and for discovering general trends. The computation time for Padding and Reduction Algorithms (Tc) depends on the sizes of the subimage query and the image subregion. For the analysis, we let Tc be 3 ms for the 8-dimensional and the 64-dimensional his-tograms, and 20 ms for the 512-dimensional histograms. We assume that each minimum seek (TM) takes 5 ms, each average seek (TA) requires 15 ms, and the transfer of data in each page (TD) with page size P — 1 K B needs 0.3 ms, and that HH, Hi anc^ HJ a r e 160, 80, and 40 respectively. Analyses on finding u = 10 images from a database of M=1000 im-ages were then conducted. As observed from the experimental results of Search P V , while delivering better performance, the filtering schemes with the 8-dimensional histogram suf-fer from a loss of effectiveness. By contrast, while delivering better accuracy, the schemes with the 512-dimensional histogram suffer from a loss of efficiency. So, we summarize only the analytical results for the 64-dimensional color histograms in the figures on the next few pages. In the figures, for each filtering scheme, the C P U , I /O, or combined C P U & I / O cost is shown by the solid line. 113 C P U Costs lor Seach PH (n = 64) (a) C P U Costs for Search P H I/O Costs for Search PH (n - 64) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (in ms) (b) I/O Costs for Search P H Figure 5.1: Analytical Results for Search P H (64-dimensional Color Histograms) 114 CPU + I/O Costs for Seach PH (n = 64) (c) Combined C P U and I / O Costs for Search P H Figure 5.1: A n a l y t i c a l Results for Search P H (Continued) Observat ion 5.1 The C P U and the I / O costs for every filtering scheme are deterministic. The best-case analyses are the same as the worst-case analyses. Explanation Since the number of images to be carried over from the current level to the next level is controlled by the predefined parameter fij, we know statically the number of images to be considered and to be retained at a particular level. • A s for Search P V , the fol lowing trends are observed: 1. The C P U cost is affected by the dimension of the color histogram: A s the dimension of the histogram grows, the time required for C P U operations increases. 2. The I / O cost is affected by the dimension of the color histogram: A s the dimension of the histogram grows, the time required for I / O operations increases. 3. The combined C P U and I / O cost is affected by the dimension of the color histogram: A s the dimension of the histogram grows, the time required for both C P U and I / O operations increases. 115 4. Filtering schemes which start with filter J or K take more C P U and I /O time. For example, the combined C P U and I /O times demanded by Schemes J , J K , and K are longer than the times taken by the worst cases of many filtering schemes such as Schemes HIJ and HIJK. 5. Filtering schemes which skip intermediate levels incur greater C P U and I /O cost. 6. The combined C P U and I/O cost for the additional intermediate levels becomes relatively less expensive as the dimension of the histogram grows. For example, as the dimension of the histogram increases, the combined cost for Scheme H J is more expensive than that for Scheme HIJ . Similarly, the combined cost for Scheme H K is more expensive than that for Scheme HIK or H J K , and either of these two costs is more expensive than that for Scheme HIJK. ' In general, the above analytical results tend to favor the filtering schemes which start with Level H (or Level I), and those that dp not skip any intermediate level. More precisely, Schemes H , I, HI, IJ, HIJ , U K , and H I J K are favored. 5.3.2 Experimental Evaluation To find the appropriate filtering schemes for Search P H , several experirhents have been performed using the same set of data (color histograms) as for the experimental evaluation of Search P V . Again, for each subimage query, we categorize the database images into five main classes, and the effectiveness of the search strategy is assessed by counting the number of retrieved images which fall into each class. The dissimilarity score (DS) for the retrieval of the best 10 images is computed as: DS = 7(2 - nA).+ 5(3 - nB) + (5 - r,c) + 15TJd + 40nE where rjj indicates the number of images falling into Class j. The efficiency of Search P H can be assessed using the execution time (TE) which measures the combined C P U & I / O 116 cost per query on a database of 1000 images. Let [in-, and fij be 160, 80, and 40 respectively. The experiments were run on a Sun UltraSPARC-1 workstation using a page size of 1 K B and user preference /? of 0.5 4 . The results (the average DS and TE values for the retrievals of the best 10 images from a collection of 1000 images) are summarized in the tables and figures on the next few pages. The T E measures the efficiency and the DS assesses the effectiveness. • "h"-typed queries. Filtering Scheme 64-dimensional Color Histograms VA VB VC VD => DS H 1.9 2.1 1.7 4.3 73.0 3.4 ± 1.9 s Table 5.1: Experimental Results for Search P H ("h"-typed Queries) • "i"-typed queries Filtering 64-dimensional Color Histograms Scheme VA VB VC VD '=> DS TE H 1.5 2.2 1.8 4.5 78.2 4.1 ± 1.6 s i : 1.7 2.1 1.8 4.4 75.8 5.9 ± 4.3. s HI 1.7 2.1 1.8 4.4 75.8 5.3 ± 1.1 s Table 5.2: Experimental Results for Search P H ("i"-typed.Queries) 4 With the user preference (3 set to 0.5, both the color distribution and spatial information are of the same level of importance. 117 Results for Search PH ("i"-typed Queries) D 5.8 o 641 i i 5.6 -_5 .4 CD <S> c 5.2 0 64HI --:ion Time i -3 4.8 cu X LU 4.6 4.4 -4.2 o 64H A 41 1 1 1 i i I 75.5 76 76.5 77 77.5 78 78.5 Dissimilarity Score Figure 5.2: Exper imental Results for Search P H (" i " - typed Queries) • " j"~typed queries F i l ter ing Scheme 64-dimensional Color Histograms VA ?}B ']c rjD DS TE H 0.9 2.1 2.5 4.5 82.2 1.4 ± 0.2 s I 1.5 2.2 1.8 4.5 78.2 7.4 ± 1.6 s HI 1.5 2.2 1.8 4.5 78.2 2.2 ± 0.3 s IJ 1.6 2.0 2.1 4.3 75.2 7.7 i 1.6 s HI J 1.6 2.0 2.1 4.3 75.2 2.5 ± 0.3 s Table 5.3: Exper imenta l Results for Search P H (" j " - typed Queries) 118 Results for Search P H ("j"-typed Queries) 7 6 2 1 Q64IJ T 1 1 1 -o64 l -0 64HIJ _ -0 64HI -i i i Q64H 1 I 1 1 1 1 I I I I 75 76 77 78 79 80 81 82 83 Dissimilarity Score Figure 5.3: Exper imenta l Results for Search P H (" j " - typed Queries) - typed queries F i l ter ing Scheme 64-dimensional Color Histograms VA VB VC VD => DS TE H 0.5 2.0 2.7 4.8 89.8 0.7 ± 0.1 s I 0.8 2.3 2.2 4.7 85.2 3.1 ± 0.5 s HI 0.8 2.3 2.2 4.7 85.2 1.3 ± 0.1 s IJ 1.1 2.1 2.2 4.6 82.6 3.3 I 0.5 s HIJ 1.1 2.1 2.2 4.6 82.6 1.4 ± 0.2 s U K 1.2 2.0 2.2 4.6 82.4 3.4 ± 0.5 s H I J K 1.2 2.0 2.2 4.6 82.4 1.6 ± 0.2 s Table 5.4: Exper imenta l Results for Search P H ( "k" - typed Queries) 119 Results for Search PH ("k"-typed Queries) 3.5 o 64IJK o 64 IJ 1 1 1 I I I -o 641 0 64HIJK -0 64HIJ I 1 o 64HI o 64H ( i i 51 1 1 1 1 1 1 1 1 1 82 83 84 85 86 87 88 89 90 91 Dissimilarity Score Figure 5.4: Experimental Results for Search P H ("k"-typed Queries) Like Search P V , when operated with Padding and Reduction Algorithms, Scheme H is best for"h"-typed queries, Scheme HI is best for "i"-typed queries, and Scheme HIJ is best for both "j"-typed and "k"-typed queries. Unlike Search P V , the D S values for Search P H appear to be higher than those for Search P V . However, the TE values for Search P H appear to be smaller than those for Search P V . In other words, while delivering efficiency, Search P H suffers from a loss of effectiveness. 5.4 Horizontal-and-Vertical Search Note that in Search P H , horizontal filters are used not only at one level, but at all the levels. Hence, the number of images to be carried over from one level to another needs to be chosen carefully. If this set of numbers {HH, Hh a n d HJ) is not determined carefully (say, the numbers are too small), then for some queries, an image I that gives a good match at a finer scale could have been eliminated before reaching this finer scale. This 120 may happen when there are sufficiently many images which are not as good as I at the finer scale but which are better than I at the coarser scale. Consequently, while delivering efficiency, Search. P H may suffer from a loss of effectiveness. So, we consider another strategy — Search H V (Horizontal-and-Vertical),— which is a hybrid of Search P V and Search P H . With it, a horizontal filter is applied only to the coarsest level 5 , and vertical filters are then applied to the remaining levels. The idea is that all M images in. the database are checked by a horizontal filter at the coarse scale in the first iteration. The best fi matches (where u < /J, <C M) are carried over from the current level to the next level (finer scale), while poor matches at the coarse scale are eliminated. Then, all these best fi images are checked one after another using vertical filters. A t any point in time, the current u smallest distances (at the finest scale of the u images, where u < fi) are kept. When an image is tested, if its distance value at the current scale is already greater than some distance value of the current u smallest, the tested image can be eliminated. Otherwise, a finer scale is used. The process repeats until the image is (1) eliminated or (2) added to become one of the current u smallest (when it reaches the finest scale). By so doing, the detailed search with the use of vertical filters is applied not to the set of all the images, but only to its most: promising subset. D e f i n i t i o n 5.5 (Search H V ) To conduct the Horizontal-and-Vertical Search for retriev-ing the top u images from a collection of M images: 1. Check all M images at the coarsest scale/level, and compute the distance value for each image. 2. Sort the images in non-descending order of distance value. A subset of these images (the fi images with current smallest values) are carried over to the next level. 5 With one level of horizontal level, we only need to carefully assign a value to one (instead of three) predefined parameter /J that controls the images to be carried over. 121 3. The first u images in this subset are checked at all remaining levels, and the distance , value is computed for each image at the finest scale. These u images become the current top-u. 4. For each of the remaining fi — u images: (a) Let the second coarsest scale be the current level/scale. (b) Compute the distance value. If the value is less than some value of the top-w, extend one level/scale (provided that we have not reached the finest level)' and repeat Step 4 (b) for the extended scale. • E x a m p l e 5.4 Let us find the top two images from a collection of seven images using Search H V (with Scheme HIJ). In the following trace, the number at the slot representing an image at a particular level is the estimated distance value, and the superscript on its left denotes the search order. To ind the top two images using Search 1 IV Img 1 Img 2 Img 3 Img 4 Img 5 Img 6 Img 7 Level H 1 s t 25 2 n d 5 3 r d g5 4th g Q 5th 70 7^h g g => top five images are Images 1, 2, 3, 5, and 7 Level I Level J Top two images sth 35 9th 6 ? A Img 1 wth 3 0 nth 32 Imgs 2 and 1 \2th 200 Imgs 2 and 1 13th -^ 5 14th 20 Imgs 5 and 2 15th y g 1| . Imgs . 5 and 2 In Search H V , the histograms are examined image by image at the coarsest scale, and level by level at the remaining scales. So, the way in which the precomputed feature vectors 122 in the data file are organized depends on the filtering scheme. More precisely, for the schemes which started at Level H , the best file organization is of the form: J H J r s 7 Js 7Ks 7 I s 7 J s 7 K s 7 I s / J s / K s , V v ' V >, — ' "> v ' S for M images for Image 1 for Image 2 for Image M . Similarly, for the schemes which started at Level I, the best file organization is of the form: i111 • • • i\ ihjH HxjS •••• HijS , for M images for Image 1 for Image 2 for Image M and for the schemes which started at Levels J and K , the best file organization is of the form: 7 Js 7 Js ••• 7 Js 7Ks 7Ks • •• 7Ks• for M images for M images In general, the histograms of the coarsest scale are arranged in such a way that the histograms of one image are followed by those of another image. By so doing, an efficient sequential read of these histograms at the coarsest scale is possible. For the remaining scales, histograms are arranged in such a way that for each image, the histograms at the second coarsest scale are followed by the histograms at finer scales. 5.4.1 Analytical Evaluation Cost Models To measure the efficiency of Search H V , we set up cost models to estimate the C P U and- the I /O costs. The C P U cost depends mainly on the. number of histograms to be examined and the computation time Tc', the I /O cost depends mainly on the time required to sequentially or randomly access the pages containing the data (color histograms). With the file organization described above, histograms at the first chosen level can be accessed sequentially. Given a minimum buffer size (one page) and an intelligent buffer management scheme, the number of page accesses at the remaining.levels can be estimated with the use of statistical formulations. 123 E x a m p l e 5.5 In Scheme HIJ , we examine all M histograms at Level. H . Then, for each of the best ////images carried over from Level H , we examine the 4 Level-I histograms, and if necessary 6 , we consider the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I. Hence, using a 64-dimensional color histogram with page size P = 1 K B , the cost model for Scheme HIJ can be described as follows: • The G P U cost is the product of Tc and the total number of histograms used in the computation. In Scheme HIJ , Level-H histograms are examined for all M images, 4 Level-I histograms are checked for each of the fifj images, and 4 Level-J histograms are examined for each of the £ 5 images (where M ^> fifj > £ 5 > u, and fin is predefined but £ 5 is determined dynamically at runtime). Hence, CPUfJj = (M + •Mm + 4£ 5 )Tc . • The I/O cost is the sum of total seek times and total data transfer times. Using the appropriate file organization for Search H V mentioned above, an average seek time (denoted by TA) is charged for moving the read head to the first "record" at Level H , and for each random page access of "records" at the second level (Level I). Since all Level-I and Level-J "records" for a particular image are stored contiguously, only a minimum seek time (denoted by TM) is charged for each jump between the data (for example, jumping from the fourth Level-I histogram to the third Level-J histogram of the same image). Moreover, when a page of data (color histograms) is loaded, a data transfer time (denoted by Tp) is charged. More precisely, for Scheme HIJ , TA is needed to get to the Level-H histogram of the first image, and Try is charged for loading the page containing the histogram. Since all Level-H histograms are stored contiguously, no seek is charged for accessing subsequent Level-H histograms, and only Trj is charged for each sequential page 6 W e consider the 4 Level-J histograms of an image if the D at Level I is less than any D of the current top-u. 124 access of data. Then, TA is needed to get to the Level-I histograms of each image carried over from Level H , and TD is charged for loading the page containing the histograms. Since only 4 histograms can fit into a page, if the 4 Level-J histograms happen to be examined, then we load the next page. There is a probability of | that the selected 4 Level-J histograms are stored contiguously after the loaded Level-I histograms. Hence, I/0%YJ = (1 + HH)TA + ^TM + ( f + HH + &)TD. The cost models for other filtering schemes, color histograms of other dimensions, or other page sizes can be formulated in a similar manner. • The cost models for the 15 filtering schemes (with page size P = 1 K B and using 64-dimensional color histograms) are shown below: 1. Scheme H We examine all M histograms at Level H . • CPUgv = MTC • I/O1// = TA + f 7b 2. Scheme I We examine all 4 M histograms at Level I. • CPU?V = AMTC • I/0\LV = TA + MTD . 3. Scheme HI We examine all M histograms at Level H . For each of the best fiH images carried over from Level H , we examine the 4 Level-I histograms. • CPUJJY = (\I+ 4(111)10 125 where [in is a predefined parameter that controls the number of.images to be carried over from Level H to the next level (Level I) 7 , and its value lies between u and M. 4. Scheme J '• We examine all 1 6 M histograms at Level J . • CPUyv = lGMTc [/0'jv = TA+4MTD 5. Scheme II.) , . We examine all M histograms at Level H . For each of the best fifj images carried over from Level H , we examine the 16 Level-J histograms. • CPUJU = ( A / + l6////)7c • ; • l/OW'j = (1 + L<H)TA + + 4tin)TD ; -6. Scheme IJ We examine all 4 M histograms at Level I. For each of the best /xi images carried over from Level I, we examine the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I. • CPU}1/ = (.\M+\,i,)TC .. I/O'// = (1 + m)TA + (M + /i/)7b 7. Scheme HIJ We examine all M histograms at Level H . For each of the best jijj images carried over from Level H , we examine the 4 Level-I histograms, and if necessary 8 , consider 7 Similar ly , fij and / i j in the following cost models control the number of images to be carried over to the next available level from Level I and Level J respectively. 8 W e consider the 4 Level-J histograms of an image if the D at Level I is less than any D of the current top-w. Similar conditions apply to the filtering schemes involving more than two levels. 126 -the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I. • CPU™ = (M + 4 m + 4 & ) r c • I/0%YJ = (1 + I*H)TA + ^TM + ( f + HH + &)Tp where fin > £5 > u. 8. Scheme K We examine all 6 4 M histograms at Level K . • CPUJP' = CvlMTc • i/o?v = TA + I6MTD. 9. Scheme H K We examine all M histograms at Level H . For each of the best fifj images carried over from Level H , we examine the 64 Level-K histograms. • CPUiJ% = (M + 64fiH)TC • I/OHVK = (1 + HH)TA.+ ( f + 1^H)TD 10. Scheme IK We examine all 4M histograms at Level I. For each of the best HI images carried over from Level I, we examine the 16 Level-K histograms. :• CPUJtf = (4M + 16fi,)TC • I/0?£= (1 + Hi)TA + ( M + Am)TD 11. Scheme J K We examine all 1 6 M histograms at Level J . For each of the best fij images carried over from Level J , we examine the 4 Level-K histograms. 127 • CPU'jy = (UiM+ 4lij)Tc • I/0H% = (1 + fij)TA + {AM + HJ)TD 12. Scheme H I K We examine all M histograms at Level H . For each of the best \in images carried over from Level H , we examine the 4 Level-I histograms, and if necessary, consider the 16 Level-K histograms which represent the subregions enclosed in the promising block of Level I. • CPUfJYK = (M + AfiH + 16fio)Tc • • J/Off/A- = (1 + HH)TA + ZIQTM + ( f + m + 4£io)TD where p,H > ^io > u. 13. Scheme H J K ,' We examine all M histograms at Level H . For each of the best HH images carried over from Level H , we examine the 16 Level-J histograms, and if necessary, consider the 4 Level-K histograms which represent the subregions enclosed in the promising block of Level J . . CPUJjJK = ( M + 16m + ^12)TC • I/0%VJK = (1 + PH)TA + ^TM + ( f + Am + Zu)TD where > £12 > u. 14. Scheme U K We examine all AM histograms at Level I. For each of the best fii images carried over from Level I, we examine the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I, and if necessary, consider the 4 Level-K histograms which represent the subregions enclosed in the promising block of Level J . 128 • CPUjYK = ( 4 M + 4 /x / + 4 6 4 ) r c • • i/Ol/fK = (1 + m)TA + ^ T M + (M + $u)TD whore fij > £14 > ti. 15. Scheme H I J K We examine all M histograms at Level H . For each of the best \xu images carried over from Level H , we examine the 4 Level-I histograms, then consider (if necessary) the 4 Level-J histograms which represent the subregions enclosed in the promising block of Level I, and consider (if necessary) the 4 Level-K histograms which represent • the subregions enclosed in the promising block of Level J . • CPUfiJjK = {M + 4fiH + 4^16 + 4$n)Tc • I/OWIJK = (1 + M)TA + ( ^ T + ^ ) T M + ( f + M + 6e + in)TD where [iH > 6 e > in > u. Note from the cost models for filtering schemes involving more than two levels, the value of each £j is determined dynamically. For instance, in the best case of Scheme HIJ , the first u images carried over from Level H are the top-u, and each Level-I distance for the remaining \in — u images is greater than all Level-J distances of the top u images. As such, the histograms at Level J for the fifj — u images are not examined; thus, £ 5 = u. Conversely, in the worst case, at any point in time the Level-I distance of each image is less than some Level-J distance of the current top-u. As such, all histograms at all levels for the images are examined; thus, £ 5 = fijj: Analyt ica l Results The above cost models provide a good foundation for analyzing the 15 filtering schemes, and for discovering general trends. The computation time for Padding and Reduction 129 Algor i thms (Tc) depends on the sizes of the subimage query and the image subregion. For the analysis, we let Tc be 3 ms for the 8-dimensional and the 64-dimensional histograms, and 20 ms for the 512-dimensional histograms. We assume that each min imum seek ( T A / ) takes 5 ms, each average seek ( T A ) requires 15 ms, and the transfer of data in each page (To) with page size P — 1 K B needs 0.3 ms, and that /*//, /.ii and fij are 160, 80, and 40 respectively. Analyses on finding u = 10 images from a database of A / = 1000 images were then conducted. A s in Search P H , we summarize only the analytical results for the 64-dimensional color histograms in the figures on the next few pages. In the figures, for each filtering scheme, the minimum cost is shown by the solid line, and the addit ional cost (if any) is indicated by the dotted line. CPU Costs lor Search HV (n = 64) 1 i 1 i 1 1 > 1 i 1 - HIJK UK HJK - HIK JK HK K HIJ HJ J -HI m "o 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (in ms) X ^ Q& (a) C P U Costs for Search H V Figure 5.5: A n a l y t i c a l Results for Search H V (64-dimensional Color Histograms) 130 I/O Costs for Search HV (n = 64) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Time (in ms) (b) I/O Costs for Search H V CPU + I/O Costs for Seach HV (n > 64) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (in ms) x 1 Q s (c) Combined C P U and I/O Costs for Search H V jure 5.5: Analytical Results for Search H V (Continued) 131 As for Searches P V and P H , the following trends are observed: 1. The C P U cost is affected by the dimension of the color histogram: As the dimension of the histogram grows, the time required for C P U operations increases. 2. The I/O cost is affected by the dimension of the color histogram: As the dimension . of the histogram grows, the time required for I /O operations increases. 3. The combined C P U and I /O cost is affected by the dimension of the color histogram: As the dimension of the histogram grows, the time required for both C P U and I /O operations increases. 4. Filtering schemes which start with filter J or K often take more C P U and I /O time. For example, the combined C P U and I /O times demanded by Schemes J , J K , and K are often longer than the times taken by the worst cases of many filtering schemes such as Schemes HIJ and HI JK . 5. Filtering schemes which skip intermediate levels often incur greater C P U and I /O cost. 6. The combined C P U and I /O cost for the additional intermediate levels becomes relatively less expensive as the dimension of the histogram grows. For example, as the dimension of the histogram increases, the combined cost for Scheme H J is usually more expensive than that for Scheme HIJ . Similarly, the combined cost for Scheme H K is usually more expensive than that for Scheme H I K or H J K , and either of these two costs is more expensive than that for Scheme HI JK . In general, the above analytical results tend to favor the filtering schemes which start with Level H (or Level I), and those that do not skip any intermediate level. More precisely, Schemes H , I, HI, IJ, HIJ , U K , and H I J K are favored. 132 5.4.2 Experimental Evaluation To find the appropriate filtering schemes for Search H V , several experiments have been performed using the same set of data (color histograms) as for the experimental evaluation of Search P V . Again, for each subimage query, we categorize the database images into five main classes, and the effectiveness of the search strategy is assessed by counting the number of retrieved images which fall into each class. The dissimilarity score (DS) for the retrieval of the best 10 images, is computed as: DS = 7(2 - rjA) + 5(3 - nB) + (5 - rjc) + 15f/D + 40T)E where rjj indicates the number of images falling into Class j. The efficiency of Search H V can be assessed using the execution time (TE) which measures the combined C P U & I / O cost per query on a database of 1000 images. Let /xu, [ii and fiJ be 160, 80, and 40 respectively. The experiments were run on a Sun UltraSPARC-1 workstation using a page size of 1 K B and user preference j3 of 0.5. The results (the average DS and TE values for the retrievals of the best 10 images from a collection of 1000 images) are summarized in the tables and figures on the next few pages. The TE measures the efficiency and the DS assesses the effectiveness. • "h"-typed queries Filtering Scheme . 64-dimensional Color Histograms VA VB VC VD => DS TE H 1.9 2.1 1.7 4.3 73.0 3.6 ± 1 . 9 s Table 5.5: Experimental Results for Search H V ("h"-typed Queries) 133 • " i " - t y p e d queries F i l ter ing Scheme 64-dimensional Color Histograms VA VB VC VD DS TE H 1.5 2.2 1.8 4.5 78.2 4.4 ± 1.6 s I 1.7 2.1 1.8 4.4 75.8 8.4 ± 4.2 s HI 1.7 2.1 1.8 4.4 75.8 6.3 ± 1.3 s Table 5.6: Experimental Results for Search H V (" i " - typed Queries) Results for Search HV ("i"-typed Queries) o 641 ' Q64HI o 6 4 H 75.5 76 76.5 77 77.5 78 78.5 Dissimilarity Score Figure 5.6: Exper imental Results for Search H V ( " i " - typed Queries) o o 8 7.5 <D 6.5 E UJ 5.5 4.5 134 • " j " - typed queries F i l ter ing Scheme 64-dimensional Color Histograms VA VB VC VD => DS TE H 0.9 2.1 2.5 4.5 82.2 1.7 ± 0.2 s I 1.5 2.2 1.8 4.5 78.2 7.7 ± 1.6 s HI 1.5 2.2 1.8 4.5 78.2 2.5 ± 0.4 s IJ 1.8 2.0 1.9 4.3 74.0 8.0 ± 1.6 s HIJ 1.8 2.0 1.9 4.3 74.0 2.8 ± 0.3 s Table 5.7: Experimental Results for Search H V (" j" - typed Queries) Results for Search HV ("j"-typed Queries) T 1 1 1 1 1 1 1 r o 64IJ o 641 o 64HIJ o 64HI o 64H p 1 1 1 1 1 1 i i i i 73 74 75 76 77 78 79 80 81 82 83 Dissimilarity Score Figure 5.7: Exper imental Results for Search H V (" j " - typed Queries) 7 w 6 <D 4 2 135 • ' 'k" - typed queries Fi l ter ing Scheme 64-dimensional Color Histograms VA VB Vc VD => DS TE H 0.5 2.0 2.7 4.8 89.8 0.9 ± 0.1 s I 0.8 2.3 2.2 4.7 85.2 3.4 ± 0.5 s HI 0.8 2.3 2.2 4.7 85.2 1.7 ± 0.2 s IJ 1.2 2.1 2.1 4.6 82.0 3.6 ± 0.5 s HI J 1.2 2.1 2.1 4.6 82.0 1.5 ± 0.2 s U K 1.3 2.1 2.0 4.6 81.4 3.8 ± 0.5 s H I J K 1.3 2.1 2.0 4.6 81.4 2.0 ± 0.2 s Table 5.8: Exper imental Results for Search H V ( "k" - typed Queries) Results for Search HV ("k"-typed Queries) o 64IJK o 64IJ o 641 o 64HIJK o 64HI o 64HIJ o 64H — i , , , , , , , 1 81 82 83 84 85 86 87 88 89 90 91 Dissimilarity Score Figure 5.8: Exper imental Results for Search H V ( "k" - typed Queries) Like Searches P V and P H , when operated with Padding and Reduction Algor i thms , Scheme H is best f o r " h " - t y p e d queries, Scheme HI is best for " i " - t y p e d queries, and Scheme H I J is best for both " j " - typed and " k " - t y p e d queries. Unlike Search P H , the DS values for Search H V appear to be lower than those for Search P H ; unlike Search P V , the TE values for Search H V appear to be smaller than those for Search P V . In other words, 136 3.5 — 2.5 M 2 1.5 Search H V keeps a good balance of efficiency and effectiveness. 5.5 The Best Search Strategy Three search strategies — namely, Search P V , Search P H , and Search H V — have been suggested to avoid the kind of frequent jumping incurred in the branch-and-bound strategy. To find the best strategy among the three, analyses and experiments have been carried out. Analyt i ca l ly , we set up cost models to evaluate the filtering schemes and estimate their C P U , I / O , and combined C P U & I / O costs. A l l three strategies share a common trend: F i l ter ing schemes start ing with Level H (or Level I) and those not skipping intermediate levels are considered to be favorable. CPU Costs lor Three Search Strategies (n = 64) 1 HIJK UK HIJ IJ HI P V PH . u , , , , , , 1 1 1 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (in ms) X ^ Q< (a) C P U Costs for Three Search Strategies Figure 5.9: A n a l y t i c a l Results for Three Search Strategies (64-dimensional Color His-tograms) a> 10 E 3> c a 8 0) 137 I/O Costs for Three Search Strategies (n = 64) HIJK U K HIJ - P V - P H HV 0 2000 4000 6000 6000 10000 12000 14000 16000 Time (in ms) (b) I/O Costs for Three Search Strategies CPU + I/O Costs for Three Search Strategies (n = 64) (c) Combined C P U and I/O Costs for Three Search Strategies Figure 5.9: Analytical Results for Three Search Strategies (Continued) As observed from Figure 5.9, the C P U , I/O, and combined C P U & I / O costs for Search P V are generally higher than those for the other two strategies. The costs for Searches P H and H V are usually almost the same. 138 I/O Costs for Four Search Strategies (n = 64) Figure 5.10: I/O Costs for Four Search Strategies (64-dimensional Color Histograms) As observed from the above figure, the I/O costs for the branch-and-bound strategy (denoted by B & B and indicated by purple lines) are potentially high. This explains why we need to consider the three search strategies which reduce the I/O costs. Experimentally, subimage queries of arbitrary size are classified into four main types, and we measure the execution time (showing the performance) and the dissimilarity score (showing the degree of accuracy) for each of these favorite schemes. Again, all three strategies agree on the results obtained when operated with Padding and Reduction Algorithms. Scheme H with 64-dimensional histograms is the best filtering scheme for "h"-typed queries, Scheme HI is the best one for "i"-typed queries, and Scheme HIJ is the best for both "j"-typed and "k"-typed queries. 139 Query. Type " , Search Strategies Search P V VA VB VC VD, Search P H VA VB VC VD Search H V rjA VB. VC VD " h " Scheme H 1,9 2.1 1.7 4.3 => DS = 73.0 TE = 4.45 ± 3.80 s • 1.9 2.1 1.7 4.3 DS = 73.0 TE = 3.39 ± 1.90 s 1.9 .2.1 1.7 4.3 = /^J>S = 73.0 TE .= 3.56 ± 1.91 s Scheme HI 1.8 2.1 1.8 4.3 ^ DS = 73.6 TE = 7.15 ± 0.76 s 1.7 2.1 1.8 4.4 =>DS = 75.8 TE = 5.33 ± 1.11 s 1.7 2.1 1.8 4.4 =• 7J5 = 75.8 = 6.32 ± 1.33-s " j " Scheme HIJ ,1.8 2.0 1.9 4.3 => DS = 74.0 TE = 5.80 ± 1.66 s 1.6 2.0 2.1 4.3 ^ DS = 75.2 TE = 2.47 ± 0.28 s 1.8 2.0 1.9 4.3 =>'DS = 74.0 T E = 2.76 ± 0.33 s "k" Scheme HIJ 1.3 2.0 2.1 4,6 •=> DS = 81.8 ' TE = 2.76 ± 0.73 s 1.1 2.1 2.2 4.6 =• DS = 82.6 TE = 1.45 ± 0.16 s 1.2 2.1 2.1 4.6 =• DS = 82.0 . /'/.; = 1.54 ± 0.21 s Table 5.9: Experimental Results for Three Search Strategies. Comparing the runtimes and the accuracies of the best configuration for each strat-egy, we found that Search H V is more efficient than Search P V , because the use of vertical filters in the latter is.applied to the whole set of all M images. In Search H V , the detailed search with the use of vertical filters is applied not to the set of all the images, but only to its most promising subset. , Search H V is more effective than Search P H , because the latter uses horizontal filters at all levels. If the number of images to be carried over from the, current level to the next level is not determined carefully (say, the number is too small), then for some queries, an image I that gives a good match at a finer scale could have been eliminated before reaching this finer scale. This may happen when there are sufficiently many.images which are not as good as I at the finer scale, but better than I at the coarser scale. Consequently, while delivering efficiency, Search P H may suffer from a loss of effectiveness. In Search H V , a horizontal filter is applied not to all the levels, but to the coarsest level. So, we only need to carefully assign a value to one (instead of three) predefined parameter \x that 140 controls the number of images to be carried over. As such, the chance of having the above mentioned I being eliminated before reaching the finer scale is reduced. Moreover, the experimental results on Searches P V , P H , and H V suggest that Search H V is the best strategy for the retrieval of desired images when operated with Padding and Reduction Algorithms. The reason is that.in addition to avoiding the kind of frequent jumping observed in the branch-and-bound strategy, Search H V also keeps a balance of efficiency and effectiveness. 5.6 Summary Although the branch-and-bound algorithm is accurate in its search, it can be inefficient for large databases. In particular, the number of images the algorithm must keep track of can be large, and jumping back and forth among images and scales may frequently be required. To avoid such jumping, we studied three strategies, namely Search P V , Search P H , and Search H V . In Search P V , the images are checked one after another using vertical filters. A t any point in time, a set of ii images currently having smallest distance values are kept. For each image, the algorithm keeps proceeding to finer scales until (1) the distance value, at a particular scale is already so large that the image cannot be qualified as a good match, or (2) the finest scale is reached and the image is either discarded or selected as a member, of the answer set, depending on the distance value. In Search P H , all the images in the database are checked by the horizontal filter at the coarsest scale in the first iteration. Poor matches at this scale are then eliminated. In subsequent iterations, images that are left from the previous scale/level are checked by filters at finer scales, and poor matches are again removed. The process repeats until the finest scale is reached and the top u images are returned. Search H V is a hybrid of the above two in which we use a horizontal filter on the 141 -first level and vertical filters on, the remaining levels. The results of the analytical and experimental investigations show that Search.HV is the best strategy for avoiding the frequent jumping and at the same time keeping a good balance of performance/speed and accuracy. When operated with Padding and Reduction Algorithms, the best 10 desired images can be retrieved efficiently and effectively from a collection of a thousand images in about 3.5 seconds. 142 Chapter 6 Conclusions 6.1 Conclusions As network connectivity has continued its explosive growth and storage devices have be-come smaller, faster, and less expensive, the number of on-line digital images has increased rapidly. We can no longer rely solely on traditional database retrieval technology based on manually associating textual descriptions with image contents. The development of efficient and effective content-based retrieval systems based on automated extracted con-tents (such as color) are necessary. In order to achieve this goal, several research projects have been carried out. Over the past few years, multi-dimensional indexing structures have been designed, multi-level filtering approaches have been proposed, and information preserving transformations have been suggested for providing efficient indexing. Expres-sive query language systems have also been developed to accommodate efficient querying from image databases for applications with dense and sparse image spaces. Other research projects for supporting efficient and effective query processing and optimization have also been carried out. Wi th the popularity of similarity matching on color and spatial information, many image database management systems store the information in local color histograms. To. 143 process user queries, some systems use fixed grid segmentation approaches. For further improvement on the efficiency and the effectiveness of content-based retrieval, multiscale matching approaches have been proposed. However, detailed analytical and experimental results on the determination of the suitable' number of levels for these approaches are seldom reported, and comparisons of different strategies for searching multiple scales are also rare. Moreover, in many situations, users are interested in, or can remember, only local image contents; therefore, subimage query processing is needed. Unfortunately, not many image database management systems can handle arbitrary-size subiriiage queries based on color and spatial similarity. For the systems that can deal, with subimage queries of arbitrary size, multiscale matching is rarely used. In this thesis, our first issue of investigation was to find a method for dealing with arbitrary-size subimage queries. To answer queries of this kind, some systems segment an image into several blocks, each of which has an associated color histogram. One problem with this arrangement is that subimage queries may be of arbitrary size that not neces-sarily an integral multiple of the chosen block size. Other systems use template-based matching algorithms. A key problem with those algorithms is that a lot of computation is needed, because of the large number of positions to be compared. Correspondingly, we proposed two algorithms, called Padding and Reduction, for dealing with subimage queries of arbitrary size. Knowing the image subregion I represented by the precomputed feature vector may not necessarily contain the same number of pixels as the query Q, we used Padding and Reduction Algorithms to estimate the best possible color histograms for Q and I . Here, we either (1) enlarged Q into a new query Q' that is of the same size as I or (2) reduced I to a new image subregion I ' that is of the same size as Q. In terms of effectiveness, both algorithms give the same best-case lower bound to the histogram distance. In terms of efficiency, the Padding Algorithm outperforms the Reduction Algo-rithm when the size differential between the subimage query and the image subregion is 144 large, and vice versa when the differential is small. Given subimage queries of arbitrary size, multiscale representation may improve the efficiency and the effectiveness of content-based retrievals. The idea is that depending on the scale or the need of a given query, a more appropriate scale can be used. So, a 4-level representation was proposed such that the entire image is.divided into four subregions, and each subregion is recursively divided into four .subregions, and so on. Since image contents are usually pre-extracted and stored, our second issue of investigation was to determine the suitable number of levels for such a representation. To do so, we analytically •and experimentally studied the performance and the accuracy.of the multi-level filtering schemes available for the representation. In the analyses, we used cost models to estimate the C P U , I /O, and combined C P U and I/O costs for each scheme. The results favor the schemes which start with the coarsest level (or the second coarsest level) of the 4-level representation and those that do not skip intermediate levels. In the experiments, given a subimage query, we measured the execution time (showing the performance) and computed the dissimilarity score (showing the accuracy). The results suggest that when using Padding and Reduction Algorithms, the desired images can be retrieved efficiently and effectively using only the top three levels. Hence, a 3-level hierarchy (up to the 4 x 4 segmentation) is preferred. Several strategies for searching multiple scales can be applied to the retrieval of desired images from a collection of database images which are stored in the multiscale representation. Branch-and-bound is one of the strategies; it is accurate in its search, but it can be inefficient for large databases. In particular, the number of images to be kept track of can be quite large, and frequent jumping back and forth among the data may be required. So, our third issue of investigation was to find an efficient and effective strategy for searching multiple scales. In this thesis, we studied three search strategies, namely Pure Vertical, Pure Horizontal, and Horizontal-and-Vertical. In the Pure Vertical 145 Search, images are checked "vertically" by vertical filters one after another, and a set of potentially desired images is kept. After a hew image is checked, it is either discarded or selected as a member of the answer set. In the Pure Horizontal Search, images are checked "horizontally" level by level using horizontal filters. After images are checked at a particular level, poor matches at that level are eliminated, and all remaining images are carried over to the next level where they are checked by another horizontal filter. The hybrid strategy — Horizontal-and-Vertical — uses a horizontal filter on the first level and vertical filters on the remaining levels for the most promising subset of the database images. To find the best strategy among the three, analytical and experimental evaluations were performed. The results indicate that the Horizontal-and-Vertical Search is the best when operated with Padding and Reduction Algorithms, riot only because the Search avoids the kind of frequent jumping as in the branch-and-bound, but because it keeps a good balance of performance/speed and accuracy. With it, the best 10 desired images can be retrieved efficiently and effectively from a collection of a thousand images in about 3.5 seconds. 6.2 Future Work Although the results of the thesis are very promising, there are some aspects to consider for further improvement. One aspect is to investigate methods to extend our Padding and Reduction Algorithms for dealing with arbitrary-size subimage queries of arbitrary shapes. For instance, a user may want to find images with a ship in the bottom right corner. It is well known that human memory is weak in retaining a fine granularity of spatial information of color. The user may not remember (or he may not be interested in) any other.portion of the images (not even a region next to the ship). Iri such a case, the subimage query he wants to submit may be of irregular shape (as shown in Figure 6.1, for example). Our Padding and Reduction Algorithms may be able to estimate the histogram distance. However, the algorithm for estimating the positional distance between the query 146 and the image subregion may need to be modified; computational geometry techniques may be required for handling queries of complex irregular shapes. Figure 6.1: Example of a Subimage Query of Arbitrary Shape Again, in some situations, due to the poor human memory capability for retaining a fine granularity of spatial information of color, the user may only be able to recall the information on the boundary of the region he is interested in, but not the center. In such cases, the query is no longer "solid" but "hollow". Examples of these queries are a ring-like query and the one in Figure 6.2. If the "hole" is small, Padding and Reduction Algorithms are expected to provide a reasonable estimation to the histogram distance. However, if the "hole" turns out to be large, care needs to be taken in handling this "hollow" subimage query of arbitrary size. Moreover, in some other situations, the user may not care about the color information on the "hole", and it can match any color. To deal with these kind of queries containing "don't care" parts, approaches similar to those for handling "hollow" queries can be applied. Figure 6.2: Example of a "Hollow" Subimage Query Since some users have poor memories, they may demand support for "hollow" 147 queries and queries containing "don't care" parts. However, other users may remember more than one portion of the images they have seen before. Hence, methods for handling subimage queries with these multiple "known" portions (for example, Figure 6.3) are also needed. A naive approach is to apply Padding and Reduction Algorithms to each portion independently. An intersection is then applied to the candidate sets of images returned by the Algorithms, and the resulting images are ranked thereafter. One problem with this naive approach is that for a subimage query with multiple "known" portions, performing Padding and Reduction for each portion may be time-consuming. Hence, more efficient approaches are necessary for handling arbitrary-size subimage queries of arbitrary shapes, "hollow" queries, queries containing "don't care" parts, and queries with multiple "known" portions. Figure 6.3: Example of a Subimage Query with Multiple "Known" Portions Enhancements to multiscale search strategy should also be explored. In the pro-cess of determining the appropriate number of levels in the multiscale representation and finding the best search strategy, the histograms which are examined during a search are of the same dimension. For example, with the best scheme for handling "j"-typed queries, all the histograms checked at Levels H , I, and J are of 64 dimensions. For possible en-hancement, we may explore the use of histograms of mixed dimensions. For example, in the Horizontal-and-Vertical Search, we can study the possible improvement (or degrada-tion) using 512-dimensional color histograms for horizontal filtering on the first level and 8-dimensional color histograms for vertical filtering on the remaining levels. 148 Moreover, in the experiments for determining the appropriate number of levels in the multiscale representation and for finding the best search strategy, color histograms of 8, 64, and 512 dimensions are used. These color histograms were created for real images collected from various sources and covering wide application domains. However, the possibility of bias in particular domains may exist. Correspondingly, we can generate realistic random color histograms [Strieker 1994], and use the resulting histograms to strengthen our findings. 149 Bibliography [Ashley et al 1995] Jonathan Ashley, Ron Barber, Myron Flickner, James Hafner, Denis Lee, Wayne Niblack, and Dragutin Petkovic. Automatic and Semi-automatic Meth-ods for Image Annotation and Retrieval in QBIC. Research Report R J 9951, I B M Almaden Research Center, San Jose C A , Apri l 1995. [Bach et al 1996] Jeffrey R Bach, Charles Fuller, Amarnath Gupta, Arun Hampapur, Bradley Horowitz, Rich Humphrey, Ramesh Jain, and Chiao-fe Shu. "The Virage Image Search Engine: A n Open Framework for Image Management". Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases IV (Vol 2670): 76-87. San Jose C A , January-February 1996. [Barber et al 1994] R Barber, M Flickner, J Hafner, D Lee, W Niblack, D Petkovic, J Ashley, T McConnell, J Ho, J Jang, D Berkowitz, P Yanker, M Vo, D Haas, D Lassig, S Tate, A Chang, P van Houten, J Chang, T Petersen, D Lutrell, M Snedden, P Faust, C Matteucci, M Raynor, R Peters, W Beck, and J Witsett. "Ultimedia Manager: Query By Image Content and its Applications". Digest of Papers of the Spring COMPCON '94:. 424-429. San Francisco C A , February-March 1994. [Beckmann et al 1990] N Beckmann, H - P Kriegel, R Schneider, and B Seeger. "The R*-tree: A n Efficient and Robust Access Method for Points and Rectangles". Proceedings of ACM SIGMOD Conference on Management of Data: 322-331. Atlantic City N J , May 1990. [Berchtold et al 1996] S Berchtold, D A Keim, and H-P Kriegel. "The X-tree: A n Index Structure for High-dimensional Data" . Proceedings of the 22th VLDB Conference. Bombay, India, September 1996. [Cardenas 1975] Alfonso F Cardenas. "Analysis and Performance of Inverted Data Base Structures". Communications of the ACM 18(5): 253-263. May 1975. [Castelli et al 1997] Vittorio Castelli, Chung-Sheng L i , and Lawrence D Bergman. Search-ing Image Databases at Multiple Levels of Abstraction. Research Report R C 20702, I B M T J Watson Research Center, Yorktown Heights N Y , January 1997. 150 [Chen et al 1997] Jau-Yuen Chen, Charles A Bouman, and Jan P Allebach.. "Multiscale Branch and Bound Image Database Search". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases V (Vol 3022): 133-144. San Jose C A , February 1997. [Chiueh 1994] Tzi-cker Chiueh. "Content Based Image Indexing". Proceedings of the 20th VLDB Conference: 582-593. Santiago, Chile, September 1994. [Dimai and Strieker 1996] Alexander Dimai and Markus Strieker. Spectral Covariance and Fuzzy Regions for Image Indexing, Technical Report BIWI-TR-173, Swiss Federal Institute of Technology, Zurich, Switzerland, Apri l 1996. [Faloutsos et al 1994] C Faloutsos, W Equitz, M Flickner, W Niblack, D Petkovic, and R barber. "Efficient and Effective Querying by Image Content". Journal of Intelligent Information Systems 3(3-4): 231-262. 1994. [Faulus 1996] Dwifiandika S Faulus. Design and Implementation of an Expressive Query Interface/Language for Image Database. MSc Thesis, The University of British Columbia, August 1996. [Faulus and Ng 1996] Dwifiandika S Faulus and Raymond T Ng. "EXQUISI : A n Expres-sive Query Interface for Similar Images". Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases IV (Vol 2670): 215-226. San Jose C A , January-February 1996. , [Faulus and Ng 1997] Dwifiandika S Faulus and Raymond T Ng. " A n Expressive Lan-guage and Interface for Image Querying". Machine. Vision and Applications 10(2): 74-85. 1997. [Finn 1996] Robert Finn. "Querying By Image Content". IBM Research 3: 22-25. 1996. [Flickner et al 1995] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ash-ley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, and Peter Yanker. "Query by Image and Video Content: The Q B I C System". IEEE. Computer 28(9): 23-31. September 1995. [Gong et al 1994] Yihong Gong, Hongjiang Zhang, and Chua Hock Chuan. " A n Image Database System with Fast Image Indexing Capability based on Color Histograms". Proceedings of 1994 IEEE Region 10 Conference on Frontiers of Computer Technol-ogy: 407-411. Singapore, August 1994. [Gudivada and Raghavan 1995] Venkat N Gudivada and Vijay V Raghavan. "Content-based Image Retrieval Systems". IEEE Computer 28(9): 18-22. September 1995. 151 [Gupta 1997] Amarnath Gupta. Visual Information Retrieval Technology: A Virage Per-spective. Virage Inc, San Mateo C A , February 1997. [Guttman 1984] Antonin Guttman. "R-trees: A Dynamic Index Structure for Spatial Searching". Proceedings of ACM SIGMOD Conference on Management of Data: 47-57. Boston M A , June 1984. [Hirata et al 1993] Kyoj i Hirata, Yoshinori Hara, Naoki Shibata, and Fusako Hirabayashi. "Media-based Navigation for Hypermedia Systems". Proceedings of ACM Conference on Hypertext: 159-173. Seattle W A , November 1993. [Hurvich 1981] Leo Maurice Hurvich. Color Vision. Sinauer Associates, 1981. [Ioka 1989] Mikihiro Ioka. A Method of Defining the Similarity of Images on the Basis of Color Information. Research Report RT 0030, I B M Tokyo Research Laboratory, Tokyo, Japan, December 1989. [Jacobs et al 1995] Charles E Jacobs, Adam Finkelstein, and David H Salesin. "Fast M u l -tiresolution Image Querying". Proceedings ofACMSIG'GRAPH Conference on Com- • puter Graphics & Interactive Techniques: 277-286. Los Angeles C A , August 1995. [Jain 1993] R Jain (editor). "NSF Workshop Report on Visual Information Management Systems". SIGMOD Record 22(3): 57-75. September 1993. [Kurniawati et al 1997] R Kurniawati, J S J in, and J A Shepherd. "The SS +-tree: A n Improved Index Structure for Similarity Searches in a High-dimensional Feature Space". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases V (Vol 3022): 110-120. San Jose C A , February 1997. [Li et al 1995] C L i , J Turek, and E Feig. "Progressive Template Matching for Content-based Retrieval in Earth Observing Satellite Image Database". Proceedings of SPIE Conference on Digital Image Storage and Archiving Systems (Vol 2606). Philadelphia PA. October 1995. [Lin et al 1994] King-Ip L in , H V Jagadish, and Christos Faloutsos. "The TV-tree — A n Index Structure for High-dimensional Data" . VLDB Journal 3(4): 517-549. October 1994. [Luenberger 1984] David G Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 1984. [Niblack et al 1993] Wayne Niblack, Ron Barber, W i l l Equitz, Myron Flickner, Eduardo Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. The QBIC Project: Querying Images by Content using Color, Texture, and Shape. Research Report R J 9203, I B M Almaden Research Center, San Jose C A , February 1993. 152 [Ng and Sedighian 1996] Raymond T Ng and Andishe Sedighian. "Evaluating M u l t i -dimensional Indexing Structures for Images Transformed by Principal Component Analysis". Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases IV (Vol 2670): 50-61. San Jose C A , January-February 1996. [Ng and Tam 1997] Raymond T Ng and Dominic Tarn. " A n Analysis of Multi-level Color Histograms". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases V (Vol.3022): 22-34. San Jose C A , February 1997. [Pentland et al 1994] A Pentland, R W Picard, and S Sclaroff. "Photobook: Tools for Content-based Manipulation of Image Databases". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases-II (Vol 2185): 34-47. San Jose C A , February 1994. [Petkovic et al 1996] Dragutin Petkovic, Wayne Niblack, Myron Flickner, David Steele, Denis Lee, John Y i n , James Hafner, Frank Tung, Harold Treat, Richard Dow, May Gee, M i m i Vo, Peter Vo, Bonnie Holt, Janet Hethorn, Kenneth Weiss, Peter Elliott, and Colin Bird. Recent Applications of IBM's Query By Image Content (QBIC). Re-search Report R J 10006, I B M Almaden Research Center^ San Jose C A , January 1996. [Pope and Lowe 1994] Arthur R Pope and David G Lowe. "Vista: A Software Environ-ment for Computer Vision Research". Proceedings of IEEE Conference on Computer Vision and Pattern Recognition: 275-280. Seattle W A , June l994. [Sakamoto et al 1994] Hiroaki Sakamoto, Hideharu Suzuki, and Akira Uemori. "Flexible Montage Retrieval for Image Data" . Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases II (Vol 2185): 25-33. San Jose C A , February •1994. • '• [Salton and M c G i l l 1983] Gerard Salton and Michael J M c G i l l . Introduction to Modern Information Retrieval. McGraw-Hil l , 1983.-[Samet 1990] Hanan Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990. [Sawhney and Hafner 1993] Harpreet S Sawhney and James L Hafner. Efficient Color His-togram Indexing for Quadratic Form Distance Functions. Research Report R J 9572, I B M Almaden Research Center, San Jose C A , October 1993. [Sawhney and Hafner 1994] Harpreet S Sawhney and James L Hafner. "Efficient Color Histogram Indexing". Proceedings of IEEE International Conference on Image Pro-cessing. 1994. • , [Schrage 1986] Linus E Schrage. Linear, Integer, and Quadratic Programming with LINDO (Third Edition). Scientific Press, 1986. 153 [Sedighian 1995] Andishe S Sedighian. A Comparative Analysis of Multi-dimensional In-dexing Structures for 'Eigenimages'. MSc Thesis, The University of British Columbia, December 1995. [Sellis et al 1987] Timos Sellis, Nick Roussopoulos, and Christos Faloutsos. "The R + - tree : -A Dynamic Index for Multi-dimensional Objects". Proceedings of the 13th VLDB Conference: 507-518. Brighton, England, September 1987. [Sigmon 1992] Kermit Sigmon. MATLAB Primer (Second Edition). Department of Math-ematics, University of Florida, 1992. [Sistla et al 1994] A Prasad Sistla, Clement Yu, and R Haddad, "Reasoning about Spatial Relationships in Picture Retrieval Systems". Proceedings of the 20th VLDB Confer-ence. Santiago, Chile, September 1994. [Sistla et al 1995] A Prasad Sistla, Clement Yu, Chengwen L iu , and King Liu . "Similarity Based Retrieval of Pictures using Indices on Spatial Relationships". Proceedings of the 21st VLDB Conference: 619-629. Zurich, Switzerland, September 1995. [Smith and Chang 1996] John R Smith and Shih-Fu Chang. "Tools and Techniques for Color Image Retrieval". Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases IV (Vol 2670): 426-437. San Jose C A , January-February 1996. [Stone and L i 1995] Harold S Stone and Chung-Sheng L i . Image Matching by means of Intensity and Texture Matching in the Fourier Domain. Technical Report 95-6, N E C Research Institute, Princeton N J , July 1995. [Strieker 1994] Markus A Strieker. "Bounds for the Discrimination Power of Color Index-ing Techniques". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases II (Vol 2185): 15-24. San Jose C A , February 1994. [Strieker and Dimai 1996] Markus Strieker and Alexander Dimai. "Color Indexing with Weak Spatial Constraints". Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases IV (Vol 2670): 29-40. San Jose C A , January-February 1996. [Strieker and Orengo 1995] Markus Strieker and Markus Orengo. "Similarity of Color Im-ages". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases III (Vol 2420): 381-392. San Jose C A , February 1995. [Swain and Ballard 1990] Michael J Swain and Dana H Ballard. "Indexing via Color His-tograms" . Proceedings of IEEE Third International Conference on Computer Vision: 390-393. Osaka, Japan, December 1990. 154 [Tam 1996] Dominic Pok Man Tam. An Analysis of Multi-level Filtering for High-dimensional Image Data. MSc Thesis, The University of British Columbia, Apri l 1996. [Tao and Dickinson 1996] B Tao and B Dickinson. "Classified Template Matching for Satellite Image Retrieval". Proceedings of 30th Annual Conference on Information Sciences and Systems. Princeton N J , 1996. [Tao et al 1997] Bo Tao, Kevin Robbins, and Bradley Dickinson. "Image Retrieval with Templates of Arbitrary Size". Proceedings of SPIE Conference on Storage and Re-trieval for Image and Video Databases V (Vol 3022): 2-11. San Jose C A , February . 1997. • , . [Thomasian et al 1997] Alex Thomasian, Vittorio Castelli, and Chung-Sheng L i . RCSVD: Recursive Clustering with Singular Value Decomposition for Dimension Reduction in Content-based Retrieval of Large Image/Video Databases. Research Report R C 20704, I B M T J Watson Research Center, Yorktown Heights N Y , January 1997. [White and Jain 1996] David A White and Ramesh Jain. "Similarity Indexing with the SS-tree". Proceedings of IEEE International Conference on Data Engineering. New Oreleans L A , February ,1996. [White and Jain 1997] David A White and Ramesh Jain. "ImageGREP: Fast Visual Pat-tern Matching in Image Databases". Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases V (Vol 3022): 96-107. San Jose C A , Febru-ary 1997. [Yao 1977] S B Yao. "Approximating Block Accesses in Database Organizations". Com-munications of the ACM 20(4): 260-261. Apr i l 1977. 155
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Efficient and effective subimage similarity matching...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Efficient and effective subimage similarity matching for large image databases Leung, Kai Sang 1997
pdf
Page Metadata
Item Metadata
Title | Efficient and effective subimage similarity matching for large image databases |
Creator |
Leung, Kai Sang |
Date Issued | 1997 |
Description | As network connectivity has continued, its explosive growth and storage devices have become smaller, faster, arid less expensive, the number of on-line digital images has increased rapidly] Correspondingly, efficient and effective content-based retrieval systems for handling image queries have become necessary. In addition, users are often interested in local contents within subimages. In this thesis, we develop Padding and Reduction Algorithms to support subimage queries of arbitrary size based on local color information. The idea is to estimate the best-case lower bound to the dissimilarity measure between the query and the image. By making use of multiscale representation, this lower bound becomes tighter as the scale becomes finer. Because image contents are usually pre-extracted and stored, a key issue is how to determine the number of levels used in the representation. We address this issue analytically by estimating the required CPU and I/O costs, and experimentally by comparing the performance and the accuracy of the outcomes of various filtering schemes. Our findings suggest that a 3-level hierarchy is preferred. We also study three strategies for searching multiple scales. Our studies indicate that the hybrid strategy with horizontal filtering on the coarse level and vertical filtering on remaining levels is the best choice when using Padding and Reduction Algorithms. Using the hybrid search strategy in the multiscale representation with the determined number of levels, the best 10 desired images can be retrieved efficiently and effectively from a collection of a thousand images in about 3.5 seconds. |
Extent | 13165704 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-03-25 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0051030 |
URI | http://hdl.handle.net/2429/6537 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1997-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1997-0563.pdf [ 12.56MB ]
- Metadata
- JSON: 831-1.0051030.json
- JSON-LD: 831-1.0051030-ld.json
- RDF/XML (Pretty): 831-1.0051030-rdf.xml
- RDF/JSON: 831-1.0051030-rdf.json
- Turtle: 831-1.0051030-turtle.txt
- N-Triples: 831-1.0051030-rdf-ntriples.txt
- Original Record: 831-1.0051030-source.json
- Full Text
- 831-1.0051030-fulltext.txt
- Citation
- 831-1.0051030.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051030/manifest