Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design and implementation of an expressive query interface/language for image database Faulus, Dwifiandika S. 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1996-0443.pdf [ 8.56MB ]
JSON: 831-1.0051362.json
JSON-LD: 831-1.0051362-ld.json
RDF/XML (Pretty): 831-1.0051362-rdf.xml
RDF/JSON: 831-1.0051362-rdf.json
Turtle: 831-1.0051362-turtle.txt
N-Triples: 831-1.0051362-rdf-ntriples.txt
Original Record: 831-1.0051362-source.json
Full Text

Full Text

Design and Implementation of An Expressive Query Interface/Language for Image Database by Dwifiandika S. Faulus B.Sc. Honours, Queen's University at Kingston, 1994 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF T H E REQUIREMENTS FOR T H E DEGREE OF Master of Science in T H E FACULTY OF GRADUATE STUDIES . (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia August 1996 © Dwifiandika S. Faulus, 1996 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Computer Science The University of British Columbia 2366 Main Mall Vancouver. BC Canada V6T 1Z4 Abstract One key component in providing effective image data management support is an expressive query language/interface. The query languages and interfaces of existing image database systems are quite adequate for image databases where the images are different. However, for many applications, such as remote-sensing and medicine, where the images are similar, most of the existing languages are restrictive since tools are not provided for the users to express subtle differences that may exist between the images that the users want and hundreds of images that are similar. The goal of this thesis is to provide a more expressive query language/interface that can accomodate querying image database for such applications. Toward this goal, we have proposed the concept of range specifications to queries. A range spec-ification allows the user's ambiguities in specifying various visual image features for the query to be captured and incorporated to the similarity matching process. The result of a query is typically n images I\,..., In that are the closest matches to the query. The reformulation language supported by existing systems is typically in the form of "like-this" query (e.g., /,•) that would return n closest matches to 74-. We have proposed 'like-this-in-what' reformulation language which allows the user to express not only 'like-this' query, but also specific aspects that the user wants to include or exclude. More formally, 'like-this-in-what' reformulation language allows not only queries of the form but also of the more general form: ii {U - Si + A) + (I, - Sj + /?;) + ••• Last but not least, we have designed and implemented a prototype query language/interface called EXQUISI ( Expressive Query Interface and language for .Similar /mages) which incorporates the above concepts. In Exquisi, we give examples how range specifications can be incorporated into the image features by introducing query features which include color range, color gradient, texture set, and location box. We also shows how 'like-this-in-what' reformulation can be incorporated in Exquisi so as to minimize the cost of reformulating the query and to maximize the expressiveness of the query reformulation. iii Contents Abstract ii Contents iv List of Figures v i i Acknowledgements vi i i Dedication ix 1 Introduction 1 1.1 Content-Based Querying for Image Databases 2 1.2 M a n u a l vs Au tomat i c Ext rac t ion of Content-Based Image Querying 5 1.3 Overview of existing I D B M S querying capabilities 8 1.4 Deficiencies of Ex is t ing Query Languages/Interfaces 9 1.5 Mot iva t ions and Contr ibut ions 11 1.6 Thesis 's Outl ine 13 2 Related Work 15 2.1 Image Features and Representations 16 2.1.1 Color 16 i v 2.1.2 Texture 17 2.1.3 Shape 18 2.1.4 Other Content Features ". . . . 19 2.2 Querying Tools and Supported Queries 20 2.3 Query Reformulation 24 2.4 Summary 26 3 Range Queries and The Input Module of Exquisi 28 3.1 Color Range Queries 28 3.2 Texture Interface and Range Queries 34 3.3 Objects in Exquis i 35 3.4 Locat ion B o x Queries 38 3.5 Color Gradient Queries 40 3.6 Summary 42 4 The Output and Query Reformulation Module of Exquisi 43 4.1 Outpu t Modu le of Exqu i s i 44 4.2 Query Reformulation 46 4.2.1 Principles of "Like-This - In-What" Queries 46 4.2.2 Facilit ies Supported by the Query Reformulation Modu le . . 48 4.2.3 Examples of Reformulated Queries 51 4.3 Summary 51 5 Semantics of Queries With Respect To Similarity Matching 53 5.1 Similar i ty Ma tch ing with Color Ranges 53 5.2 Simi lar i ty M a t c h i n g wi th Co lo r Gradient and Texture Set Queries . 59 5.3 Similar i ty Ma tch ing wi th Locat ion B o x Queries 60 v 5.4 Incorporating User Preferences 62 5.5 Summary 63 6 Implementation of Exquisi 65 6.1 Implementation Goals 65 6.2 Object Representation and Graphics Functions 66 6.3 Image Display and Processing 67 6.4 Color 68 6.4.1 Color Space 68 6.5 Texture -. 70 6.5.1 Texture M o d e l 70 6.5.2 Interface for texture querying 71 6.6 Query Representation 72 7 Conclusions 75 7.1 Future Work 77 7.1.1 Interface/language issues 77 7.1.2 Query Processing and Opt imiza t ion Challenges 78 Bibliography 81 Appendix A . Color Space and CIE XYZ - HSV - RGB Conversion Algorithms „ 87 v i List of Figures 2.1 Color Picker of Q B I C system 21 2.2 Tool to perform shape querying in Q B I C system 22 2.3 The Top Query W i n d o w of Q B I C system describing a multi-* query 23 3.1 Color Ed i to r W i n d o w for Specifying Color Ranges 30 3.2 Color Holder W i n d o w and Color Range W i n d o w 32 3.3 Query Composi t ion W i n d o w - Workspace for Composing Queries . . 36 3.4 Init ial Image to be Retrieved „. 37 3.5 Color Gradient W i n d o w 41 4.1 Query Output W i n d o w 45 4.2 Query Reformulation W i n d o w 47 4.3 A Reformulated Query 49 5.1 Changing the Similar i ty M a t r i x for a Color Range Query 56 6.1 C R T color gamut 69 6.2 Example of Exquis i ' s Query Representation 74 .1 Chromat ic i ty diagram 88 v i i Acknowledgements I would like to sincerely thank my supervisor, D r . Raymond N g , for his support and guidance throughout the course of my work. His suggestions, advice, and en-couragement during the various stages of my work has provided me wi th invaluable knowledge and experience for beyond the scope of my thesis. I would also like to thank to my second reader, D r . Jack Snoeyink, for the time that he spent reading and reviewing my work. His comments and suggestions were very helpful and valuable. I'm indebted to my parents for their constant unfailing love, prayers, and support during this endeavour. W i t h o u t them, I wi l l not be what I am. A special thanks also goes to Dhe-Dhe ' for her endless love, patience, and understanding. Thanks also to friends, both inside and outside of the University, who have helped me in many ways. They have made it possible for me to have a joyful life during my stay at school and my time away from home. Foremost, what I have achieved is only possible by the blessings of A l l a h S . W . T , the M o s t Beneficient and the M o s t Merciful of a l l . DWIFIANDIKA S. FAULUS vi i i In the name of Allah Swt. Most Merciful and the Most beneficient ix Chapter 1 Introduction The main function of a database management system, or D B M S , is to efficiently and effectively manage data. Since first introduced 30 years ago, the design of D B M S has been focused on the management of alphanumeric data. A s a result, D B M S s have been very succesful and have proliferated in many areas, notably in business and industry. Advances in image storage and processing technology over the last decade have significantly increased the generation of non-alphanumeric data, such as i m -age, video, and audio. Today images are being generated at ever increasing rate by sources such as satellites, photographic devices, biomedical imaging, and sci-entific experiments. For example, an estimated 1 terabyte of image data wi l l be generated by N A S A ' s earth observing system [GR95]. There are already many ap-plication areas in which image is the pr imary form of data, such as art galleries and museum management, Geographical Information System (GIS) , remote sensing and management of earth resources, picture archiving, and retailing. The N F S ' s 1 workshop on visual information management system (VIMS) in particular has iden-tified four Grand Challenge application domains from which image database will play an important role. These include education, medicine, engineering and sci-entific image and visualization systems, and geographic/environmental information systems [Jai93]. With such growing importance of images in many applications, it has become necessary for a database management system to provide support for images. Among other things, it should support adequate facilities for the retrieval of images from the database. 1.1 Content-Based Querying for Image Databases Given a database, a user typically wants to retrieve specific information which sat-isfies certain criteria set by the user. For example, in a real-estate database, the user may want to find those houses that are located at certain location and within a given price range. The query interface provided by the DBMS allows these crite-ria to be more formally specified and then translated into a more precise semantic description understood by the query processing engine of the system. Since queries depend on particular needs of the user, they are typically not known apriori by the system. Thus, the query interface should be able to accomodate a range of possible queries. As an interface between the user and the database, it is also important for the query interface to be sufficiently expressive. It is certainly of little use if only a small subset of information from the database can be captured. Furthermore, if the system is designed to work with database from a wide range of applications, the 2 query facility should be able to organize data transparently from the domain of the data. In traditional DBMSs, this has typically been done by abstracting data from the application using some sort of data model. Associated with a data model are operations to manipulate data. In relational model, for example, data is organized as tables of values and each row in a table represents a collection of related data values. These values can be interpreted as a fact describing an entity or a relationship instance. A query can be specified by means of a relational query language such as SQL [EN89] which is based on relational algebra formalism. Other data models that have been proposed include network, hierarchical, and object-oriented data models [EN89]. Among these models, relational model is the most popular model due to the more formal mathematical foundation for its query language and its greater flexibility. Since the data stored in a database is alphanumeric, a query can be specified precisely, meaning that there is no ambiguity in interpreting the query. This is due to the fact that human are precise in remembering alphanumeric information. As a result, it is possible to perform an exact matching between a query and the database. Thus, precise answers to the query can also be obtained. The kind of information that can be extracted from an image is called, image features. As opposed to alphanumeric data, image features may be semantic-bearing or non-semantic-bearing. A semantic-bearing image feature involves contextual in-terpretation of the image. This information typically requires some previous knowl-edge or experience about the context of the image being examined. For example, 3 some knowledge about cars is necessary before one can recognize that an image contains a sedan car of specific type and make. On the other hand, a non-semantic-bearing image feature refers to a visual property or visual content of an image such as color, texture, and shape. One characteristic of this kind of image feature is that its extraction is context-independent. Color, texture, or shapes from an image may be extracted without having prior knowledge of the image. For example, one may extract a car-shape object from an image without having to recognize that the object is interpreted as a car, or extract a region with a specific texture without recognizing that it is the texture of a brick. In content-based image querying, a query is specified using image features that are non-semantic-bearing. For instance, given an image database, one may be interested in finding those images with a particular mixture of colors or of a particular shape and texture pattern. One key characteristic of content-based querying is the imprecision and am-biguity by a user in specifying various image features of the query. Most people find it difficult to accurately recall the exact color, shape, or location of the objects that they have seen in a given image. It is also difficult to discriminate between instances of an image feature when they are sufficiently similar to each other. An example is in differentiating between shades of the color green. Studies in the area of psychol-ogy show that there is also involvement of perception when a person interprets and understands an image [Bad90]. There may be as many interpretations of an image as there are people. A person may also expect different regions to be recognized as similar when his or her goal(s) change. In addition, perception is affected by what his or her language and culture emphasize [PM94]. All of these factors support the 4 argument that different models and techniques of organizing, indexing, and querying image data, are necessary rather than relying on the existing, alphanumeric-based database techniques in order to support content-based image querying [Jai93]. In this thesis, we focus our discussion on some issues related to support ing content-based image querying. Specifically, we investigate the issues in providing an expressive query language/interface for content-based image querying. This wi l l be explained in more detail later. 1.2 Manual vs Automatic Extraction of Content-Based Image Querying Since images are often described and distinguished by the image features they con-tain, it is necessary for an image database management system (or I D B M S for short) to support content-based image querying. Due to the nature of image features, sup-port ing content-based querying requires different mechanisms than those of textual data. It also requires the abil i ty of the system to extract image features from an image and to support image querying based on these features. In term of defining a query, there are two approaches of querying that can be done depending on whether the image features are extracted manually or automatically. The first approach relies on the use of t radi t ional D B M S to facilitate database operations by using textual annotations which are attached to each image [OS95, RS92a, RS92b, ST92] . The images themselves are not actually part of the database, instead they are referenced by these annotations. There is usually no image un-derstanding capabilities incorporated into the system. Image features are usually 5 modelled as a set of textual attributes extracted manually and managed within the framework of traditional DBMS. Querying an image in this case would be done in-directly by having the user to specify these attributes textually. As an example, to specify a color the user can choose from among a set of textual descriptions" of the colors such as "blue", "navy blue", "dark blue", etc. This, in effect, is similar to keyword-based querying. As these are part of the traditional DBMS, searching tech-niques and query operations are the same as in the case of alphanumeric database, ie. an exact matching is possible. There are many major drawbacks with this approach. First, it is quite clear that this approach is labor intensive, since the task of extracting the attributes of the images in the database typically has to be done manually. For a database of large size, thousand or even hundreds of thousands of images, this approach be-comes infeasible. Second, the wealth of information contained in an image can not be sufficiently described by merely using keywords. It is also difficult to translate the visual information to a precise textual description. For example, in the database there may be images which contain the same fish objects, but with a different shape formation in each image. It will be difficult to capture this into the textual de-scription, let alone trying to discriminate them. This results in limited querying capabilities to only the fixed set of keywords that are made available and the lack of system's ability to sufficiently discriminate among images. In addition, needs from users may not be'well represented by these keywords. Even if they are, different users may have a different perception of an image and it may not be the same as the person who assigns the keywords. The second approach of querying images automatically extracts visual in-6 formation contained in an image. Image processing methods are used to extract different image features, such as colors, textures, edges, and shapes. Each of these features is usually represented as either a numeric value or a vector. For example, a color may be defined by a 3-tuple corresponding to its location in the RGB color space. Images are then represented by these image features. Thus, one can view an image as a point in a multidimensional image space. It is easy to see that the dimensionality of this space is high, typically more than 20 dimensions. In terms of database indexing and searching, this high dimensionality poses some problems. In addition to the extra space required to store the features, computation time is increased to search an image in the database. Several approaches have been taken to accomodate this problem, such as trying to reduce the number of dimensions and at the same time to minimize the loss of information [Sed95]. Since this is not the main goal of this thesis, this issue will not be discussed further. Queries in this approach are typically specified visually or graphically. To define a color, for example, the user can choose from a given set of colors, or define its components such as its RGB values. Shape specifications can be accomodated, for example, by means of a drawing facility. It is also possible to incorporate domain specific information of images by attaching semantic querying capability typically by means of keyword-based querying. This would allow queries to be specified facilitating both on semantic and visual information. Typically keyword searching is first done to prune the search space by removing the subset of the database that does not correspond to the current query context. Then content-based searching is done to the images left in the search space. One key difference between the two approaches is in the way searching is 7 done. Retrieval in the manual approach is based on exact matching since queries are done by matching alphanumeric keywords. On the other hand, retrieval in the automatic approach is based on similarity matching. Similarity matching states that given a query which is represented as a point Q in the image space, find the images Ii, • • •, In that are within a certain distance threshold e from Q or find the n nearest matches. There are two reasons why similarity matching is necessary in image database searching. First, image processing techniques that are used to extract image features inherently introduce noises and distortions [GM92]. Second, as discussed previously, queries are an estimation of the image the user is looking for. For example, the user often uses fuzzy identifiers, which are not exact, to describe a feature, such as "reddish ball", "top corner of the image", and "sand-like texture". 1.3 Overview of existing IDBMS querying capabilities As we have discussed before, there are some important differences between image and alphanumeric data. These differences are fundamental enough that new tech-niques of organization, indexing, and query processing of image data are necessary. One of the important issues in developing an IDBMS is the provision of its query interface/language. This is the main issue that will be addressed in this thesis. In general, the input module of the query interface accepts a query Q in which part of it may be graphical. The query may be of the form "Find images that contain color Cx or Texture Ty" or "Find images that contain 'this' shape at 'this' location," or complex combinations of such querying conditions. Typically, as 8 answers to Q, the output module of a system returns the top-n matches, which are images h,- • - ,In stored in the database that are closest to Q. (Hereafter, we use the notation Ans(Q, n) to denote the answer set I\, • • •, /„.) The system's notion of "closest", however, may be quite different from the user's notion, and the user's imprecisions might mean that the user is not absolutely clear on what he/she is looking for. A s a result, the answers may not be what the user wants. Th i s creates the necessity to refine or reformulate the query before the desired images can be obtained. Whi le most of the systems do not explici t ly support reformulation, some systems [N93+, L F 9 4 + , Kato92] provide a query mechanism to allow query reformulat ion 1 . Typ ica l ly this is done by allowing the user to pick one of the returned images I\, • • •, In as the refined query for further retrieval. Th is is called "like-this" query or query-by-example. 1.4 Deficiencies of Existing Query Languages/Interfaces It has been reported that existing systems wi th the kinds of input, output, and reformulation modules described above are effective for applications such as art gallery, museum, and fashion image databases [F95+]. However, we found that for other applications, such as air-borne forest cover images and many types of medical images, the expressiveness of the query languages/interfaces of existing systems is inadequate. The main characteristic that differentiate these kinds of images from the former is that these images are typical ly similar to each other in the database. Th i s notion of similar images can be best explained by regarding the dis tr ibut ion 'The term "query reformulation" has been used in different contexts to mean different opera-tions [LHQ91, LS90, AKS96, GC93]. We use the term to mean "refinement of the initial query to obtain better answers" 9 of each image in the appropriate image space. If the image space is sparse, e.g., art gallery images, neighbouring images are sufficiently different from each other. Even if the query point, which represents the query in the image space, is not accurate, the top-n maches are still accurate, i.e., Ans(Q + 5, n) « Ans(Q, n). We found that when the images are sufficiently similar, which corresponds to a dense image space (e.g., forest cover images), existing query interfaces/languages are restrictive since tools are not provided for users to express subtle differences among neighboring images. In this case, if the query point is not accurate because of restrictions from the interface, the top-n matches may be vastly inaccurate. One may argue that one way to solve this problem is to increase the value of n to a much larger value m so that Ans(Q + 5,m) « Ans(Q, n). This is clearly not a good solution because the larger the value of TO, the longer it takes for query processing for the retrieval of images, and for the user to sort through the returned images. More specifically, we identify three main problems or inadequacies with ex-isting query interfaces/languages: • Lack of incorporation of imprecision and ambiguities in specifying queries Querying tools provided by these interfaces often do not consider the inherent ambiguities and imprecisions of a user's query. The user, in this case, is forced to formulate an overly exact query. For example, the user has to specify an ex-act color Cx or an exact location for a query. This may result in an inaccurate query when the image space is dense and may lead to wrong answers. • Lack of support for query reformulation With "like-this" query, the user can perform a query reformulation in the 10 sense that the reformulated query is one of 7 i , . . In other words, "like-this" query is the most min imal reformulation language in that it only allows image-level reformulation. However, the language does not allow the user to utilize more specific visual information of the returned images to refine the query. For example, the user can not select only part of the returned image or incorporate contents of several returned image to refine his/her query. Therefore, for many applications especially those wi th dense image space, the language is inadequate. W h a t is needed is a more expressive reformulation language which can be used to specify the more subtle differences among very similar images. • Lack of provision on adequate querying support The common approach taken by many existing systems is that they tend to put a lot of emphasis on what can be done right now on query processing. The result is the querying support that depends on the query processing capabilities of the system. The user may then find that the query interface is insufficient to support the kind of queries that are needed. We believe that, as far as research goes, it makes sense to first define a target language that is needed, and then work on how to process queries expressed in the language. 1.5 Motivations and Contributions The l imitat ions and deficiencies of existing systems as described in the previous section give major motivations as to why we aim to develop a more expressive query interface/language for image databases. Another motivat ion is that wi th the ever-growing use of the internet, there are more and more image database servers being 11 provided in the internet. To maximize the efficiency and the throughput of an image database server, it is beneficial to provide more expressive querying facilities so that each individual client/user can be as precise as possible in specifying his/her query. This is important to reduce the number of times a query needs to be reformulated, particularly when the queries are transmitted across a network. The work of this thesis is a result of addressing the above problems faced by existing query interfaces/languages. As a preview," the key contributions of this thesis can be summarized as follows: • Range specifications to queries To better capture the ambiguity when the user composes a query, we introduce the concept of range specifications to the image features. A range specifica-tion allows an image feature to be defined by specifying a range of values for that image feature. All values in a range are to be treated identically in the similarity matching of the query. We give examples of how range specification can be incorporated into color, texture, and location features, by introducing color range, texture set, and location box image features. This allows the user to specify, for example, a range of colors that are perceived to be similar. • "Like-this-in-what" queries We propose a reformulation language that support "like-this-in-what" query reformulation. This allows a user to express not only "like-this" queries, but also specific aspects, possibly sub-image level, that the user wants to include and exclude in the reformulated query. More specifically, Exquisi al-lows reformulation not only of the form but also of the more general form: (Ii - Si + (3i) + (Ij - S3 + f3j) + • • • 12 • We have addressed several issues that are raised in support ing the querying facilities mentioned above. In particular, we have addressed how the semantics of the different image features should be defined. We have also addressed how range specifications and the "like-this-in-what" reformulation mechanism can be incorporated to the s imilar i ty matching functions. • Development of EXQUISI query interface/language Last but not least, the above key ideas have been implemented through the development of the EXQUISI system ( impressive QUery interface for Similar /mages). The expressiveness of Exquis i query language allows querying from an image database where the images are similar. Its input module provides fa-cilities that incorporate range specifications for different image features. C u r -rently, Exquis i provides facilities to support color range, texture set, and lo-cation box specifications. In addit ion, Exquis i also allows color gradient to be specified. The reformulation module of Exquis i provides facilities to sup-port "like-this-in-what" query reformulation. We have also implemented the reformulation module that not only allows a finer granulari ty of query refor-mulat ion, but also the uti l izat ion of the input module, thus achieving the same expressiveness power as the input module. 1.6 Thesis's Outline The remaining chapters of this thesis are organized as follows: Chapter 2 describes related works. It reviews the different content features and representations that are used by existing image database systems. It also discusses the query lan-guage/interface of these systems. Chapter 3 explains the concept of range speci-13 fications and discusses the range queries and the input module of Exqu i s i . Chapter 4 describes the "like-this-in-what" reformulation mechanism and how it is incor-porated into Exquis i . It also describes the output module of Exqu i s i . Chapter 5 explains how the different type of range queries, namely color range, texture set, color gradient, and location box, can be incorporated into the s imilar i ty matching process. Chapter 6 describes some of the issues related to the implementation of Exqu i s i . In particular it describes the implementation of the object representation, graphic functions, image display and processing, and the query representation of Exqu i s i . It also discusses the issues related to the choice of appropriate models and representation for color and texture. Final ly , chapter 7 presents the conclusions and suggestions for the future work. The query processing and opt imizat ion challenges are also discussed. 14 Chapter 2 Related Work In the past several years, research in providing image database field has resulted in the development of various prototype systems which can be grouped into two categories. Systems that are developed for specific applications typical ly employ query language and query processing techniques that are suited to the intended application [BPJ92 , G M 9 0 , P P S 9 4 , RS92b, S H K 9 3 , Sri95]. Thei r applicabil i ty to image databases to different domains are usually l imi ted. O n the other hand, the query languages of general-purpose systems are characterized by the use of image features such as colours and textures that do not bear application-specific informa-t ion. Examples of such systems are Q B I C [F94+, F 9 5 + , L B 9 4 + , L F 9 4 + , N93+], M i y a b i [Kato92], F M R [SSU94], Q V E [HK92, K K 9 3 ] , Wavelet [JFS95], and the system developed by Gong et al . [G94+]. Our goal in this thesis is to provide a general-purpose query language. Thus, we wi l l not consider systems of the first category. This chapter wi l l examine several of the current systems to give a better understanding of them. In particular, we look at how they address the issue of providing the query language and interface, which is the central issue of this thesis. 15 2.1 Image Features and Representations In this section, we describe the various schemes used by current systems to represent image features. 2.1.1 Co lo r In vision and image processing community, colour has been widely used for i m -age segmentation and classification tasks. Studies have been done suggesting that colours play an important role in human understanding of an image [Hur81]. Th is can be taken as a strong preposition for an image querying system to support colour querying. A l l systems that we review support some form of colour querying. Several mechanisms have been proposed to extract and encode colour information from an image. One popular scheme is color histograms, which is used by several systems such as Q B I C , M i y a b i , F M R , Wavelet, and the system by G o n g et a l . [G94+]. In Q B I C and F M R , s imilar i ty matching between two histograms is based on the weighted Eucl id ian distance of the normalized histograms. The system by G o n g et al . views the largest 20 bins of the colour histogram as forming a hyper-polygon in the color space, in which two parameters, Weighted Perimeter and Weighted Angle can be extracted from the largest 20 bins. Simi lar i ty matching is calculated by comparing these values. A l so worth mentioning is the histogram intersection technique developed by Swain and Bal la rd [SB91] which they claim to be robust for image identification problem. In this technique, the result of the intersection of two histograms is the number of pixels in one histogram that have corresponding pixels 16 of the same colour in the other histogram. Several other methods have been suggested to represent colour information from an image and to compute its similarity. For example, Strieker et a l . [SD96] uses three of the first order moments of the color dis t r ibut ion. They claim that this method results in a more robust representation than color histograms. In the Q V E system, colour s imilar i ty of the two images is computed by dividing the image into small windows ( 3 x 3 pixels ) and comparing the first order autocorrelation of corresponding windows of the two images. Smal l window displacements are allowed to take into account query imprecision. In M i y a b i , image's color information is encoded, by means of an image regioning and merging technique, into a color picture index. The VisualSeek system [SC96] also uses colour information to segment the image into regions by using the color histogram backprojection method of Swain and Bal l a rd [SB91]. The encoding of color information is then done using the "binary color set" technique in which only those colors which are sufficiently present in a region are chosen. The Wavelet system developed by Jacobs et a l . [JFS95] utilizes a two-dimensional wavelet decomposition on the color information of the image. The decomposition process produces wavelet coefficients. Image similar i ty is done by comparing the significant coefficients of the query image and each image in the database. 2.1.2 Texture Texture is another important visual cue of an image. Whi l e it is relatively simple for people to visually identify and discriminate information from an image, it is not yet clear how the same can be done by computer. Ongoing research into providing 17 computat ional framework for texture discrimination have resulted in various texture models. Examples are Simultaneous Autoregressive texture model [MJ92], and a texture model based on the 2D Wold decomposition [LP95]. M o s t models, however, are l imited to only a small subset of textures or synthetic textures. A s a .result, efforts to support texture querying are also constrained by currently available texture models. Only a few of the systems reviewed support texture querying. A m o n g them are Q B I C , F M R , and Photobook [PPS94]. Q B I C uses a modified coarseness, con-trast, and directionality features of [ T M Y 7 8 ] to represent texture. Weighted Euc l id -ian distance are used to compute the similarity. F M R uses coarseness feature where a coarseness of a region is measured by divid ing the region into small blocks and then calculating the block averages of the four-direction entropy of the co-occurrence array in each block of the region [HSD73]. Photobook develops a texture discr imi-nation mechanism that combines several texture models, such as S A R and 2D wold decomposition texture model, to take advantage of the capacity of the individual 's model. 2.1.3 Shape For the human eye, identifying a particular object in an image is a relatively simple task. Even a child is capable of identifying all dogs that are present in a collection of images. Th is seemingly simple task is, unfortunately, not currently feasible. The complexity stems from the problem of how to group image features into meaningful objects and attach semantic descriptions to objects (i.e. to understand that "this object is a dog") v ia model matching, in which a human is much better than com-18 puter [F95+]. A s a result, research has concentrated on identification of shape which is non-semantic. Efforts in object querying, correspondingly, are l imited to querying of the form "find objects whose shape is similar to the shape of this 'dog' object". Even so, object recognition of this k ind is s t i l l difficult, due to noise and distortions that are inherently present in images. In systems reviewed, shape-querying can be characterized into two kinds. In the first k ind , shape is explicit ly given measurable properties, such as area, circular-ity, eccentricity, and moments. Similar i ty matching is generally based on-calculating the weighted Eucl id ian distance of these properties. Systems that support this kind of shape-querying includes Q B I C and the system by G o n g et a l . [G94+]. In the second kind of shape-querying, shape is used merely as a spatial and area identifier for other image features such as colour and texture. The layout of the shape itself is not significant. It is used mainly to facilitate the definition of other image features on different regions in the image at the query composit ion stage. Examples are M i y a b i , the Wavelet system [JFS95], and F M R . Whi l e matching on the first two systems are done at image level, F M R uses the shape to perform sub-image level matching. In this case, s imilari ty on individual features is computed for each shape defined. The overall s imilar i ty is then the combination of similarities of individual shapes. 2.1.4 Other Content Features Edges of an image is another commonly used image feature. Significant edges in the image are extracted using some edge detection and thinning techniques. Template matching is then performed to measure the similar i ty between the edges of two i 19 images. Typical ly , a query using this image feature is of the form "query-by-sketch" where a user draws a sketch of dominant lines or edges of the image he/she is looking for. Systems that use this feature include [GR95, H K 9 2 , Kato92, K K 9 3 , N93+]. The approach of [HK92] reduces the image to a 64 by 64 resolution image and partit ions it into an 8 by 8 set of blocks. Ma tch ing is done by correlating each block of the query image and the corresponding block in an image from the database. Because each block is spatially correlated separately, the method allows for some spatial shifting between the sketch and database images. 2.2 Querying Tools and Supported Queries For a user to be able to formulate a query, querying tools should be provided. These tools provide the user wi th the abili ty to define the different content features for his or her query. This also implies that the capabil i ty of the querying tools determines the kinds of queries that can be supported and the expressiveness of the query language. In this section we describe the querying tools of existing systems and examine their expressiveness. In Q B I C , a query specification can include colour, texture, shape, and lo-cation. The facility for color features allows two kind of queries: 1) average color query, and 2) color percentage query. A n average color query allows the user to find images or objects that are similar to a selected color or to a color of an object. A color percentage query allows two or more colors to be specified as percentages. The goal is to find images or objects that contain the specified colors and distributions. In terms of the user interface, a color is specified by means of a set of ( R , G , B ) sliders. A color patch shows the currently selected color. Figure 2.1 shows color 20 Figure 2.1: Color Picker of Q B I C system picker, Q B I C ' s user interface for defining color features. A s seen from the figure, the user is given a set of color samples (256 colors) to pick. The picker allows up to 5 colors to be picked and associated wi th a query. The user can also define a percentage dis tr ibut ion for each color defined. In the example of Figure 2.1, two colors are defined each with 15 % and 13 % dis t r ibut ion. Queries on texture are specified by selecting from a set of sample textures. Choosing a texture for the query is done by browsing the sample texture list. Fur-thermore, Q B I C also allows querying based on shape and sketch. To define a shape, a drawing editor, as shown in Figure 2.2, is provided for the user to draw an outline of the shape. Polylines, ellipses, rectangles, and lines, can be used to draw differ-ent shapes. A colour can also be specified for a shape. The images that wi l l be returned are those that contain the specified shape and color(s). The same editor is used to define a sketch query. In this case, the user draws a set of dominant lines and edges that may appear in the images of interest. Query-by-edge-sketch is the 21 Figure 2.2: Tool to perform shape querying; in Q B I C system main querying mechanism supported by the T R A D E M A R K and A R T M U S E U M system [Kato92]. Q B I C also allows multi-* queries, in which query features above can be com-bined to define a single query. Iconic representations are used to describe which features are defined for the query. Figure 2.3 shows the query window which hold information about the query. Th i s window also functions as a location editor that allows the user to specify an (x,y) location of an object in a query (the other is by providing a blank area where user can pinpoint the location of an object). For instance, Figure 2.3 illustrate a multi-* query which asks for images that contain both a red round and a green textured object. Other querying features provided by Q B I C include a facility for the user to set the relative importance of the features during querying. Querying tools of M i y a b i support two types of queries, namely sketch queries and queries using example images. In sketch querying, instead of drawing lines and edges, as in the Q B I C case, the user sketches the image by painting. This is to 22 Figure 2.3: The Top Query Window of Q B I C system describing a multi-* query allow the user to define shapes and associate colour and spatial information wi th the shapes. The difference wi th Q B I C ' s shape querying is that the s imilar i ty of this query is based on template matching between the query image and database images. The same querying mechanism is used by the Wavelet system developed by Jacobs et a l . [JFS95], except that the s imilar i ty is based on the wavelets of the images. In the second type of querying, an image is used as a query. E x t r a querying capabil i ty on this aspect is added by allowing the modification of example image, e.g. by changing the background color of the image. The querying support of the F M R system allows the user to pick parts of an image and use them in a query. Par ts from different images may be used to compose a query. Th is mechanism also accomodates location querying by allowing the image parts to be spatial ly placed in the query. Th i s mechanism is different from the query-by-example in that i t allows finer granularity when querying. For example, 23 given two images, one consists of sky wi th clouds and other of a sea object, a user can compose a query containing the sky and the sea picked from the two images. The interface, however, does not allow the user to modify the individual image features of the image parts. For example, the user cannot pose a query of the form " F i n d an image consisting image parts A and B but wi th the colour and texture of A modified". In addit ion, it restricts the composit ion of the query using image parts of existing images by not allowing the user to define image parts taken from the user's memory. The querying mechanism of the system developed by Strieker et a l . [SD96] uses a modification of query-by-example mechanism. Using an example image, the mechanism allows the user to incorporate spatial information by fuzzily segmenting the query image into 5 regions. Weights can be set to these 5 regions which describes user preference. This allows the user to specify a finer granularity, sub-image queries to a certain level. For instance, a query of the form " F i n d images like this image emphasizing the fish in the center and the rocks on the bot tom" can be translated by setting a higher weights to the regions of interest in the example image (i.e center and bot tom of the image). 2.3 Query Reformulation Given a query Q, the result is a set of returned images 7 i , • • - , / „ . These images are usually displayed in thumbnail format for browsing [Kato92, K K 9 3 , L F 9 4 + ] . Since searching is based on similarity, these images are typical ly the n most similar images to Q, and ranked from the best match to the n th best match. Image viewing functions such as zooming, full size image viewer, and file I / O are also provided. 24 Addi t iona l facilities may also be provided to the output module. M i y a b i supports feedback on the results by allowing retrieval of images based on the result of specific content features. For example, images can be ranked based on the result of the colour feature of the query. A s discussed before, one of the consequences of matching based on similar i ty is that the results of the query may not be what the user wants. Thus , it may be neccessary to reformulate the ini t ia l query Q. The ini t ia l result is often useful to be used as cues for the reformulation. For instance, the statist ical feedback of the results can provide information for what aspect of the query needs to be reformulated. In addit ion, the user may see that an object in returned image I{ and a region that contain a particular texture from another returned image Ij can be added to Q. The user may also want to modify the in i t ia l definition of features of Q. In this case, input facility is also needed during the query reformulation. A m o n g the systems examined, few support query reformulation. Query re-formulation is typical ly not considered part of the querying process. A s a result, the querying facilities supported by these systems generally do not allow query re-formulation. If supported, it is usually l imited to the query of the form "like-this" or "query-by-example". Tha t is, a returned image is used as the refined query. E x -amples are Q B I C and M i y a b i . The querying facility of F M R as described in the previous section supports a more elaborate query reformulation since regions of a (returned) image can be picked. The facility is, however, restricted to the degree that the user can not pick more specific image features from the image such as colour and texture. 25 2.4 Summary From the above review, we can see that most existing prototypes allow only limited querying capabilities. This seems to be caused by the fact that the query language is not the main focus. Instead, it is used mainly as a testing facility. The consequence of this is the inadequacy of current query language to capture subtle differences that differentiate among similar images. Given two x-ray images, querying mechanisms such as query-by-sketch, query-by-example, or other image level querying such as QBIC's colour and texture querying fail to capture the fact that they differ, for example, only in the location of a white object that appear in both images. Most of the systems also seem to ignore the inherent differences between image and alphanumeric querying. They also seem to overlook the fact that an image querying process involves iterative refinements of the query in which the initial query may need to be reformulated several times before the answer is what the user has in mind. Existing query languages/interfaces that are provided rely on the user giving an exact description of the feature. For example, QBIC assumes a colour picked by a user to be correct although the user may not be certain of the color. As discussed in previous chapter, this may result in the generation of 'wrong' queries. While it may not be possible to guarantee the correctness of a query (since- perception is also involved), it is possible, however, to maximize it. In this case, a" query language/interface for image querying should provide tools that can accomodate user's imprecision and ambiguity and are able to capture subtle image features. It is also necessary to support the need to do iteration (query refinements) during the querying process, which can be best accomodated by providing a mechanism to perform query reformulation. In the remaining chapters, we discuss how we address 26 these issues. 27 Chapter 3 Range Queries and The Input Module of Exquisi Given the inherent ambiguities and imprecisions of the user in specifying image features, the query language/interface should provide querying tools that can ac-comodate such ambiguities. In this chapter, we explain how the concept of range specification can be incorporated to the color range, texture set, location box and color gradient queries supported by Exquisi. We also clarify the notion of objects in Exquisi. 3 . 1 C o l o r R a n g e Q u e r i e s Human are known to have poor color memory. Most people find it difficult to match colors that they have seen innumerable times. It is also common for a person to make a mistake when trying to match a paint sample from memory alone. Human perception of a color is also easily affected by the frame of reference, or the visual 28 context in which the color is perceived. The same color of red, for example, seems brighter, lighter, darker, or changed in hue by changing the surrounding color in which the red color is seen. The l imitat ions above may result in the user being unable to choose one particular color among several that seem equally acceptable. For instance, green may be what the user remembers from a certain image. However, there may be five different shades of green provided in the quantized color space. F r o m the user's point of view, every one of the five shades is as good as the others. Forcing the user to choose one of them as the color, may not capture precisely what the user knows or wants. Consequently, the top-n images returned as answers may be well off the mark, part icularly for a dense image space. M o r e specifically, let Qc\, • • •, Qc5 represent respectively the queries that pick only one of the five shades of green as the color. The top-n matches of these five queries may be quite different, part icularly for a dense image space, i.e. Ans(Qci, n) ... ^ Ans(QC5, n), and inaccurate, i.e., 9^  Ans(Qcit...jC5,n), where Qci,...,C5 denotes the query that allows all five colors to be specified and treated indistinguishably. Instead of forcing the user to select a particular color, Exquis i allows the user, whenever necessary, to select several colors perceived to be indistinguishable for this query. This is called color range query. To simplify the language, Exquis i imposes the restriction that the multiple colors chosen by the user must form a range in exactly one of the three dimensions of the H S V color model, which is the color model supported in our prototype. To be more specific, consider the color editor window shown in Figure 3.1. To define a color range, the user first defines a reference color, as shown by 29 30 the small square at top right of the window, by using the "Hue" , "Saturat ion", and "Brightness" slide bars at the top of the window. The user can then select one of the three dimensions to finalize the choice of color range. In the example shown in Figure 3.1, the saturation dimension is selected. The small color cells below the word "Saturat ion" show the full spectrum of the colors wi th varying saturation values, but the same hue and brightness values as set in the top part of the window. B y adjusting the slide bars labelled "Endpoint 1" and "Endpoint 2", the user picks colors C i and C2 respectively, effectively defining a color range consisting of all the colors between C\ and C2 inclusive. Since the hue color element is circular, the user can define "Endpoint 1" to be at the right of "Endpoint 2" to specify that the color range includes those colors to the right of C\ and to the left of C2. (The slide bar labelled "Point B t w " as il lustrated in Figure 3.1 is used when defining color gradient explained in Section 3.5). Once the user has decided on the color range, there are two options the user can choose. The first one is to link this color range to a certain "object" by cl icking the button "Object L i n k " . Whi l e Chapter 4 wi l l discuss the notion of objects in Exquis i in more detail , it suffices to say here that this option specifies that the color of the object under consideration is in the current color range. A p a r t from assigning a color range to an object, the user can also store the color range for future use. Th is can be done by clicking the but ton "To Holder" . The effect is that the current color range wi l l appear in the color holder window shown in Figure 3.2a. A s the name implies, the color holder window holds and displays all the color ranges stored so far. Th is is crucial in support ing "like-this-in-what" queries. Whi l e Section 4.2.2 wi l l provide more details, it is sufficient to point out 31 storedColorO I stored colon •ti_. reiiC olorZ storec!Color3 Color Ranqs (a) holder window for storing colors \ Distribution: i Close Set (b) range window for setting distributions Figure 3 .2 : Color Holder Window and Color Range Window •52 here that the user can initialize the color editor window by picking a color range from the color holder window. To do this, the user first clicks the button "From Holder" in the color editor window, causing the color holder window to pop up. The user can then browse through the stored color ranges by selecting one range at a c t ime, and finalize his/her selection by clicking the button "Use". Last but not least, Exquis i allows multiple color ranges to be specified for an object. Similar to Q B I C , the user can give the percentage dis tr ibut ion of each of the color ranges. The color range window shown in Figure 3.2b is specifically for this purpose. For the currently selected object, the color range window displays the color range(s) linked to the object. The color cells at the top right hand corner of the window display the selected color range, and the "Dis t r ibu t ion" slide bar gives the percentage of the selected range which is 52% in the example shown in Figure 3.2b. If the user wants to modify the selected color range or to add a color range to the currently selected object, the user can click the buttons " E d i t " or "New" to cause the color editor window to pop up. The difference between the two options is that the former causes the color editor window to be init ialized wi th the selected color range. In the VisualSeek system developed by Smi th and Chang [SC96], there is the notion of "color set", which is proposed as "an alternative to color histograms for representation of the color content of images and regions". Th i s alternative rep-resentation aims to solve some of the computat ional problems of color histogram techniques such as high-dimensional feature vectors, indexing, and distance compu-tat ion. A s such, "color set" is very different from our notation of "color range". The former focuses on query processing whereas the latter is concerned wi th expres-siveness of queries, namely, as a way to capture user's ambiguity. In Chapter 5, we 33 show how our notion of color range can be incorporated into systems that use color histograms for processing. However, efficiency issues are beyond the scope of this thesis. 3.2 Texture Interface and Range Queries Even though many texture models have been proposed, such as the ones discussed in [LP95, M J 9 2 ] , there does not yet exist a texture metric which allows texture to be specified conveniently. For this reason, the kind of slide bar interface used for, colors in the color editor window (Figure 3.1) is not applicable. Thus , existing systems typical ly provide a browser for the user to pick one texture among a collection of sample textures for a query. This interface may not be user-friendly and effective when a large number of sample textures are provided, because a linear search may need to be carried out for the user to find the intended texture. Exquis i provides a mechanism for easier texture browsing by grouping all sample textures into clusters of similar textures. Based on the S A R texture model [MJ92], the similar i ty between two sample textures are defined and computed, and for each cluster, a representative texture, defined to be the most centrally located texture in the cluster, is chosen. In this way, the entire set of sample textures can be organized as a 2-layer structure (more layers i f necessary). To select a texture, the user first picks a cluster from among the representative textures. F rom within the selected cluster, the user picks the texture that is intended. Like color ranges, Exquis i allows the user to select a set of similar textures from the cluster. Th is is called a texture set or texture range. The intention, again, 34 is to allow the user to declare that for this query, al l the textures in the set are to be considered indistinguishable. 3 . 3 Objects in Exquisi So far, we have introduced range queries of color and texture. Exquis i also supports range queries of location. B u t before range queries of location can be introduced and discussed, we need to describe the notion of "objects" in Exqu i s i . Objects in Exquis i are syntactic and geometric - but not semantic - in nature. In its simplest form, an object is simply a two dimensional geometric shape wi th its outer boundary defined. Figure 3.3 shows the query composit ion window of Exquis i , where the user begins his/her specification of an ini t ia l query or, as we shall see, a reformulated "like-this-in-what" query. B y using the buttons along the bot tom left margin of the window, the user can choose to define a polyline, a .circle or an ellipse object. Once the object is placed in the query composit ion workspace area, the user can invoke standard operations such as zoom, move, resize and rotate. Content features for an object can be defined by cl icking the appropriate buttons allong the left margin of the window. For example, to define the color range(s) of the object, the user can click the "Color Range" button to cause the color range window (Figure 3.2b) to pop up. Suppose the image shown in Figure 3.4 is the ini t ia l image the user wants to retrieve. A s shown in Figure 3.3, the user defines the sea as a polyline object. The user then defines a blue color range (e.g., Figure 3.1) as the color of the sea. Furthermore, as shown in Figure 3.3, Exquis i also allows the user to define and place 35 Figure 3.3: Query Composition Window - Workspace for Composing Queries 36 Figure 3.4: Initial Image to be Retrieved multiple objects in the query composition workspace area. This raises the question of how Exquisi handles overlapping objects. Since all objects are two dimensional in nature (so are images), when the user puts a certain object A onto another object B that already exists in the workspace, A causes the part of B that overlaps with A to be clipped. The user can use the above notion of overlapping objects to define complex objects that have inner boundaries. Consider an example where the user wants to define a ring-like object. This can be done by first defining an object A, and then putting on top of A a smaller object B. Intuitively, A — B represents the ring, and B denotes the hollow part of the ring. Actually, we need to be very careful in defining the meaning of the minus sign in A — B above. Suppose the user wants to specify a red ring object. First, the user can specify that A is red in color. But then 37 what color should B be? Strictly speaking, the color and everything of B should be the background covered by the ring, whatever that may be. However, if we were to support this notion of "hollowness" to the fullest extent, query processing could be very difficult, and may not be worthwhile. Thus, we approximate "hollowness" by specifying B to be a "don't care" object, which matches anything. More specifically, an image that has a red A and a blue B, an image that has a red A and a yellow B, and an image that has a red A and a red B all match the query equally well. "Don't care" objects have another role, perhaps even more important than the one discussed above, to play in Exquisi. As will be discussed in detail in chapter 4, a "like-this-in-what" query is of the general form: (/,- — 6";+/?;) + (/, — Sj + /3j) + If the "don't want" parts 6i,6j can be defined as objects, then they can be specified simply as "don't care" objects. As shown in Figure 3.3, this can be done by clicking the "Don't Care" button at the top left corner of the query composition window. 3.4 Location Box Queries Apart from poor color memory, human memory is also weak in retaining a fine granularity of spatial information. In many cases, the user may remember only an approximate location of some objects in an image. This is reflected by the frequent use of fuzzy quantifiers when describing spatial information. For example, given an image, the user may remember at coarse granularity that a tree object is located at the left side of a house object, and that both are located at the lower left corner of the image. Query mechanisms which provide spatial querying, such as query-by-sketch 38 [H93+, HK92] or query-by-shape [F95+], are limited in that they do not allow such imprecisions to be included in the query. Exquisi allows these imprecisions to be explicitly specified by introducing the concept of a location box of an object. It is defined as a rectangle which encloses the object, where the edges of the rectangle are either parallel to the x- or the y- axes. The location box states that anywhere inside the box is considered equally acceptable to the user as the location of the object. Thus, analogous to what color range queries do for colors, a location box defines an area of indistinguishable locations of the object. For example, in Figure 3.3, the area enclosed by the dotted line defines the location box of the sea object. It is interesting to observe that location boxes can be used to define some spatial relationships between the objects. For instance, if object A is to be at the NW corner of object B, then the user can define a location box of A that is at the NW corner of the location box of B. However, location boxes cannot be used to define all possible spatial relationships; this is not the primary function of location boxes to start with. Consider, for instance, the previous example of A at the NW of B. Even if the location box of A is at the NW of the location box of B, once the location boxes overlap, there is no guarantee that A will end up to be at the NW corner of B. One way to overcome this problem is to allow a location box to be defined not only for one object, but also for multiple objects. But as will be discussed in Section 5.3, such a generalization from one object to multiple objects would vastly complicate the similarity semantic of location boxes. Hence, for now, a location box can contain one object only, and its generalization to contain multiple objects will only be studied in future work. 39 3 . 5 C o l o r G r a d i e n t Q u e r i e s Gradients commonly occur in an image and are used as a depth cue. It involves an apparent gradual change in an object's attribute (eg. size, sharpness, and details) or image property (eg. pattern density, light and shadow, and colors). With color, the perception of space and distance in an image, for example, can be simulated by a smooth change in brightness or saturation of colors. Here the spatial location of these colors is essential in order to define the gradual change of colors. Since the color query tools described so far (i.e., color range and distribution) do not capture the spatial element of the colors, additional tools are needed to capture this information. Depth perception using color gradients results from gradual variation of col-ors. As objects recede into the distance, for example, their colors generally appear less intense, paler, more grayish or bluish. Bright objects are perceived as nearer to the observer than less bright objects. The change in brightness or lightness also gives a perception of volume, solidity, and forms to the objects. In the computer graphics field, light-to-dark gradients or shadows are utilized typically to simulate 'real-looking' images. Many real images, such landscape and natural scene images, also contain color gradients. An easy way to define a color gradient between two endpoints X\ and X2 is to specify the color C\ at X\ and the color C2 at X2, where C\ and C2 form a color range in the sense introduced in Section 3.1. Then if there are n quantized colors between C\ and C2, the color gradient can be approximated as consisting of n equal-width segments in between Xi and X2, with the colors of the n segments 40 Gradient Angle-531 Color] Figure 3.5: Color Gradient Window gradually changing from C\ to C2. Color gradient queries supported by Exquisi extend the above simple no-tion of gradient to allow non-uniform distribution of colors to be specified. This is achieved by using a third point Xbtw which is in-between Xi and X2, but not necessarily the mid point of X\ and X2- Here Xbtw linearly interpolates the two endpoints. Similarly, a color Cum m between C\ and C2 is specified for Xbtw Extending the simpler situation discussed above, this color gradient consists of n segments between X\ and Xbtw, a n d m segments between Xbtw and X2, where n and m are the numbers of quantized colors between C\ and Cbtw and between Cbtw and C2 respectively. 11 Figure 3.5 shows the color gradient window of Exqu i s i . The main part of the window displays the object for which the color gradient is to be defined. Since the color gradient segments need not be parallel to the x- and y- axes, the "Gradient Angle" area of the window allows the user to set the appropriate angle, which would cause the object to be rotated accordingly. The endpoints X\ and X2, by default, are set to be the left-most and right-most points of the rotated object. The user can choose the colors of X\ and X2 by clicking the "Set Color" but ton, which causes the color editor window to pop up (Figure 3.1). If a non-uniform color distr ibution is required, the user clicks on the "Set M i d P o i n t " button to set the point X^tw, whose color can again be set by invoking the color editor window. This is the purpose of the "Point B t w " slide bars in the color editor window. To verify the color gradient, the user can click on the "Compute G r a d " but ton, which causes the appropriate segments to fill up the rotated objects. If this is acceptable, the user can click the " O K " button to confirm. 3 . 6 S u m m a r y In this chapter we have described several key facilities of Exquis i for defining various image features for a query. These facilities incorporates range specifications that allow the user to express his/her ambiguities in specifying various image features for the query. The facilities supported include color ranges, color gradient, texture set, and location box. In the next chapter (Chapter 4), we wi l l describe the concept of "like-this-in-what" query reformulation and how it is incorporated into Exqu i s i . We wi l l also describe the output module of Exqu i s i . 42 C h a p t e r 4 The Output and Query Reformulation Module of Exquisi After the query is composed using the facilities described in Chapter 3, it is then submitted. It will be processed by the query processing engine. Relevant information for the matching process is extracted. Based on similarity, matching is done against images in the database. As such, the answer to the query are n images Ii, • • •, In computed to be the closest to the query. However, due to the inherent involvement of human perception and ambiguities in the query, it may be the case that none of the returned images is really what the user is looking for. Thus, the user needs to refine the initial query, which is often related to the returned images, as well as the" initial query. In this chapter, we describe the "like-this-in-what" query reformulation that allow the user to refine his query. We also discuss the output facility provided by 43 Exquisi. 4.1 Output Module of Exquisi The output module of Exquisi display the n returned images in a thumbnail format. These images are ranked according to their overall similarity scores. Figure 4.1 shows the output window of Exquisi. The two arrow buttons are provided for the user to browse the images when there are more images to be displayed simultaneously by the output window. The area in the bottom left corner of the window gives the information about a chosen image, such as its name, location, and any text annotation associated with the image. The user can also pick a returned image to be used to reformulate the initial query. This can be done by clicking the button labelled "Reformulate" to cause the reformulation window to pop up. The reformulation process will be explained later in this chapter. The output module of Exquisi returns not only the n images, but also sta-tistical information gfi,.. .,gfn that describe the "goodness of fit" of the returned images to the various conditions specified in the initial query Q. The user, for ex-ample, can click the "Detail Stat" to view the similarity score of individual image features or other relevant information for the returned images. This kind of feedback is meant to provide valuable aggregate information (in relevance to Q) about the images that are stored and the matching process. Finally, to further assist the user in examining the query results, the output module also provides the user with the options of viewing the images ranked based on the similarity scores of certain image features. 44 j»; Output Wifitlow j l AS V( Sim Val 5 S3 *J Sim Val-543 ;*j Sim Val-5 12 i*= Sim Val. 4.51 .'»] Sim Val 4 48 Sim Val 412 Sim Va l -3 50 f*|f Sim.Val 3 20 Paifi mfygrads3/grasls3/fauiijs/tem : Other Info- N/A jReforraulaiej .Detail Slat j Ctej Figure 4.1: Query Output Window 45 4.2 Query Reformulation In this section, we introduce "like-this-in-what" queries and the facilities of the reformulation module of Exquisi. 4.2.1 Principles of "Like-This-In-What" Queries While there may be a lot of useful information conveyed to the user from the returned images, a "like-this" query restricts the use of the information to the choice of a single image. One way to extend the expressive power of the reformulation language is to allow multiple images to be used in reformulation. More importantly, to maximize expressiveness, the granularity of information reuse must be extended from the image level to the sub-image level of objects and features. It is the essence of "like-this-in-what" queries that allow the user to specify which part(s) of a returned image that the user wants to include in and exclude from the new query. Thus, the general form of a "like-this-in-what" query is: (/,- — Si + j3i) + (Ij — Sj + j3j) +..., where S{, 5j are parts of that are no longer needed, and f3i,f3j are new features that are to be included in the new query. While "like-this-in-what" queries allow sub-image level features to be reused, there still remains the specific question of exactly what features can be specified, and the general question of how expressive reformulated queries can be. As shown in Chapter 3, we have spent a lot of effort in designing the input language/module of Exquisi to make it expressive enough for many applications. Our goal then is to allow the reformulated queries to be as expressive as the input queries. This maximizes the expressiveness of the reformulated queries - but without requiring any additional 46 Figure 4.2: Query Reformulation Window 47 query processing capabilities. To achieve our goal, we make query reformulation transparent to query processing by means of cut-and-paste operations. In particular, sub-image level features can be selected and cut from multiple returned images, and can then be pasted into the same query composit ion workspace (cf. Figure 3.3) in the input module, where input queries are composed init ial ly. Once the features are pasted, al l the operations invokable from the workspace are available. For instance, if the color range of a pasted object needs to be defined or modified, the user can simply invoke the color range window and the color editor window. Similarly, a location box for a pasted object can be defined in exactly the same way as described in Chapter 3. 4.2.2 Facilities Supported by the Query Reformulation Module Given a set of returned images, displayed in the output window, the user can pick one that is of interest to be used to reformulate the query. B y clicking the but ton "Reformulate" of Figure 4.1, the query reformulation window wi l l pop up. The main function of the query reformulation module of Exquis i is to provide tools for the user to cut and paste objects or features from the returned images. A s shown in Figure 4.2, the query reformulation window provides tools to select objects," colors and texture from the returned images. A t the centre of the window is a full size image viewer that allows the thumbnai l of a returned image to be zoomed, one image after another, if necessary. W h e n the user wants to select an object from the image being shown in the viewer, the user clicks the button "Object" . The user can then select an object by outl ining the boundary or contour of the object. In the example shown in Figure 4.2, 48 Color GraOiend T e x t u r e Dent Care Figure 4.3: A Reformulated Query 49 the surfing board is selected. After the object is pasted to the query composition workspace (cf: Figure 4.3), along with the object, all the contents (e.g., color ranges, etc.) of the object are retained and used to initialize the object in the new query being reformulated. If needed, the user can modify the contents of the pasted object. In this case, only the shape of the object is retained. All the tools supported by the input module can be used. Optionally, the user may also want to retain some of the features of the pasted object or of the returned images for future use. Instead of having to "re-define" every feature, the user can use the color and texture selection tools provided in the reformulation window. By clicking the button "Color", the user can point to the chosen color in the image being shown in the viewer. Sometimes this may not be easy to do, particularly due to the high resolution of the image and the effect of neighboring colors. Zooming is provided to assist in the selection process. When the user finally clicks the button "Set", this would cause the selected color to be stored in the color holder window (Figure 3.2a). As explained in Section 3.1, the user can later use the stored color to initialize the color range of objects in a reformulated or new query. Similarly, the user can pick a texture from the image being shown in the viewer, and store the texture in a texture holder. The user picks a texture by defining the boundary of the area in the image that contains the intended texture. The selected texture is compared with all the sample textures to find the closest match. 50 4.2.3 Examples of Reformulated Queries Figure 4.3 shows an example of a reformulated query. As compared with the initial query shown in Figure 3.3, the reformulated query includes a surf board from one returned image and a sail boat from another returned image (cf: Figure 4.1). To illustrate the use of "don't care" objects in "like-this-in-what" queries, consider the following example. Suppose what the user wants in the reformulated query are (a) the entire returned image Zi except the circle A at the centre, and (b) a smaller object B from another image I2 to replace A. To compose this query, the user first brings I\ into the full size image viewer of the reformulation window. The user then picks the entire image as an object, and pastes it to the query composition workspace. To remove the circle A from the image, the user first creates a circle object A' on top of A by clicking the circle icon along the left margin of the query composition window. Then as discussed in Section 3.3, the user can effectively remove A by declaring A' a "don't care" object. This completes part (a) above. Then to achieve part (b), the user brings I2 into the full size image viewer of the reformulation window. The user picks B as the selected object, and pastes it to the query composition workspace on top of the "don't care" object A'. 4 . 3 S u m m a r y In this chapter, we have described the output module and the query reformulation mechanism of Exquisi, which we call "like-this-in-what" reformulation. By allowing the user to perform a query formulation at finer granularity, Exquisi has provided a more expressive way of reformulation than the common "like-this" reformulation 51 approach. In addition, we have design the mechanism to be transparent from the input module. Therefore, the reformulation module can be as expressive as the input module. Finally, by incorporating cut-and-paste operations, there is no additional processing that is needed to do the reformulation. While this chapter and previous chapter describes the querying facilities of-fered by Exquisi, it is still not clear how these can be incorporated into the similarity matching process. In the next chapter, we will explain how the latter can be done. 52 C h a p t e r 5 Semantics of Queries With Respect To Similarity Matching In chapter 3, we have introduced the unique querying facilities of the input module of Exquisi. These facilities include color range, location box, color gradient and texture set. In chapter 4, we also have described the "like-this-in-what" query reformulation mechanism. So far the discussion has been focused on why and how these facilities could be used. In this chapter, we aim to define more formally the meanings of these facilities with respect to similarity matching. This serves the purpose of defining precisely what correct answers to Exquisi queries are. 5 . 1 Similarity Matching with Color Ranges Given the nature of image queries, the notion of the correct answers to an image query Q is typically defined to be: (i) all the images I that are within a specified distance D from Q, i.e., {/ | dist(Q,I) < D], or (ii) the k images that have the 53 shortest distance from Q. (See [WJ96] for a more detailed discussion.) Whi l e Exquis i uses the same notion of correct answers, it is important to spell out how the distance function distQ should be set up to capture the meanings of the unique facilities of Exqu i s i . A s with most existing systems, such as Q B I C , Exquis i utilizes color his-tograms to set up a distance function for color queries. M o r e specifically, if Q and I denote the n-bin color histograms of the query and an image respectively, the distance between Q and / can be defined by: distA(Q,l) = yf(Q-i)tA{Q-i) (5.1) where the symbol t denotes the transpose of a vector, and A is a symmetric n x n similari ty matr ix whose entries A(i,j) gives the distance in the color space between the colors C ; and Cj that represent the z-th and j-th bins of the histograms. There are several cases of color queries depending on the number of colors defined in a color range and on whether there are other color ranges defined for the same object or not (cf: Section 3.1). Suppose all the colors defined in the color range cr can be represented by certain m (m < n) bins of the histograms. The simplest case is when the color query is "100 % of cr" and cr consists of only a single color C. Tha t is, m = 1. The query histogram consists of entries described by the following equation: Qcr{i) 100% if bin i correspond to C 0 otherwise (5.2) 54 Since the color range consists only of a single color, the query does not change the size of the original histogram. Therefore, equation 5.1 can be readily applied. The above equation (equation 5.1), however, must be modified to handle color range queries when there is more than one color included in a color range. Let us first consider the simpler case where the color range query is "100% of cr ." Recall from Section 3.1 that the intended meaning of cr is that from the standpoint of the user, all colors in the range are indistinguishable. Th i s implies that the m bins corresponding to cr must be collapsed into one bin . W i t h o u t loss of generality, assume that the m bins are the first m bins. These can always be achieved by swapping the columns in the histogram vectors and the rows and columns of the similar i ty matr ix . Thus, the query histogram is a vector of size (n — m + 1), whose entries are: The first entry gives the distr ibution of cr. Similarly, the original n-bm Qcr{i) = < 100% * = 1 (5.3) 0 otherwise histogram I of an image object I must be collapsed to give the (n — m + l ) -b in histogram, whose entries are: hrii) = { (5.4) In other words, the first m entries of / are summed together to give the first entry of 7 c r . B u t all the remaining entries of Icr have values identical to the corresponding entries of I. Not only are the histograms required to be changed, but the s imilar i ty mat r ix must also be adjusted to reflect that all the m bins corresponding to cr should be 55 m x m m x (n-m) lxl 1 x (n-m) / \ El E2 Bl B2 E3 E4 B3 B4 (n-m) x 1 (n-m) x (n-m) (n-m) x ra (n-m) x (n-m) Figure 5.1: Changing the Similar i ty M a t r i x for a Color Range Query collapsed into one. A s shown in Figure 5.1, the original s imilar i ty mat r ix A can be divided into four sub-matrices: (i) Bi which is m X m; (ii) B2 which is m X (n — m); (iii) Bz which is (n — m) X r a ; and (iv) B 4 which is (n — m)x(n — m). F i r s t of a l l , each entry in B\ gives the original distance between any two of the m bins corresponding to cr. To capture the meaning that al l colors in the color range are indistinguishable, the entire B\ is collapsed to a single entry of value zero. A s shown in Figure 5.1, this is represented by the sub-matrix E\, which is 1 X 1, of the new similar i ty matr ix Acr. Sub-matr ix B4 of A is not in any way related to the color range. Thus , al l the distances recorded in B4 must be kept intact. Th is is represented by the sub-matrix E4 of Acr) where E4 — B4. Here the least straightforward case corresponds to B3. (Given the symmetric nature of s imilar i ty matrices, B2 is the transpose of .63.) Each row in #3 gives m distances, each of which sets the distance between one of the m bins corresponding to cr and a particular bin among the (n — m) remaining ones. Since the m bins are being collapsed into one, the m distances must also be combined into one. Whi l e 56 there are many possible ways to do so, we take the average value of the m distances. Thus, the sub-matrix E% of Acr has (n — m) rows, each of which consists of a single entry recording the average value of the appropriate m distances in A. A g a i n , it is the case that E2 is the transpose of £ 3 . To summarize, the entries of the new similari ty matr ix corresponding to cr are: 0 i = j = 1 (i.e., Ex) A(i -t-m — 1, j + m — 1) i > 1 and j > 1 (i.e., E4) £ EfeU A{i + m-l,k) i > 1 and j = 1 (i.e., £?3) Acr(j,i) otherwise (i.e., E2) (5.5) For example, consider a situation in which the color space is quantized into 4 colors, wi th the similar i ty matr ix A: ^ 0 1 4 2 ^ 1 0 5 3 4 5 0 6 V . Suppose that the color 2 3 6 0 range cr corresponds to the first two bins. Then the new similar i ty mat r ix Ac / 0 4.5 2.5 ^ is: I 4.5 0 6 2.5 6 0 distance between the query object and an image object is given by: distAcr(QCr, hr)-To complete our discussion here, for the color range cr , the So far, we have only considered the color range query of the form "100% of c r " . A s discussed in Chapter 4, Exquis i also allows multiple color ranges to be specified for an object. In this case, each of the color range cr; corresponds to a certain percentage distr ibution The simpler case of this is when each color range consists only of a single color and each color corresponds to different bins. Given that color d is the color in a color range cr; and there are r color 57 ranges defined, the query histogram consists of entries which can be described by the following equation: Again, similar to the first case, this query does not change the similarity matrix nor the histogram of the images in the database since each color range consists of a single color. Thus, equation 5.1 can be applied directly. The most general case is when the color range query specifies a certain distri-bution of multiple color ranges in which each color range cr; in the query corresponds to a distribution of </;% and cr; may correspond to more than one bin. In this case, appropriate adjustments need to be done to the similarity matrix A, the image his-togram, and the query histogram. In particular, suppose that TO; is the number of bins corresponding to cr;. To reflect that all colors in cr; are indistinguishable, the TO; bins must be collapsed into one bin and this should be done to each color range defined. Thus, the size of the new histograms after adjustment is n — Y7k-i mi + r-The new similarity matrix A is also computed in the similar way as in the case of single color range with multiple colors. In this case, for each color range, the corresponding entries are collapsed into one bin. The "collapsing" method that is used in the case of single color range with multiple colors can be applied iteratively (in arbitrary order) for each of the color ranges defined. For this to work, we need to assume that there is no overlapping among color bins from different color ranges. In other words, if bins(cri) denotes the set of color bins associated with cr;, then the assumption is that bins(cri) f]bins(cr2) • • -f]bins(crr) = 0 qi% if bin i corresponds to C; . (5-6) cri, — ,crr 0 otherwise 58 5.2 Similarity Matching with Color Gradient and Tex-ture Set Queries The above framework for color range queries applies equally well to color gradient queries. Recall from Section 3.5 that a color gradient query of an object, Qcg, can be viewed as consisting of u segments segi,..., segu of specific colors. As defined in Section 3.5, we assume that each segment corresponds to one color bin. Therefore, there! are u = n + m bins defined for the color gradient. Thus, the distance between the color gradient query object and an image object can be defined as a sum of the distances for all segments, weighted with the widths of the segments. More specifically, if Qsegj and Isegj denote the color histograms of the j-th segment of the query object and an image object, and Wj is the percentage of the area of the image object occupied by the segment, the weighted distance for each segment segj is given by distseg- (Qsegj, Isegj) = 1Vj * dist^(Qseg-, ISegj) (5-7) The distance, distcg(Qcg, Icg), between the gradient query object Qcg and the image object Icg is u distcg(Qcg, Icg) — ^ ] distseg- (Q seg-, I seg,) i=i Strictly speaking, texture is not a property of an individual image pixel, but a property of a block of neighbouring pixels. However, a natural way to assign a texture value to a pixel p is to create a window of pixels (of the appropriate size) centering at p. The texture value of p, Tp, is then defined to be the texture value 59 of the window. M o r e specifically, Tp is the texture in the sample texture set (cf: Section 3.2) that has the smallest distance from the texture obtained by a fc X fc window centered at p. The size of the window may be determined as in [MJ92]. One advantage of this approach of defining texture on a per pixel basis is to allow texture to be handled and texture range query to be defined in exactly the same way as colors. In particular, a histogram can be used to record the statist ical distr ibution of texture in an image (or image object), where each bin of the histogram records the percentage of pixels of a certain texture. Let n denotes the size of the sample texture set and is a texture in the set. Let also Qtex a n d hex denotes the n-bin texture histograms of the query and an image respectively. Similarly, a texture s imilar i ty matr ix , Atex, can be set up where Atex is a symmetric n X n mat r ix whose entries Aiex(i,j), gives the distance between texture T; and Tj in the sample texture set. Thus, Equat ion (5.1) can be used to give a texture distance between a query object and an image object. Tha t is, 5.3 Similarity Matching with Location Box Queries Recall from Section 3.4 that a location box B of a query object defines an area of indistinguishable locations of the object. In other words, i f there is an image object that is of the same shape as the query object, the location distance between the query object and the image object is set to zero if the query object is located in box B. To be more specific, there are at least two possibilities in defining what "located in box B" actually means. The first one is "completely contained in B" and the (5.9) 60 other one is "partially contained in B" or " intersecting 5." We decide to choose the latter meaning for the reason outlined below. Consider an image object that is completely outside the box B. The ques-tion is what the location distance between the image and B should be. While again there are many possible answers to this question, a natural answer is given by the separation distance between two objects. A concept commonly used in computa-tional geometry [PS85], the separation distance between two objects is the minimum distance between the boundaries of the two objects. This is our choice for defining the location distance between an image object and box B that do not overlap. One may choose to define a location box in a strict "cut-off" sense, namely if the object falls out of the box, the location distance is infinity. The problem of this strict approach is that if there is an image / containing an object that is outside the box but is otherwise a perfect match to the query, / is not considered as a match. In contrast, our semantic relieves the pressure on the user by allowing the location distance to change gradually. The fact that / contains a perfect match object may more than make up for the fact that the object is not in the box, and / may still be ranked as a good match. Recall from Section 3.4 that a location box is defined on a per object basis. As a side-effect, location boxes can be used to enforce certain spatial relationships between objects. However, as shown in Section 3.4, there are also spatial rela-tionships that cannot be maintained by location boxes. One way to overcome this problem is to allow a location box to be denned for multiple objects. But if this were allowed in Exquisi, then apart from location distance, this would require an addi-tional distance function to be defined on the spatial relationship itself. For instance, 61 suppose in the query the spatial relationship between objects A and B is described as: (i) A is 45 degrees Nor th East of B, and (ii) the separation distance between A and B is d. Then if there is an image where (i) A is 75 degrees Nor th East of B, and (ii) the separation distance is e, the immediate question is how well the image matches the query. If there are only two objects, this question would s t i l l be quite tractable. B u t when there are n objects in one location box, there would be up to n2 pairwise separation distances and n2 relative orientation constraints. Defining a distance function for this most general case would not be simple. Thus , in the current version of Exquis i , a location box can only contain one object. Deal ing wi th the general case is an i tem for future work. 5.4 Incorporating User Preferences So far in this section, we have introduced the distance functions that capture the meanings of the unique facilities in the input module of Exquis i , namely color range, color gradient, texture set and location box queries. These distance functions can be combined to compute the distance between a query object and an image object. In particular, the distance between the two objects is a weighted, normalized sum of the individual distances. A n d i f there are multiple query objects, the to ta l distance between a query and an image is the sum of al l distances between corresponding query and image objects. Whi l e weighted sum is a rather standard approach used by many existing systems, such as [H93+, RS92b], it is not clear what the values of the weights should be. In fact, we do not believe that there should only be one set of weights applicable to all queries. We believe that the weights should be adjustable by the user. Th is 62 allows the user to express varying degrees of preference/confidence on the querying conditions. For instance, while the user specifies a location box and a color range for a particular querying object, the user may prefer images that match more the color range than the location box, possibly because the user feels his/her specification of the color range is more accurate. In this case, to reflect the user's choice, the weight associated with the color range should be higher than that with the location box. Similarly, if the user prefers images that match more object A than object B, the weights associated with the querying conditions defined for A should be higher than those with the conditions for B. In the current version of Exquisi, the input module, as described in the pre-vious section, has not yet been augmented to allow user preferences to be specified. This is largely because we have not fully worked out all issues related to the user interface. But the point here is that once the interface allows preferences to be attached to individual querying conditions, then the framework presented in this chapter can easily be extended to incorporate user preferences. 5.5 Summary In this chapter, we have discussed the semantics of Exquisi's query facilities with respect to the similarity matching. We have shown how the range specification concept can be incorporated into the similarity matching function, and how we can apply them to different query features such as color, texture, and location (ie. color range, texture set, and location box). We have also discussed some of the. issues that are raised and how they can be handled. 63 A s d i s c u s s e d in c h a p t e r 4, one key fea tu re o f " l i k e - t h i s - i n - w h a t " r e f o r m u l a t i o n m e c h a n i s m is i ts t r a n s p a r a n c y w i t h respect t o the i n p u t m o d u l e . W i t h respect to the s i m i l a r i t y m a t c h i n g , th is m e a n s t h a t th is m e c h a n i s m d o e s n o t i n t r o d u c e n e w q u e r y p r o c e s s i n g needs. 64 Chapter 6 Implementation of Exquisi A prototype of Exquisi has been implemented using C++ and X Window/Motif GUI toolkits and is currently running in a Silicon Graphics workstation. Among the different modules implemented, the user interface and graphics routines module make the major part of the implementation, which comprise about 70% of the coding. To avoid reinventing the wheel, we also utilize several software libraries that are available and useful for our purpose. This chapter will summarize the major component of the implementation. 6 .1 Implementation Goals Content-based image querying depends on the use of various computer vision meth-ods to extract features from images. As researches in that discipline progress, there may be new methods that correct or enhance existing methods. It is also possible for the creation of methods to extract image features that are currently rendered infeasible. Therefore one emphasis of the implementation of Exquisi is that it should 65 be extensible to be able to adapt to these changes. Another reason for extensibility is due to the design of Exquisi which is targeted as a general purpose query interface for content-based querying. In this case, image features chosen, such as color, texture, and shape, are also independent of a particular domain knowledge. However, it may also be necessary to support domain-specific features. More efficient querying may be achieved when necessary capabilities are added to the query interface to support image features specific to a particular application. Another emphasis is on the use of computational models for the image fea-tures supported. As discussed in Chapter 1, user perception plays an important role in the querying process. A query interface, therefore, should be able to capture the user perception into a query. It is important to have the system to have the same understanding of the query as it is meant by the user. This translates to the use of computational models of image features that can conform with the notion of human perception. Although there are many models proposed, right now it is not yet clear, how human perception can be computationally represented. Given the available models, it may be necessary to pick among the one that is the closest in conforming with the notion perception. 6.2 Object Representation and Graphics Functions As described in Chapter 3, query specification in Exquisi is based on some notion of object. That is, a query in Exquisi is composed of a collection of geometric objects with some content features defined for each object and spatially arranged in the 66 query image. The Ipe's object data structure serves as an appropriate representa-tion of Exquisi's objects. Ipe, developed by Schwarzkopf [Sch94, Sch95], is a drawing application that is intended for generating drawings for computational geometry. As such, the data structure of Ipe is designed so as to support operations pertaining to computational geometry. For this purpose, it utilizes the Plageo planar geometry library which was developed by Giezeman [Gie94] and which provides a data struc-ture for defining more primitive geometric objects and various geometric operations on these objects. In relation to the current Exquisi's need for its data structure, only a subset of these capabilities are needed. However, this computational geometry features may be useful for future development of Exquisi. One key feature of Ipe is a mechanism that allows user defined functions or macros to operate on objects. This feature supports extensibility in Exquisi. 6 . 3 Image Display and Processing Another aspect of Exquisi implementation is the need for image-based operations,» which include operations to display and manipulate images. This is particularly needed by the output and reformulation module (cf. Figure 4.1 and Figure 4.2). Vista library is the main tool used for image related operations of Exquisi. Devel-oped by Pope and Lowe [PL94], Vista is a software library that is primarily intended for computational vision related tasks. It provides functions and data formats for image processing, manipulation, and other vision tasks. Another feature of Vista that make it a preferrable choice for use in Exquisi is its extensibility. Like Ipe, it 67 also provides a mechanism to allow additions of new image processing algorithms. 6 . 4 Color 6.4.1 Color Space We use CIE L UV color space to represent colors in Exquisi. The CIE L UV color space is based on another color space known as CIE XYZ [F90+, Gla95]. As ex-plained in more details in Appendix A, CIE LUV has the desirable property over CIE XYZ in that it is perceptually uniform. This means that two colors in CIE L UV space that are equally distance are perceived as equally distant by the user. To Exquisi, this is important in order to ensure the correctness of the similarity matching of colors with respect to the perception of the user. With reference to a color space, the capability of hardware devices in display-ing colors are typically limited to a certain subset of the color space. For example, as shown in Figure 6.1 , a typical CRT color monitor corresponds to a certain color gamut in the CIE XYZ color space. To allow a convenient specification of colors within the color gamut, a color model is used. Monitor devices typically support RGB color model. However, this model is not easy to use because it does not relate directly to the intuitive notions of hue, saturation and brightness. It is difficult to find a particular color, and once located, it is difficult to adjust that color. An example here is the task of finding a color blue and then making a lighter shade of blue. In term of providing the expressiveness to the query interface, such difficulties may not be acceptable. It is necessary to provide the user with an ease of use and intuitive way of specifying color. 68 All colors Displayable colors Z Figure 6.1: CRT color gamut We use the the HSV color model for defining colors [F90+]. It has the desirable property that it is based on the intuitive appeal of artist's tint, shade, and tone. With this model, a color can be defined using the three dimensions hue, saturation and brightness (cf. Figure 3.1). To provide a coherent interpretation both from the user's and the system's pont of view, the color defined using this model need to be transformed into its representation in the CIE L UV color space and vice versa. This involves three steps. First, the HSV color is converted to its RGB equivalent. Then the RGB version of the color is used to compute its representation in the CIE XYZ space. Finally, the corresponding CIE L UV representation is computed (see Appendix A for the CIE XYZ - CIE L UV conversion). The opposite conversion, from CIE LUV to HSV model, is also necessary. A more detailed conversion procedures is provided 69 in Append ix A . 6 . 5 Texture 6.5.1 Texture Model A s discussed before, there are two cri teria that are of interest for support ing texture querying. F i r s t is the need to use a computat ional model that corresponds to human perception of similarity. Second is the need to have a sufficiently expressive way of specifying texture for query. Over the last decade, there are about twenty basic texture models have been presented in the literature. However, their performance is data-dependent [PM94] . Al though some of the models have been shown to achieve around 90 % success rate on texture classification, they are usually tested on small and l imited data sets and wi th homogeneous textures. Whi l e natural scene typical ly contains non-homogenous textured regions, it is not currently feasible to rely on a particular texture model in order to support a wide range of textures. In addit ion, there are few conclusions exist on the relative strengths of texture models for different classes of data. Whi l e this issue is not the focus of this research, it is sufficient for us to use a model from the available ones. In term of coding, as wi th implementation of other, image features, it is necessary to design the system to be flexible enough to incorporate other more perceptual texture model(s). Based on the availability of source code and algori thm, the current implemen-tat ion utilizes the multiscale version of Simultaneous Autoregressive model, referred to as Mult iscale Simultaneous Autoregressive ( M S A R ) model [MJ92]. The M S A R 70 model computes parameters for a second order simultaneous autoregressive model, which is an instance of M a r k o v random field model, over three scales, for a total of 15 features for the texture. Mahalanobis distance is then applied to these features to compute the similari ty between two textures. Th is model has been shown to work better than eigenvectors, which are commonly used for a variety of pattern recognition problems [PM94], for retrieval of images in the standard Broda tz texture database [Bro76]. A s mentioned in Chapter 5, texture is defined as a property of pixel . Given a pixel p and a window centered at p, the M S A R features of the window is computed and then compared wi th the texture samples. The closest texture sample is then assigned to be the texture for the window which is then regarded as the texture of pixel p. It is then possible to construct the texture histogram for the image or a region in the image. The choice of window size is empirically determined. The opt imum window size and analysis of the effectiveness of the texture histogram wi l l be investigated in future research. 6.5.2 Interface for texture querying A s mentioned in Chapter 3, there is not yet a texture metric which would allow a convenient texture specification by means of some intuitive texture parameters as in the case of color. For this reason, a texture retrieval system typical ly uses sample textures for specifying a texture. A s mentioned, Exquis i extends the mechanism of specifying texture by or-ganizing the sample textures into clusters of similar textures. In addit ion to a more expressive way of browsing texture than a simple browsing, it also provides the user 71 with some information on how texture similari ty is perceived by the system. Current ly we use 400 texture samples taken from the Vis ion Texture Database from the M I T M e d i a Lab , which is publicly available from the internet 1 . The Clarans clustering algori thm [NH94] is then applied on these texture samples to group them into 20 clusters. The representative texture of each cluster is defined to be the medoid, defined to be the most centrally located object, of the cluster. Mahalanobis distance is used for discr iminat ion. 6 . 6 Q u e r y R e p r e s e n t a t i o n A query is represented internally in Exquisi format, an adaptat ion of the format used by Ipe. It provides the interface wi th the query processing engine and can be used also to store a query. The structure of Exquis i format is as follows: TopLevelQuery { <Global_Query_Attribute_list> <0bject 1> <0bject 2> <0bject n> } where an object is a 2 dimensional geometric shape and is defined wi th the following attributes: <Object-Type> { <Drawing_Attribute_List> <Vertices_List> <Image_Feature_Flags> <Color_Range_Attributes> 'available on the web site 72 <Texture_Attributes> <Location_Box_Attributes> > This format is based on a simple two level tree structure. The root corre-sponds to the top-level node in which global query attributes are stored. Corre-sponding to the children of this node are the definition of query objects with their attributes which include graphical attributes, query features, and location box of the object. This format can be easily extended to allow new query features and objects to be included. Figure 6.2 shows a sample query representation of a polygons with a color range feature and location box defined. 73 ss TopLevelQuery { Line { ss 0 0.4 [] c l */. c l f i c 0.517082 0.754147 0.839109 sc sk 0 sg sk np '/. # 6 -130.725 119.7 mt -20.475 84.2625 It -105.525 -4.72501 It -136.237 18.9 It -152.775 71.6625 It -152.775 73.2375 It // drawing info s f i QF CR 0 0 id Polygon.CRO repclr 272.7 0.81 0.73 choelt 224 p t l 0 pt2 150 dist 0 pref 0 } LocBox { ss 3855 0.4 [ 4 ] ss np '/. # 4 -172.462 142.537 mt -7.875 142.537 It -7.875 -19.6875 It -172.462 -19.6875 It c l 7. c l sk 0 sg sk > > > // l i s t of vertices // image ieature flags // color range d e f i n i t i o n // location box d e f i n i t i o n Figure 6.2: Example of Exquis i ' s Query Representation 74 Chapter 7 Conclusions The growing importance of image data in many applications has necessitated the database management system to support image data. Since image data is funda-mentally different from the traditional alphanumeric data, it is not sufficient to rely only on the existing alphanumeric-based techniques. In fact, the realization of the image database system requires not only techniques in database, but also synergies of techniques from other disciplines. \ As a result of research efforts in supporting image database, there are many image database systems have been proposed. Our studies on the query language and interface support of these systems, particularly systems that support content-based image querying, have resulted in the identification of several common important deficiencies or inadequacies. First, existing systems lack querying supports necessary for users to adequately define a query. In addition, the querying facilities of these systems do not incorporate imprecision and ambiguities that are inherent of a user when defining the various image features. Second, most systems also have a lack of 75 support for the query reformulation. We have argued that query reformulation is a requisite in the query language since the querying process is based on the notion of similarity. In addressing the above issues, we have proposed several key ideas that should be accomodated by a query language or interface for image database system. First, to better capture user imprecisions and ambiguities in specifying a query, we intro-duce the concept of range specification which allows image features, such as color and texture, to be specified in ranges. This gives the user a way of expressing his/her ambiguities in defining an image feature. For example, the user may define a color range where all colors in the range are perceived to be similar. As such, from the query processing's point of view, all values in a range are treated identically. We have shown how the range specification concept can be incorporated into the similarity matching functions. Furthermore, we also have shown how it can be ap-plied to different image features, in this case color, texture, and location by formally defining the semantics of these features. Second, we have proposed "like-this-in-what" query reformulation mecha-nism. One of its key feature is that it allows a finer granularity of query reformula-tion process than the "like-this" reformulation mechanism that is currently utilized by most existing systems which support query reformulation. In particular, "like-this-in-what" query reformulation allows specific aspects of a query to be included or excluded. The user can, for example, pick a specific image feature, such as a shape or color, from a returned image to be included in the reformulated query. We have shown how the "like-this-in-what" reformulation can be integrated, by means of cut-and-paste operation, to the query language without having to add ex-76 tra query processing needs. Furthermore, by allowing the query facilities from the input module to be utilized in the query reformulation process, the expressiveness of the reformulation can be maximized. Finally, we have designed and implemented a prototype query language/interface, called Exquisi, that incorporates the above ideas. The querying facilities that are currently supported by Exquisi include color range, texture set, location box, and color gradient. It also support the "like-this-in-what" reformulation of a query. Currently, it is implemented in UNIX operating system under SGI environment. 7.1 Future Work 7.1.1 Interface/language issues We have shown that range specification and "like-this-in-what" reformulation mech-anism is able to accomodate the need for a more expressive query language than those provided by existing image database systems. Using Exquisi query inter-face/language, we have argued how these concepts can accomodate ambiguities in specifying a query and capture subtle differences among similar images. However, since our work involves some subjective measures it is necessary to verify our claims in a more objective manner. For this, experiments need to be carried out with hu-man subjects to evaluate the effectiveness of the various querying facilities of Exquisi and to statistically compare Exquisi with other systems. Our work in this thesis has been focused on the database issue, i.e., providing an expressive query language to a content-based image querying system. However, as discussed in Chapter 1 and shown in the discussions on Exquisi, the realization of 77 I D B M S involves work not only from the database field, but also from other research fields such as vision, image understanding, and knowledge-based systems. This is the main reason for emphasizing extensibility to the implementation of Exqu i s i . Work is st i l l needed to investigate other image features that can be incorporated into the language and the use of image feature models that conform wi th perception. A s a generic query language, it wi l l be necessary also to investigate the applicabil i ty of the language to different application domains. It is also useful to see how to incorporate domain-specific capabilities to the language depending on the applications using the system. Final ly , work needs to be done to integrate Exquis i wi th other elements of the I D B M S , such as the query processing engine and the indexing module, to create a fully working system. 7.1.2 Query Processing and Optimization Challenges Exquis i is designed to satisfy what we believe the user wants. Th is approach is different from that of many existing systems which tend to put more emphasis on what can currently be done on query processing. Here we handle the problem by first define a target language that is needed, and then work on how to process queries expressed in the language. Given the expressive power of Exquis i , providing efficient processing for some of the Exquis i querying facilities is rather challenging. In the following, we discuss some of these challenges. Let us begin with color range queries. A s discussed in Chapter 5, color histograms are often used to provide color indexing for images [F94+, SS94, SB91]. To avoid computing Equat ion (5.1) at run-time, which involves matr ix mult ipl icat ion 78 of the color similarity matrix A, preprocessing of a certain kind is often done. In the presence of a query involving color range cr, the similarity matrix A must be reset to the matrix Acr as defined in Equation (5.5). Since the color range cr is specified by the user only at the point of querying, Acr can only be reset dynamically at run-time, which seems to lead to the conclusion that preprocessing is not possible. Fortunately, the latter conclusion is not completely true. While the current way of preprocessing is rendered infeasible in the presence of a color range, we believe that there are other ways of preprocessing that could accommodate color ranges. In particular, in ongoing work, we are investigating how preprocessed color histograms based on A can be used for filtering for the processing based on Acr. The key idea is that run-time matrix computation is acceptable, so long as it is restricted to a very small subset of images. As discussed in Section 5.2, whatever optimization techniques developed for color range queries would be of immediate importance to the optimization of color gradient and texture set queries. A naive evaluation of a location box query presents two major sources of inefficiency. First, recall from Section 5.3 that separation distances are used to define distance functions for location box queries. For arbitrary shapes, such as non-convex ones, finding the separation distance sd can be computationally expensive, particularly if there are numerous such distances to be computed. One way to speed up the processing is to under-estimate the true separation distance sd of two shapes with the separation distance sdg of the two bounding boxes of the shapes. Finding the separation distance between two bounding boxes only takes constant time. If extreme accuracy is required, the latter processing can be followed by the computation of the true separation distance - but this time restricted to those images whose sdg distances are sufficiently small. 79 The second source of inefficiency in evaluating a location box query in a straightforward manner stems from the fact that an extensive search must be made to find an object in the image that best matches the object within the location box. To some extent, this is one form of object segmentation. Fortunately, in our case, objects are syntactic, not semantic, in nature. Thus, there exists a naive evaluation strategy that tries all possible positions of the image to identify the best matching object in the image. Actually, the matching problem we are dealing with is reminis-cent of string matching. And taking inspiration from well-known string matching algorithms, we seek to develop techniques that, based on the comparison of the current position, can skip multiple positions that are deemed impossible to produce better results and can jump to the next potential position. In this way, the°search space would be pruned and the execution would be optimized. Another possible optimization that is worth studying is how to perform the matching in a multireso-lution manner. This would take on a different flavor than the multiresolution image querying algorithm developed in [JFS95]. However, the use of wavelets may prove to be equally applicable to our problem at hand. 80 Bibl iography [AKS96] Y i g a l Arens, Cra ig A . Knoblock , and W e i - M i n Shen, "Query Reformu-lat ion for Dynamic Information Integration", In Journal of Intelligent Information Systems, pp. 1-38, 1996. [Bad90] A l a n D . Baddley. Human Memory: Theory and Practise. Lawrence E r l b a u m Associates, 1990. [BPJ92] J . Bach , S. Pau l , and R . Ja in . " A Visua l Information Management System for the Interactive Retrieval of Faces," in IEEE Transactions on Knowledge and Data Engineering, V o l . 5, No . 4, pp. 619-628, 1992. [Bro76] P h i l Broda tz . Land, Sea, and Sky: A Photographic Album for Artists and Designers. Dover Publicat ions, New York , 1976. [EN89] Ramez Elmasr i and Shamkant B . Navathe. Fundamentals of Database Systems. Ben jamin /Cummings Publ ishing, 1989. [F94+] C . Faloutsos, W . Equi tz , M . Fl ickner , W . Niblack, D . Petkovic, and R . Barber . "Efficient and Effective Querying by Image Content ," in Journal of Intelligent Information Systems, 3, 3/4, pp. 231-262, 1994. [F90+] James D . Foley, Andries van D a m , Steven K . Feiner, John F . Hughes. Computer Graphics: Principles and Practise 2nd Edition. Add i son-Wesley Publ ishing, pp. 563-603, 1990. [F95+] M . Fl ickner , H . Sawhney, W . Niblack, J . Ashley, Q . Huang, B . D o m , M . Gorkani , J . Hafner, D . Lee, D . Petkovic, D . Steele, and P. Yanker . "Query by Image and Video Content: The Q B I C System," in IEEE Computer, V o l . 28, N o . 9, 1995. [FN96a] D . S. Faulus and Raymond T . N g . " E X Q U I S I : A n Expressive Query Interface for Similar Images," in Proceedings of SPIE: Storage and 81 Retrieval for Still Image and Video Databases IV, V o l . 2670, pp. 215-226,1996. [FN96] D . S. Faulus and Raymond T . N g . An Expressive Language and Inter-face for Image Querying. Technical Report No . ??, U B C , 1996. [G94+] Y i h o n g Gong , Hongjiang Zhang, H . C . Chuan , and M . Sakauchi. " A n Image Database System wi th Content Cap tur ing and Fast Image In-dexing Abi l i t i es , " in Proceedings: IEEE International Conference on Multimedia, pp. 121-130, 1994. [GC93] G . Golovchinsky and M . H . Chignel l . "Queries-R-Links: Graphica l M a r k u p for Text Naviga t ion" , in Proceedings of INTERCHIPS, pp. 454-460, 1993 [Gie94] Geert-Jan Giezeman. PlaGeo: A Library for Planar Geometry. 1994. [Gla95] Andrew S. Glassner. Principles of Digital Image Synthesis Vol. 1. M o r -gan Kaufmann Publishings, pp. 5-112, 1995. [GM90] W . Grosky and R . Mehro t ra . "Indexed Based Object Recognit ion in P ic tor ia l D a t a Management," in Computer Vision, Graphics, and Im-age Processing, V o l . 52, No . 3, pp. 416-436, 1990. [GM92] W i l l i a m I. Grosky and Ra j iv Mehro t ra . "Image Database Manage-ment," in Advances in Computers, V o l . 35, pp. 237-291, 1992. [GR95] Venkat N . Gudivada and Vi j ay V . Raghavan. "Content-Based Image Retrieval Systems," in IEEE Computer, pp. 18-22, 1995. [Gry95] Robert S. Gray. Content-Based Image Retrieval: Color and Edges, Technical Report P C S - T R 9 5 - 2 5 2 , Dar tmou th College, 1995. [GWJ92] A . G u p t a , T . Weymouth , and R . Ja in . "Semantic Queries in Image Database," in Visual Database Systems II, E . K n u t h and L . M . Wegner (Eds.) , Elsevier Science Publishers, pp. 201-215, 1992. [H93+] K y o j i Hi ra ta , Yoshinori Hara , Naok i Shibata, and Fusako Hirabayashi . "Media-based Navigat ion for Hypermedia Systems", in Proceedings: ACM Hypertext '93, 1993. [HB95] D a v i d J . Heeger and James R . Bergen. "Pyramid-Based Texture A n a l -ysis/Synthesis," in Proceedings: ACM SIGGRAPH'95, pp. 229-234, 1995. 82 [HK92] K y o j i H i r a t a and Toshikazu K a t o . "Query by V i s u a l Example , " in Pro-ceedings: International Conference of Extending Database Technology (EDBT'92), pp. 56-71, 1992. [HSD73] R . M . Haral ick, K . Shnamugam, and I. Dinste in . "Texture Features for Image Classification," in IEEE Transaction on System, Man, and Cybernetics, V o l . S M C - 3 , N o . 6, pp. 610-621, 1973. [Hur81] Leo M . Hurv ich . Color Vision. Sinauer Associates, 1981. [Jai93] Ramesh Ja in . "Workshop Report : N S F Workshop on V i s u a l Informa-tion Management Systems," in SIGMOD Record, V o l . 22, No . 3, pp. 57-75, 1993. [JFS95] Charles E . Jacobs, A d a m Finkelstein, and D a v i d H . Salesin. "Fast M u l -tiresolution Image Querying," in Proceedings: ACM SIGGRAPH'95, 1995. [Kato92] Toshikazu K a t o . "Database Archi tecture for Content-Based Image Re-tr ieval," in Proceedings of SPIE: Image Storage and Retrieval Systems, vol . 1662, , pp. 112-123, 1992. [KK93] Takio K u r i t a and Toshikazu K a t o . "Learning of Personal V i sua l Im-pression for Image Database Systems," in Proceedings: International Conference of Document Analysis, 1993. [LB94+] Denis Lee, Ron Barber , Wayne Niblack, M y r o n Fl ickner , J i m Hafner, and Dragut in Petkovic. Complex queries for a query-by-content image database. I B M Research Report R J 9919 (87299), 1994. [LHQ91] Sang-goo Lee, Lawrence J . Henschen, and Ghassan Z . Qadah. "Seman-tic Query Reformulation in Deductive Databases", Proceedings IEEE: 7th International Conference on Data Engineering, pp. 232-239, 1991. [LF94+] Denis Lee, M y r o n Fl ickner , R o n Barber , J i m Hafner, Wayne Niblack, Dragut in Petkovic. "Query B y Image Content Using Mul t ip l e Objects and Mul t ip l e Features: User Interface Issues", in Proceedings: Inter-national Conference of Image Processing (ICIP-94), Vol .2 , pp. 76-80, 1994. [LP95] F . L i u and R . W . P ica rd . Periodicity, Directionality, and Randomness: Wold Features for Image Modeling and Retrieval. Technical Report T R # 3 2 0 , M I T M e d i a Lab , 1995. 83 [LS90] Sang-goo Lee and Donghoon Shin. "Semantic and Structura l Query Reformulation for Efficient Manipu la t ion of Very Large Knowledge Bases", Proceedings: Computer Software and Application Conference (COMPSAC), pp. 369-374, 1990. [MJ92] Jianchang M a o and A n i l K . Ja in . "Texture Classification and Segmen-tat ion Using Mul t i resolut ion Simultaneous Autoregressive Mode l s" , in Pattern Recognition, V o l . 25, pp. 173-188, 1992. [Min96] Thomas M i n k a . An Image Database Browser that Learns From User Interaction, M . E n g Thesis, M I T , 1996 [MY88] M . M i y a h a r a and Y . Yoshida . "Mathemat ica l Transform of ( R , G , B ) Color D a t a to Munse l l ( H , V , C ) Color Da t a , " in Proceedings of SPIE: Visual Communications and Image Processing, V o l . 1001, pp. 650-657, 1988. [N93+] W . Niblack, R . Barber , W . Equi tz , M . Fl ickner , E . Glasman , D . Petkovic, P . Yanker, C . Faloutsos, G . Taubin . "The Q B I C Project : Querying Images by Content Using Color , Texture, and Shape," in Proceedings of SPIE: Storage and Retrieval for Image and Video Databases, V o l . 1908, pp. 173-187, 1993. [NH94] Raymond T . N g and Jiawei Han . "Efficient and Effective Cluster ing Methods for Spatial D a t a M i n i n g , " in Proceedings: The 20th Interna-tional Conference on Very Large Data Bases, pp. 144-155, 1994. [NS92] Ca ro l L . Novak and Steven A . Shafer. "Ana tomy of A Color His-togram," in Proceedings of IEEE: Computer Vision and Pattern Recog-nition, pp. 599-605, 1992. [OS95] V i r g i n i a E . Ogle and Michae l Stonebraker. "Chabot : Retrieval from a Relat ional Databaase of Images," in IEEE Computer, V o l . 28, N o . 9, pp. 40-48, 1995. [PL94] A . Pope and D . Lowe. " V i s t a : A Software Environment for Computer Vis ion Research," in Proceedings 1994 IEEE Computer Society Con-ference, Seattle, pp. 768-772, 1994. [PM94] R . W . P icard and T . P. M i n k a . "Vis ion Texture for Anno ta t ion , " in Journal of Multimedia Systems, A C M / S p r i n g e r - V e r l a g , V o l . 3, pp. 3-14, 1995. 84 [PPS94] A . Pent land, R . W . P ica rd , S. Sclaroff. "Photobook: Tools for Content-Based Manipu la t ion of Image Databases," in Proceeding of SPIE: Stor-age and Retrieval for Image and Video Databases II, V o l . 2185, pp. 34-47, 1994. [PS85] F . Prepara ta and M . Shamo. Computational Geometry, Springer-Verlag, New York , 1985. [RS92a] F . Rab i t t i and P . Savino. "Query Processing on Image Database," in Visual Database Systems II, E . K n u t h and L . M . Wegner (Eds.) , Elsevier Science Publishers, pp. 169-183, 1992. [RS92b] F . Rab i t t i and P. Savino. " A n Information Retrieval Approach for Image Databases," in Proceedings: The 18th International Conference on Very Large Data Bases, pp. 574-584, 1992. [SB91] Michae l J. Swain and D a n a H . Ba l l a rd . "Color Indexing", in Interna-tional Journal of Computer Vision, V o l 7, No. 1, pp. 11-32, 1991. [SC96] John R . Smi th and Shih-Fu Chang . "Tools and Techniques for Color Image Retrieval ," in Proceedings of SPIE: Storage and Retrieval for Still Image and Video Databases IV, V o l . 2670, pp. 426-437, 1996 [Sch94] Otfried Schwarzkopf. A Manual of Ipe, 1994. [Sch95] Otfried Schwarzkopf. "The Extensible drawing Ed i to r Ipe," in Proceed-ings of ACM: 11th Annual Simposium on Computational Geometry, 1995. [SD96] M a r k u s Strieker and Alexander D i m a i . "Color Indexing wi th Weak Spatial Constraints ," in Proceedings of SPIE: Storage and Retrieval for Still Image and Video Databases IV, V o l . 2670, pp. 29-40, 1996. [Sed95] Andishe S. Sedighian. A comparative analysis of multi-dimensional in-dexing structures for "eigenimages". M . Sc. Thesis, U B C , 1995. [SHK93] R . Samadani, C . Han , and C . Kat ragadda . "Content Based Event Se-lection from Satellite Images of the A u r o r a , " in Proceedings of SPIE: Storage and Retrieval for Image and Video Databases, V o l . 1908, pp. 50-59, 1993. [Sri95] Rohin i K . Sr ihar i . "Automat ic Indexing and Content-Based Retrieval of Capt ioned Images," in IEEE Computer, V o l . 28, No. 9, 1995. 85 [SS94] M a r k u s Strieker and Michae l Swain. "The Capaci ty of Color His togram Indexing," in Proceedings of IEEE: Computer Vision and Pattern Recognition, pp. 704-708, 1994. [SSU94] Hi roak i Sakamoto, Hideharu Suzuki , and A k i r a Uemor i . "Flexible Montage Retrieval for Image Da ta , " in Proceedings of SPIE: Storage and Retrieval for Image and Video Databases II, V o l . 2185, pp. 25-33, 1994. [ST92] M . Schneider and C . Trepied. "Extensions for the Graphica l Query Language C A N D I D , " in Visual Database Systems II, E . K n u t h and L . M . Wegner (Eds.) , Elsevier Science Publishers, pp. 185-199, 1992. [ T M Y 7 8 ] H . Tamura , S. M o r i , and T . Yamawak i . "Textural Features Corre-sponding to V i sua l Perception," in IEEE Transactions on Systems, Man, and Cybernetics, V o l . S M C - 8 , N o . 6, pp. 460-473, 1978. [WJ96] D . W h i t e and R . Ja in . "Similar i ty Indexing wi th the SS-Tree," in Pro-ceedings of ICDT, 1996. 86 Appendix A. Color Space and CIE XYZ - HSV - RGB Conversion Algorithms A . l . Color Space Various approaches have been used to specify colors. Artists often specify color as different tints, shades, and tones of strongly saturated, or pure pigments. A shade, for example, can be created by adding a black pigment to a pure pigment, thereby decreasing lightness. The MunselPs color system measures colors by three intuitive quantities [MY88]. These are hue, which distinguishes colors such as red, green, and purple, saturation, which refers to how far color is from a gray of equal intensity, and lightness, which refers to the perceived intensity of a reflecting object. Such methods of measuring color are, however, subjective. They largely de-pend on the judgement of human observer, the lighting, the surrounding color, and the overall lightness of the environment. A more objective measure of colors corre-sponds loosely to the notion that colors can be specified by positively weighted sums of red, green, and blue (the so-called primary colors). The CIE (Commision Inter-nationale de PEclairage) has defined three computable standard primaries, called X, Y, and Z (corresponds to red, green, and blue) along with the corresponding color-matching functions x,y,z [F90+, Gla95]. The primaries can be used to match all the colors that can bee seen by human. Figure .1 shows the CIE Chromatic-ity diagram which defines all perceivable colors. One use of the CIE chromaticity diagram is to define a color gamuts or range that show the effect of adding colors together. Any two colors, say d and Cj, can be added to produce any color along their connecting line by varying the relative amounts of the two colors being added. 87 Figure .1: Chromaticity diagram However, the CIE XYZ space is not a very intuitive color space. It is difficult to interpret the meanings of the values X, Y, and Z. Another problem is that the distance between two colors in the space are not generally perceived to be equal. That is, the distance between color C\ to C\ + A C i and color Ci to C2 + AC2, although equal to A C , will generally not be perceived as being equal. In relation to the color querying, perceptually uniform color space is necessary since the system notion of similarity should correspond to the user notion of similarity. Based on the CIE XYZ space, another color space, called the CIE LUV, is designed to be perceptually uniform. Given (Xn,Yn, Zn) as the coordinate of the reference white color, the space is defined with the following functions that convert the color in CIE X Y Z space to CIE LUV space: L* = 1 1 6 ( y / y „ ) 1 / ' 3 - W,Y/Yn > 0.01 88 u* = 1 3 X * ( « ' - u'n), v* = 13L*(v'-v'n), , _ AX , _ 9 Y ; , U ~ X + 15Y + 3Z' V ~ X + lbY + 3 Z ' •?/ — i) — " X n + 1 5 Y n + 3 Z „ ' n Xn + 15Yn + SZn A.2. R G B - CIE XYZ conversion The available colors in a monitor device are defined by the chromatici ty of the device's phosphors. T w o different monitors wi th different phosphors may have different (RGB) color gamuts. The specifications of the monitor typical ly provides the information about the chromatici ty weights for the monitor 's RGB phosphors. The chromatici ty weights can be represented as a mat r ix M w i th entries as follows: Xr Xg Xb . Yr Yg Yb Zr Zg Zf) In this matr ix , the triplet (Xr, Xg, Xb) are the weights applied to the moni-tor's RGB colors to find the X of the color in CIE X Y Z space. Similarly, the other two triplets (Yr,Yg,Yb) and (Zr, Zg, Zb) are used to find Y and Z components re-spectively. A n RGB color can then be transformed to the corresponding CIE XYZ space by the following equation: X R Y = M G Z B 89 A . 3 . A l g o r i t h m for converting from R G B to H S V color space [F90+] void RGB_To_HSV (float r,g,b,h,s,v) { // Given: r,g,b, each i n [0,1] // Returned: h in [0,360], s and v in [0,1]. If s = 0, then h i s UNDEFINED max = Maximum (r,g,b); min = Minimum (r,g,b); v = max; // This i s the value v // next calculate saturation s i f max <> 0 s = (max - min) / max; // s i s the saturation else s = 0; // saturation 0 i f red, green,and blue are a l l 0 i f s = 0 h = UNDEFINED; else •[ delta = max - min; i f r = max h = (g - b) / delta; // resulting color i s between yellow and magenta else i f g = max h = 2 + (b - r) / delta; // resulting color i s between cyan and yellow else i f b = max h = 4 + (r - g) / delta; // resulting color i s between magenta and cyan h = h * 60; // convert hue to degrees i f h < 0 h = h + 360; // make sure hue i s non-negative > } \\ end RGB_To_HSV() 90 A.4. Algorithm for converting from H S V to R G B color space color space [F90+] void HSV_To_RGB (float r,g,b,h,s,v) { // Given: h in [0,360] or UNDEFINED, s and v in [0,1] // Returned: r,g,b each i n [0,1] i f s = 0 { i f h = UNDEFINED { r = v; g = v; b = v; } else ERROR } else { // the color i s on the black-white center l i n e // there i s no hue, th i s i s the achromatic case // error i f s = 0 and h has a value // there i s a hue, so th i s i s the chromatic case i f h = 360 // 360 degree i s equivalent to 0 deg h = 0; h = h / 60; // h i s now in [0,6) i = fl o o r ( h ) ; // f l o o r returns the largest integer f = h - i ; // f i s the f r a c t i o n a l part of h p = v * (1 - s) q = v * (1 - (s * f ) ) t = v * (1 - (s * (1 - f ) ) ) ; case 0 1 2 3 4 5 i : { (r,g,b) = (v.t.p) (r.g.b) = (q.v.p) (r.g.b) = (p,v,t) (r.g.b) = (p,q,v) (r,g,b) = (t.p.v) (r.g.b) = (v.p.q) // notation for t r i p l e t assignment } // end HSV_To_RGB(); 91 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items