UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Video database retrieval based on trajectory analysis Gu, Zhe 1999

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1999-0517.pdf [ 9.64MB ]
JSON: 831-1.0051304.json
JSON-LD: 831-1.0051304-ld.json
RDF/XML (Pretty): 831-1.0051304-rdf.xml
RDF/JSON: 831-1.0051304-rdf.json
Turtle: 831-1.0051304-turtle.txt
N-Triples: 831-1.0051304-rdf-ntriples.txt
Original Record: 831-1.0051304-source.json
Full Text

Full Text

Video Database Retrieval Based on Trajectory Analysis by Zhe Gu B.Sc.(Computer Science) Fudan University, 1989 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF M A S T E R OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept this thesis as confirming to the required standard T H E UNIVERSITY O F BRIT ISH C O L U M B I A October 1999 © Zhe Gu, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of C 0 '"put£-r S a . a . / . c ^ 1 The University of British Columbia Vancouver, Canada Date Octo'b^'- ' f ' - H DE-6 (2/88) Abstract The increasing volume of video motivates research on video management. Related topics include video representation, organization, query formulation and search. Much work has been done based on shot segmentation of video, however, the temporal nature of video has not been fully investigated. Our goal is to analyze the motion of objects in video, so that video can be retrieved based on objects' motion it records. We emphasize the importance of motion information in video. A two-level abstraction based on the movement of objects is proposed. The first layer describes the movement of objects with trajectory and speed curves, and the second layer abstracts the curves with geometric features. Points with maximum curvature are extracted, then, based on these points, feature values are calculated that are insensitive to planar translation, rotation and scale. Similarity of trajectories is measured by the similarity of the feature values. A similarity measurement metric with point warping is given. We implemented the R-tree as the indexing structure and specialized it to our application. The trajectory and speed curves are analyzed and the abstract feature val-i ues are organized within the R-tree structure. A QBE interface is provided, and hierar-chical searching is proposed and implemented. Experiments show that video content retrieval based on motion is useful and feasible. Table of Content Abstract • ii List of Figure vi Acknowledgement vii Chapter 1 Introduction 1 1.1 Background and challenge 1 1.2 Overview of existing video indexing approach 3 1.3 Motivation and contribution ; 4 1.4 Thesis outline 5 Chapter 2 Related Work Review 7 2.1 Introduction 7 2.2Previous systems 8 2.2.1 Camera operations 8 2.2.2 Motion of objects 9 2.3 Track moving object from image sequence 12 2.4 Feature extraction 14 2.4.1 Feature catalog 14 2.4.2 Representation of curves 15 2.5 Multidimensional indexing 16 2.6 Conclusion 18 Chapter 3 Similarity of Motion 19 3.1 Feature extraction 19 3.1.1 Feature properties • 20 3.1.2 Description of motion .- 21 3.2 Feature extraction based on curvature 23 3.2.1 Algorithm 23 3.2.2 Deciding the size of curves • 27 3.2.3 Properties of the feature values 31 3.2.4 Modification to exclude rotation insensitivity 32 3.2.5 Experiments •• 33 3.3 Metric for similarity measurement : 38 3.3.1 Property of the similarity metric •••39 3.3.2 Algorithm 40 3.3.3 Experiment of warping , 41 3.3.4 Experiment of similarity metric 42 3.4 Conclusion 44 Chaper 4 Feature Organization and Retrieval 46 4.1 Multidimensional indexing structure 47 4.1.1 Quadtrees 48 4.1.2 K-d trees 49 4.2 Review of standard R-Trees 51 4.2.1Properties ofR-trees 52 4.2.2 Data operation 53 4.3 Apply R-trees to feature values of curves 53 4.4 Query based on features 54 4.4.1 Hierarchical Searching Strategy 55 4.4.2 Logical AND of Query '• 56 4.4.3 Experiments • 57 4.5 Conclusion 67 Chapter 5 Summary and Future Work 68 5.1 Summary 68 5.2 Future Work 68 5.2.1 Efficiency Improvement 68 5.2.2 Group of Curves 69 5.2.3 Integrate with Motion Analysis 69 5.2.4 Integrate with Video Database 70 iv Bibliography List of Figures Figure 1.1 System architecture Figure 3.1 Trajectory and speed curve of a falling apple Figure 3.2 Point with maximum curvature Figure 3.3 Centroid representation Figure 3.4 Preserving local information Figure 3.5 Area size changes with rotation of curves Figure 3.6 Convex hull and MBR of a curve Figure 3.7 Distance from point to line : Figure 3.8 Distance from point to point along given direction Figure 3.9 Speed representation Figure 3.10 Feature Extraction Figure 3.11 Experiment: Features of a curve with changed size.... Figure 3.12 Convex hull and size measurement of a curve Figure 3.13 Experiment: Feature values before and after rotation .. Figure 3.14 Locale feature preserving Figure 3.15 Experiment of warping Figure 3.16 Trajectory similarity Figure 3.17 Similarity of speed curve Figure4.1 Quad tree illustration(Boxes are null subtrees) Figure4.2 2-d tree illustration Eigure4.3 Data(dark circles) organized in an R-tree with fanout=3 Figure4.4 The structure for the R-tree in Figure 4.3 Figure4.5 Hierarchical searching Figure4.6 Query execution Figure4.7 Graphics user interface Figure4.8 Test on tennis video (I) Figure4.9 Test on tennis video (II) Figure4.10 Experiment on serch strategy vi Acknowledgements I would like to express my sincere appreciation to my supervisor, Dr. Jim Little for his support and guidance of this thesis. Thanks Dr. David Lowe to be my second reader and thanks for his valu-able comments. Thanks Deborah and April, they helped me many times to find the materials that I need. Thanks all my friends for the discussing of my thesis and sharing the happy student life with me. Thanks my parents and my husband, they always encourage me through frustrations. Thanks my little baby, she accompanies me every minute at the late stage of my thesis writing and fulfills me with hope and joy. I vii Chapter 1 Introduction 1.1 Background and challenge Video has long been an essential part of the broadcast and entertainment industries, and today, video is also used in corporate, advertising, educational and government envi-ronments, for applications ranging from training and video conferences to web publishing and the documentation of medical, legal and other procedures. Its popularity brings chal-lenges including storage of video on computer systems, real-time synchronized delivery of video, and content-based retrieval [Hampapur 98]. This work addressed the content-based retrieval task. In a traditional database, text and annotations are used to search for the information. However, this approach provides limited capacity for retrieving multimedia information. For example, the associated meta-data cannot fully describe the contents of the material, and different persons may have different understandings of or interpretations of the same object. To enhance this approach, content-based query techniques use visual features of the images and videos in the search process. In image databases, users construct queries using graphical interface tools, or pro-vide examples of images, then the desired images are retrieved based on the features such 1 CHAPTER 1: Introduction as color, texture and shape. Substantial progress has been achieved in image database sys-tems, for example, QBIC [Flickner 95], Virage [Hamrapur 97] and VisualSEEK [Smith 96]. Techniques used in image databases can also be utilized in video databases. A fre-quently used approach towards video is to treat it as an extension of images, that is, to extract key frames from video. Most video information management systems follow a sim-ilar operation: The input stream is parsed to obtain individual shots; key frames from each shot are then extracted and subsequently merged into coherent groups based on certain cri-teria. Such a hierarchical model of film, where the shot, scene, and sequence represent sub-sequent levels, provides a practical framework for automatic content analysis. However, video is not just a collection of images. It is an evolution of temporal relationships, and in the above approach, the temporal nature of video is neglected. This work attempts to capture the useful and important information in video, that is the motion of objects. We use the trajectory curve and speed curve to describe the motion of objects, extract abstract features from them and test the feasibility of indexing and retrieving video based on our feature vectors. Our approach is useful when a user wants to extract a segment of video containing a certain pattern of motion. For example, a TV editor may search the video database for vid-eos containing swinging, end up with a swinging pendulum, child on a swing and a conduc-tor's hand, then group them into a visual appealing program. 2 CHAPTER 1: Introduction 1.2 Overview of existing video indexing approach One stream of research attempts to integrate all available media such as audio, video and captions [Hampapur 97]. Hampapur presents a video engine for content-based video retrieval. They have primitives for all media, for example, the key frame primitive extracts frames from the image sequence; the motion primitive extracts image flow which is a representation of the pixel motion between consecutive images; the audio tracking index primitive produces the audio component and segments the audio information; last, the decoder outputs the ASCII text, and the ASCII text becomes the input to the Virage caption primitive. The difficulty is that the representation and analysis of each of these dif-ferent media remains to be done. Another stream of effort concentrates on visual media alone. Many techniques apply the methods used in still image databases to the key frames of video sequences [Chang 95], [Shi 97], [Kreyb 1997]. VideoQ [Chang 95] includes the major visual features used in image database; as well they have a novel interface for QBE. The video is indexed individually by color, texture, shape and motion. Once the user sends the query, the features in the query are matched against the features of the objects in the database and the candi-date lists are then merged as result. The ultimate feature of video is its temporal aspect. While many systems neglect it, some systems treat it superficially. Several systems tried to use motion as a feature of indexing video [Ioka 92], [Lee 93], [Sahouria 97]. Basically, video sequences are processed 3 CHAPTER 1: Introduction to extract features which describe their semantics. The extracted features are then repre-sented, organized, and stored in the database. In the retrieval process, the system analyzes a query, extracts the appropriate feature vector and then performs a search process. The search process is carried out by computing the similarity between the feature vector of the query and those of the candidate stored in the database. The retrieved video sequences are presented in user interface. 1.3 Motivation and contribution Video database access is a very active research area. Many aspects of video data-bases are investigated. While motion is a very important attribute of video, it is not fully used in indexing and retrieving. In the mean time, research on computational geometry and pattern recognition provides us many useful solutions to the problems that arise in video databases. We try to apply the research results of computer vision and computational geom-etry to this field and improve video indexing. We propose the trajectory curve and speed curve as the representation of object motion. We investigate the problem of feature extraction and propose a feature description so that the search will be based only on geometric features. The feature has two advantages: it is affine transformation insensitive and local feature preserving. We adopt a warping method in our feature matching algorithm so that it is robust to the variation in feature vec-tors. We also test and prove that the R-tree can be used to index the multidimensional fea-tures we get so that search will be efficient and it is scalable to a large database. We 4 CHAPTER 1: Introduction implement a framework to demonstrate the feasibility of the idea. The system gets the trajectory and speed curve of an image sequence first, then extracts the features and inserts them into the database. During retrieval, the system reads the query sketch and gets the features, then searches the indexing tree to retrieve the similar ones. The system architecture is shown as follows: image sequence feature list Figure 1.1 System architecture 1.4 Thesis outline The remaining chapters are organized as follows: Chapter 2 reviews much of the related work that has been done with regards to content-based retrieval. It emphasizes pre-vious work in indexing and retrieving based on motion. Chapter 3 discusses feature extrac-tion and examines the measurement of motion similarity. Chapter 4 discusses our method CHAPTER 1: Introduction of applying the R-tree method in feature organizing. Chapter 5 presents some implementa-tion details of our framework. Chapter 6 concludes the thesis and discusses future work. 6 Chapter 2 Related Work Review 2.1 Introduction Multimedia is an irreversible trend in computing today. Because multimedia not only provides the most expressive way of representing and browsing, but also enhances the overall quality and quantity of information, most areas of information technology are embracing multimedia. As the result of these developments, multimedia, and especially video, will bring dramatic improvements to human-computer interaction. Video is by far the most powerful and expressive multimedia because it is a stream-ing media with high resolution and multiple channels with real multimedia including audio, visual and text information. These properties of video make it a popular medium for captur-ing and presenting information. At the same time as video has multiple channels to convey different media, and it requires massive storage space, it also presents technical challenges for the improvement of data management. Research work involved in video database is mainly in three domains: video stor-age, video delivery and video indexing and retrieval. Our work belongs to the video index-ing domain. In research in this field, most organize video with a hierarchical model based on shots. Several systems are similar to our work, indexing video based on motion informa-7 CHAPTER 2: Related Work Review tion. The approaches are basically as follows: first track moving object in the video sequence, then extract features and index in the feature space. In this chapter we first review works similar to ours, then discuss the research areas that relate to each part of the system; namely, tracking, feature extraction and multidimensional indexing. 2.2 Previous systems In this section we review previous work on video indexing based on motion infor-mation. The apparent motion in a video sequence can be attributed to camera motion or object motion. So we group the systems based on the different motion information that they analyze. 2.2.1 Camera operations Regular camera operations include panning, zooming, tracking and dollying, together with many combinations of these operations. The seven basic camera operations are fixed, panning (camera horizontal rotation), tracking (horizontal transverse movement), tilting (vertical camera rotation), booming (vertical transverse movement), zooming (vary-ing the focusing distance) and dollying (horizontal lateral movement). Different camera operations lead to different optical flow (motions vectors pattern). Akutsu [Akutsu 92] has used motion vectors to automatically extract the camera operation. They first use block matching to extract the motion vectors, then determine the spatial/temporal characteristics of the vectors, that is, the divergence/convergence point and magnitude of the motion vec-tor. At last the camera operation is estimated. 8 CHAPTER 2: Related Work Review An alternate approach in detecting camera operations is to examine what are known as the X-ray images [Akutsu 94]. Edge detection is first performed on all the frames within a shot. A horizontal X-ray image is then obtained by taking a weighted integral of the edge frames in the horizontal direction. Similarly, a vertical X-ray image is obtained by taking a weighted integral of the edge frames in the vertical direction. Camera operations are obtained by approximating the spatial distribution of the edge angles of the horizontal and vertical X-ray images. 2.2.2 Motion of objects Here, image sequences are indexed based on the motion properties of objects within the sequence. The systems we will discuss addressed the same problem: how to index and retrieve video sequences (or image sequences, as only visual information is used) based on motion similarity of the objects they record. However, the systems used different approaches. Ioka, Dimitrova, Lee and Sahouria index video based on motion of objects mainly, while VideoQ and Virage integrate motion of objects as one of the attributes in the system. Ioka and Kurokawa [Ioka 92] proposed a method for retrieving image sequences by using motion information. To start with, each frame is partitioned into rectangular blocks. Motion vectors are derived from the image sequences using block matching. These vectors are mapped into spatiotemporal space and the motion of each block is then represented as a single vector in the feature space. The representative trajectories are stored in the database. Queries can be specified using an interactive query specification mechanism which allows 9 C H A P T E R 2: Related Work Review the user to enter a motion trajectory. The specified trajectory is matched with the trajecto-ries of the sequences in the database using a distance measure and the sequences with the smallest distance are retrieved. The distance measure is the sum of the Euclidean distance between the query vector and the stored vector after each time interval. Dimitrova and Golshani [Dimitrova 94] have proposed a technique based on the motion compensation component of the M P E G video encoder. The trajectory of a macrob-lock is computed from the forward and backward motion vectors that belong to the macrob-lock. The position of a macroblock in a P-frame is computed using block coordinates and forward motion vectors. The position of a macroblock in a B-frame is computed by averag-ing the positions obtained from 1) the next predicted block coordinates and the backward motion vector and 2) the previous block coordinates and forward motion vector. Each tra-jectory can be thought of as an n-tuple of motion vectors. The macroblock trajectories are the feature vectors used for indexing. Lee and Kao [Lee 93] presented a qualitative description which enable subsequence query, Object motion is represented using a combination of the following 12 primitive motion types: 1. Translation: North, northeast, east, southeast, south, southwest, west, northwest. 2..Translation in depth: close to the camera, away from the camera. 3. Rotation: clockwise, counterclockwise. 4. Rotation in depth: rotate to left, rotate to right, rotate upward, rotate downward. Sahouria [Sahouria 97] analyzed surveillance videos based on the motions of 10 CHAPTER 2: Related Work Review objects in the scene. After tracking, trajectories are represented by two vectors: X-[xl,x2,...xn] and Y=[yl,y2,...ynJ. The Haar wavelet transform is computed separately for the two vectors. Then the first eight coefficients of the projection transform are stored as search key. The index structure is a list of keys, and simple linear scanning is used to search the list. VideoQ [Chang 97] emphasizes motion as an attribute. They use vectors to present the interframe motion, and compare the trails without compressing it. Virage also include motion as an indexing feature. They have four dimensions for motion description: motion content measures the total amount of motion within a given video. Motion uniformity measures the smoothness of the motion as a function of time. Motion panning captures the panning motion and motion tilting measures the vertical motion component of the motion within a video sequence. ^ The following problems exist in the previous work: 1. Appropriate features are needed that will preserve most of the trajectory informa-tion and in the mean time enable quick searching. Ioka's features can not be easily com-pared, as the original trajectory is not compact, Lee's features can not preserve the trajectory and the velocity. Sahouria's features do not support subsequence retrieval. 2. The indexing method needs improvement. Both Ioka and Lee's work put no effort on indexing, and Sahouria only uses linked lists as an index structure, which is not scalable to large database. Without testing the features with an indexing structure, the feasi-11 CHAPTER 2: Related Work Review bility of the features is unknown. 3. As similarity measurement happens on-line, not like feature extraction, which is conducted when a new item is inserted into the database, their comparison methods involve too much computation. We are going to review each step of feature extraction, distance metric and multidi-mensional indexing in order to get an improved system. First, the following is a short review on tracking moving objects to compute their trajectories. 2.3 Track moving object from image sequence Object tracking has been an active field of research in computer vision for more than a decade. Existing methods for object tracking can be broadly classified as feature-point tracking and optical flow. Feature-point tracking concerns the matching of character-istic tokens through time, while optical flow consists of the computation of the displace-ment of each pixel between frames [Shah 92]. Feature-point tracking deals with extracting interesting points, characteristic fea-tures in an image that can be tracked in time. This correspondence for multiple frames results in what is called a motion trajectory, i.e., a sequence of locations (jt,-,y,-), for i -l,...,n, where n is the number of frames in the sequence. A motion trajectory can thus be considered as a vector-valued function, that is, at each time we have.two values x and y. Verestroy [Verestroy 98] presented an systematic performance evaluation study of feature point tracking algorithms. Their work provides guidelines to selection of tracking technique 12 CHAPTER 2: Related Work Review by considering the algorithms' scope, error rate, processing time, point density and speed. Optical flow can be computed from a sequence by considering two consecutive frames each time. From the result of optical flow analysis, more elaborate motion represen-tations can be built. In principle, the flow vectors in successive frames can be linked across a sequence of frames into motion trajectories. Barron [Barron 92] has implemented nine existing optical flow techniques and has presented a very good performance evaluation of the techniques. A good object tracking method suitable for an interactive video application should satisfy [Toklu 99]: 1. Produce reasonable adjustment of object markers in response to motions occurred in the scene. 2. Work on a variety of videos and big number of video frames. 3. Independent of the encoders and different encoding algorithms, when applied to compressed video. Toklu proposed a method for tracking a video object in an ordered sequence of two-dimensional images, where the outcome is the trajectory of the video object throughout the time sequence of images. As our work addresses a general problem, this is not included in our framework. Where a specific domain is concerned, an appropriate method can then be chosen and inte-13 CHAPTER 2: Related Work Review grated into our f ramework . Current l y , we get trajectory a nd speed curves manua l ly. S p e ed cu r ve ind icates the speed i n each t ime interva l . A s we a lready k n ow the tra-j e c t o ry o f an object, the d i rect i on o f mo v emen t is known. S p e e d w i l l add more i n f o rmat i o n on the mo v emen t o f object. T h e trajectory is f o rm e d by each pos i t i on o f object, so speed cu r ve c a n be got by obta in the t ime interva l and the d i s p l a cement o f an object between two success i ve frames. 2.4 Feature extraction A f t e r the trajectory and speed curves are c ompu ted , we can c ompa re them direct ly. Howe ve r , the c ompa r i s o n is l im i ted, because the representat ion of the curves is not abstract, so extens i ve c omputa t i o n is i n v o l v e d and it is a f fec ted by the abso lute pos i t i on a nd p lanar rotat ion. W e want to abstract the representat ion o f the cu r ves so that the c ompa r i s o n o f the curves w i l l not be computat i o na l l y c omp l ex , and be m o r e robust. In this sess ion, we d iscuss work on feature extract ion a nd the methods that app ly to our demands. 2.4.1 Feature catalog Features are used when data are o rgan i zed into a database or searched f r o m the database. A s they must be extracted and stored at the t ime o f data entry, they shou l d be determ i ned carefu l ly. T h e r e are two k i nds o f features: 1. M e t a features: the s ize o f the image, photogeography, data taken, reso lut ion, etc. M a n y o f these features can be r ead by the system either f r o m the header, f i l ename, or f r o m other s im i l ar sources. 14 C H A P T E R 2: Related Work Review 2. Derived features: these features are derived directly from the image data when data are inserted into the database. Values are automatically calculated for these features using automatic or semiautomatic functions. They are commonly required for answering queries. We consider the trajectory of an object as a 2D curve on a plane over time. So the feature that we extract should represent the characteristic of the curve. Meta features can not satisfy this requirement. We should use derived features of the curve. These features should represent the curves quite well, enable comparison and not be sensitive to planar transformation as we expected. 2.4.2 Representation of curves We consider the trajectory of an object as a 2D curve on a plane over time. When the problem of representing motion is converted to the problem of representing curves, it falls into the pattern recognition area in which shape representation has been investigated for some time. Various curve representations have been proposed in the computational vision liter-ature. The two most popular schemes for curve approximation are polygonal and spline approximations. Polygonal approximations are used to approximate the shape boundary using the polygonal line. These methods are based on the use of the minimal error, the minimal poly-gon perimeter, the maximal internal polygon area, or the minimal external polygon area as 15 C H A P T E R 2: Related Work Review approximation criteria. One of the most popular methods in this group is the split and merge algorithm. In this approach, a curve is split into segments until some acceptable error is obtained. At the same time, split polygonal segments are merged if the resulting segment approximates the curve within some maximum error [Pavlidis 77]. Splines have been very popular for the interpolation of functions and the approxi-mation of curves. Especially, B-splines are constructed so that the local function value change does not spread to the rest of the intervals. B-splines can be used for the interpola-tion of plane curves given by parametric equations. In this case, each parametric equation is interpolated independently. Cohen et al. [Cohen 95] proposed a technique for curve repre-sentation and matching using B-splines. Both approaches agree on the significance of high curvature points for visual per-ception. In the field of pattern recognition, extraction of critical points with high curvature has been investigated [Ansari 91] [Chang 91] [Freeman 78] [Mokhtarian 86]. To be suit-able for indexing, the features should be independent of translation, rotation and scaling of objects; the number of features used should be as small as possible and the computational time should be short and storage space requirement should also be small. We adopt a maxi-mum curvature approximation in our work to derive feature points [Chen 86]. 2.5 Mult idimensional indexing In order to access the data quickly, features are organized into an indexing struc-ture. Over the last two decades a number of multidimensional search structures have been 16 C H A P T E R 2: Related Work Review analyzed and implemented. These include point-quad trees [Finkel 74], k-d trees [Bentley 75], R-trees [Guttman 84] and their variants. A k-d tree is a multidimensional binary search tree. Each level of the tree discrimi-nates for one attribute. With, k-d trees, the number of levels can become quite large. The search may not so efficient. Point-Quad trees are analogous to B-trees with multiple branches. The approach is to partition comparisons at each level of the tree. Each node of a k-dimensional quad tree partitions the object space into k quadrants. It is good for searching multidimensional points. R-trees are also the extension of B-trees for multidimensional objects. They can be viewed as higher-dimensional generalizations of B-trees. Closely connected to an R-tree definition is the notion (in a two-dimensional space) of a rectangle that contains points or segments in a spatial database. Given a query such as, find all rectangles that contain rect-angle Q, the R-tree traversal will start from the root and traverse to the leaf node containing the rectangle and select the next node at each level that contains the rectangle. R-trees are good for range searching. In our case, as we will have a vector of 2D points describing curve features, and our typical query is about nearest neighbors, so the R-tree is the one that fits our situation quite well. 1 7 2.6 Conclusion W e r e v i e w e d the p r e v i o u s r e s e a r c h w o r k r e l a t e d to o u r t o p i c . T h e r e h a v e b e e n l i m i t e d attempts o f i n d e x i n g v i d e o b a s e d o n m o t i o n i n f o r m a t i o n . T h e s e attempts h a v e s o m e l i m i t a t i o n s . In the m e a n w h i l e , p r o g r e s s has b e e n a c h i e v e d i n s h a p e r e p r e s e n t a t i o n a n d m u l t i d i m e n s i o n a l i n d e x i n g w h i c h w e c o u l d u t i l i z e . O u r w o r k uses g e o m e t r i c a n a l y s i s a n d s p a t i a l data i n d e x i n g to s o l v e the f o l l o w i n g p r o b l e m s : The representation of motion: W e use t ra jector ies a n d s p e e d c u r v e s to r e p r e s e n t the m o t i o n o f objects . T h e n c o m p r e s s the c u r v e s i n t o a f f i n e t r a n s f o r m a t i o n i n s e n s i t i v e features . The measure of similarity: W e a d o p t the w a r p i n g m e t h o d i n the s i m i l a r i t y m e a s u r e m e n t m e t -r i c . Indexing structure: W e a d o p t R-trees to o r g a n i z e the features a n d d e s i g n a h i e r a r c h i c a l s e a r c h -i n g strategy. 18 Chapter 3 Similarity of Motion In this chapter, we will discuss how to measure the similarity of two objects' motion, and present our feature extraction method and similarity measurement metric. 3.1 Feature extraction An object's movement can be described by its trajectory and speed. When a rabbit jumping in front of us, we can notice its up and down, and know if it is running quicker and quicker. If a user wants to retrieve video that contains motion similar to a rabbit's jumping, he can draw a trajectory and a speed curve to describe it. In this case, the absolute value of speed may not mean as much as the trend, such as constant versus changing, increasing ver-sus decreasing. Our system records the position of an object in each frame of a video sequence to describe its trajectory, and uses the speed curve to record its speed trend. The speed curve is a 2D curve, with x-coordinate indicating time, and the y-coordinate indicat-ing speed. Given a video sequence, the trajectory and speed curves of the moving objects in it can be obtained by motion analysis. The trajectories and speed curves are 2D planar curves, which can be represented by linked lists of 2D points. They can be compared directly, but the comparison is computationally complex and not invariant to affine transformations. By 19 C H A P T E R 3: Similarity of Motion feature extraction we expect to make representation of the curves simple so that similarity matching will be efficient. In the following section we discuss the properties of these fea-tures. 3.1.1 Feature properties Feature extraction is used widely in multimedia database application to reduce the volume of information used to index. Features are extracted from image sequences and stored in the database. As is well known, different applications may require different fea-tures. Since features are calculated and stored at the time of data entry, we must carefully decide the features that will be used in our system. As mentioned in Chapter 2, we can con-sider meta-features and derived features. In our case, the features that we use must describe the motion of objects, so meta-features are not our choice. We will try to find the derived feature as indexing values. We also need to notice the properties that a good feature should have. Features can be an image or a number, information preserving or non-information-preserving. A set of criteria for measuring features of motion is: Accessibility: describes how easy it is to compute a feature. We do not want an algorithm that is computationally very complex. Stability and sensitivity: describe how sensitive the feature is to "small" changes. We want the feature values to represent the main trend of the motion while neglecting triv-ial details. 20 C H A P T E R 3: Similarity of Motion Rich local support: requires that a representation is information preserving (rich) and can be computed locally. 3.1.2 Description of motion There are many ways to describe the motion of an object. Is it slow or fast? Is it continuous or discontinuous? We choose to use the speed curve and trajectory to describe motion, because this is the most common way human beings describe motion. When we say the ball drops from a tower, we know that the trajectory of the ball is a straight line, its velocity is increasing by 9.8m/sec . If it is thrown out, then it goes along a parabola, its velocity is decreasing along the x-axis, while increasing along the y-axis. Given an image sequence, we can get the moving object's position in each frame by tracking, and then construct the trajectory by assembling the positions into a curve. We can also compute the speed if the time interval between two frames is known. So we will have a pair of curves, one standing for the trajectory of the object, another standing for its speed. For example, to describe an apple falling down from the tree, we have the trajectory and speed curve in Figure 3.1. As we know, the apple follows a straight line down and its speed increases by 9.8m/second 21 CHAPTER 3: Similarity of Motion speed timd Trajectory Speed curve Figure 3.1 Trajectory and speed curve of a falling apple From an image sequence to curves, the size of the material has been reduced a lot; however, the trajectory and speed information in the form of curves is not condensed enough, and they are sensitive to translation, rotation and scaling. When a user is trying to retrieve videos according to motion information, e.g., find a one man dash in basketball game video, he doesn't mind either the position of an object in the frame, or the distance from the camera to the object, or the camera's rotation. This may not apply to all the situa-tions, however, we work based on this assumption, and assume invariance to translation, rotation and scaling is an objective of our feature extraction method. In this section, we try to find an abstract representation for them. This representa-tion should be insensitive to affine transformation, and easy to compare. Thus the problem of feature extraction of motion turns into feature extraction of curves, and we are going to discuss it in Section 3.2. With the feature values of motion, the motion similarity can be evaluated by the feature values' similarity. 22 C H A P T E R 3: Similarity of Motion 3.2 Feature extraction based on curvature 3.2.1 Algorithm The image sequences are processed to get the trajectory and speed curves. In this part, we focus on how to get the computed feature from the curves. The similarity measure is based on models of human perception. As a result of research on human visual perception, corners and high curvature points are important infor-mation that helps us to identify objects. Here we first get the points with high curvature, then work out the universal features from that. Curvature computation: On the real Euclidean plane, the curvature of a curve is defined as the rate of change of slope as a function of arc length. For a curve y - f(x), it can be expressed as: d2y/dx2 r fdx\2i3/2 HI)] Chang [Chang 91] proposed a maximum curvature approximation as follows: Let the trajectory points be denoted by p,= (x,-,y,), I <i<n , where pj is the first point and pn is the last; let pt be a trajectory point which has been chosen as a feature point, and let pj be a boundary point starting from pi+2. We find the straight line L that connects the point pi and py Let the Euclidean distance from the points p^, i<k<j, to L be denoted by d(i,j,k) as shown in Figure 3.2 23 CHAPTER 3: Similarity of Motion d(i,j,k) Figure 3.2 Point with maximum curvature Here, d(i,j,k) can be seen as a distance from a point to a chord L. Then d(i,j,k) can be computed as follows. The point pk with d(i,j,k)> threshold will be regarded as a feature point. See Section 3.2.5, experiment 1. One problem of this method is that the threshold is the same regardless of the curve size, which does not satisfy the demand of invariance. We improve this by setting the threshold as a percentage of the curve size, so that if the shape of the curve is the same, they will get the same number of feature points. How to decide the size of curves is discussed in Section 3.2.2. Pseudo code of the algorithm: -d(i,j, k) = (yj - ydxk - (xj - xdyk + (y,-** - *,-y/) ((Xj-X.)2 + iyry.)2y 24 C H A P T E R 3: Similarity of Motion 1. threshold = (width+height)*percentage 2. Start from pi to extract a l l the feature points. 3. Pi i s a feature point with coordinate (x^, y^) and Pj i s a boundary point s t a r t i n g from Pi+2 • Find the maximum distance D from the t r a j e c t o r y points to the l i n e from p i to pj 4. If D>threshold, then the point p k. with the coordinate (x k. , yk,) i s extracted as a feature point and a new search f or a feature point i s st a r t e d again with p k. instead of P i . 5. If D < = threshold, then advance Pj to the next neighbor point Pj. and compute the d ( i , j ' , k ) f or a l l k where i<k<j' 6. A f t e r the feature points are extracted, they are stored i n an ordered l i n k e d l i s t . The feature points are much fewer in number than the points in the original curve; however, their geometric positions are decided by the size, position and orientation of the curve. So we need to abstract them to get the values that will preserve the feature of the curve yet be invariant to the size, translation and rotation. The first representation we consider is using the centroid of the feature points [Free-man 1978]: The centroid of the feature points is defined as follows: n n xc = \ X xi and yc = \ X y« where n is the number of the feature points in a curve. The centroid is used as the central reference point, and the distance between each feature point and the centroid is computed. These distances are all divided by the average of 25 CHAPTER 3: Similarity of Motion the distance to make them scale invariant, and the relative distances and angles between them are stored in a list (Figure 3.3). p 2 ^ ? 4 P l len, ' " ' P 3 Figure 3.3 Centroid representation For the curve above, the index list will be: ((Ij,0]),(l2,62),(l3,63)) where 0- is as illustrated, = len^avgLen, and avgLen = ^ leni i = 1 The drawback of this method is that, although it preserves the curve under affine transformation, as the centroid depends on all feature points, and each pair of index values depends on the centroid, so the result is that we can use the index list to compare the simi-larity of whole curve, but it is not possible to compare and retrieve a segment of the curve. We propose a method that depends only on local parameters as follows. Preserving local information metric . The first step is the same as centroid based met-ric. We get the feature points as those with maximum curvature, and then we produce an indexing list as follows: C H A P T E R 3: Similarity of Motion ((r],6j),(r2,62),...,(rn,6n)), where r,= /j/Z,_/, and 0,- as illustrated in Figure 3.4. v l - v 2 01 = acos v l | • |v2| v 3 v l ^ ' • V 2 Figure 3.4 Preserving local information As each pair of index values in the linked list only depends on three feature points, it preserves the local information of the curve, so this method makes it possible to compare only a segment of the trajectory. 3.2.2 Deciding the size of curves In the algorithm we discussed above, in order to make sure that scaling doesn't affect the result, we define the threshold according to the size of a curve. We can use the minimum boundary of the curve to measure the size of a curve; however, when the curve rotates, the area is changed (Figure 3.5), because the threshold is decided by the size of the area, so the threshold is changed. This causes the problem that the algorithm yields differ-ent results before and after a curve rotates. 2 7 C H A P T E R 3: Similarity of Motion J Figure 3.5 Area size changes with rotation of curves To solve this problem, we get the convex hull to wrap the curve first, then compute the area of the rectangle aligned with each edge of the convex hull and use the minimum one to measure the area the curve covers. We use Graham Scan [Graham 72] to calculate the convex hull. The algorithm is as follows: p u b l i c v o i d convex() { i f (curve.size() >0) { //get point Pi with max Y coordinate; i n t maxY = 0; i n t p i = 0; for ( i n t i = 0; i < trackcopy.size(); i++) { Point p = (Point)trackcopy.elementAt(i); i f (p.y > maxY) { maxY = p.y; p i = 1 ; 28 C H A P T E R 3: Similarity of Motion //get angles between l i n e Pi to each point and X axis Point p i = (Point) curve. elementAt (pi) ,-Point pO = new Point(pi.x-2, p l . y ) ; for (int i = 0,- i < curve. s i z e () ; i + +) { i f ( i != pi) { Point p =(Point) curve.elementAt(i); f l o a t angle = Tool.anglel(pO, p i , p) ; p.angle = angle; } } // s o r t i n g points by angles Vector sorted = Tool.sort(curve); // get convex h u l l points hull.add((Point)sorted.elementAt(0) ) ; hull.add((Point)sorted.elementAt(1) ) ; fo r ( i n t i = 2; i < s o r t e d . s i z e ( j ; i++) { hull.add((Point)sorted.elementAt(i)); boolean end = f a l s e ; while (hull.size()>2 && lend) { Point rightp = (P o i n t ) h u l l . e l e m e n t A t ( h u l l . s i z e ( ) - 1 ) ; Point centerp =(Point) hull.elementAt(hull.size()-2) ; Point l e f t p = (Point)hull.elementAt(hull.size()-3) ; f l o a t a = T o o l . a n g l e l ( l e f t p , centerp, r i g h t p ) ; i f ( a >= 0 ) hull.removeElementAt(hull.size()-2); else end = true; } } . } After we get the convex hull of a curve, for each edge of the convex hull, we get the minimum rectangle that encloses the convex hull and which has one side overlapped with the edge, and then calculate its area. The minimum area among the areas that we get is used as a measure of the curve size. 29 CHAPTER 3: Similarity of Motion Figure 3.6 Convex hull and M B R of a curve a. Original curves b. Convex hull of the curve c.The minimum boundary rectangles that align with each side of the convex hull and cover the convex hull We use points set (Pj, P2, PnJ to represent a convex hull. For each edge PiPt+j, we calculate the distance from every convex hull point to it and use the maximum one as the height of the bounding rectangle. We then calculate the maximum distance between the points along the direction that is parallel to PiPi+j. P(Px>Py) \ \ ^ / B ( b x , b y ) A(a x,a y) Figure 3.7 Distance from point to line 30 CHAPTER 3: Similarity of Motion The distance from point P(px, py) to line A B is: (fry ~ ay)Px - (bx - ax)Py + (ayPX ~ axPy) ((bx-ax)2 + (by-ay)iy<i Figure 3.8 Distance from point to point along given direction The distance from point P(Px,py) to line AA1( A A 1 LAB )is: d (bx - ax)Px + (fry - ay)Py + (<*y ~ b y ) a y + (aX ~ b X ) a x J ( b X - a x ) 2 + (by-ay) and: * = (bx - ax)PX + (fry - ay)Py + ( « y ~ b y ) a y + K ~ bx)ax c = (ay-by)ay + (ax-bx)ax If (t*c) < 0 then d = -d. 3.2.3 Properties of the feature values The feature values we get from local preserving metric have the following proper-31 CHAPTER 3: Similarity of Motion ties: 1. Insensitive to transformation, rotation and scaling. The algorithm represents only the relative position of the feature points, so the method is invariant to affine transformation, such as translation, rotation in a plane and scaling. 2. Preserving local attributes As the representation only uses three nearby feature points, it can preserve the local information quite well. If two curves are similar, and same threshold is applied, the local information that is preserved will be similar. The properties satisfy our requirement of choosing feature values, so we use them as our indexing values of database. Each image sequence is represented by a linked list; we call this linked list the indexing value of the image sequence. 3.2.4 Modification to exclude rotation insensitivity For the speed curve, we use 2D curves to show the trend of speed. In Figure 3.9, CI represents the speed of an object that keeps increasing until time tj, then keeps decreasing. C2 describes an object moving with a constant speed until time tj, then changing to a lower speed and moving with that speed. 32 C H A P T E R 3: Similarity of Motion speed tj speed ti C l time C2 time Figure 3.9 Speed representation Because the y-coordinate of the speed curve indicates the relative speed of an object, a line y=k means that the object maintains the same speed k, while a line y-ax+b means an object moving with a increasing (a>0) or decreasing (a<0) speed at a constant rate. So rotation of the curve matters when we compare the similarity of speed curves. We modify the algorithm described above so that the angle parameter of each feature value stands for the angle between the segment and the horizontal axis. With the modification, when a curve rotated, the indexing value that we get will be different from before rotating. This is reasonable, because the curve is indicating the speed of an object, so a rotated curve has a different meaning for changing speed. .3.2.5 Experiments These experiments demonstrate the invariance of our curve representation to scal-ing, translation and rotation. Experiment 1. Feature extraction 33 CHAPTER 3: Similarity of Motion We randomly draw a curve (left), then choose threshold 5% to get the feature points (right). Figure 3.10 Feature Extraction Experiment 2. The feature points extraction works with scaling. We use the same parameters to compute and get the feature points of a curve, and then reduce its size by scaling. The feature points are shown as following and the resulting feature values are plotted. (a)The original curve and scaled 34 CHAPTER 3 : Similarity of Motion (i) Relative length of segments in the three curves.(ii) angle between the segments in the three curves Figure 3.11 Experiment: Features of a curve with changed size We have an original size curve as shown, then we scale it by 0.75 and then by 0.375. The smallest one is shown above in Figure 3.11. The curves and features points are shown in figure(a). The same approximation is used for all three curves, and we get the indexing values based on the feature points. We draw graphs to illustrate the values. The result shows that the indexing values are very close when the curve size is changed. Experiment 3. Measuring the size of a curve We have a curve as shown, and compute its convex hull and the area of its mini-mum bounding rectangle. We then rotate it and get the convex hull and area size of the rotated curve. The result is as illustrated. 3 5 CHAPTER 3: Similarity of Motion Figure 3.12 Convex hull and size measurement of a curve The area size before rotation is 11610, and after rotation is 14000. Their square roots are 117.7 and 118.3, which is less than one pixel's difference, caused by arithmetic calculation. The result is very convincing that with the help of convex hull to measure the size of the curves, our algorithm can be quite stable under rotation. Experiment 4. The feature points extraction works with rotation. We first derive the feature points and indexing values of an original curve, then rotate the curve. We extract the feature points and indexing values using the same thresh-old. An example result is shown in Figure 3.13. 36 CHAPTER 3: Similarity of Motion Curves: / ^ "... * AJ]/ Indexing Values: 0.979, -110 0.966,-110 1.248, 108 1.299, 106 1.405, -157 1.388,-155 1.429, 163 1.430, 163 (i) Original (ii) Rotated Figure 3.13 Experiment: Feature values before and after rotation Experiment 5. Preserving local feature We draw a curve C], then draw a longer curve which has segments similar to the previous one. We find their feature points and then calculate their feature values. Results are shown as follows: . A ^Al A Curve C l Curve C2 37 CHAPTER 3: Similarity of Motion The feature values of C j (i) The length of feature values (ii) The Angle of feature values The feature values of C2 (i)The length of feature values (ii) The angle of feature values Figure 3.14 Locale feature preserving From Figure 3.14 we can see that a certain pattern appearing in Cj also appears in C2 twice, and the corresponding feature values of C2 reflect the repeated pattern. 3.3 Metric for similarity measurement After we extract the points with maximum curvature from curves and represent 38 C H A P T E R 3: Similarity of Motion them with a linked list, the linked list has much less data than the original trajectory. We call them feature values and will store them in the database as indexing values. The similar-ity of the videos is measured based on them. In this section, we define a similarity metric for the curves based on the feature values. 3.3.1 Property of the similarity metric We have two trajectories A and B. If A and B are similar in shape, our system should report a match and return a measure of how good that match is. The property that a measure should have: a. It should be invariant under translation, rotation and change of scale. b. It should be reasonably easy to compute. c. It should match our intuitive notions of similarity. As our feature values are representation of the curves, which is insensitive to trans-lation, rotation and change of scale, we only need to concern about its simplicity which should match our intuitive perception. One problem we need to solve is that, when we extract the feature points, the fea-ture points of the curve may change due to rotation of the curve. The length of the feature values (a linked list) will change with it. Because the curves are similar, our metric should report the correct result. 39 C H A P T E R 3: Similarity of Motion 3.3.2 Algorithm Euclidean distance is a good measurement for our purpose. However, as feature points are decided by approximating maximum curvature, two curves with similar shape from our human judgement may have a different number of feature points. Therefore, we use warping [Yi 1997] when we measure the distance of the two linked lists. When we compare two sequences, sometimes points at different places of the sequence may be closer to each other than the points at the same place. The idea of warping is to repeat and shift the elements so that the two sequences can get their closest match. Given a sequence V with N points in feature space, e.g., (v],v2,...,vn), let Head(v) denote vj and rest(v) denote (v2,...,vn). We want to allow the stuttering transformation on it. (stuttering: repeats v(- and shifts the elements to the right.) For any non-null sequence v and w, the warping distance is defined as follows: Dwarp(<>.<>) = 0; Dwarp(K<» = Dwarp(W,<>)= oo Dwarp(v>w) = Dbase(Head(v), Head(w)) + D w a r p ( V > R e S t ( W ) ) min- Dwarp(Rest(v),w) Dwarp(Rest(v),Rest{w)) where <> denotes a null sequence. D b a s e is the Euclidean distance. This definition does not require two sequences having the same length. 40 CHAPTER 3: Similarity of Motion The pseudo code is as follows: /* i n i t i a l i z a t i o n * / 1 m = l e n ( v ) ; n= len(w); M [ 0 , 0 ] = 0; M[l...m,0]= i n f i n i t e ; < M[0,l...n] = i n f i n i t e ; /* compute p a r t i a l r e s u l t s * / for ( i = 1; i<m; i++) for (j = 1; j< n; j++) { M[i,j]=Dbase(vi,wj)+min(M[i-1,j],M[i,j-1],M[i-1,j-1]) ; } return M[m.n] r 3.3.3 Experiment of warping While in our experiment the number of feature points and indexing values are the same after operation of rotation and scaling, similar curves may have different number of feature points. When this happens, it is not very good to use pair to pair comparison. We use warping to solve the problem. Here we test our algorithm with two angle vectors: 41 C H A P T E R 3: Similarity of Motion VI = [-42,-85,105,-47,-54,-61], V2 = [-54,-49,-40,122,-54,-88] Figure 3.15 Experiment of warping After warping, we can see that the points with arrows have been repeated. The two curves now match better, which is closer to human perception. 3.3.4 Experiment of similarity metric Experimentl: Trajectory comparison. Table 3.1 shows the result of comparing a group of curves. Less distance means more similar. We compare all the curves with the curve Cl; the number besides each curve indicates the distance to Cl. 42 CHAPTER 3: Similarity of Motion i C I 7) c \ \ V \ \ 1 I C2: 148 C3: 303 C4: 194 C5: 347 U' ' 1 D > C6: 306 C7: 147 C8: 82 C9: 91 Figure 3.16 Trajectory similarity The above result resembles our perception of the curves quite well. C8 and C9 look like CI most, and our similarity metric ranks them best. 43 C H A P T E R 3: Similarity of Motion Experiment 2: Speed curve C l /W ) C2: 84 C3: 133 C4: 296 C5: 37 C6: 237 C7: 198 C8: 210 C9: 196 Figure 3.17 Similarity of speed curve When we compare the speed curve, only those curves with the same shape without rotation should be considered similar. Among the curves above, C2 and C5 are both similar to Cl, but C5 is closer to Cl than C2, so C5 has less, distance than C2, and this is what we want. 3.4 Conclusion In this chapter we discussed the similarity measurement of motion. We first extract 44 C H A P T E R 3: Similarity of Motion the feature points as those with maximum curvature, then transform them into an affine transformation insensitive linked list. We measure the distance of the linked lists to decide the similarity of the motions. In the procedure of feature extraction, we used the convex hull to get a relative con-sistent area size of a curve so that the algorithm of feature extraction is very stable under rotation. When comparing two curves' indexing values, as similar curves may have different number of feature values, we use point warping to calculate the distance. Our experiments show that the result of the comparison algorithm with warping fits human perception quite well. 45 Chapter 4 Feature Organization and Retrieval In last chapter we described feature extraction. Given a video sequence, we can derive feature values of trajectories and speed curves. They are linked lists of 2D points with varying length. To retrieve videos with similar object motion, we now only need to compare their feature values. In order to accelerate comparison and search, two aspects can be considered. One is to reduce the dimension of feature values, and another is to use an efficient indexing technique. If we consider each point in the list as a dimension, as the length of the feature vec-tors can be very long, the whole feature vectors are high dimensional. The most direct way to accelerate search and comparison is dimensionality reduction. Techniques for dimen-sionality reduction for multidimensional data include singular value decomposition (SVD), discrete cosine transform (DCT) and principal component analysis (PCA). S V D and P C A are practical only when a large sample of the data set is known (for example, human faces). One shortcoming of D C T is that it neglects local information. In our case, we do not make any assumptions about the video that we process, and also do not have restrictions on what kind of motion an object has, so S V D and P C A do not apply here. As details are important to us, D C T does not work either. Therefore, we index our feature vectors directly without 46 C H A P T E R 4: Feature Organization and Retrieval dimensionality reduction. In this chapter, we first review multidimensional indexing structures, then present our implementation of R-trees to index feature values of curves and nearest neighbor search strategy. 4.1 Multidimensional indexing structure When we have a large collection of data, we need management tools and search tools to enable query. One of the typical queries.is the range query: Given an image sequence, retrieve all the image sequences that are similar to it. In order to perform the query efficiently, an indexing method is needed to organize the feature values of image •re-sequences efficiently. As the feature space is multi-dimensional, classical one-dimensional indexing structures such as B-trees do not work. And because we require range searching, structures based on exact matching of values, such as hash tables, are not useful either. Several spatial access methods have been proposed in the database literature. These methods fall into the following broad classes: methods that transform rectangles into points in a higher dimensionality space [Finkel 74]; methods that use linear quadtrees; and meth-ods based on trees (R-tree, k-d-tree, k-d-B-tree). As the feature value of curves has varying length, it is not practical to transform them into points in a higher dimensional space. So we only discuss quadtrees and k-d-trees, 47 CHAPTER 4: Feature Organization and Retrieval as they are possible candidates solution to our problem. 4.1.1 Quadtrees Quadtrees have been used as multi-dimensional indexing structures for large data-bases. The quadtree is a multidimensional generalization of a binary tree, which utilizes the grid file mechanism. The approach of quad-tree is to partition the object space into various quadrants and perform multi-key comparisons at each level of the tree. Each node of a k-dimensional quad-tree partitions the object space into k quadrants. A F CD • C : ' m D GOD 0 • • • - F -. . j . . , . : E d f - c-Figure 4.1 Quad tree illustration (Boxes are null subtrees) In Figure 4.1, point A separates the space into 4 subspaces, corresponding to the 4 branches under A in the tree. As the southwest part contains no data, there is a null point. Point B, D and E fall into same part, so a sublevel of quadrants is partitioned, so on so forth. Quadtrees have number of limitations. First, there are number of null links that use 48 C H A P T E R 4: Feature Organization and Retrieval space resources; second, at each node all the keys have to be tested during data operation; and third, the node sizes tend to be large-for a k dimensional space, each node has 2k chil-dren. 4.1.2 K - d trees The k-d tree is a multi-dimensional binary search tree. In a basic k-d structure, each node of the tree consists of a "record" and two pointers. The pointers are either null or point to another node in the k-d tree. Nodes have levels and each level of the tree discriminates for one attribute. The partitioning of the space with respect to various attributes alternates between the various attributes of the n-dimensional search space. Al l the nodes at the same level have the same discriminator. A discriminator is one of the attributes along which the object space is partitioned. Given a node P of the k-d tree and assuming the discriminator at P's level is attribute A and the A-attribute value of P is KA(P), then all the descendant nodes of P can be partitioned into two subspaces. Al l the records in one subspace will have A-attribute values that are less than KA(P), where as the other subspaces will have records whose A-attribute values will be greater than KA(P) 49 CHAPTER 4: Feature Organization and Retrieval G(10,60) B(10,70) E(40,85) D(25,20) A(50,50) F(70,85) Discriminated 0 0 C(80,85) Figure 4.2 2-d tree illustration The number of levels in a K-d tree can become quite large. With binary structures, the search tree can also become very high and search becomes expensive. Both quadtrees and k-d trees are designed to store points. Geometric shapes must be converted to points using some transformation technique if we want to use quadtree or k-d tree. For example, by using the upper and lower bounds in each dimension, a k-dimen-sional rectangle can be identified by a point in a 2k-dimensional space. ,R-trees store the Minimum Bounding Rectangle (MBR), that is the smallest rectangle (can be multi-dimen-sional) that covers the geometric shapes. We choose to use R-Trees because they provide considerable performance advantages for spatial queries. In the following sections, we give our implementation of R-trees and specialize it to better fit our problem. 50 C H A P T E R 4: Feature Organization and Retrieval 4.2 Review of standard R-Trees R-trees were proposed as a natural extension of B-trees in higher than one dimen-sions. They have the nice features of B-trees and quadtrees. Like B-trees, they remain bal-anced, and like quadtrees, they have the flexibility of dynamically adjusting their grouping to deal with either deadspace or dense areas. The decomposition used in R-trees is dynamic, driven by the spatial data objects. With appropriate split algorithms, if a region of an n-dimensional space includes dead-space, no entry in the R-trees will be introduced. Each node of an R-tree can have entries no greater than some M nor less than some m<=M/2. Leaves contain the objects or pointers to the objects and MBR of the objects. Non-leaf nodes contain the MBR, the Minimum Bounding Rectangle that covers all the MBRs of their children and pointers to their children. R l R2 R4 R6 R3 Figure 4.3 Data (dark circles) organized in an R-tree with fanout=3 In Figure 4.3, the dark circles stand for objects. They are grouped under R5; R4,R5 and R6 are grouped under R3 , and so on and so forth, to get a tree as Figure 4.4 51 CHAPTER 4: Feature Organization and Retrieval I R11 R2 I R~l O IR4 R5I R6I I I I I Figure 4.4 The structure for the R-tree in Figure 4.3 4.2.1 Properties of R-trees Let M be the maximum number of entries that will fit in one node and let m<=M/2 be a parameter specifying the minimum number of entries in a node. An R-tree satisfies the following properties: 1) Every leaf node contains between m and M index records unless it is the root. 2) For each index record (tuple-identifier, I) in a leaf node, I is the smallest rectan-gle that spatially contains the n-dimensional data object represented by the indicated tuple. 3) Every non-leaf node has between m and M children unless it is the root. 4) For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangle that spatially contains the rectangles in the child node. 5) The root node has at least two children unless it is a leaf. 6) . Al l leaves appear on the same level. 52 CHAPTER 4: Feature Organization and Retrieval 4.2.2 Data operation Insert: Inserting index records for new data tuples is similar to insertion in a B-tree. First the appropriate position for the records is located, then new index record is added to the node. If the children number of the node overflows, the node is split and the size of its rect-angle is adjusted. Splitting and adjusting propagate up the tree to the root. Delete: The index records are deleted from the tree, and size of the rectangle of its parent node is adjusted, if the number of a node's children is less than minimum m, the node is removed from the tree and its children are reinserted. The adjustment propagates up the tree. If only one node is left at level n-1, the tree is downgraded. 4.3 Apply R-trees to feature values of curves Our indexing structure organizes a collection of tuples representing video sequences. Each tuple has a unique integer identifier that can be used to retrieve it. Leaf nodes contain index record entries of the form: (tuple-identifier, I, object-pointer), where tuple-identifier refers to a tuple in the database and / is a 2 dimensional rectangle which is the MBR of the feature vector indexed. Suppose feature value of a trajectory is: [(L],A,), (L2,A2), .... (LwAn)] I ~ (Lmin, Lmax, Amm, Amax) Lmin = minimum/LJ, i — 1, 2, n Lmax = maximum/LJ, i — 1, 2, n 53 C H A P T E R 4: Feature Organization and Retrieval Amin = minimumfAi}, i = 1, 2, n Amax — maximum{AJ, i — 1, 2, n where L(- indicates the relative length parameter of feature value and A,- indicates the angle parameter of i feature value, n is the length of the feature value linked list. Non-leaf nodes contain entries of the form: (tuple-identifier, I, child-pointers) where child-pointers are the addresses of lower level nodes in the R-tree and / is the MBR that covers all rectangles in the lower level nodes's entries. We use two R-trees to index the two attributes of motion: trajectory and speed curve separately. A video sequence is processed by motion tracking, extraction of feature points, calculation of feature values, and then produces two linked lists standing for the motion aspect of the video. Each attribute is inserted into one R-tree. In the following part, we discuss how to retrieve them when we need. 4.4 Query based on features We provide user with a QBE (Query By Example) graphics interface. The query result is: R = {r\(re RdD(r,q)<T)} where D(r,q) is the distance between an object r and query example q following the simi-larity measurement metric D. T is the maximum distance from the query example to any object in the query result. There is another parameter, a positive integer k, which specifies 54 CHAPTER 4: Feature Organization and Retrieval that maximum number of nearest neighbors returned in the query result. If the query retrieves more than k objects, the nearest k objects are reported. Query processing includes extracting the feature value of query example q, search-ing the indexing trees and presenting of the results. Feature extraction is the same as we described in Chapter 3, and presentation of the result is designed into user interface. In this part, we present the design of searching strategy. To minimize the searching time, we use hierarchical searching to eliminate the objects in the database first, and only use the expen-sive similarity measurement metric when necessary. 4.4.1 Hierarchical Searching Strategy When we need to decide the similarity of two feature value sets, the similarity mea-surement metric can be used. However, the similarity measurement metric is computation-ally expensive. We propose a hierarchical searching strategy, using the inexpensive method to reduce as much objects as possible, and so using the expensive distance computation only as necessary. First we search through the R-tree. With the R-tree indexing structure, we can com-pare the MBR of the objects. If two objects are similar, their MBRs overlap in most cases. So conversely, if the MBRs of two objects do not overlap, they don't resemble each other. Those objects whose MBRs do not overlap with the query example are eliminated after the first round scan. Candidates are listed for examination at the next level. At the second level, we check the angle attribute of feature values, because the 55 CHAPTER 4: Feature Organization and Retrieval angle attribute is integer, so computing it is simple compared with relative length. We extract the angle dimension of each feature value, both from the query example and the data in the database, and form vectors from them. Then we apply on them the one-dimen-sional distance measurement metric with wrap, which we described in Chapter 3. Only those candidates that have a distance less than a certain threshold are listed for next round scan. After that, by checking the length we can get the third level candidates, which are really similar with the query sample. The method we use is exactly the same as above, only difference is that this time we extract the length parameter instead of angle. Figure 4.5 Hierarchical Searching By hierarchical searching, most objects in the database are quickly eliminated from consideration by first inexpensive computations. Only the remaining objects are filtered progressively by more expensive representations. 4.4.2 Logical A N D of Query The query can be either based on trajectory or on speed or on the combination of 56 CHAPTER 4: Feature Organization and Retrieval them. When both of them are selected, a logical A N D is performed to get the final result. We provide the query choice in the user interface so that the user can decide which criterion is used to perform the search. If the query is based on trajectory, we use the trajec-tory indexing tree to start the first round scan, then get the candidate and perform the sec-ond and third round scan according to the algorithm described above. The same procedure is used when the query is based on speed curve. If the query is based on both speed curve and trajectory, we execute the query pro-cedure separately to get two result sets, then get their intersection as the result of our final query. User Query By Trajectory By Speed R l R2 Intersection Result 4.4.3 Experiments Figure 4.6 Query execution We implemented a video storage and retrieve system using Java. The system inte-grates the curve feature extraction, similarity measurement and R-tree indexing. The 57 CHAPTER 4: Feature Organization and Retrieval graphic user interface is as follows: Video Database File Operation Play Stop i Next Frame search Figure 4.7 Graphics user interface The main part of the interface is 9 tabs. The trajectory and speed tabs are designed for the user to draw trajectory and speed curve freely. The PlayByFrame tab can load video and play it. The user can play the video frame by frame to get the position of an object in each frame so as to create a trajectory. The speed curve can be automatically calculated and 58 CHAPTER 4: Feature Organization and Retrieval drawn in the speed tab. The three video tabs and three curve tabs are used to show the retrieved result. In the trajectory and speed tab, the user can right click and activate a popup menu. The menu enables the user to rotate, scale, smooth the curve, set the threshold and get the feature points. It is mainly for experimental purposes.: Videos are stored as files while their trajectory and speed curve are stored in sepa-rate files. When each video is loaded, its trajectory and speed curve are analyzed and fea-ture information is inserted into the database. When processing a searching request, the system searches the index tree, gets the result and displays the video and trajectory at video tab and curve tab. As this is only a test bed for our algorithms, only 3 results at most will be displayed. Following is an experiment on real video: We digitized a video shot of 100 frames, played it by frame and found the tennis ball's'position in each frame. The following four frames show the trajectory that we get by manually marking, the reduced one by connecting the feature points, the speed curve that we get by calculation, and the reduced speed curve by connecting the feature points of it. For each video that we are going to put into the database, we can repeat the proce-dure, then insert it into database. When the user wants to search the database, he can draw a curve manually and 59 CHAPTER 4: Feature Organization and Retrieval search, or load a piece of video, manually enter the track and use it as a sample to search. : (a) Trajectory of the tennis ball 6 0 CHAPTER 4: Feature Organization and Retrieval Play Stop Next Frame :!•; Trajectory 'y\ Speed search (b) Connected feature points of the trajectory 6 1 CHAPTER 4: Feature Organization and Retrieval P l a y Stop Next F rame k* Trajectory • S p e e d s e a r c h (c) Calculated speed curve of the tennis ball 6 2 CHAPTER 4: Feature Organization and Retrieval Play Stop Next Frame s e a r c h (d) The connected feature points of speed curve Figure 4.8 Test on tennis video(I) 6 3 CHAPTER 4: Feature Organization and Retrieval Same test on another tennis video sequence: ~~~ —n-Uli  in * ^ * . 411 I '< if k* / / / / / / ,/ / / A 4 / (a) Manually pointed trajectory (b) Computed speed curve and feature points Figure 4.9 Test on tennis video(II) 64 CHAPTER 4: Feature Organization and Retrieval We also test our search strategy on a curve set. We manually entered following pairs of curves standing for trajectories and speed curves, as well as their corresponding video sequence. Their feature values are calculated and inserted into the indexing tree with the file information of curves and video sequences. HHHI tal ta2 ta3 tbl tb2 tb3 tcl (a) A small set of trajectory database 65 Chapter 4 Feature Organization and Retrieval tal HHHHHII ta2 • n ta3 / \ HHHBHHBHHBBIIIII • tbl tb2 tb3 tcl tc2 tc3 (b) The corresponding speed curves: (c) A query example Figure 4.10 Experiment on search strategy We set the threshold of similarity to 300, the query based on trajectory retrieves the first row, that is {tal, ta2, ta3}; and the query based on speed curve retrieves the middle col-umn, that is {ta2, tc2, tb2}. The final result reports the intersection of them, which is ta2, the 66 Chapter, 4 Feature Organization and Retrieval corresponding trajectory and video sequence are loaded and shown. 4 . 5 Conclusion In this chapter, we discussed multi-dimensional indexing methods. The abstract fea-ture values of trajectory and speed curves are organized within the R-tree structure. A QBE interface is provided, and hierarchical searching strategy is proposed and implemented. The experiments show that the indexing structure works well with our feature vectors. 67 Chapter 5 Summary and Future Work 5.1 Summary This thesis investigates indexing video by motion information. We identified several problems with previous work: poor features, inadequate indexing, and expensive comparison techniques. We proposed to use trajectory and speed curves to describe motion. Feature extraction on these curves is examined, and an affine transformation insensitive algorithm is proposed. To examine whether the feature values can be indexed efficiently, we implemented an R-tree indexing structure, and specialized it to fit our case, we also proposed hierarchical search strategy to reduce computation of comparison. We implemented a framework as a testbed for the algorithm above. Test results have been discussed. 5.2 Future Work The design and implementation of the framework associated with this work concen-trated more on testing than on the production of a usable system. This section discusses sev-eral modifications that would be required to put the work in use. 5.2.1 Efficiency Improvement R-tree has several improved variants, for example, R*-tree and R+-tree. R*-tree i 68 CHAPTER 5: Conclusion and Future Work [Beckmann 90] enhanced the original R-tree in two major ways. First, it added the perimeter of the bounding regions as an important factor to the heuristics for node splitting. Second, it introduced the notion of forced reinsert to make the shape of the tree less dependent on the order of the insertion. R+-tree [Selllis 87] imposes the constraint that no two bounding regions of a non-leaf node can overlap. Thus, except for the boundary surfaces, there will be only one path to every leaf region, which can reduce search and join costs. 5.2.2 Group of Curves In some video, we only care about the motion of one object. Under some conditions we may care about two objects moving in the same video shot. Our tests were based on the assumption that there is only one object moving in the video sequence. A metric for similarity measurement of a group of curves can be developed from our result. In the case of two objects moving, we may care more about one object than another; depending on the demands of the user, we can adjust the weight of the two curves. Let object A have speed curve sa, trajectory curve ta, and object B have speed curve si,, trajectory curve fy,, and the weights for them are waand wb. So the similarity of the move-ment of A and B s(a,b) is as follows: S(a, b) = wv x d(sa, sb) + wt x d(ta, tb) where 0 < Wj < 1 , 0 < w, < 1 and ws+wt = 1 5.2.3 Integrate with Motion Analysis In our work, trajectory curves are drawn through an interface for the purpose of test-ing. They can be generated by motion analysis. Motion analysis is an active field in computer 69 C H A P T E R 5: Conclusion and Future Work vision and much progress has been achieved. For example, Akutsu [Akutsu 94] proposed a tracking method that is suitable for video analysis and indexing. 5.2.4 Integrate with Video Database This work only investigates how to utilize motion to index video. In fact, video is a multi-channel multimedia sequences containing sound, images and captions. Both spatial and temporal information is important parts. So the analysis of video should embrace all aspects of video in order to be complete. There are attempts to process as much information as possible. Our work can be a part of systems such as Virage video engine [Hamrapur 97] and make it more versatile. 70 Bibliography [Akutsu 92] A.Akutsu, Y.Tonomura, H.Hashimoto and Y.Ohba, "Video indexing using motion vectors", Proc. SPIE: Visual Communication and Image Process-ing'92 1818, ppl522-1530, 1992 [Akutsu 94] A.Akutsu and Y.Tonomura, "Video tomography: An efficient method for camera work extraction and motion analysis", ACM Multimedia 94, pp349-356 [Ansari 91] Nirwan Ansari and Edward J. Delp, "On Detecting Dominant Points", Pat-tern Recognition, Vol.24, No.5, pp441-451, 1991 [Barron 92] BarronJ.L., Fleet, D.J., Beauchemin, S.S. and Buikitt, T.A., "Performance of Optical Flow Techniques", Proceeding of Computer Vision and Pattern Recognition. pp236-242, 1992 [Bentley 75] Jon Louis Bentley, "Multidimensional Binary Search Trees Used for Asso-ciative Searching", Communications of the ACM, Vol.18, No.9, Sep. 1975, pp509-517 [Beckmann 90] Norbert Beckmann et al., "The R*-tree: An efficient and robust access method for points and rectangles", Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp322-331 [Chang 1991] C.C.Chang, S.M.Hwang and D.J.Buehrer, "A Shape Recognition Scheme Based on Relative Distances of Feature Points from the Centroid", Pattern Recognition, Vol.24, No . l l , ppl053-1063, 1991 [Chang 95] S.F.Chang and J.R.Smith, "Extracting multi-dimensional signal features 71 for content-based visual query", SPIE Symposium on Visual Communica-tions and Signal Processing, 1995 [Chang 97] Shih-Fu Chang, William Chen, Horace Meng, Hari Sundaram, and Di Zhong, "An Automated Content-Based Video Search System Using Visual Cues", Columbia University/CTR Technical Report, CTR-TR #478-97-12. [Chen 86] H.H.Chen and J.S.Su, "A Syntactic Approach to Shape Recognition", Pro-ceeding of International Computer Symponium, Taiwan, ppl03-122, 1986 [Cohen 95] F.S.Cohen, Z.Huang and Z.Yang, "Invariant matching and identification of curves using B-splines curve representation", IEEE Trans. Image Process-ing, Vol.4, ppl-10, 1995 [Dimitrova 94] N.Dimitrova and F.Golshani, "Rx for semantic video database retrieval", ACM Multimedia 94, pp 219-226 [Finkel 74] R. A. Finkel and J. L. Bentley, "Quad Trees: A Data Structure for Retrieval on Composite Keys", Acta Informatica, Vol.4, ppl-9, 1974 [Flickner 95] M.Flickner, H.Sawhney, W.Niblack, J.Ashley, Q.Huang, B.Dom, M.Gor-kani, J.Hafner, D.Lee, D.Petkovic, D.Steele, PYanker, "Query by Image and Video Content: The QBIC System", IEEE Computer Magazine, Vol.28, No.9, pp23-32, Sep. 1995 [Freeman 78] Herbert Freeman, "Shape Description via the Use of Critical Points", Pat-tern Recognition, Vol.10, ppl59-166, 1978 [Graham 72] R.L. Graham, "An efficient algorithm for determining the conves hull of a finite palar set", Information Processing Letters 1, ppl32-133, 1972 [Guttman 84] Antonin Guttman, R-trees: "A dynamic index structure for spatial search-72 ing", P r o d e e d i n g s of t h e 1 9 8 4 A C M S I G M O D I n t e r n a t i o n a l C o n f e r e n c e o n M a n a g e m e n t of D a t a , pp47-57, 1984 [Hamrapur 97] A. Hamprapur, A.Gupta, B.Horowitz, C.F.Shu, C.Fuller, J.Bach, M.Gor-kani, R.Jain et al, "Virage Video Engine" SPIE P r o c e e d i n g s o n S T o r a g e a n d R e t r i e v a l f o r I m a g e a n d V i d e o D a t a b a s e s V, pp 188-197, San Jose, Feb. 97 [Hampapur 98] Arun Hampapur and Ramesh Jain, "Video Data Management Systems: Metadata and Architecture", M u l t i m e d i a D a t a M a n a g e m e n t , McGraw-Hill Company, NY, pp245-286, 1998 [Idris 97] F. Idris and S. Panchanathan, "Review of Image and Video Indexing Tech-niques", J o u r n a l of Visual C o m m u n i c a t i o n a n d I m a g e R e p r e s e n t a t i o n , Vol.8, No.2, June, ppl46-166, 1997 [Ioka 92] ' M.Ioka and M.Kurokawa "A method for retrieving sequences of image son the basis of motion analysis", SPIE I m a g e S t o r a g e a n d R e t r i e v a l S y s t e m s , vol.1662, pp35-46, 1992 [Kreyb 1997] J. Kreyb, M.Roper, .P.Alshuth, Th. Hermes and O.Herzog, "Video Retrieval by Still Image Analysis with ImageMiner", S t o r a g e a n d R e t r i e v a l f o r I m a g e a n d V i d e o D a t a b a s e s , Vol.3022, pp36-44, 1997 [Lee 93] S.YLee and H. M. Kao "Video Indexing-An Approach Based on Moving Object and Track", SPIE S t o r a g e a n d R e t r i e v a l f o r I m a g e a n d V i d e o D a t a -b a s e s Vol.1908, pp25-36, 1993 [Lienhart 98] Rainer Lienhart, Wolfgang Effelsberg and Ramesh Jain, "VisualGREP: a systematic method to compare and retrieve video sequences", S t o r a g e a n d R e t r i e v a l f o r I m a g e a n d Video D a t a b a s e VI, ,pp271-282, 1998 73 [Marr 1978] D.Marr and H.Nishihara, "Representation and recognition of the spatial organization of three-dimensional shapes", Proc. Roy. Soc. London, B200, pp269-294,1978 [Mokhtarian 86] Farzin Mokhtarian and Alan Mackworth, "Scale-Based Description and Recognition of Planar Curves and Two-Dimensional Shapes", IEEE Trart. on Pattern Analysis and Machine Intelligence,Vo\. PAMI8, No. l , January 1986 [Pavlidis 77] T.Pavlidis, "Polygonal approximation by Newton's method", IEEE Trans. Computer, Vol.26, pp800-807, 1977 *[Pentland 96] A.Pentland, R.W.Picard, and S.Sclaroff, "Photobook: Content-Based Manipulation of Image Databases", International Journal of Computer Vision, Vol.18, No.3, pp233-254, 1996 John R. Smith, "Integrated Spatial and Freature Image Systems: Retrieval, Analysis and Compression", Ph.D. thesis, Columbia University, 1997 Emile Sahouria, "Motion Indexed Video", Master Thesis, Department of EECS, University of California, Berkeley, 1997 http://www-video.eecs.berkeley.edu/~emile/publications/ms/ms.html Timos Sellis, Nick Roussopoulos, and Christos Faloutsos, "The R-r-tree: A dynamic index for multidimensional objects", Proceedings of the 13th International Conference on Very Large Databases, pp507-518, 1987 Y. Shi, "Design and Implementation of a Content-based Video Retrieval System", Master Thesis, Department of Computer Science, University of British Columbia, 1997 J.R.Smith, S.F.Chang, "VisualSEEk: A Fully Automated Content-Based [Smith 97] [Sahouria 97] [Sellis 87] [Shi 97] [Smith 96] 74 Image Query System", ACM Multimedia Conference, Boston, pp87-98, Nov. 1996 http://www.ctr.columbia.edu/VisualSEEk [Shah 92] Mubarak Shah, Krishnan Rangarajan and Ping-Sing Tsai, "Motion Trajec-tories" [Toklu 99] Candemir Toklu and Shih-ping Liou, "Semi-Automatic Dynamic Video Object Marker Creation", Proc. SPIE Storage and Retrieval for image and Video Databases Vol 3656, ppl78-185,1999 [Verestoy 98] D.Chetverikov and J.Verestoy, "Tracking feature points: A new alg-orighm", Proc. International Conference on Pattern Recognition, ppl436-1438,1998 [Yi 1998] Byoung-Kee Yi , H.V.Jagadish, Christos Faloutsos, "Efficient Retrieval of Similar Time Sequences Under Time Warping", Proceedings of the Four-teenth International Conference on Data Engineering, pp20T-208, 1998 [Zhang 99] Tong Zhang and C C Jay Kuo, "Audio Guided Audiovisual Data Segmen-tation, Indexing and Retrieval", Proc. SPIE Storage and Retrieval for image and Video Databases Vol 3656, pp316-327, 1999 75 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items