Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Template-based sketch recognition using Hidden Markov Models Flor, Roey 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_spring_flor_roey.pdf [ 878kB ]
Metadata
JSON: 24-1.0051971.json
JSON-LD: 24-1.0051971-ld.json
RDF/XML (Pretty): 24-1.0051971-rdf.xml
RDF/JSON: 24-1.0051971-rdf.json
Turtle: 24-1.0051971-turtle.txt
N-Triples: 24-1.0051971-rdf-ntriples.txt
Original Record: 24-1.0051971-source.json
Full Text
24-1.0051971-fulltext.txt
Citation
24-1.0051971.ris

Full Text

Template-based Sketch Recognition using Hidden Markov Models by Roey Flor  Hon. B.Sc., The University of Toronto, 2003  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Computer Science)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) November 2010 c Roey Flor 2010 ⃝  Abstract Sketch recognition is the process by which the objects in a hand-drawn diagram can be recognized and identified. We provide a method to recognize objects in sketches by casting the problem in terms of searching for known 2D template shapes in the sketch. The template is defined as an ordered polyline and the recognition requires searching for a similarly-shaped sequential path through the line segments that comprise the sketch. The search for the best-matching path can be modeled using a Hidden Markov Model (HMM). We use an efficient dynamic programming method to evaluate the HMM with further optimizations based on the use of hand-drawn sketches. The technique we developed can cope with several issues that are common to sketches such as small gaps and branching. We allow for objects with either open or closed boundaries by allowing backtracking over the templates. We demonstrate the algorithm for a variety of templates and scanned drawings. We show that a likelihood score produced by the results can provide a meaningful measure of similarity to a template. An example-based method is presented for setting a meaningful recognition threshold, which can allow further refinement of results when that template is used again. Limitations of the algorithm and directions for future work are discussed.  ii  Table of Contents Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  v  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  2 Related Work  6  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3 Preprocessing Steps  . . . . . . . . . . . . . . . . . . . . . . . .  12  3.1  Stroke Generation . . . . . . . . . . . . . . . . . . . . . . . .  12  3.2  Polyline Approximation of Strokes . . . . . . . . . . . . . . .  14  3.3  Connectivity Graph . . . . . . . . . . . . . . . . . . . . . . .  15  4 Matching  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  18  4.1  Representing the Problem as an HMM  . . . . . . . . . . . .  18  4.2  Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . .  21  4.2.1  . . . . . . . . . . . . . .  22  . . . . . . . . . . . . . . . . . . . . . .  23  4.3  Scale and Angle Constraints  Correspondence Map  iii  Table of Contents 5 Results 5.1  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  Sketches  25  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  25  5.1.1  Multiple Objects in Complex Scenes . . . . . . . . . .  26  5.1.2  Match Ranking Within a Class . . . . . . . . . . . . .  27  6 Conclusions and Future Work . . . . . . . . . . . . . . . . . .  38  6.1  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .  38  6.2  Limitations and Future Work . . . . . . . . . . . . . . . . . .  39  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42  iv  List of Figures 1.1  Examples at each step in the process. Clockwise starting from the top-left: The input sketch, the processed straight-line segments of the input sketch, a mug template where the segments are ordered and labeled and a blow-up of the match found in the scene labeled with correspondences to the template. . . .  1.2  System pipeline overview. The input can be in the form of a sketch or directly input using a mouse or tablet. . . . . . . .  2.1  4  5  Example of the input to the method presented in [21]. The sketch is labeled with the order in which a set of strokes were drawn. Reprinted with permission from Elsevier. . . . . . . .  2.2  9  Example of the recognition results produced by [27]. (a) and (b) are an example of the input to the method, (c) is the template used and (d) shows the computed correspondences. Reprinted with permission from author. . . . . . . . . . . . .  10  v  List of Figures 3.1  Examples of input images. On the left is a scene with several recognizable objects. On the right is a series of random lines which we can use to test how likely a template can be matched.  3.2  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  An example of a possible segmentation of a stroke into a set of straight-line segments.  3.3  13  . . . . . . . . . . . . . . . . . . . .  14  Segments are connected in the graph based on a distance measure from a segment’s endpoint. In this case, segment A will be connected to segments B and C as they are within the outer radius. The edge AB will have cost 1 whereas the edge AC will have a cost c ∈ [0, 1] based on the actual distance.  4.1  .  16  . . .  19  The HMM trellis. A matching can be represented as the optimal path through the trellis consisting of hidden states (the image segments) and observations (the template).  5.1  The three closed templates. The plane template has 38 segments labeled 0-37, the car template has 46 segments labeled 0-45 and the mug has 24 segments labeled 0-23. All three templates also have a closed-loop constraint.  5.2  . . . . . . . . .  29  The three open templates. The letter A template has 6 segments labeled 0-5, the car bottom template has 25 segments labeled 0-24 and the stick figure has 12 segments labeled 0-11. Notice that the stick figure draws over itself and thus allows us to find more varied objects by actually making them closed boundary templates. . . . . . . . . . . . . . . . . . . . . . . .  30 vi  List of Figures 5.3  A scene with multiple objects.  . . . . . . . . . . . . . . . . .  5.4  Labeled output for matching of the car template.  5.5  Labeled output for matching of the plane template.  31  . . . . . .  31  . . . . .  31  5.6  Labeled output for matching of the mug template. . . . . . .  32  5.7  Labeled output for matching of the letter A template. . . . .  32  5.8  Labeled output for matching of the car bottom half template. 33  5.9  Labeled output for matching of the stick figure template. . .  33  5.10 The segments for a sketch consisting of a randomly drawn set of lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  34  5.11 Labeled output for matching of the car template to the random scene. The second image shows the outline of the output to make it clearer. . . . . . . . . . . . . . . . . . . . . . . . .  34  5.12 Labeled output for matching of the plane template to the random scene. The second image shows the outline of the output to make it clearer. . . . . . . . . . . . . . . . . . . . .  35  5.13 A regular grid. . . . . . . . . . . . . . . . . . . . . . . . . . .  35  5.14 Labeled output for matching of the car template to the grid. The second image shows the outline of the output to make it clearer.  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  36  5.15 Labeled output for matching of the plane template to the grid. The second image shows the outline of the output to make it clearer.  . . . . . . . . . . . . . . . . . . . . . . . . .  36  vii  List of Figures 5.16 The 8 cars in descending order of the probability of matching to the car template. This provides a ranking for future car sketch objects. The values in the table are normalized so that the sketch that returned the highest probability has a value of 1 and the worst match gets 0. . . . . . . . . . . . . . . . .  37  5.17 A scene containing 7 potential car objects. When using a threshold of 0.4, as per the scale computed earlier, only 4 objects are considered cars. . . . . . . . . . . . . . . . . . . .  37  viii  Acknowledgements I owe my deepest gratitude to my supervisor, Michiel van de Panne, who gave me the guidance and support to finish my thesis. I would like to thank David Lowe for being my second reader and for useful comments and suggestions. I am grateful to all of those who supported me during the completion of the thesis. Many thanks go to my parents and my siblings Lior and Shirey for their continual support and trust.  ix  Chapter 1  Introduction The ability for computers to recognize objects in an image is a challenging goal of computer vision. We want to be able to interpret an image by finding and identifying objects in it. Similarly, we want to be able to recognize objects in a drawing, whether it be a sketch or a simple diagram. Drawings and sketches tend to be much sparser in the information they provide than full images, consist purely of lines and have little or no texture and shading. Structured diagrams often have a standard language of symbols, limiting what can be drawn. There are many possible applications for sketch recognition. Sketching can provide improved interfaces for applications by adding features such as: the ability to draw a small diagram of a process, to draw a quick picture or scene to express an idea, being able to use small symbols as shortcuts to run applications or to perform specific modeling tasks such as drawing a rough sketch of a road or terrain and being able to interpret it to create a full 3D model. Identifying an object in a sketch or image is difficult because we do not know what the object may look like in the image. In order to identify a certain class of objects, we must find some characteristic of that type of  1  Chapter 1. Introduction object which we can use for comparison. In this work, we focus on using the shape of an object as the main feature by which we classify an object. Specifically we use the object’s boundaries, whether they be the external silhouette of the object to its surroundings or boundaries around details of the object itself. Many classes of objects have a unique shape and can therefore be recognized by shape. It is not always necessary to know the context of the scene or the colour and texture of a car seen from the side in order to know that it is a car. This is especially true of hand drawn sketches, as they often lack this type of additional information. We therefore model object recognition in a scene by finding a shape in the scene that conforms to a known template shape we have for a given type of object. Our template-based approach consists of several steps: (a) the creation of template shapes; (b) developing a shape based distance measure given a set of correspondences between a template and a candidate shape and; (c) exploring the space of correspondences to find the best shape. Both the template and drawn input are modeled as poly lines, formed from either the points in a scanned image or from direct input. The sequence of segments in the template model is then used to perform a search through all the possible shapes found in the input. The method we use consists of finding a correspondence between the boundary segments of the template to the segments of the input sketch by matching the connectivity and orientation of the segments. We implement the method using Hidden Markov Models, which allows us to use an efficient dynamic programming approach to perform the search. Therefore, given a template of an object, we can find an un-occluded instance of that object at any scale in the input sketch with 2  Chapter 1. Introduction computation times of less than 1 second on average. Figure 1.1 shows examples of this process at different steps. The input given may be a scene containing multiple objects. For any object that the user wishes to recognize in the input, a template must be given, either created by the user or chosen from a set of pre-existing templates. In Figure 1.1 we show an example of a mug template which was then used to recognize the mug that was present in the input scene. The labeled result of the recognition is found in the bottom-left corner of the figure. In order to recognize multiple objects, individual templates for each object must be provided and the process may be applied to each template. Figure 1.2 gives an overview of the entire process. The remainder of this thesis is structured as follows. Chapter 2 discusses relevant related work on object recognition and sketch interpretation. Chapter 3 details the method we use to process the template and user input to convert it into the final form we will use for the matching. Chapter 4 details the HMM model we use in order to perform the matching itself, as well as modifications we have made to improve the efficiency and accuracy of the results. Chapter 5 describes experiments we conducted on different forms of input and their results. Chapter 6 provides conclusions and a discussion of future possibilities for our work.  3  Chapter 1. Introduction  Figure 1.1: Examples at each step in the process. Clockwise starting from the top-left: The input sketch, the processed straight-line segments of the input sketch, a mug template where the segments are ordered and labeled and a blow-up of the match found in the scene labeled with correspondences to the template.  4  Chapter 1. Introduction  Figure 1.2: System pipeline overview. The input can be in the form of a sketch or directly input using a mouse or tablet.  5  Chapter 2  Related Work Sketch recognition is the process by which a series of strokes given as input by a user are labeled with what they are meant to represent, as represented by a model or template. Sketch recognition is used in various applications such as modeling diagrams [1, 11, 15] and handwriting recognition [25, 18]. The methods used are similar to those used in image based approaches for object recognition, but sketches provide some advantages and disadvantages over standard images. Being made up of drawn lines, sketches do not have any colour or texture information which can be used to aid in matching or labeling. A hand drawn sketch can also be “messy” in that the user’s strokes may not be continuous. For example the user may have lifted the pen while drawing or the user may have failed to line up parts of an object that should be solid, such as not closing circles or mis-aligning joints or intersections of multiple strokes. Users may also add unnecessary strokes or include extraneous detail. These issues must be addressed in order to correctly recognize a sketch [17]. If a sketch is input using a tablet based device, it is possible to obtain temporal information about the sketch. If the time at which each stroke in a sketch was drawn is known, this can be helpful in recognizing objects  6  Chapter 2. Related Work as well as learning how a particular object is likely to be drawn. The way people draw a sequence of strokes for many objects is often consistent, thus allowing recognition to be done based partly upon the order and timing of the strokes [20, 26, 21]. The domain in which the objects we are attempting to recognize are in can be used as well. The objects that are drawn in a URL diagram or an engineering sketch tend to have specific meaning in that context. Alvarado at al. [2] used this context information combined with a bottom-up feature based language for describing objects to recognize sketches. These methods can produce accurate results in domains that have a specific structure and grammar that they must follow (like diagrams or charts) but require the creation of a language for describing the different objects it can recognize. An example of this type of sketch is shown in Figure 2.1. Alvarado et al. [3] and Yu et al. [28] attempt to recognize sketches by taking each individual stroke given by the user and determining the most likely object it belongs to, incrementally updating the interpretation. These methods still require the recognition to be done as the user is providing input as opposed to providing a completed sketch as input. Given a hand drawn sketch, we have no information on the ordering of the strokes, their direction, nor the time it took to draw them. In these cases, the objects in the sketch can be treated as shapes which must be matched to a known shape (template). In general, the goal of shape matching is to take a shape and, given some measure of similarity, compare it to another shape and determine how closely related they are. In order to determine how well two shapes match, it is necessary to specify what properties of the shapes 7  Chapter 2. Related Work we will measure and exactly how they will be measured. These issues have been approached in a number of different ways. When defining the shape of an object we may use any number of properties it has, including its points, segments, curvature, orientation and size. Better results can be achieved by enhancing some of these properties. Belongie et al. [5] introduced Shape Contexts, which are a point-based property which encode information about the position and orientation of the points in a shape relative to each other in a histogram. Given these properties, a similarity measure can be computed to match two shapes. One such distance measure, computes the minimum sum of distances over all the possible one-to-one correspondences between points on the two shapes. In the case where the shapes have differing numbers of points, due to the way they were defined, or some other form of noise, a more involved distance measure can be used such as the Hausdorff or Frechet distance. Different measures can be computed by adding more properties of the shapes. Including the orientation of points or segments allows for the measurement of the relative angles and curvature between the shapes. Certain features of objects can be more recognizable to humans, for example parts that do not change between different views of an object. These can be used to determine better features with which to perform the matching [12]. The contour of a shape can be used to define a model of an object. A detailed descriptor of a shape can be made by distinguishing features based on curvature and relative lengths of segments. A search can then be performed through the shapes in an input to match against the object we want to recognize [6]. Yang et al.[27] attempt to search through sets of 8  Chapter 2. Related Work  Figure 2.1: Example of the input to the method presented in [21]. The sketch is labeled with the order in which a set of strokes were drawn. Reprinted with permission from Elsevier.  point correspondences based on a curve-matching metric. An example of the results can be seen in Figure 2.2. These search based methods work on a wide variety of objects but the cost of the search tends to be fairly high for more complex templates and sketches. In our work, we implement an efficient search method including optimizations based on the use of sketches. It is also possible to build feature graphs which can be matched against the corresponding graphs of known objects [23]. Ferrari at al. [9] construct a graph based on contour segments, which they use to perform a greedy search to determine a matching between paths through the graphs of an image and object template. Our work takes a similar approach by matching line segments along the contour of an object to calculate the cost of a match, but we develop a method using HMMs rather than performing a graph search. The greedy based approach is not guaranteed to find the global optimal solution while in our method we use a dynamic programming algorithm whose solution is optimal. 9  Chapter 2. Related Work  Figure 2.2: Example of the recognition results produced by [27]. (a) and (b) are an example of the input to the method, (c) is the template used and (d) shows the computed correspondences. Reprinted with permission from author.  Contours can also be directly matched to each other. By applying transformations to the template of a known object it is possible to deform the template to more closely match the input [13, 16, 4]. Our method also makes use of the entire contour of the object, but work such as [10, 14, 19] match objects using local features extracted from the contour and then perform a search for the minimum cost between the objects . Ferrari et al. [8, 7] find points along the contour of a shape and combine adjacent points into series of unique line segments, which can then be used as local features to define  10  Chapter 2. Related Work a model for object recognition.  11  Chapter 3  Preprocessing Steps In this chapter, we present the necessary preprocessing steps required for the input sketch and template in order to be able to perform matching. The final output is a set of straight-line segments and their connectivity. If the input is an image, we extract a set of poly-line strokes. These poly-line strokes can then be further reduced to the set of straight-line segments we need for the final matching process. If the sketch was input using a tablet type device, we can directly begin the preprocessing using the strokes it provides. This chapter is structured as follows. Each section in this chapter will deal with a step required during the segmentation process, beginning from generating a set of strokes from the input, segmenting the strokes into polylines and finally building a connectivity graph based on those polyline segments.  3.1  Stroke Generation  Given the large number of points in a sketch, the next step is to generate lines or curves which fit these points and produce a simplified representation. By doing this, we can assign to groups of consecutive points a common 12  3.1. Stroke Generation  Figure 3.1: Examples of input images. On the left is a scene with several recognizable objects. On the right is a series of random lines which we can use to test how likely a template can be matched. direction as well as resolve ambiguities at intersections of curves to encourage continuity. This is accomplished through the following sequence of steps. Once we have a set of points, we want to get a reasonable sampling so that we can assign each point to a stroke. We reduce the initial set of points through a process of morphological erosion and pruning [24]. Each of the remaining points is then assigned an orientation which is used to maintain continuity along the strokes. The orientation is computed by modeling a spatial Gaussian distribution over the neighboring points. Principal components analysis (PCA) is then applied to the covariance matrix of the distribution and the direction of the principal eigenvector is used as the orientation. Given this set of points, we begin to build strokes. For every point that does not already belong to a stroke, we grow a stroke in both directions by finding and adding to the stroke points that meet three conditions: (1) If the point is less than d pixels away; (2) is located in the right direction  13  3.2. Polyline Approximation of Strokes  Figure 3.2: An example of a possible segmentation of a stroke into a set of straight-line segments.  (within α degrees of the expected direction for finding further points, as based upon the orientation of the last point added to the stroke); and (3) has an associated orientation that is compatible with that of the last point, as measured by the difference in their orientations. The end result is that we are left with a set of strokes S, where each stroke Si = P1 ...Pn is defined as an ordered set of 2d points.  3.2  Polyline Approximation of Strokes  The next preprocessing step takes possibly long strokes and approximates them with a sequence of straight-line segments. Each straight segment will correspond to a part of a stroke where the orientation is roughly equal. This is done by starting at the first point on a stroke and calculating the orientation of each pair of consecutive points. Once the orientation of a pair of points deviates more than β degrees from the orientations of the 14  3.3. Connectivity Graph previous points, a new segment is created from the starting point and the current point. We use β = 15◦ in our implementation. This is continued over the entire length of the stroke, for all strokes. The entire image is thus converted into a series of straight-line segments Ii = (P1 , P2 ) given by their two endpoints. The orientation of a segment is computed by finding the angle between the line made by the two endpoints and the x-axis. The orientation will be different depending on which direction the segment is drawn (whether from P1 to P2 or vice versa). This may cause problems, as the orientations of adjacent segments may not align, since the user is free to draw strokes in any direction they choose. In order to fix this issue, for every segment we find, we create a duplicate segment, which has the same endpoints, but an opposite orientation. One consequence of doing this is that it becomes possible to backtrack over the same segment, since the same segment in an image has two separate representations, one for each direction.  3.3  Connectivity Graph  Small breaks may have occurred in what should be a fairly continuous and smooth outline of an object in the image. In order to recover any lost connectivity and to preserve the stroke continuity, we can build a connectivity graph over the segments of the image. Doing this will also provide us with a simple way to define contours in an image as paths through the graph with associated costs. The graph G = (V, E) is constructed where the nodes, V , correspond  15  3.3. Connectivity Graph  Figure 3.3: Segments are connected in the graph based on a distance measure from a segment’s endpoint. In this case, segment A will be connected to segments B and C as they are within the outer radius. The edge AB will have cost 1 whereas the edge AC will have a cost c ∈ [0, 1] based on the actual distance.  to segments I. An edge will exist between two segments if one of their endpoints is within some radius Router of an endpoint of the other. Each edge has an associated cost in [0,1], where the cost is 1 if the endpoints are within Rinner (where Rinner < Router ) and otherwise, the cost is given by Router −d Router −Rinner  where d is the distance between the endpoints (see Figure 3.3).  This cost is used in the matching process, described in more detail in the next chapter. The graph can then be used as a representation for the input image or sketch, or, equivalently, the template. Any path through the graph corresponds to following a curve (possibly jumping over small gaps) through the lines of an image. Thus, doing a comparison between the costs of two paths from the graphs constructed from separate images corresponds to a form of  16  3.3. Connectivity Graph contour matching.  17  Chapter 4  Matching In this chapter, we detail our representation of the template matching problem. We begin with a description of the core method developed to perform the matching. We then describe several improvements and modifications to the basic HMM model that we used to obtain better results. Given two graphs, one for a known template and one constructed from a user sketch, the goal is to find the path through the image graph that best matches the template graph. We cast the problem in terms of an HMM, wherein the image graph nodes are hidden states and the template nodes are observations. The problem is then converted into that of finding the mostlikely sequence of hidden states that could produce the given observations (see Figure 4.1). This can be solved by using the Viterbi algorithm, which is a form of dynamic programming that finds exactly this sequence in a HMM.  4.1  Representing the Problem as an HMM  An HMM consists of emission probabilities, which are the likelihood that a hidden state will produce a given output, and transition probabilities, which are the probability of going from one state to another. The transition probabilities in our model are directly proportional to the 18  4.1. Representing the Problem as an HMM  Figure 4.1: The HMM trellis. A matching can be represented as the optimal path through the trellis consisting of hidden states (the image segments) and observations (the template). edge costs in the sketch graph, i.e. , the probability of a transition from one state to another is given by the cost of the edge between those two states in the graph and a probability of 0 if the edge does not exist. In our case, a transition from a state to its direct neighbors has the same probability as a transition from that state to itself. This is done because the density of segments in the template can be higher than that in the sketch and it may be necessary to match multiple template segments to a single sketch segment. By allowing transitions from a state to itself we may continue along the template while remaining in the same state on the sketch. We expect the matrix of transition probabilities will be fairly sparse, since the only transitions we allow are those between neighboring nodes in a graph. This sparsity allows for a significant increase in the efficiency of the matching, as we can safely ignore segments that are not connected as being impossible transitions. 19  4.1. Representing the Problem as an HMM The emission probabilities are computed as a Gaussian distribution based on the orientation of the segments, thus:  P (Tj |Ii ) ∝ e(  θj −θi 2 ) σ  (4.1)  where Tj is a template segment j, Ii is a sketch segment i, θi and θj are the segment orientations and σ is the variance. The cost of a transition between two segments is then the product of the transition and emission probabilities, as shown in the algorithm below. The final cost of a matching between the paths in the sketch and the template is thus:  C(P, T ) =  ∏  trans(Pi , Pi+1 ) ∗ emis(Pi+1 , T j)  (4.2)  i,j  where Pi are the sketch segments in the final path and Tj are the template segments. We make the assumption that the probabilities are all independent. Once we have constructed the trellis, it is possible to use the Viterbi algorithm to perform a basic matching. The complexity of the general Viterbi algorithm is O(M N 2 ) where M is the number of observations (template segments) and N is the number of hidden states (sketch segments). The number of segments in the template is usually much lower than the number of segments in the sketch. Our biggest template has less than 80 segments. Given the limited connectivity of our trellis, our method has a complexity of O(M kN ) where k is the average connectivity of the graph G. The connectivity k can be bounded and so we can achieve a complexity of O(M N ) 20  4.2. Constraints In some cases, it may be necessary to include more information about the relationship between segments beyond that of just their connectivity. The next several sections go into more detail about modifications we introduce in order to improve on the model presented thus far. Algorithm 1 The Viterbi Algorithm {Given a Template T , Image I, transition matrix a[i, j] and an emission matrix b[i, t], create a path probability matrix V [i, t]} Initialize probabilities for t = 0 to size(T ) do for s = 0 to size(I) do for each transition i from s do cost = v[s, t] ∗ a[s, i] ∗ b[i, t] if cost > v[s, t + 1] then v[s, t + 1] = cost backtrack[i, t + 1] = s end if end for end for end for {To find best path, take the highest probability segment in the last column of v[] and backtrack using backtrack[]}. The cost of the path is the product of the costs in the path. {If we are using a closed path template, we can add the constraint that the final segment must equal the first segment of the optimum path}  4.2  Constraints  Several simple constraints may be added to the definition of the template for an object. If the template corresponds to a closed-loop outline of an object, we can add the constraint that the optimal path should be closed as well. This is done by making the transition from the final matched segment to the first segment in the path the only possible one (by setting all 21  4.2. Constraints other probabilities to 0). We can also ensure that specific template-image segment pairs must be in the final match by setting the likelihood of any other segment to be 0. Although the results returned by using the basic HMM we have described thus far will produce the optimal path, there are still several problems which appear in some cases. As a consequence of having both directions of every segment represented in an image graph, it is possible to suddenly reverse direction at any point while computing the optimal path. The ability to backtrack allows for flexibility of matching in an image. Template segments are ordered and continuously connected, therefore the template of an object which cannot be drawn in one continuous stroke may require backtracking. Any approximately parallel paths in an object can be a potential problem though, since the cost of backtracking over entire parts of an object may be lower than continuing along the path as it should have.  4.2.1  Scale and Angle Constraints  As we traverse along the template building the optimal path of segments, at each point we can keep track of the partial optimal path computed up to that point. We can therefore add a condition to any segment relative to any segments that are before it on the current path. While adding these conditions no longer guarantees that the globally optimal solution will be found, using several strategically placed constraints can minimize the adverse affects on the results. The model that we have been using is built by only taking into account the connectivity between segments and their orientations. There is no knowledge of length, scale or positioning of segments relative to the rest 22  4.3. Correspondence Map of the segments in a path. This is done because we want to allow for nonuniform scaling of the sketch object relative to the template. For example, if a car is drawn with a very tall roof, it will still be matched correctly by mapping multiple image segments to one template segment. This allows us more flexibility in the objects a template can match, but there are several issues we need to address. Problems can occur due to internal or background detail that is similar to parts of the actual object, but are of different scale than what is wanted. Another type of problem that the lack of scale or distance matching can cause is that the method has no sense of spacing. The same segment in the image can be reused to match different segments of the template. Adding more complex constraints can help resolve some of these issues. By constraining the matches for certain template segments to be a certain distance apart, or within some relative angle to each other, we can circumvent the above mentioned issues. Note also that by adding distance and angle constraints we also automatically prevent backtracking along the partial paths where the constraints are defined.  4.3  Correspondence Map  Once we have computed the optimal path of image segments which correspond to the template segments, we are left with a mapping from the template segments to the sketch segments, which we can use to recognize and label parts of the object in the image. The segment correspondence can also be used to determine the strokes that the segments were initially part of and thus recover the original points in the image which would represent  23  4.3. Correspondence Map our matched outline.  24  Chapter 5  Results We implemented our method in C++, creating a simple mouse and keyboard interface for inputting templates and scenes. The code was optimized in order to take advantage of the low connectivity of the graphs involved (see Section 4.1). Each of the templates we used in testing were drawn by us, using a mouse to specify each of the points of the segments. The scenes the templates were matched to were all hand-drawn on paper, then scanned into jpeg images to be used as input to our application. The computational times of all results were on average less than 1 sec, varying with the complexity of the scenes.  5.1  Sketches  For the closed boundary objects we used three templates for our testing, each template corresponding to a different object: A car, a plane and a mug (See Figure 5.1). We also included three open templates: The letter A, the bottom half of a car and a stick figure (See Figure 5.2).  25  5.1. Sketches  5.1.1  Multiple Objects in Complex Scenes  Our primary multi-object test scene includes one full instance of each of the template objects, as well as several other objects scattered throughout the scene and detail on the objects (windows on the car, wheels, etc...) which are not present in the bare template we use to do perform the matching. The scene was hand-drawn on a sheet of paper and was scanned to convert it into an image. The templates were created using a simple mouse interface to directly create the segments by placing a sequence of points on the screen. Each template was matched using the same scene given in Figure 5.3. The results for the matches with the car, plane and mug templates are given in Figures 5.4, 5.5 and 5.6. Similarly, the three open templates were tested on the scene and the results are as shown in Figures 5.7, 5.8 and 5.9. In all cases in which we tested, there was a clear correspondence from the template segments to individual or sequences of segments found in the scene. Looking at the results in the input labeled output, we see that the correct objects have been identified in the scene. Furthermore, each template segment correctly matches to a set of segments in the scene object, giving us detailed information on where each of the parts of the objects are located i.e, mug handles match to mug handles, car roofs match to car roofs, etc. In order to test the ability of the algorithm to find the best possible match in a given sketch, regardless of whether the object “exists” in the sketch, we used sketches which consisted of a set of randomly drawn lines (Figure 5.10) and a regular grid of straight lines (Figure 5.13). We then attempted to match the car and plane templates to these sketches to determine what  26  5.1. Sketches the best matches that the algorithm returned were. The results for the random drawing can be seen in Figures 5.11 and 5.12 while the results for the grid are in Figures 5.14 and 5.15. This behaviour may not be desirable in cases where these matches have high probability scores. We can reduce the instances of false-positive matches by considering an upper bound on the connectivity of sketch segments.  5.1.2  Match Ranking Within a Class  A side effect of using the Viterbi algorithm to perform the matching is that each match also returns a probability corresponding to how well the sequence of segments in the sketch matched to the segments in the template. By finding the relative ranking of several sketches of the same object using a single template and assigning a scale, we can then rank later sketches of the same object based on the probability returned by our algorithm. This can let us discard poor matches that occur when the object is coincidentally found in a series of randomly connected segments (for example the cases in Figures 5.11,5.12) We used 8 sketches of a car, making them all different from each other so as to provide a large range of possible car styles. In order to provide a reasonable starting point, one of the sketches was made to look as close as possible to the template we use for matching. The probabilities assigned to the 8 cars are provided in the table. Based on the table results, we can now set a threshold probability such that when this template is used on future matches, we can discard those that score below the threshold. Results are shown in Figure 5.16 for matching the car template to a scene containing 27  5.1. Sketches 7 car-like objects. Using a threshold of 0.4, only 4 were considered to be adequate matches (Figure 5.17).  28  5.1. Sketches  Figure 5.1: The three closed templates. The plane template has 38 segments labeled 0-37, the car template has 46 segments labeled 0-45 and the mug has 24 segments labeled 0-23. All three templates also have a closed-loop constraint.  29  5.1. Sketches  30 Figure 5.2: The three open templates. The letter A template has 6 segments labeled 0-5, the car bottom template has 25 segments labeled 0-24 and the stick figure has 12 segments labeled 0-11. Notice that the stick figure draws over itself and thus allows us to find more varied objects by actually making them closed boundary templates.  5.1. Sketches  Figure 5.3: A scene with multiple objects.  Figure 5.4: Labeled output for matching of the car template.  Figure 5.5: Labeled output for matching of the plane template. 31  5.1. Sketches  Figure 5.6: Labeled output for matching of the mug template.  Figure 5.7: Labeled output for matching of the letter A template.  32  5.1. Sketches  Figure 5.8: Labeled output for matching of the car bottom half template.  Figure 5.9: Labeled output for matching of the stick figure template.  33  5.1. Sketches  Figure 5.10: The segments for a sketch consisting of a randomly drawn set of lines.  Figure 5.11: Labeled output for matching of the car template to the random scene. The second image shows the outline of the output to make it clearer.  34  5.1. Sketches  Figure 5.12: Labeled output for matching of the plane template to the random scene. The second image shows the outline of the output to make it clearer.  Figure 5.13: A regular grid.  35  5.1. Sketches  Figure 5.14: Labeled output for matching of the car template to the grid. The second image shows the outline of the output to make it clearer.  Figure 5.15: Labeled output for matching of the plane template to the grid. The second image shows the outline of the output to make it clearer.  36  5.1. Sketches  Figure 5.16: The 8 cars in descending order of the probability of matching to the car template. This provides a ranking for future car sketch objects. The values in the table are normalized so that the sketch that returned the highest probability has a value of 1 and the worst match gets 0.  Figure 5.17: A scene containing 7 potential car objects. When using a threshold of 0.4, as per the scale computed earlier, only 4 objects are considered cars.  37  Chapter 6  Conclusions and Future Work 6.1  Conclusions  Object recognition is a long standing and challenging problem. There exist an enormous number of approaches that have been developed for recognizing objects in images and sketches. These methods most often use a variety of statistical methods, machine learning models and template based approaches. In this thesis we have developed a flexible shape-based template model which can be used to recognize objects in sketches. Given a template for a target object to find, the algorithm will return a match to the most similar object. The result is a one to many correspondence of template segments to sketch segments which allows us to not only find the object in the scene, but know where each individual part of an object is. The main contributions of this work include developing a suitable HMMbased problem representation. This allows us to make use of the Viterbi algorithm, an efficient dynamic programming algorithm, which we further  38  6.2. Limitations and Future Work optimized for use on sketches. The method we develop allows for multiple instances of an object in an image to be ranked by similarity to the template, thus giving a confidence score to a match. The method has been demonstrated to work on directly input strokes and hand-drawn images and supports both closed and open templates.  6.2  Limitations and Future Work  There are several ways we can improve upon the method we have presented, both in terms of increasing the number of objects and object types we can successfully match, as well as increasing the accuracy of the matches themselves. The method we have described will work only when the object template we are searching for consists of a connected boundary (either closed or open). It is currently not possible to match templates with multiple disconnected parts. An example is that of a face, including the head, eyes, nose, mouth and ears. It is possible to add this functionality to our method with the use of “virtual edges”. These edges could be added into the graph of the template and would allow the method to jump over to other parts of the model e.g., from the head to the eyes. This would allow for a larger variety of objects to be matched, as well as increasing the information available to perform a better matching. By having several disconnected parts on an object, we could utilize a form of constellation model to increase the accuracy of the matching [22]. Once the method allows for additional parts to be matched on a template, it is also possible to have optional parts on an object, to  39  6.2. Limitations and Future Work increase the variability of a class of objects we can successfully match. For example, a mug may have zero, one or two handles. Our method does not handle large occlusions due to the fact that we assume the boundary of the object is connected in a single path, allowing for small gaps. One possible way to handle occlusions with the method we provide is to compute a composition matching using several partial paths. If we take a connected template, then any partial open path of that template can be used to perform its own matching. If there are any large occlusions in the given sketch, one or more of these partial templates may match, giving us a match to the original template. For cases where much of the object is occluded, this approach could introduce the possibility of many false-positive matches due to using only small parts of the template. A limitation of our method is that it is not rotationally invariant to a single template. The template and the object we are matching in the sketch must have the same orientation for the match to work, as we use the absolute orientations of the segments in the template and sketch to calculate the cost of matching. In order to find a rotated object in the input, we must create multiple templates of the object each with a different rotation. It may be possible to account for rotation by computing features based on the relative orientation of segments, but this would involve a substantial change in the way the costs are calculated. In our work, we used the same distribution over the orientations of the segments when calculating the emission probabilities for our HMM. Certain parts of an object may have more or less variability in the orientations it can have (for example, the bottom of a mug would tend to be flat, while its 40  6.2. Limitations and Future Work sides could have arbitrary shapes). A model could be used to learn these tendencies and provide more accurate results. Our method does not make use of the timing information for strokes when the sketch is directly input. It may be possible to improve on the method in these cases by assigning higher probabilities to transitions between stroke segments that were drawn near each other in time.  41  Bibliography [1] C. Alvarado. Sketch Recognition And User Interfaces: Guidelines for Design and Development. In AAAI Fall Symposium on Pen-Based Interaction, 2004. [2] C. Alvarado, M. Oltmans, and R. Davis. A framework for multi-domain sketch recognition. In AAAI Spring Symposium on Sketch Understanding, pages 1–8. AAAI Press, 2002. [3] Christine Alvarado and Randall Davis. Sketchread: a multi-domain sketch recognition engine. In UIST ’04: Proceedings of the 17th annual ACM symposium on User interface software and technology, pages 23– 32, New York, NY, USA, 2004. ACM. [4] I.A. Bachelder and S. Ullman. Contour matching using local affine transformations. Master’s thesis, Massachusetts Institute of Technology, Cambridge, MA, 1991. [5] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 509–522, 2002.  42  Bibliography [6] C. Calhoun, T.F. Stahovich, T. Kurtoglu, et al. Recognizing MultiStroke Symbols. AAAI Spring Symposium on Sketch Understanding, 2002. [7] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 36–51, 2007. [8] V. Ferrari, F. Jurie, and C. Schmid. Accurate object detection with deformable shape models learnt from images. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007. [9] V. Ferrari, T. Tuytelaars, and L. Van Gool. Object detection by contour segment networks. Computer Vision–ECCV 2006, pages 14–28, 2006. [10] K. Grauman and T. Darrell. Fast contour matching using approximate earth mover’s distance. 2004. [11] Tracy Hammond and Randall Davis. Tahuti: a geometrical sketch recognition system for uml class diagrams. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses, page 25, New York, NY, USA, 2006. ACM. [12] J.E. Hummel and I. Biederman. Dynamic binding in a neural network for shape recognition. Psychological review, 99(3):480–517, 1992. [13] A.K. Jain, Y. Zhong, and S. Lakshmanan. Object matching using deformable templates. IEEE Transactions on pattern analysis and machine intelligence, 18(3):267–278, 1996.  43  Bibliography [14] L.B. Kara and T.F. Stahovich. Hierarchical parsing and recognition of hand-sketched diagrams. In Proceedings of the 17th annual ACM symposium on User interface software and technology, pages 13–22. ACM, 2004. [15] Levent Burak Kara and Thomas F. Stahovich. Hierarchical parsing and recognition of hand-sketched diagrams. In UIST ’04: Proceedings of the 17th annual ACM symposium on User interface software and technology, pages 13–22, New York, NY, USA, 2004. ACM. [16] H.C. Liu and M.D. Srinath. Partial shape classification using contour matching in distance transformation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1072–1079, 1990. [17] J.V. Mahoney and M.P.J. Fromherz. Three main concerns in sketch recognition and an approach to addressing them. In AAAI Spring Symposium on Sketch Understanding, pages 105–112, 2002. [18] R. Plamondon and S.N. Srihari. On-line and off-line handwriting recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 63–84, 2000. [19] K. Schindler and D. Suter. Object detection by global contour shape. Pattern Recognition, 41(12):3736–3748, 2008. [20] T.M. Sezgin and R. Davis. HMM-based efficient sketch recognition. In Proceedings of the 10th international conference on Intelligent user interfaces, page 283. ACM, 2005.  44  Bibliography [21] TM Sezgin and R. Davis. Sketch recognition in interspersed drawings using time-based graphical models. Computers & Graphics, 32(5):500– 510, 2008. [22] D. Sharon and M. Van de Panne. Constellation models for sketch recognition. Sketch-Based Interfaces and Modeling 2006, page 19, 2006. [23] K. Siddiqi, A. Shokoufandeh, S.J. Dickinson, and S.W. Zucker. Shock graphs and shape matching. International Journal of Computer Vision, 35(1):13–32, 1999. [24] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis, and Machine Vision Second Edition. International Thomson, 1999. [25] C.C. Tappert, C.Y. Suen, and T. Wakahara. The state of the art in online handwriting recognition. IEEE Transactions on pattern analysis and machine intelligence, pages 787–808, 1990. [26] Olya Veselova and Randall Davis. Perceptually based learning of shape descriptions for sketch recognition. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses, page 28, New York, NY, USA, 2006. ACM. [27] C. Yang, D. Sharon, and M. Van de Panne. Sketch-based Modeling of Parameterized Objects. Sketch-Based Interfaces and Modeling 2005, 2005. [28] Bo Yu and Shijie Cai. A domain-independent system for sketch recognition. In GRAPHITE ’03: Proceedings of the 1st international con-  45  Bibliography ference on Computer graphics and interactive techniques in Australasia and South East Asia, pages 141–146, New York, NY, USA, 2003. ACM.  46  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0051971/manifest

Comment

Related Items