UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Sketch-based instancing of parameterized 3D models 2004

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata


ubc_2004-0697.pdf [ 3.23MB ]
JSON: 1.0051323.json
JSON-LD: 1.0051323+ld.json
RDF/XML (Pretty): 1.0051323.xml
RDF/JSON: 1.0051323+rdf.json
Turtle: 1.0051323+rdf-turtle.txt
N-Triples: 1.0051323+rdf-ntriples.txt

Full Text

Sketch-based Instancing of Parameterized 3D Models by Dan Xiao M.Sc, Zhejiang University, 2002 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF M a s t e r s o f S c i e n c e in THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia September 2004 © Dan Xiao, 2004 F A C U L T Y O F G R A D U A T E S T U D I E S j, T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A Library Authorization In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. DM X I A O oiirl lo l7^ Name of Author (please print) Date (dd/mm/yyyy) Title of Thesis: tyxAcJ* - 6MK>1 JAIM-QUU* ttfj. &^ PcvrweXerf-i&J. Degree: MflVtrg/T Y e a r : JZ Department of Ctfn^bjCt'-t/r ^~yiA Q/lUy€s The University of British Columbia Vancouver, BC Canada grad.ubc.ca/forms/?formlD=THS page 1 of 1 last updated: 20-Jul-04 A b s t r a c t We present a system for constructing 3D models from simple hand-drawn sketches. The system exploits a priori knowledge about the type of object being sketched. In particular, we assume that the class of object being drawn is known, (e.g., a rocket ship) and that the object wi l l be drawn using a fixed number of known components. B y using these assumptions, we show that we can bui ld a sketch-based modeling system that, while object specific, is capable of quickly creating geometric models that are beyond the scope of current sketch-based modeling approaches. K e y to our approach is the use of a K-means classifier for labeling each drawn stroke as being a particular component in the generic model. We demonstrate our approach by applying this classifier to face and rocket sketches. C o n t e n t s Abstract ii Contents i i i List of Tables v List of Figures vi Acknowledgements viii 1 Introduction 1 1.1 Motivat ion 1 1.2 System Overview 3 1.3 Appl icat ion of Sketch Recognition 3 1.4 Thesis Outline 4 2 Related Work 5 2.1 Attributes of a Sketch Recognition System 5 2.2 Algorithms for Object Recognition 6 2.3 2D Sketch Recognition 7 2.4 3D Sketch-based Modeling 9 2.5 Summary : 11 i n 3 Single-stroke Classification 12 3.1 Problem 12 3.2 The Features 13 3.3 Methods for Stroke Classification 15 3.3.1 Least Squares Method 15 3.3.2 K-means Method 16 3.3.3 Expectation-Maximisation Method 18 3.4 . Classifier Comparison 19 3.5 Summary 22 4 Sketch Recognition and 3D Model Construction 23 4.1 Sketch Recognition: Training 23 4.1.1 Shapes 23 4.1.2 Choice of Features 24 4.1.3 Sketch Recognition: Matching . 28 4.2 3D Mode l Construction 28 5 Results 30 5.1 Face Sketching 30 5.2 Rocket Sketching 31 5.3 Discussion / 31 6 Conclusions and Future Work 35 Bibliography 36 Appendix A User Interface 39 iv L i s t o f T a b l e s 1.1 Correspondences between strokes and components of a rocket 2 4.1 Description of components for face 26 4.2 Description of components for rocket 26 5.1 Comparisons of drawing and recognized face components wi th incor- rect recognition 31 v L i s t o f F i g u r e s 1.1 A n example of sketch recognition and modeling 2 1.2 System Overview 2 2.1 Design process [13] 6 2.2 Example of user interface: Teddy 10 2.3 Another example of user interface: Chateau 10 3.1 Features extracted from a gesture for identification 14 3.2 Examples from the gesture set used for evaluation 20 3.3 Recognition rate vs. training size for least square method 20 3.4 Recognition rate vs. training size for K-means method 21 3.5 Recognition rate vs. training size for E M method 21 4.1 Shapes of basic symbols: arc, ellipse, triangle, square, open rectangle. 24 4.2 Features of head used for classification 26 4.3 Illustration of face components 27 4.4 Illustration of rocket components 27 4.5 Model ing parameters for tapered cylinder and ellipsoid. . . . . . . . . . 29 5.1 Face modeling wi th correct recognition - 32 5.2 Rocket modeling wi th correct recognition 33 5.3 Face modeling wi th incorrect recognition 34 v i The interface of modeling system v A c k n o w l e d g e m e n t s I would like to give my great gratitude to my supervisor, Dr . Michie l van de Panne, for his invaluble advice on "my thesis work. Without h im, this would never have been completed. I am also very grateful to Dr . Nando de Preitas, who is giving the interesting machine learning course and stimulate the idea in this thesis. To Jason Harrison, who contribute useful books and papers. Also to Peng Zhao, Yizhen C a i and Xiaoj ing W u for providing several discussions. D A N X I A O The University- of British Columbia September 2004 v i i i Chapter 1 I n t r o d u c t i o n 1.1 Mot ivat ion " A picture speaks a thousand words". People often rely on sketches when they try to convey ideas to others. We all have the experience of drawing something. Of all drawers, children are among the most fanatic. They like to convert what they see and think into pictures. Sometimes their pictures are simply drawn but they are at the same time capable of conveying much information. Creating and animating 3D models is currently not possible, however, without significant training and expertise. New technologies are thriving in our modern society. One common charac- teristic of successful technologies is their ease of use, so that people can grasp those technologies in a short period of time and use them in an easy way. W i t h devices such as P D A s and Tablet P C s , there exists hardware that largely replicates the feel and function of a pencil or pen or paper. Wha t now remains is for new software applications to exploit these new capabilities. Our system can interpret a l imited class of 2D sketeches as 3D objects, and thus represents such a new application. 1 Figure 1.1: A n example of sketch recognition and modeling. Stroke 3D Model Classification Construct! on Chapter 3 Chapter 4 •K 3D Model Figure 1.2: System Overview. Stroke Resulting component 1 cone 2 body 3 tail 4 left wing 5 right wing Table 1.1: Correspondences between strokes and components of a rocket. 2 1.2 System Overview Figure 1.1 shows an example drawn and recognized by our system. Each pen stroke in the graph is recognized as a component of the object and then constructs a component of the 3D model. In the above graph, a rocket is drawn wi th 5 pen strokes and later, a 3D rocket model is displayed wi th 5 components. Each pen stroke represents one component and their correspondences are illustrated in Table 1.1. The overall system structure is given in Figure 1.2. 1.3 Application of Sketch Recognition The application domains of Freehand Sketch Recognition System (FSRS) include: • 3D modeling based on sketching interface, such as computer aided design ( C A D ) . • Query by sketch : image database retrieval. A long-standing dream in computer-aided design is the possibility of using freehand sketching as the language for interactive design. The ability to sketch a 3D object, predict its performance, and re-design it interactively based on physics-based feedback would bring the power of state-of-the-art C A D tools into the critical, early design phase. The enormous potential of sketch-based interfaces is widely recog- nized, and has been broadly pursued. However, the practical use of such attempts has remained limited because these interfaces have been primari ly 2D, losing much of the benefit of mainstream 3D C A D . In order to become truly 3D, a sketch inter- face must automatically be able to reconstruct the spatial geometry from a single 2D sketch in near real-time, much like a human observer does implicit ly. 3 1.4 Thesis Outline The rest of this thesis is organized as follows. In Chapter 2, we discuss work related to this project. Chapter 3 explores how to label individual strokes using classi- fication and compares different classifiers. Chapter 4 describes our implemented system. Chapter 5 presents results. We make conclusions and discuss future work in Chapter 6. 4 Chapter 2 Related Work 2.1 Attributes of a Sketch Recognition System Freehand sketching can play an important role in the realm of user-interfaces for graphics systems. Sketching appears to be a natural communication language, en- abling fast conveyance of qualitative information while not burdening the creativity of the user or disrupting the flow of ideas. As illustrated in Figure 2.1, sketching is a key element in modeling, developing and defining a design result. Dur ing the whole design process, sketching often provides the input for conceptual design. [17] suggests that a sketch recognition technology must meet three main requirements: it must deal reliably wi th the pervasive variability of hand sketches, provide interactive performance, and be easily extensible to new configurations. User interfaces based on sketching can generally be divided into three cat- egories, according to the level of information they intend to gather from the input sketch [22]: 1. Drawing Pads. Basic sketching is allowed for general purpose drawings. Input strokes are smoothed and many other graphic tools are provided. 2. 2D sketch applications. These applications smooth sketch strokes and classify them into 2D primitives, such as lines, arcs, and splines. Some geometric 5 Figure 2.1: Design process [13]. constraints and relationships among the entities are utilized to further refine the sketch. 3. 3D sketchers. Sketches are analyzed as representing rough projections of 3D scenes. Sketch strokes are identified as basic geometrical shapes, such as lines, arcs and corners. However, some of the sketch strokes do not necessarily represent what they appear to be since the analyzed sketch represents a rough projection of a three dimensional scene [11, 14]. 2.2 Algorithms for Object Recognition The sketch recognition problem is strongly related to the object recognition problem in computer vision. A broad class of such object recognition problems have bene- fited from statistical learning machinery. The use of Principle Component Analysis ( P C A ) has produced good results [25] in the area of face recognition. O n more gen- eral object recognition tasks, several other learning methods such as Bayes classifier [21] and decision tree learning [28] have been applied. The technique of boosting is proven to be a viable method of feature selection in [26]. Support vector ma- chine methods have demonstrated success in template matching problems such as 6 recognizing pedestrians in [16]. [2] and [31] present a novel approach to measure similarity between 2D shapes and exploit it for object recognition. In order to solve the correspondences problem, they attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. In this way, they treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. 2.3 2D Sketch Recognition Research on sketch recognition and interpretation has a long history. A variety of systems have incorporated gesture recognition into their user interface. The development of a Freehand Sketch Recognition System involves three stages: 1. Stroke-level recognition which interprets the pixels and produces low-level ge- ometric primitives such as lines and curves. 2. 3D modeling based on 2D sketches. 3. Multi-stroke understanding. Each sketch of an object consists of multiple strokes. The first step in in- tepreting a sketch is to classify individual pen strokes by processing them. Usually there are two assumptions in this area, one is that each pen stroke symbolizes a single shape, such as a single curve segment or triangle segment, whichever fits the stroke best. M u c h of the previous work has relied either on using stroke methods in which an entire symbol must be drawn as single stroke [20, 10, 4], or single primitive methods in which each stroke must be a single line, arc, or curve [18, 6, 27]. Another one used in [3] is that multiple primitives can be drawn in the same pen stroke. In this way, a square can be drawn as four individual pen strokes or a single pen stroke 7 with three 90° bends. The key challenge is determing which bumps and bends are drawn intentionally and which are unintentionally. Arvo and Novins described a kind of stroke-level recognition algorithm [1]. Different from traditional ways of recognizing sketches, their method continuously morphs raw input strokes into ideal geometric shapes. Another stroke-level recognition algorithm is a domain-independent system for sketch recognition [29]. Users are allowed to draw sketches as naturally as how they do on paper. The system then recognizes the drawing through imprecise stroke approximation which is implemented in a unified and incremental procedure. This method can handle smooth curves and hybrid shapes as gracefully as it does to polylines. W i t h a feature-area verification mechanism and the intelligent adjustment in a post-process, the system can produce the results intended by the user. Rubine presented a trainable gesture recognizer which used a "linear ma- chine" classifier to discriminate gestures [20]. Each gesture class is associated wi th a linear evaluation function using 13 features. In order for training, appropriate weights are learned for each attribute in the linear function. The attributes consider aggregate properties of a pen stroke, and it is possible that two different gestures would have the same aggregate properties. Recent studies show us some interesting gesture-based interfaces in which the user specifies commands by simple drawings. G R A N D M A , a tool for building gesture-based applications introduced in [19], supports the integration of gestures and direct manipulation. It allows views that respond to gestures and views that respond to clicks and drags to coexist in the same interface. Our work is closely related to this in that we also a set of stroke features and then apply a classifier to discriminate different gestures. Landay and Myers advanced an interactive sketching tool called S I L K in which designers can quickly sketch out a user interface and transform it into a fully operational system [12]. As the designer sketches, the recognizer of S I L K matches 8 the pen strokes to symbols representing various user interface components, and returns the most likely interpretation. Their recognizer is l imited to single-stroke shapes drawn in certain preferred orientations. [23, 24] have developed a program called SketchIT that can transform a sketch of a mechanical device into working designs. Electronic Cocktai l Napkin [5] employs a trainable recognizer that works for multi-stroke shapes. The recognition process consists of glyph (low-level) and configuration (high-level) recognition. A glyph is described by a state transition model of the pen path, the aspect ratio and size of the bounding box, and the number of corner points. Configuration recognition takes the spatial relationships between the glyphs into consideration. This method is sensitive to changes in orientation, and the 3x3 grid may be inadequate for symbols containing small features. 2.4 3D Sketch-based Modeling Zeleznik et al. have extended the use of gesture recognition for 3D modeling[30]. The S K E T C H system they proposed attempts to combine 2D image wi th 3D computer modeling systems to create an environment for rapidly conceptualizing and editing approximate 3D scenes. A l l interaction wi th S K E T C H is v ia a three-button mouse, occasional use of one modifier key on the keyboard, and a single orthographic window onto the 3D scene. The Teddy system [8] in Figure 2.2 is aimed especially at the modeling of simple free form objects. The system is able to innate 2D closed curves into 3D objects, which can then be edited using simple gestures. The interface shown in Figure 2.3 is Chateau [7], a prototype systemin which the user gives the hints to the system by highlighting related lines and the system suggests possible operations based on the hints, shown in the thumbnails at bottom. 9 Figure 2.2: Example of user interface: Teddy. 2.5 Summary In this chapter, we presented a review of sketch recognition systems, 2D sketch recognition methods, along wi th some 3D sketch-based modeling examples. In the next chapter, we wi l l present some machine learning methods used for gesture recog- nition. 11 Chapter 3 Single-stroke Classification One of the goals of our work is to obtain an efRcent classifer as well as a rich and meaningful feature vectors. In this chapter, we compare several machine learning methods that classify strokes into one of several categories based upon the stroke fea- ture vectors [20]. The selection of useful features and a good classifier wi l l determine the effectiveness of our gesture recognizer. 3.1 Problem Each gesture is denned as an array g of coordinate values and time. 9P = {xp,yP,tp) 0<p<P The gesture recognition problem is stated as follows. There is a gesture set wi th C classes. Given a gesture, we need to determine to which class i t belongs. This is done by first extracting a limited set of features. Features are extracted from the gesture g and are used as the input for classification. The feature vector / = [ / i . . . fp] is taken as the training data for classification. 12 3.2 The Features As an example, a stroke can be characterized by a set of 11 geometric and 2 dynamic features [20]. This is the set of features we shall use for testing several classification algorithms. Note that many other feasible feature sets could also have been used, wi th possible varying results. Figure 3.1 shows how to extract these specific features from a stroke. The discription of features are: • Feature 1 (A) : the cosine of the ini t ia l angle of the gesture. • Feature 2 ( A ) : the sine value of the ini t ia l angle of the gesture. • Feature 3 (A) : the length of the bounding box diagonal. • Feature 4 ( A ) : the angle of the bounding box diagonal. • Feature 5 ( A ) : the distance between the first and the last point. • Feature 6 ( A ) : the cosine of the angle between the first and last point. • Feature 7 ( A ) : the sine of the angle between the first and last point. • Feature 8 ( A ) : the total gesture length. • Feature 9 ( A ) : the total angle traversed. • Feature 10 (fio) : the sum of the absolute value of the angle at each mouse point. • Feature 11 ( / n ) : the sum of the squared value of those angles. • Feature 12 ( / 1 2 ) : the maximum speed (squared) of the gesture. • Feature 13 ( / 1 3 ) : the duration of the gesture. fi = cos a = x2 - x0/ J(X2 - x0)2 + (2/2 - Vo)2 13 Figure 3.1: Features extracted from a gesture for identification. / 2 = sin a = y2 - yo/y [x2 - x0)2 + (y2 - yo)2 fz = \J(%max %min)2 "I" (^moi Zmin)2 Vmax y-min fi — arctan - Lmax •''mm h = \ / ( z p - i - xo)2 + {yp-i - yo)2 fe = cos/3 = (xp-i - x0)/f5 f7 = sin/3 = ( y P _ i - 2/0V/5 Let Axp = x p + i - x p , A y p = j / p + i - yp f8 = PJ2jAxl + Ay* Let 0 P = arctan ^ - ' " ^ ^ P - 2 /9 = P̂ p = l 14 P-2 fw P = i P-2 hi P = i Let Atp — £ p + i — P—2 (• /12 = max - p=0 •max •max / l 3 = tp+1 - to 3.3 Methods for Stroke Classification We now introduce several possible classification methods wi th the purpose of de- termining which one works best. Three machine learning classification schemes are presented in this section: least squares, k-means, expectation-maximisation. The least squares method is a supervised learning technique while the latter two use unsupervised learning. 3.3.1 Least Squares Method [20] uses a linear machine algorithm for strokes classification, which is a k ind of least square approach. Given C as the number of total classes and F is the number of features, each gesture class c has parameters tuc; for 0 < i < F . The evaluations, vc, are calculated as follows: The class of a gesture g is the c which maximizes vc. During the training process, we need to determine the weights uici from the example gestures. Let fcei be the feature of the example of gesture class c, 0 < e < F 0<c<C 15 Ec, where Ec is the number of training examples of class c. The sample estimate of the mean feature vector per class fc is defined as: 1 E c ~ x fci = T T A e i ^ c e=0 The sample estimate of the covariance matrix of class c, JZcij, is computed as: Ec-l _ Etij = ^ (fcei ~ fci)(fcei ~ fci) e=0 The _ cr/ are averaged to yield _ i 7 , a r L estimate of the common covariance matrix. E . . _ _ _ _ _ _ _ We can invert the common covariance matrix and use it to get the weights u>ci as follows [20]: F _ uCJ = J2^~%fa 1<3<F i = i F 2 = i 3.3.2 K-means Method We use K-means clustering introduced in [15] for classification. In K-means cluster- ing, K , needs to be determined at the onset. The goal is to divide the objects into K clusters such that some metric relative to the centroids of the clusters is minimized. Various metrics related to the centroids can be minimized, including: • The maximum distance to its centroid for any object. • The sum of the average distance to the centroids over all clusters. • The sum of the variance over all clusters. • The total distance between all objects and their centroids. 16 The metric to minimize and the choice of a distance measure w i l l determine the shape of the optimium clusters. Two different algorithms are available to search for the optimum set of clus- ters. In the first procedure, the objects are randomly assigned to one of the K clusters. Once this is done, the position of the K centroids are determined, as is the value of the metric to minimize. A global optimization method is then used to re- assign some of the objects to different clusters. New centroids are determined, as is the metric to minimize. This procedure is continued unti l the optimum assignment of objects to clusters is found. In the second procedure for K-means clustering, placement of the K centroids can be done by the following procedure. Place K points into the space represented by the objects that are being clustered. These points represent ini t ial group centroids. 1. Assign each object to the group that has the closest centroid. 2. W h e n all objects have been assigned, recalculate the positions of the K cen- troids. 3. Repeat Steps 2 and 3 unti l the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated. A global optimization method is then used to move the position of one or more of the centroids. The above procedure is repeated and new metric value is determined. The object is to move the centroids into a position such that an optimum separation of objects into groups occurrs. For K-Means clustering, it is necessary to calculate a "distance" between r either two objects (one of which may be a cluster seed point) or an object and a group centroid. In this discussion, it is assumed that each object is described by an array of real-valued metrics. 17 Since we need to minimize the largest or average distance, or the variance, during clustering, it is important that each metric contribute equally to the total distance. In other words, if one metric spans the range [0.0,0.5] and another spans [0.0,100.0], the maximum deviation in the first would have little effect on the total distance, while even a modest separation in the second would have a much larger effect. To remove this dependency on the range spanned by each metric, it is impor- tant to first standardize the values. This means that each metric, when compared over the full set of objects should have a mean of 0.0 and a variance (or standard deviation) of 1.0. For each metric, the following steps should be taken to standardize each metric describing the objects. 1. Sum the values of the metric over all objects and divide the sum by the number of objects. 2. Subtract this average value from the metric in al l objects. 3. Sum the square of these new values over all objects, divide the sum by the total number of objects, and take its square-root. This is the standard deviation of the new values. 4. Divide the metric by the standard deviation in each object. 3.3.3 Expectat ion-Maximisat ion M e t h o d This method supposes that the stroke features have normal Gaussian distributions. The E M method can be applied to build a gesture classifier. The detailed steps are given by Jordan [9]: 1. Initialise. 2. E Step: A t iteration t, compute the expectation of the indicators for each i and c: = p(c)N(xi\fxc,T,c) U E * p ( c ' ) A ^ | / v , £ c ' ) 18 and normalize it. 3. M Step: Update the parameters P(c), fj,c, S c . E c E i = l Cic 1 p(c) = - _ > The E M method is used to find the parameters for the normal distribution of each class and then apply this distribution to classify new, unseen feature vectors. The distribution wi th the highest probability is defined to be the class to which the gesture belongs. 3.4 Classifier Comparison The gestures used for evaluation of the three classification algorithms are shown in Figure 3.2. In this figure, four different gesture styles of each class are given. Performance is evaluated on 3 gesture classes. Figure 3.3 shows the results using the algorithm presented by [20]. The plot demonstrates the recognition rate as a function of the number of training examples per class for evaluation gestures. One line is for 2 classes and the other one is for 3 classes. Figure 3.4 illustrates the results wi th k-means. The results when applying E M are shown in Figure 3.5. From the above three figures, we can see that the K-means method gives the best performance. For the K-means method, in the cases where 3 gesture classes are recognized by a classifier trained wi th 10 or more examples per class, at least 96% of the test gestures are classified correctly. When there are only 2 gesture classes, the classifier trained wi th 10 or more examples per class yields 100% correct classification. 19 A A Figure 3.2: Examples from the gesture set used for evaluation. 100 90 80 70 60 50 40 30 20 10 0S| Classification with Least Squares — B n H _ ta i il I B 2 classes 3 classes 0 10 20 - 30 40 training examples per classes 50 60 Figure 3.3: Recognition rate vs. training size for least square method. 20 Classification with K-means 100 90 80 70 60 50 40 30 20 10 OS. •B 2 classes -+- 3 classes 20 30 40 training examples per classes 50 60 Figure 3.4: Recognition rate vs. training size for K-means method. Classification with EM 100 30 B0 70 60 50 40 30 20 10 od 7 / / tf if / ~ B 2 classes - * - 3 classes 20 30 40 training examples per classes 60 Figure 3.5: Recognition rate vs. training size for E M method. 21 The linear machine classifier used by Rubine that is trained wi th 10 or more examples can recognize 3 gesture classes with correctness > 60% , and for 2 gesture classes, the adequacy is > 85%. This method gives better performance when there are 40 training examples per class, but the correctness falls after increasing the number of examples. The recognition adequacy of E M method is low when the number of training examples is less than 30 per class, however, it increases when more training exam- ples added. The more training examples per class, the higher the correctness at recognizing gestures. The classifier trained wi th 30 or more examples can recognize 3 gesture classes at above 85% correctness, and for 2 gesture classes above 80% correctness. It is surprising to note that it w i l l give better performance for 3 gesture classes than 2 gesture classes for large number of training examples. 3.5 Summary The generation of a meaningful, extensible and effective feature set is essential. How many features are enough to describe a gesture without redundancy and what kind of features should be extracted is an important problem to be tackled. In this chapter, testing strokes comprise congruent lines and arcs, and we can see that thirteen features are quite efficient for classification. Given different gesture sets, some features may be useless and a small number of feature set may be applied to generate good results as well. In the next chapter, we want to classify strokes of different shapes from the ones we used in this chapter. We choose K-means method to use for our sketch recognition system. They are generally simpler and thus we wi l l use a feature set wi th a smaller number of attributes. r 22 C h a p t e r 4 S k e t c h R e c o g n i t i o n a n d 3 D M o d e l C o n s t r u c t i o n This chapter w i l l discuss the main contributions of our work: a system for sketch recognition and 3D model construction. Sketch recognition consists of assigning feature labels to individual pen strokes, i.e., identifying the object features they corresponded to. It includes two phases: training and learning. We have chosen K-means method for stroke classification and we w i l l show how we select features and apply this classifier to sketch recognition. Lastly, we wi l l go through the details of 3D model construction with a generic example. 4.1 Sketch Recognition: Training 4.1.1 Shapes In our system, each pen stroke symbolizes a single shape, such as a single curve or triangle. The advantage is that it facilitates shape recognition. The disadvantage is that the system requires the user to sketch features in a fairly specific fashion. We define a shape set for typical symbols, including curve, ellipse, triangle, square, open rectangle, as shown in Figure 4.1. We regard a line as being a kind of curve. 23 Figure 4.1: Shapes of basic symbols: arc, ellipse, triangle, square, open rectangle. We use x and y values of the stroke points to determine its shape style, which includes two steps. For curve and open rectangle, the distance between the begin point and the end point is quite faraway; whereas for ellipse, triangle and square, the begin point is close to the end point. In this way, we can categorize the five shape types into two classes: the one with open stroke and the one wi th closed stroke. The next step is to employ geometrical properties of different shape type to distinguish them. Every triangle, right-angled or not, w i l l have at least two acute angles. A square contains four right angles and an open rectangle takes two angles, whether acute or obtuse. O n a stroke, the lines between a point and its two neighbouring points form an angle. This three points consist of a triangle, length of each edge can be calculated and then the cosine value of this angle. Since sketch is drawn freehand, it may incur a lot of minor noises, for instance, a line may contain some sharp angles. A good way to avoid those noises is to include more neighbouring points for calculation. W i t h this method, angle values on a stroke can be computed one point by one point. These values change smoothly for curve and ellipse because they don't have such obvious angle traits as triangle, square and open rectangle. 4.1.2 Choice of Features Four features as the intrinsic properties of a stroke are denned to interpret one single stroke, including: • Feature 1 ( / i ) : Normalized location in x direction. 24 • Feature 2 (f2) : Normalized location in y direction. • Feature 3 (fs) : Drawing time. • Feature 4 (f±) : Drawing shape. Once an object sketch is drawn, an axis-aligned bounding box for the whole sketch is computed. As shown in Figure 4.2, the sketch bounding box is illustrated in dotted lines. The normalized x value (feature / i ) is computed as a f\ = x'/xb0, where x' is.the distance from the center of stroke bounding box to the left side of the sketch bounding box, and x0b is the bounding box width. In the same manner, h = V'/Vbb- The drawing time (feature fs) is measured as the total time between first and last sample point. The last feature used for stroke classification is drawing shape (feature fi) which is categorized according to our shape set denned in Figure 4.1 and calculated from the method we discribed in 4.1.1. We assign a unique integer value to each shape type: 0 to curve, 1 to ellipse, 2 to triangle, 3 to square and 4 to open rectangle. In this way, al l the features can be parameterized. Figures 4.3 and 4.4 show the sketched components of face and rocket that wi l l be recognized by our system. A face consists of 9 strokes drawn in any order: face, left eye, right eye, left eyebrow, right eyebrow, nose, mouth, left ear and right ear. A rocket is described by 5 strokes that can be drawn in any order: head, body, tail , left wing and right wing. Numbers in the figures correspond to the drawing order for each stroke for the example database, which provides a labelled set of training data. Detailed information is listed in Table 4.1 for face and Table 4.2 for rocket. In these tables, we also show possible drawing shapes for each component and the required shapes for some component during recognition. We set required shapes to facilitate recognition. After we aquire all the features for each stroke, we use a K-mean classifer to distinguish them. 25 Figure 4.2: .Features of head used for classification. Order Components Possible Shapes Required Shapes 1 face ellipse ellipse 2 left eye ellipse ellipse 3 right eye ellipse ellipse 4 left eyebrow curve curve 5 right eyebrow curve curve 6 nose curve, triangle, ellipse 7 mouth curve, ellipse 8 left ear curve, ellipse 9 right ear curve, ellipse Table 4.1: Description of components for face. Order Components Required Shapes 1 head triangle 2 body open rectangle 3 tai l open rectangle 4 left wing open rectangle 5 right wing open rectangle Table 4.2: Description of components for rocket. 26 4 •* *• 3 *• 9 *• 6 7 Figure 4.3: Illustration of face components. 4.1.3 Sketch Recognition: Matching The gesture recognition can be regarded as classification problem for which we can apply K-means scheme as described in Chapter 3 . There are C classes for classification, C = 9 for face and C = 5 for rocket. Each gestures in one class should have some similarity and can be clustered together. 4.2 3D Model Construction In the modeling phase, the recognized strokes are used to construct 3 D models. Once classified, a stroke plays a key role in determining model's geometrical construction. For our current system, curves and ellipses are rendered into ellipsoids, trianglea become cones and rectangles into cylinders. Figure 4 . 5 illustrates parameters used for modeling cylinder and ellipse sketches into 3 D models. There are 4 parameters for modeling a tapered cylinder: rl, r2, h and nslices. h represents cylinder's height, rl and rl are the radius of two ends of tampered cylinder, nslices is the the number of polygons used to represent the cylinder, which can be considered to be open on either end. If the shape of a sketch is triangle, r l is zero. The parameters for representing ellipsoid are: r, h and nslices. The cross section of an ellipsoid along z axis is an ellipse wi th its minor axis value to be r and major axis value to be h. A l l the radius and height values are taken from axis-aligned bounding box around a stroke. After sketch recognition in the previous stages, we have determined each component and its shape for an object. Because we have a priori knowledge about what k ind of object is being modeled, once we identity each stroke, we can locate their position related to other components, for instance, nose and eyes are on the face while ears are at the two sides of a face. The following is an example description of how we model a face from raw sketches. Figure 4 . 3 shows 9 strokes of a face. Stroke 1 is the head outline and 28 / h n s l i c e s x h x Z Figure 4.5: Model ing parameters for tapered cylinder and ellipsoid. the resulting 3D head is modeled using an ellipsoid. Stroke 2 is the left eye and is modeled using an ellipsoid. Stroke 4 is left eyebrow modeled using a 3D curve. We assume eyes, eyebrows, nose and mouth are all located on the surface of the head and that the ears are located on the two sides of the head. Even i f the ear strokes are not drawn in this way, we st i l l model them as a normal face. Since they are related to head model, the z value of these models should be calculated from the model of the head outline. For the head, the cross section along the z and x axis of an ellipsoid is an ellipse and the cross section along y axis is a circle. The position values (x, y, z) of any point on an ellipsoid can be computed from ellipse and circle mathematical equations. In the next chapter, we wi l l show some concrete examples of human faces and rockets that go from raw sketches to 3D models. 29 Chapter 5 Results We test our methods wi th sketch-based modeling of simple faces and rockets. Results from our system are demonstrated in this chapter. We show examples of both successful and unsuccessful sketch-based modeling. We assume that a face always takes 9 strokes and a rocket wi th 5 strokes, otherwise, the system w i l l not work properly. 5.1 Face Sketching Figure 5.1 shows three examples of faces drawn and recognized by our system. We show examples of both successful and unsuccessful sketch-based modeling. From left to right are the sketches, front view of the 3D models and a rotated view of the 3D models. Figure 5.3 shows two faces drawn in our system wi th incorrect recognition. Table 5.1 list the drawing components and recognized ones for these two examples. In example one, "mouth" is mistaken as a "left ear" because these two strokes are drawn in the same shape and other features data are similar and therefore confuse the classifier. In example two, "left eye" is wrongly recognized as "left eyebrow" because they are drawn too close to each other. One possible solution is to make each component unique during recognition. Each component can be marked and if 30 E x a m p l e 1 E x a m p l e 2 Drawing Drawing Recognized Drawing Recognized Order Components Components Components Components 1 face face left eyebrow left eyebrow 2 left ear left ear right eyebrow right eyebrow 3 right ear right ear left eye left eye 4 left eyebrow left eyebrow right eye right eyebrow 5 left eye left eye nose nose 6 right eyebrow right eyebrow mouth mouth 7 right eye right eye face face 8 nose nose left ear left ear 9 mouth left ear right ear right ear Table 5.1: Comparisons of drawing and recognized face components wi th incorrect recognition. there is another component also marked as "left eyebrow", we can choose the label wi th the least error according to that classifier. 5.2 Rocket Sketching Three examples of rockets are shown in Figure 5.2. From left to right are the raw sketches, front image of rendered rockets and rocket images after rotation. 5.3 Discussion We set up a training library for sketches including face and rocket, the size of database wi l l affect recognition correctness. From Chapter 3, we know that the more examples the database contains, the more accurately the sketches w i l l be recognized. Our training data consists of 40 face skteches and 30 rocket sketches. Classifiers are obtained after applying K-means clustering to this data. 31 32 3:3 Figure 5.3: Face modeling with incorrect recognition. 34 Chapter 6 C o n c l u s i o n s a n d F u t u r e W o r k The system has been tested on simple drawings such as human faces and rockets. In order to make the system work for a new class of objects, we have to specify how many components this object has and from some training examples, collect the shapes that can best symbolize this component. W i t h this information, the system can create 3D models from simple sketches. Our current algorithms and implementation are efficient enough for exper- imental use. However, they can fail or generate unintuitive results when the user draws unexpected strokes. We must devise more robust and flexible algorithms to handle a variety of user inputs. In particular, we plan to enhance the algorithm to allow multi-stroke input. Currently, we assume that each stroke represent one shape. If we would allow pen strokes to represent any number of shape primitives connected together, the system could potentially cope deal wi th more complicated sketches. Another important research direction is to develop additional features to support a wider variety of shapes wi th arbitrary topology, and to allow more precise control of the shape. Second, in training phase, we would like to test wi th more classifiers to construct a more accurate classifier for matching strokes. 35 B i b l i o g r a p h y [1] James Arvo and K e v i n Novins. F l u i d sketches: Continuous recognition and morphing of simple hand-drawn shapes. In Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technlogy, November 2000. [2] Berge Belongie, Ji tendra Mal ik , and Jan Puzicha. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(24) :509-522, 2002. [3] Chris Calhoun, Thomas F . Stahovich, Tolga Kurtoglu , and Levent Burak Kara . Recognizing multi-stroke symbols. AAAI Spring Symposium on Sketch Under- standing, pages 15-23, 2002. [4] F Cohen, Z Huang, and Z Yang. Invariant matching and identification of curves using b-splines curve representation. IEEE Transactions on Image Processing, 4(1):1-10, 1995. [5] M . Gross and E Do. Ambiguous intentions: a paper-like interface for creative design. In Proceedings of UIST 96, pages 183-192, 1996. [6] T Igarashi, S Matsuoka, S Kawachiya, and H Tanaka. Interactive beautification: A technique for rapid geometric design. In UIST'97, pages 105-114, 1997. [7] Takeo Igarashi and John F . Hughes. A suggestive interface for 3d drawing. ACM UIST Symposium on User Interface Software and Technology, pages 173-181, 2001. [8] Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. Teddy: A sketching interface for 3d freefrom design. SIGGRAPH 99, Computer Graphics Proceed- ings, pages 409-416, 1999. [9] Michael I. Jordan. An Introduction to Probabilistic Graphical Models. To be published, 2004. 36 [10] T . D . K imura , A Apte, and S. Sengupta. A graphic diagram editor for pen computers. Software Concepts and Tools, pages 82-95, 1994. [11] D . Lamb and A Bandopadhay. Interpreting a 3d object from a rough 2d line drawing. In Proc First IEEE Conf on Visualization, 90, pages 50-66, 1990. [12] James A . Landay and Brad A . Myers. Sketching interfaces: Toward more human interface design. IEEE Computer, 34(3):56-64, 2001. [13] H a n L i . Freehand sketch recognition system, http://science.unitn.it/ tomasi/ think/pdf/pha.ppt. [14] H o d Lipson and Moshe Shpitalni. Correlation-based reconstruction of a 3d object from a single freehand sketch. 2002 AAAI Spring Symposium on Sketch Understanding, pages 99-104, 2002. [15] Br i an T . Luke. K-means clustering, http://fconyx.ncifcrf.gov/ lukeb/kmeans.html. [16] P. Sinha E . Osuna M . Oren, C . Papageorgiou and T . Poggio. Pedestrian de- tection using wavelet templatess. In Proc. IEEE Conf. Comput. Vision and Pattern Recognition, pages 193-199, June 1997. [17] James V . Mahoney and Markus P. J . Frommerz. Three main concerns in sketch recognition and an approach to addressing the. Sketch Understanding, Papers from the 2002 AAAI Spring Symposium, pages 105-112, 2002. [18] Zhao R . Incremental recognition in gesture-based and syntax directed diagram editor. In Proceedings of InterCHI'93, pages 95-100, 1993. [19] Dean Rubine. Integrating gesture recognition and direct manipulation. In Proceedings of the Summer'91 USENIX Technical Conference, pages 95-100, 1991. [20] Dean Rubine. Specifying gestures by example. SIGGRAPH 91, Computer Graphics Proceedings, 25:329-337, 1991. [21] H . Schneiderman and T . Kanade. A statistical method for 3 dimensional object detection applied to faces and cars. In Proc. IEEE Conf. Comput. Vision and Pattern Recognition, 2000. [22] Moshe Shpitalni and Hod Lipson. Classification of sketch strokes and corner detection using conic sections and adaptive clustering. ASME Journal of Me- chanical Design, 119(2):131-135, 1997. 37 [23] T . F Stanovich. Sketchit: a sketch interpretation tool for conceptual mechanism design. Technical report, MIT Al Laboratory, 1996. [24] T . F . Stahovich, R. Davis, and H Shrobe. Generating multiple new designs from a sketch. Artificial Intelligence, 104(1-2) :211-264, 1998. [25] M . Turk and A . Pentland. Eigenfaces for recognition. J. Cognitive Neuro science, 3(l):71-96, 1991. [26] P. V i o l a and M . Jones. Robust real-time object detection. Second international workshop on statistical and computational theories of vision, 2001. [27] L . Weisman. A foundatoin for intelligent mult imodal drawing and sketching programs. 1999. [28] D . Geman Y . A m i t and K . . Wilder . Joint induction of shape features and tree classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(11):1300-130596, November 1997. [29] Bo Y u and Shijie Ca i . A domain-independent system for sketch recognition. 2003. [30] Robert C . Zeleznik, Kenneth P. Herndon, and John F . Hughes. Sketch: A n interface for sketching 3d scenes. SIGGRAPH 96, Computer Graphics Proceed- ings, pages 163-170, 1996. [31] Hao Zhang and Jitendra Mal ik . Learning a discriminative classifier using shape context distances. IEEE Computer Vision and Pattern Recognition, pages 242- 247, 2003. 38 Appendix A U s e r I n t e r f a c e The physical user interface of this sketch system is implemented wi th F L T K , a platform independent graphics toolkit. The interface consists of a main display window on the left side and a control panel on the right. Most control actions are focused on drawing gestures and displaying object, we use a two-button mouse for drawing and control. The top panel called "Object Display" contain a number of buttons, the function of which is given below. • sketch: ready for new sketching the object or add more sketches to an rendered object. • solid shading: show object wi th solid shading. • render: render the whole sketches after finished drawing an object. • show normal: display normal on each point of object meshes. • reset: clean the display window and ready to draw other objects. • redisplay: show original sketches. In the middle, there is a animation panel designed for showing some simple animations. Different training and drawing choice buttons are listed at the buttom 39 1 Sketching and Rendering Syste: Object Display | sketch | solid_shading • j render |show_normal j reset | redisplay Animation r — element: face O redisplay O 3D Animation Draw What Training face face racket rocket finish snip detect Open tile '' •~ j Save as /-" Exit Figure A . l : The interface of modeling system. of the control panel. In addition to gestures, we also use a few button widgets for auxiliary operations, such as save and load. 40


Citation Scheme:


Usage Statistics

Country Views Downloads
United States 3 0
Japan 2 0
China 1 5
City Views Downloads
Tokyo 2 0
Denver 2 0
Beijing 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}


Share to:


Related Items