PolyFit: Perception-Aligned Vectorization of Raster Clip-Art viaIntermediate Polygonal FittingbyEdoardo Alberto DominiciB.Sc., Computer Science, University of Pisa, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Computer Science)The University of British Columbia(Vancouver)April 2020c© Edoardo Alberto Dominici, 2020The following individuals certify that they have read, and recommend to the Faculty of Graduateand Postdoctoral Studies for acceptance, the thesis entitled:PolyFit: Perception-Aligned Vectorization of Raster Clip-Art via Intermediate Polyg-onal Fittingsubmitted by Edoardo Alberto Dominici in partial fulfillment of the requirements for the degreeof Master of Science in Computer Science.Examining Committee:Alla Sheffer, Professor, Computer Science, UBCSupervisorHelge Rhodin, Associate Professor, Computer Science, UBCExamining CommitteeiiAbstractRaster clip-art images, which consist of distinctly colored regions separated by sharp boundariestypically allow for a clear mental vector interpretation. Converting these images into vector formatcan facilitate compact lossless storage and enable numerous processing operations. Despite recentprogress, existing vectorization methods that target such data frequently produce vectorizations thatfail to meet viewer expectations. We present PolyFit, a new clip-art vectorization method that pro-duces vectorizations well aligned with human preferences. Since segmentation of such inputs intoregions had been addressed successfully, we specifically focus on fitting piecewise smooth vec-tor curves to the raster input region boundaries, a task prior methods are particularly prone to failon. While perceptual studies suggest the criteria humans are likely to use during mental boundaryvectorization, they provides no guidance as to the exact interaction between them; learning theseinteractions directly is problematic due to the large size of the solution space. To obtain the desiredsolution, we first approximate the raster region boundaries with coarse intermediate polygons lever-aging a combination of perceptual cues with observations from studies of human preferences. Wethen use these intermediate polygons as auxiliary inputs for computing piecewise smooth vector-izations of raster inputs. We define a finite set of potential polygon to curve primitive maps, andlearn the mapping from the polygons to their best fitting primitive configurations from human anno-tations, arriving at a compact set of local raster and polygon properties whose combinations reliablypredict human-expected primitive choices. We use these primitives to obtain a final globally con-sistent spline vectorization. Extensive comparative user studies show that our method outperformsstate-of-the-art approaches on a wide range of data, where our results are preferred three times asoften as those of the closest competitor across multiple types of inputs with various resolutions.iiiLay SummaryClip-art images, characterized by visually distinct colored regions, are ubiquitous in digital me-dia and typically allow for an unambiguous mental interpretation. Converting them into a vectorrepresentation would allow for a more compact representation, effortless resizing and a wide ar-ray of image processing operations. Unfortunately, existing raster to vector algorithms, often failat consistently producing outputs matching human expectations. In this thesis we present a novelvectorization method which first approximates the input raster with a polygon and then uses thispolygon as auxiliary input in computing the final vectorization. We validate its effectiveness in pro-ducing outputs well aligned with human expectation by running comparative studies against existingalternative methods.ivPrefaceAll the work presented in this thesis is a product of a working relationship between Edoardo AlbertoDominici, Nico Schertler, Jonathan Griffin, Shayan Hoshyari, Alla Sheffer and Leonid Sigal.The implementation of sections 3, 5 and 6 has been done by Edoardo A. Dominici, the imple-mentation of Section 4 was done by Nico Schertler and the extension of the method for anti-aliasedinputs has been done by Jonathan Griffin. Shayan Hoshyari provided fundamental software com-ponents and help with the overall organization. Edoardo A. Dominici was provided with invaluableguidance by Alla Sheffer, Nico Schertler and Leonid Sigal.The user study in section 3 was conducted with the approval of the University of British Columbia(UBC BREB Number H16-02320)vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Intermediate Polygonal Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 113.1 Graph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Optimal Polygon Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Spline Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.1 Primitive Configuration Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Primitive Configuration Classification . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Spline Shape Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Multicolor Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Results and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297.1 Comparison to Prior Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307.2 Comparisons to Manual Vectorization. . . . . . . . . . . . . . . . . . . . . . . . . 31vi7.3 Non-Uniform Shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317.4 Ablation Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327.5 Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337.6 Limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38A Perception Studies of Polygon Approximation . . . . . . . . . . . . . . . . . . . . . . 41B Spline Fitting Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43C Quantitative Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44D Algorithm overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46viiList of FiguresFigure 1.1 Given a raster input (a), human observers prefer the vectorizations in (e) createdby PolyFit over those in (b-d), created using the methods of Kopf [23], Potrace[35], and [18] by factors of 10:0, 10:0 and 7:3, respectively, for the examplein the first row, and 9:1, 10:0, 10:0 for the example in the second row. Thispreference persists despite the fact that rasterizing all three vector images in thebottom row would reproduce the input raster exactly. . . . . . . . . . . . . . . 2Figure 1.2 Vector representation: (a) raster input; (b) primitive set (lines in green, Be´ziercurves in blue, C0 corners in red). (c) output vector image. . . . . . . . . . . . 3Figure 1.3 Potrace [35] polygons (b) tend to smooth out details in the input (as highlightedin the overlays shown in the insets), which are not recoverable by the subsequentspline fit. Our method (c-d) preserves details at the polygon and final fit levels. 5Figure 1.4 Examples of straight and non-straight paths in Potrace. The vertices of the pathare shown as dots, and their 1/2-neighborhoods are shown as squares. (a), (b),and (d) are straight, whereas (c) and (e) are not. Courtesy of Potrace [35] . . . 6Figure 1.5 Subset of the vectorization pipeline used by Hoshyari et al. [18] . . . . . . . . 6Figure 2.1 PolyFit overview: given an input image (a), our first polygonal fitting stage(b-c) generates a graph of all possible edges that connect input vertices withina given accuracy threshold (b) and computes the shortest cycle on this graphwith respect to a perception-motivated cost function to obtain a polygonal fitwell aligned with viewer expectations (c). Our spline fitting step (d-e) usesa learned classifier to fit best-suited primitive combinations to each polygoncorner (singe-curve, curve-line, line-line) (d) and computes a best-fitting spline(e) using the obtained set of primitives. We obtain more regular results byregularizing both the polygon and its corresponding spline (f) to obtain the finalvector output (g). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Figure 2.2 Perceptual cues impacting mental vectorization: accuracy, regularity, simplic-ity, continuity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Figure 4.1 The three primitive configuration types used for a polygon section around acorner. Control points shown as disks. . . . . . . . . . . . . . . . . . . . . . . 17viiiFigure 4.2 (a,b) raster input and fitted polygon; (c) corner classification (single curve - or-ange, curve-line - grey, line-line red), (d) final fitted spline with curves colorizedbased on type, (e) vector output. . . . . . . . . . . . . . . . . . . . . . . . . . 17Figure 4.3 Representative fits used to assess the suitability of the single-curve (top row)and curve-line (bottom row) primitive configurations. From left to right: basicboundary constraints, fixing left endpoint, fixing right endpoint. . . . . . . . . 18Figure 4.4 Example of a model with per-corner classification primitives (Curve) fits andassociated binary label. Gray corners are excluded from the training set becausepaired by a regularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 4.5 Left to right: raster input, polygon (with accidental edges highlighted), fittedspline, simplified polygon, and final fit. . . . . . . . . . . . . . . . . . . . . . 22Figure 4.6 We first identify raster regularities (a) and use them to modify the unregularizedpolygon (b) producing a more regular polygon (e). We combine detected rasterregularities with regularities detected on the polygon (f) to regularize the outputspline (g). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 6.1 Multicolor input processing: (a) raster input, (b) result of processing each regionboundary independently, (c) our result with consistent junction resolution. . . . 27Figure 7.1 Comparison of our method to across inputs of various resolutions (please zoomin to see details). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Figure 7.2 The percentages of user preference in our study. Each row shows the results ofcomparing our results to those of the respective alternative method. . . . . . . 30Figure 7.3 Comparison against artist-produced vectorizations (left) and the distribution ofvotes from our user study (right). While viewers preferred our results on the toptwo inputs, they preferred the artist result on the raster axe (inset numbers). . . 31Figure 7.4 Vectorizations (right) of raster inputs (left) with fuzzy (anti-aliased) boundaries. 31Figure 7.5 Polygon ablation study: Raster input (left), polygons and vectorizations ob-tained without using accuracy, smoothness, or simplicity cues (middle), andour result using the full energy. . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 7.6 Fitting ablation study: Raster input (a), polygon (b), and vectorizations obtainedwithout using one of the primitive configurations (c-d) and our result using allconfigurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 7.7 Examples of pairs of our and alternative outputs where our results were consis-tently judged as inferior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 7.8 Additional multi-color results across multiple resolutions. . . . . . . . . . . . 34ixFigure A.1 Examples of participant drawn perception study results (b-f) and our outputs (g)on same inputs (a). Top: participants were allowed to use all pixel corners andedge midpoints as polygon corners. Bottom: participants were allowed to usethe same raster boundary corners/midpoints as our method. . . . . . . . . . . . 41Figure C.1 The filter question for the study. All other questions had a similar layout . . . . 45xAcknowledgmentsFirst I would like to express my sincere gratitude to my supervisor, Alla Sheffer, for the continuedguidance and encouragement through the past years. She has always been available to patiently helpme understand the various different ways in which a problem can be seen. I feel fortunate to havehad the possibility to work under her supervision.Next, a unique thanks to Nico for his collaboration and counsel, which was fundamental in get-ting the project to the finish line.Thanks to Jonahatan for his invaluable help during the SIGGRAPH submission deadline.I would like to thank Shayan for the relentless help at the various stages of the project.A heartfelt thank you goes to all my friends in the lab, in particular Chrystiano and Silver, whichmade for a humanly enjoyable work environment.I thank my parents Luigi and Marina for their unconditional love.xiChapter 1Introduction1.1 Motivation(a) Raster Input (b) [Selinger 2003] (c) [Hoshyari et al. 2018] (d) Intermediate Polygon (e) OursArtist-drawn clip-art images consist of distinctly colored regions delineated by piecewise smoothvisually pronounced boundaries, and are ubiquitously used in digital media. Clip-art images can becompactly and losslessly represented in vector form; yet, for a variety of reasons, large numbers ofclip-art images are still created and stored in raster format, e.g. the Adobe Stock Image databasecontains over nine million raster images labeled as clip-art or icons [18]. Vectorizing these im-ages would enable resolution-free reuse of artwork created for legacy displays and facilitate a rangeof operations such as resizing or editing, which are easier to perform on vector rather than rasterdata. Vectorizing this data in a manner consistent with viewer expectations poses unique chal-lenges, motivating the development of algorithms specifically designed for clip-art vectorization(e.g. [18, 23, 35]). Notably, while prior methods [23] successfully segment raster clip-art into re-gions consistent with viewer expectations, vectorizing the raster boundaries between these regionswhile meeting observer expectations remains challenging; existing algorithms still often fail to pro-duce vector boundaries consistent with human preferences (Figures 1.1b-c, 1.1b-d).We present a new approach for vectorization of region boundaries in raster clip-art imagesthat significantly outperforms these earlier approaches, producing results much better aligned withviewer expectations (Figures 1.1e, 1.1e).Given a raster segmentation obtained using off-the-shelf methods [23], we aim to convert the1(a) Raster Input (b) [Kopf and Lischinski 2011] (c) [Selinger 2003] (d) [Hoshyari et al. 2018] (e) Our ResultFigure 1.1: Given a raster input (a), human observers prefer the vectorizations in (e) createdby PolyFit over those in (b-d), created using the methods of Kopf [23], Potrace [35], and[18] by factors of 10:0, 10:0 and 7:3, respectively, for the example in the first row, and9:1, 10:0, 10:0 for the example in the second row. This preference persists despite thefact that rasterizing all three vector images in the bottom row would reproduce the inputraster exactly.raster boundaries between the obtained regions into vector form. This conversion requires comput-ing a set of geometric primitives that jointly capture the input boundary shape as perceived by humanobservers (Figure 1.2). Computing these desirable vectorizations requires determining the numberand type of primitives in the fitted sequence, the locations and types of the transitions between them(C0 corners vs. smooth G1 or G2 transitions), and their individual parameters. Perception studiesand computer graphics literature (Section 2) identify a number of core principles likely to impacthuman perception and affect the set of piecewise-smooth primitives human observers are likely tomentally fit to the given raster data. However, converting these principles into an actionable algo-rithm requires learning the interaction between them and quantifying their combined impact on themental fitting process. Learning this relationship directly from pairs of raw raster images and man-ually vectorized counterparts is impractical [18]: vectorizing a single region requires 30-45 minutesof a computer artist’s time, making construction of a large-scale training set problematic. Further-more, such pairs cannot be created by simply rasterizing vector boundaries; since rasterization islossy, the vectorization output humans expect given a raster image is often not the same as suchoriginating vector data (see Figure 1.1 (bottom) where all vector outputs once rasterized reproducethe input). Additionally, directly searching for the desired vectorization requires operating on ahuge mixed discrete (e. g., number and type of primitives) continuous (e. g., primitive parameters)solution space, which is computationally challenging.We overcome both challenges and successfully compute the desired vectorizations by solving asimplified variant of the vectorization problem first, and then using this intermediate output to guidethe rest of the computation (Section 2). Specifically, instead of directly solving for a piecewisesmooth approximation of the raster input, we first approximate this input using a coarse polygonal,2(a) Raster Input (c) Output(b) Primitive SetFigure 1.2: Vector representation: (a) raster input; (b) primitive set (lines in green, Be´ziercurves in blue, C0 corners in red). (c) output vector image.or piecewise linear, approximation designed to match human expectations (Fig. 1.1d, Section 3).This choice is motivated by the observation that the approximating polylines human observers prefer(Appendix A) are strongly correlated with their preferred piecewise smooth vectorizations in termsof primitive counts, as well as corner and other transition locations. We formulate polygonal fittingas a computation of a shortest cycle over a graph of possible polyline edges, and follow perceptionprinciples and observations gleaned from human-traced polygons (Appendix A) to define the costsof traversing different edges in this graph.We use these intermediate polygons as auxiliary input to compute piecewise smooth vector-izations of the raster inputs (Section 4). We use a learned classifier, trained on a compact set ofmanually annotated polygons, to identify the polygon corners that correspond to final output cor-ners and to identify the piecewise smooth set of primitives best suited for fitting the boundary nextto each polygon corner. We optimize the shape of the selected primitives to obtain a final glob-ally consistent spline. Perception studies point to a strong viewer preference for more regular inputinterpretations [22, 41], stemming from an expectation that observed data conforms to a regular,axis-aligned, and symmetric pattern. We thus enforce the fitted polygons and splines to conformto regular patterns detected on the raster data and regularize near-regular polygon and spline struc-tures (Section 5). The combined method efficiently computes desirable solutions on a wide rangeof tested inputs.We validate our approach by applying it to more than a hundred inputs and comparing theresults to algorithmic alternatives both visually and via an extensive user study (Section 7). Studyparticipants preferred our results by a margin of 3:1 compared to the closest-ranking prior methodacross all tested input resolutions, with the margin growing to 16:1 for images with region sizesbelow 16× 16. We also improve on the closest-ranking prior method [18] in terms of runtimeand generality: our method is an order of magnitude faster, taking about a second to vectorizea typical 64× 64 input compared to over a minute. While the prior method requires separatelytrained models for different input resolutions, we use a single model trained using a compact set of3annotated examples for all resolutions.To summarize, our core contribution is a new, resolution independent method for clip-art vec-torization that significantly outperforms prior art and is particularly well suited for low-resolutioninputs, where existing frameworks perform especially poorly (e.g. Figure 1.1,top). We achieve thisgoal by leveraging the connections between perception-aligned polygonal and piecewise smoothraster boundary approximations.1.2 Related WorkWe build on research across a number of domains outlined below.Approximating Sample Points. Vectorization shares some commonalities with the classical prob-lems of fitting curves or polygons to sample points. Traditional fitting methods [9, 14, 26] approx-imate unordered noisy point samples with smooth spline curves by balancing fit accuracy againstcurve fairness. Computational statistics and operations methods approximate dense input pointsusing coarse polylines, balancing accuracy against edge count [3, 5, 15]. The inputs and goals ofour boundary fitting method are distinctly different from those employed in both settings. Insteadof dense samples, we process raster boundaries with axis-aligned edges; instead of denoising, in-terpolating, or approximating this input within a given accuracy threshold, we seek to recover it’spiecewise smooth interpretation consistent with human perception.Stroke Beautification. Our goals are related to those of methods that seek to beautify, or regularize,artist strokes on free-hand sketches [4, 29, 30]. However, the raster polyline inputs we processstrongly deviate from the assumptions these methods make about the input stroke geometry. Inparticular, the approaches for tangent and curvature estimation these methods employ are ill-suitedfor raster data. As a result, applying these frameworks to raster boundaries leads to inadequateresults [18].Natural Image Vectorization. Techniques for natural image vectorization address two distinct chal-lenges: segmenting the input images into a set of compact, color-coherent raster regions, and fittingthe raster region boundaries using sequences of curve primitives. Most natural image vectorizationmethods focus on the image segmentation stage [11, 24, 33, 38, 42, 45]. Due to the inherent com-plexity of such images, these methods tend to produce large sets of segments with highly irregularboundaries. Consequently, in the absence of interactive human adjustments [20], the boundary fit-ting stage of these methods focuses on simplifying and smoothing these boundaries [1, 7, 13, 21].This lossy smoothing approach is well suited for natural images but fails to preserve the visuallydistinct, piecewise smooth segment boundaries of artist-designed clip-art [18, 47]. Super-resolutionmethods that target natural images [43] perform equally poorly on artist data [18]. Our focus is onvectorization of artist-generated content, which on the one hand is simpler to segment, but on theother hand requires a much more careful boundary analysis and fitting.4(a) Raster Input (b) Potrace Polygon (c) Our Polygon (d) Our VectorizationFigure 1.3: Potrace [35] polygons (b) tend to smooth out details in the input (as highlightedin the overlays shown in the insets), which are not recoverable by the subsequent splinefit. Our method (c-d) preserves details at the polygon and final fit levels.Vectorization of Artist-Generated Imagery. Several prior works focus specifically on vectoriza-tion of artist-generated imagery, including cartoons, binary imagery, pixel art, and clip-art. Zhanget al. [48] and Sykora et al. [39] vectorize cartoon images and animations. Both methods fit bound-aries between pairs of regions using G1 or G2 continuous curves and introduce discontinuities onlyat corners where multiple regions meet. Fatemi et al. [10] upsample binary images using similarcontinuity assumptions. These assumptions do not hold for clip-art images, which often containvisually pronounced C0 corners and curvature discontinuities along individual region boundaries(Figures 1.1, 1.2). Yang et al. [47] employ user-provided discontinuity and corner locations to fit apiecewise Be´zier spline to raster boundaries; providing such locations as input is a time-consumingtask. Our method successfully locates viewer-expected discontinuities algorithmically with no userinput.Dedicated methods target upscaling [8, 28, 37] and vectorization [23] of low-resolution pixel-art images. Kopf and Lischinski [23] focus on resolving topological ambiguities during imagesegmentation. Like earlier methods, they employ G2 curve fitting to vectorize individual boundariesbetween raster regions. This strategy is inadequate on inputs that contain visibly distinct C0 corners(e. g., Figure 1.1).Several commercial packages (e. g. [34, 35, 40, 44]) specifically address clip-art vectorization.As demonstrated by Hoshyari et al. [18], Potrace [35] and VectorMagic [40] often exhibit undesir-able artifacts on typical clip-art inputs; others, e. g. [44] are no longer maintained. ScanFont [34] isdesigned specifically for fonts and uses character recognition priors.Our approach of fitting an intermediate polygon is inspired by Potrace [35], who first computea polygonal approximation of the raster and then replace it with a smooth spline. The polygonalextraction is done by first obtaining a shortest cycle favoring simplicity over accuracy and then byoptimizing the polygon vertices to minimize the rasterization error. The set of possible polygonedges corresponds to all the line segments connecting pixel corners which are deemed straight.A line segment is straight if all the pixel centers contained in the boundary interval bounded by5Figure 1.4: Examples of straight and non-straight paths in Potrace. The vertices of the pathare shown as dots, and their 1/2-neighborhoods are shown as squares. (a), (b), and (d)are straight, whereas (c) and (e) are not. Courtesy of Potrace [35]the endpoints have a distance less or equal than 0.5 from the pixel centers (Figure 1.4). Addition-ally, to account for the path turning onto itself they disallow segments if all 4 turns occurr in theboundary interval (Figure 1.4e). The criteria for straightness allows straight lines to approximateinflections which result in the loss of pixel-sized details (Figure 1.3b), not recoverable in the suc-cessive spline fitting. The accuracy metric used by potrace does not distinguish between inside andoutside distances, while our criteria for determining valid polygon edges better preserves inflectionsand details, which are fundamental to vectorize low-resolution inputs (Figure 1.3c).Hoshyari et al. [18] observe that continuity plays a major role in the mental vectorization processand that the location of C0 corners is key to obtain a vectorization matching human expectations.Unfortunately, as the precision of existing corner detection methods is not high enough for thetask ∼ 90%, they integrate the classifier within a complex perception-motivated pipeline with thepurpose of determining the final corner classifications through a trial and error process (Figure 1.5c).Each attempt removes a subset of C0 corners which paired by regularities and computes a newboundary vectorization which is compared against the older one and discarded if the total energy(simplicity and accuracy) is higher. The impact of removing a C0 corner is not generally localwhich makes the processes very expensive, requiring the curve primitives to be refit globally. Ourmethod performs this operation only once, after determining the discontinuities with a much greaterFigure 1.5: Subset of the vectorization pipeline used by Hoshyari et al. [18]6accuracy ∼ 99.3%.The initial corner prediction uses a classifier trained on manually annotated raster boundarieswhere each vertex is associated with a feature vector encoding the turns in a variable-sized neigh-borhood of the vertex. The size of the neighborhood is dependent on the input resolution, requiringpredictions to interpolate probabilities between the nearest resolutions. Due to the size of the neigh-borhood, the users were able to view the shapes in their entirety, introducing bias due to recognition.This made it more difficult to increase the precision of the corner predictor by simply adding moredata.Our two-step vectorization process is able to more accurately pinpoint viewer-expected cornerlocations and achieves better fitting quality (Figs. 1.1, 1.1), at a fraction of the compute time (Sec-tion 7). Our validation study participants preferred PolyFit results over those of Hoshyari et al.75.5% of the time (Section 7).7Chapter 2Algorithm OverviewPolygonal Fitting(a)Spline Fitting Regularization Vector OutputInput Raster Image(b) Graph (c) Polygon (d) Primitive Fitting (e) Fitted Splines Regularized Polygon (f) Regularized Splines (g)Figure 2.1: PolyFit overview: given an input image (a), our first polygonal fitting stage (b-c)generates a graph of all possible edges that connect input vertices within a given accuracythreshold (b) and computes the shortest cycle on this graph with respect to a perception-motivated cost function to obtain a polygonal fit well aligned with viewer expectations(c). Our spline fitting step (d-e) uses a learned classifier to fit best-suited primitive com-binations to each polygon corner (singe-curve, curve-line, line-line) (d) and computes abest-fitting spline (e) using the obtained set of primitives. We obtain more regular resultsby regularizing both the polygon and its corresponding spline (f) to obtain the final vectoroutput (g).The input to our method is raster clip-art images, which are expected to consist of a number ofdistinctly colored regions with clearly identifiable inter-region boundaries. To vectorize this inputwe first segment it into single-color regions using the framework of [23]. Our core focus is tosubsequently convert the raster, or piecewise axis-aligned, boundaries between the resulting regionsinto piecewise smooth curves consistent with viewer perception.Output Properties. Before describing our algorithm, we discuss the main criteria that prior per-ception and computer graphics research [18, 22, 41], identifies as likely to impact the human mentalvectorization process: accuracy, simplicity, regularity, and continuity (Figure 2.2). We subsequentlyemploy these criteria in our vectorization framework.Accuracy predicts that the vectorizations humans envision are geometrically close to the inputraster boundaries. Promoting accuracy encourages vector outputs that are maximally close to theraw raster input, and in particular ones that, when rasterized, reproduce (or nearly reproduce) thisinput [18]. The simplicity principle of Gestalt psychology indicates that human observers prefer sim-pler, more compact explanations of observed phenomena [41]. In the context of vectorization, sim-8Accuracy Regularity ContinuitySimplicityFigure 2.2: Perceptual cues impacting mental vectorization: accuracy, regularity, simplicity,continuity.plicity argues for an output that consists of a minimal number of geometric primitives, for prioritiz-ing straight lines over curves, and for preferring curves with smaller curvature variation [4, 18, 29].Prior research emphasizes that human observers are highly sensitive to inflections, or curvature signchanges, that are present in the output and are not self-evident in the input [4, 29]. The simplicityprinciple also indicates that observers prefer to group data into regular patterns. Research indicatesthat viewers expect observed exact or approximate regularities such as parallelism, symmetries, oraxis-alignment present in the raster input to be preserved in the vector output. Finally, and critically,studies indicate that observers are likely to mentally convert raster region boundaries into piecewisecontinuous curves, with a small number of C0 corners or abrupt curvature changes [4, 41].Given a raster input, we expect observers to balance these, frequently conflicting, properties tomentally conjure a piecewise smooth output.Boundary Vectorization. In the general case, our method seeks to approximate an axis-alignedpolyline raster boundary network using a connected network of spline curves, each consisting of asequence of curve primitives. For simplicity, the formulation presented here and in Sections 3 and 4addresses vectorization of a single closed raster boundary. Section 6 describes the extension of thismethod to the case of multiple regions separated by a network of boundaries. The main differencebetween this setup and the single boundary one is the need to resolve fitting inconsistencies thatmay arise near network junctions formed by three or more boundary segments (Figure 6.1).Given a raster boundary defined as a sequence of vertices located at pixel corners, we seek tovectorize, or fit, it using a piecewise smooth spline curve, defined using a sequence of primitives.Each spline primitive is expected to approximate a segment of the input boundary. Neither thenumber of primitives, nor the segments they each correspond to, are known a priori, and both thelengths of the segments approximated by each primitive and the primitives’ geometry can varysignificantly (see Figure 1.2). Consequently, searching directly for the desired vectorization fora given raster boundary requires operating on a very large mixed discrete and continuous solution9space which includes the number of output spline primitives, their types, their geometric parameters,and the locations of their endpoints. More importantly, while prior research identifies the coreprinciples behind the vectorizations we desire, it does not quantify the specific balance requiredamong them.We sidestep the challenges of directly learning this balance from raster-vector pairs by using atwo-step vectorization process (Figure 2.1). Instead of computing a piecewise smooth vectorizationdirectly, we first compute a polygonal approximation of the raster input using the same perceptualcriteria. Since this setting is much simpler, we can empirically define a fitting cost function and acorresponding algorithm (Section 3) that jointly lead to polygonal approximations consistent withobserver expectations (Figure 2.1c).We conjecture that our fitted polylines provide a rough approximation of the piecewise smoothvectorizations we seek. In particular, we note that the viewer-perceived corners on the piecewisesmooth vectorization are typically a subset of the corners on these polygonal approximations, andthat the slopes of the polyline edges are strongly correlated with their adjacent spline tangents (Fig-ure 2.1c and g). We use these observations to define plausible spline primitives connecting consec-utive polygon edge midpoints, and to learn the perceptual mapping from such corners to their bestfitting primitives (C0,G1 or G2) based on the properties of these fitted primitives. We perform thisclassification using a random forest classifier learned from a compact set of annotated combinationsof consecutive polygon edges, their corresponding raster segments, and locally fitted primitives.We then optimize the geometry of the computed primitive sequence to best fit the raster input.The combined algorithm (Section 4) reliably produces the desired piecewise smooth vectorizations(Fig. 2.1g).10Chapter 3Intermediate Polygonal ApproximationWe formulate the extraction of an intermediate polygonal approximation as the computation of anoptimal energy-minimizing cycle in a directed weighted graph G = (V,E), where the vertices Vrepresent candidate corners and the edges E represent candidate polygon segments (see Fig. 2.1b).Computing an optimal cycle in a directed weighted graph is known to have cubic complexity in thesize of the processed graph [36]; to facilitate efficient computation, we keep the size of the graphswe operate on small by only including vertices and edges which are reasonably likely to be part ofthe final polygon (Sec. 3.1). We then compute an optimal cycle on this graph that balances accuracy,simplicity, and continuity (Sec. 3.2, Fig. 2.1c) via a graph construction that reduces the problem offinding a cycle that minimizes an energy function defined on pairs and triplets of edges to a classicalshortest-cycle problem.3.1 Graph ConstructionOur perceptual study, in which users were asked to draw approximating poly-gons for given raster inputs (Appendix A), indicates that viewers expect thevast majority of corners of the approximating polygon to be a subset of thecorners of the input raster boundary;we define a boundary vertex as a cornerif the raster edges emanating from it are orthogonal to one another. The onlynotable exception some study participants made was to place corners at thecenters of one- or two- pixel long raster segments surrounded by symmetricalcorners (see inset). We speculate that this choice is motivated by the desirefor more symmetric outputs. Following these observations, and to keep the graph compact we de-fine the set of graph vertices V to include the raster boundary corners as well as the midpoints oflength-one and -two raster segments.Our study indicates that viewers expect polygon edges to closely follow the raster shape, butallow for edges whose rasterization deviates from the boundary as long as the Manhattan (axis-aligned) distance from all pixel corners to the edge is below one and all missclassified pixels (high-lighed in the inset) are to one side of the edge; we view a pixel as missclassified if it is inside (to the11left of) the raster boundary but its center is outside (to the right) with respect the edge, or vice versa(using the counter-clockwise orientation convention).We consequently connect each pair of vertices (vm,vn) with an edge em,n ∈ Eif and only if the straight line lm,n = {vm,vn} between them satisfies thesecriteria. We discard all edges that violate the Manhattan distance criterionand require all misclassified pixels to lie to one side of the line, discardingedges that do not satisfy this property. We speculate that observers relaxthe strict raster reproduction constraint and allow edges that lead to pixelmissclassification, to accommodate inputs containing rasterizations of veryshallow lines which are characterized by long pixel runs in a single direction,followed by a single step in the orthogonal direction (as in the inset). In allsuch cases the misclassified pixels lie to one side of the line. We set the directions of the edges to becounter-clockwise with respect to the region they bound. We compact the graph by merging edgeswith a 180◦ angle between them as they are perceived as a single line. The existence of such edgesalso affects the correct energy computation as described in the next chapter, which could also affectthe correct placement of polygon vertices.3.2 Optimal Polygon ComputationIn assessing the optimality of a polygon P = ei j,e jk, . . . ,emi, we evaluate it with respect to thethree perceptual criteria identified in Sec. 2, accuracy, simplicity and continuity. When assessingsimplicity we evaluate both edge count and curvature variation. Since we operate on polygons,we use discrete metrics to measure both continuity and curvature variation, expressing those interms of polygon angles. We note that both metrics can only be meaningfully compared for anglesthat are sufficiently flat (in the case of continuity) or sufficiently similar (in the case of curvaturevariation), motivating us to use saturation limits for both (lang = 135◦ for continuity, and lcont = 60◦for curvature variation).We measure the accuracy of each edge with respect to the raster boundary using the previouslycomputed set Pi j of misclassified pixels. For each such pixel we measure the degree to which apixel violates the in/out partition criterion using the distance d(p) from the centers of the wronglypartitioned pixels p to the line. We define the overall per edge accuracy error as:facc(ei j) =0 P = /02 ·maxp∈Pi j d(p) otherwise. (3.1)Note that facc ∈ [0,1] for all edges, approaching 1 when the Manhattan distance is violated. Edgesthat conform perfectly to the raster have zero energy.We use the angles at the polygon corners Ai jk = ei je jk as a discrete proxy for continuity. Notethat Ai jk is the smaller angle out of interior and exterior one at the corner. We prefer flatter angles12and thus more continuous transitions by setting:fcont(ei j,e jk) = min(1,180◦−Ai jklang), (3.2)We express simplicity as a preference for minimal change in curvature and forsmaller polygon edge counts. In our discrete case, we measure curvature change asthe similarity between consecutive angles. We differentiate between cases where thecurvature sign is the same (measured angles are on the same side of the polyline) andinflection scenarios, where the curvature sign changes (measured angles are on opposite sides), (seeinset). We distinguish between these two scenarios since as noted in Sec. 2 viewers are particularlysensitive to inflections.fcv(ei j,e jk,ekl) =min1,|Ai jk−A jkl|lcontif same sidebinfl+(1−binfl)180◦−min(Ai jk,A jkl)180◦−linfl if inflection(3.3)To minimize the appearance of inflections (independent of magnitude) we add an absolute inflectionpenalty of binfl = 0.1 for each inflection edge and use the smaller of the two angles (i. e., the moreacute one) to assess the inflection’s magnitude. We empirically set the saturation limit on thismagnitude to linfl = 90◦. This formulation avoids inflections when possible and prioritizes moregradual side changes over more abrupt ones if necessary.Simplicity also argues for reducing the number of polygon edges. To this end, we assign asmall penalty ε = 1e−3 to all graph edges, and further penalize redundant pixel-long edges. Weconsider such edges as redundant when they contain previously inserted mid-points. As noted byprior research [18] and confirmed by our study (see Appendix), viewers prioritize approximationthat incorporate such midpoints as corners, and rarely utilize redundant short edges.fsimp(ei j) =0.5 if ei j is redundantε otherwise (3.4)The combined polygon cost is the sum of these terms over all edges, and pairs and triplets ofconsecutive edges, where we weight curvature variation less than all other terms (ωcv = 0.25, unitweight otherwise):C(P) = ∑i jfacc(ei j) + ∑i jkfcont(ei j,e jk) + ωcv∑i jklfcv(ei j,e jk,ekl) + ∑i jfsimp(ei j) (3.5)13Optimization. Classical shortest cycle computations account for weights of edges, but do not allowfor the evaluation of costs of pairs or triplets of edges in a cycle. We therefore reformulate theshortest cycle computation on G as the problem of finding the shortest cycle on a graph G˜ = (N˜, E˜)whose nodes n˜i jk correspond to triplets of vertices vi,v j,vk in the original graph G. We introduce anode n˜i jk into this graph if the original graph G contains the edges ei j = (vi,v j) and e jk = (v j,vk).We then connect each node pair n˜i jk and n˜ jkl with an edge e˜i jkl . Note that, by construction, theseedges correspond to quadruplets of vertices in the original graph that are connected by sequences ofthree edges. We define the cost of each such edge asC(e˜i jkl) = facc(ei j)+ fcont(ei j,e jk) (3.6)+ωcv fcv(ei j,e jk,ekl)+ fsimp(ei j)The shortest cycle on this graph defines a cycle on the original graph that minimizes the polygoncost C(P).Since the best known method for computing the shortest cycle in a graph has a time complexityof O(|N˜||E˜|+ |N˜|2log||N˜|), we efficiently compute a sufficiently close approximation by repeatedlyemploying the shortest path algorithm for a fixed start node. The initial path is obtained by findingthe shortest cycle passing through the vertex with minimum valence, in order to increase the likeli-hood of the node being part of the shortest path. We then attempt to find a better path from the nodethat is most distant (in terms of number of edges) from the previous start node. If the proceduredoes not improve the path, we choose a random new start node for the next attempt to exit possiblelocal minima. This procedure is repeated for a total of 10 times. While this empirical approachdoes not guarantee that the shortest cycle is found, it effectively reduces the time complexity toO(|E˜|+ |N˜|log||N˜|). We compare the results obtained by this approach to the real shortest cycle forone of our binary dataset (144 inputs) and confirm that it always finds the correct solution.Optimal Parameters The set of parameters used for the polygon extraction has been obtained em-pirically by observing the impact of their variation on a reference binary dataset consisting of 144models, which included the training set used by Hoshyary et al. [18]. We conducted a parametersensitivity study, observing that varying the parameters by a 10% has a noticeable impact on around20% of the inputs. Although we didn’t assess the following statement quantitatively, we observedthat the visual quality didn’t necessarily get worse with roughly the same number of models im-proving as the ones decreasing in quality. This, together with the observation that manual tracesthemselves differ between a few participants (Appendix A), supports the decision in deciding em-pirically on the parameter values. For the sake of completeness, if such a ground truth existed, thatis a collection of boundaries BT and target polygons PT , the optimal set of weights and parametersW could have been obtained by minimizing the following energy functional:argminW|T |−1∑i=0|C(BT i,S(BT i))−C(BT i,PT i)| (3.7)14Where S(BT i) is the shortest path computed on the graph with the variable parameters and C isthe sum of the cost for each polygon edge as described in Equation 3.6. The optimization could becarried on once by an exhaustive search in the parameter space in a reasonable time.Perfect lines Even with the procedures described in the Paragraph 3.0.1 it is still possible that thegraph reaches an unnecessarily large complexity for inputs of higher resolution. We observe thatthe highest density in the graph is present in boundary subpaths which correspond to rasterizationsof perfect lines. Once such a subpath has been identified, it is possible to include in the graph onlythe first and the last two vertices, excluding all the intermediate ones. This yields a constant numberof 4 possible candidate edges within that subpath rather than n! for a subpath of length n oriented at45◦. The procedure can be applied for perfectly rasterized lines at any slope.15Chapter 4Spline FittingThe second stage of our algorithm fits a closed, piecewise smooth spline S to the extracted polygonsuch that the resulting outline is perceptually consistent with the input raster boundary R. Comput-ing this fit requires solving for several sets of variables: the sequence of primitives (defined by theirtype) that jointly define the spline, the mapping between the primitives and the raster boundary seg-ments they fit to (specifically the correspondences between primitive endpoint and raster boundarylocations), and finally the shape parameters of these primitives (control point locations). Methodssuch as [18] iteratively update the different sets of variables through trial and error until a satisfac-tory solution is found, a highly time-consuming process. We uncouple the computation of the threesets of variables and enable a fitting method that is both effective and fast by utilizing the geometricinformation provided by the intermediate polygon.We recall that both the polygon and the spline are expected to accurately ap-proximate the input raster and that the tangents of the intermediate polygon edgesare expected to strongly correlate with the tangents of the final spline. In partic-ular, we expect the spline to pass close to polygon-edge midpoints and for splinetangents in the vicinity of these midpoints to be similar to the edge tangents. Thisobservation suggests using edge midpoints as natural G2 transition points betweenprimitives and constraining primitive tangents at these midpoints to align with polygon edge tan-gents. In the following, we refer to each polygon section between consecutive midpoints as a poly-gon corner (see inset green) and denote their corresponding raster segments and spline sectionsas corner segments (purple) and corner sections (orange), respectively. We further note that bothpolygonal and spline fits are expected to provide a similar balance between continuity and simplic-ity. Thus, we expect corner spline sections to be at least as continuous as their matching polygoncorners (namely to have at most one C0 discontinuity) and expect them to respect the upper boundon the number of primitives imposed by the polygon corner (i. e., two).Following these observations, we identify three corner-spanning primitive configurations thatreflect all different balance choices between simplicity and continuity and that satisfy the criteriaabove (Sec. 4.1, Fig. 4.1). We obtain the sequence of primitives that best approximates the inputraster boundary by searching for the primitive configurations at all polygon corners that are best16(a) Single-Curve (b) Curve-Line (c) Line-LineFigure 4.1: The three primitive configuration types used for a polygon section around a corner.Control points shown as disks.(a) Raster Input (d) Fitted Spline (e) Vector Output(c) Corner Classification(b) PolygonFigure 4.2: (a,b) raster input and fitted polygon; (c) corner classification (single curve - orange,curve-line - grey, line-line red), (d) final fitted spline with curves colorized based on type,(e) vector output.aligned with human expectations. We note that assessing all possible primitive configurations overthe entire spline at once is intractable as the runtime would grow exponentially with the numberof polygon corners. Instead, we successfully compute the desired primitive sequence by assign-ing a primitive configuration to each spline corner section independently using a learned classifier(Sec. 4.2, Figs. 2.1d, 4.2b). To perform the classification we fit the three different primitive configu-rations to each polygon corner and measure the compatibility of each configuration to this polygoncorner and its underlying raster segment across a number of criteria. These measurements are thenused as input for the classifier that determines the best configuration. The local spline fits we use toobtain the measurements balance tangent and positional accuracy terms subject to different bound-ary conditions as discussed in Sec. 4.3. Given the finalized sequence of primitives, we compute aglobally consistent spline that balances accuracy, tangent alignment, and fairness (minimal curva-ture variation) by optimizing the control points of all curve primitives (Sec. 4.3, Fig. 4.2d).4.1 Primitive Configuration CandidatesWe expect the primitive combinations fit to each corner to be at least as continuous and as simpleas the polygon corners themselves. Thus we enforce the spline corner sections to have at mostone discontinuity and to consist of at most two primitives. We require the spline shape to mimicthat of the corresponding polygon corner: spline tangents at section endpoints should be similarto the corresponding edge tangents and the spline tangent near the polygon corner should be closeto the linear average of these two tangents.These expectations lead us to use the following localfitting primitive combinations for each spline corner section (Figure 1.2): (1) a single cubic Be´ziercurve whose tangents approximately linearly interpolate the end-point and corner tangents (the mostcontinuous and simple but potentially least accurate solution); (2) a pair of straight lines aligned17Unrestricted Fixing right endpoint Fixing left endpointUnrestrictedInput Raster ImageVector OutputFixing right endpoint Fixing left endpointsingle-curve single-curve single-curvecurve-line curve-line curve-lineFigure 4.3: Representative fits used to assess the suitability of the single-curve (top row) andcurve-line (bottom row) primitive configurations. From left to right: basic boundaryconstraints, fixing left endpoint, fixing right endpoint.with the endpoint tangents (simple, sufficiently accurate, but discontinuous solution); and (3) acurve-line combination, where one Be´zier curve end point is placed next to the midpoint closest tothe corner and the other is placed symmetrically along the other edge, the line’s tangent matches themidpoint tangent and the Be´zier curve smoothly interpolates between the midpoint tangents. Thelast alternative has the same primitive count as the polygon corner, is G1 continuous, and is likelyto be somewhere between the other two primitive combinations in terms of accuracy.4.2 Primitive Configuration ClassificationOur overall goal is to produce a spline that best balances smoothness, simplicity, and accuracy withrespect to the input raster. To find such a spline, during classification we consider the configurationtypes using an ordering that prioritizes continuity over simplicity. Whenever possible, we use thesingle-curve configuration and downgrade to line-curve, and subsequently to line-line only whenthe first attempted fits are deemed inadequate.When assessing adequacy locally, we lack the global context (position and tangent boundaryconditions at section endpoints ) that comes into play when fitting an entire spline. To account for theglobal nature of the fitting problem when determining the adequacy of each primitive configuration,instead of only assessing one possible fit, we examine three representative fits – each with differentboundary conditions; we then base our decision on the measurements taken across all three fits.In the first representative fit, we allow both section endpoints to move orthogonally to theircorresponding polygon edges and fix the end tangent directions to those of the polygon. In theother two fits, in addition to these constraints, we fix either the left or right endpoint position tothe respective polygon edge midpoint. Jointly, these three options account for plausible additionalconstraints that are imposed by neighboring corner sections during the global fit. Figure 4.3 showsthese representative fits for an example model.18Unrestricted Fixing right endpoint Fixing left endpoint LabelsFigure 4.4: Example of a model with per-corner classification primitives (Curve) fits and as-sociated binary label. Gray corners are excluded from the training set because paired bya regularity.We use these representative fits to obtain a list of features, which we then utilize to categorizethe configurations for each corner section into adequate and inadequate using a random forest clas-sifier [6]. We use the Random Forest implementation provided by Andres et al. [2]. The classifierreceives features from all representative fits at the same time and produces a single binary label. Inselecting the features, we seek to account for cues human observers are likely to use during mentalvectorization.(1) For each smooth fit we include its accuracy (computed as described next) as well as the dif-ference between the accuracy of the smooth and underlying polygonal fits (6 values). We speculatethat both the absolute and relative accuracy play a role – viewers would be more likely to discard afit if it is objectively inaccurate, or if it is far less accurate than the less continuous alternative.(2) We also include the maximal curvature along each fitted spline (3 values). We expect viewers toprefer less continuous but lower curvature solutions when the curvature of the continuous solutionsgrows.(3) Lastly, we provide the polygon corner angle itself (1 value), as we expect it to strongly correlatewith viewer’s mental balance of continuity and simplicity.This choice gives a total of ten features, from which the classifier computes its categorization.Notably, we do not provide the classifier with internal details of the fit such as the specific primitiveconfiguration or raster resolution. This makes the categorization independent of the input resolution,or specific fitting strategy, and thus more general and robust.In measuring accuracy, we perform a more fine-grained analysis than in the polygon fitting stage.We measure both the degree to which the vector output violates the in/out pixel-center classificationimposed by the raster boundary and the actual distance between the raster and the vectorization orpolygon fits. This fine-grained measurement is motivated by the WYSIWIG principle, which indi-cates that humans prefer interpretations closely aligned with the data observed [41]. More specifi-cally, for each pixel edge on the corresponding raster boundary,19we use the axis-aligned distance of its midpoint to the vectorization or poly-gon edge as the pixel’s accuracy measure if the pixel center is correctly clas-sified with respect to the line or spline (see inset top pixel). If the pixel hasthe wrong in/out center classification, we consider the Euclidean distance be-tween the pixel center and the vectorization or polygon edge (Eq. 3.1) (seeinset bottom pixel). More formally, given the axis-aligned distance from thepixel midpoint d⊥ and the Euclidean distance from the pixel center d↗:d =d⊥ if center classification is correct0.5+d↗ otherwise. (4.1)Note that this accuracy metric allows a simple test to determine if the pixel center classification iscorrect (by comparing with 0.5). Given this per-pixel accuracy metric, we define the vectorization’sor polygon edge’s accuracy as the maximum over all corresponding raster pixels.We use the classifier to determine if each of the curve and curve-line configuration is adequate(we view the line-line as the fallback, always adequate option). If the curve configuration is deemedadequate we use it, otherwise we fall-back to the curve-line if it is deemed adequate, and fallbackto line-line otherwise. In cases where the classifier deems the curve configuration as adequate butwith a low confidence (probability < 75%), we select the curve-line configuration if it is deemedadequate with a high confidence (probability ≥ 75%). We found that using confidence in additionto the binary decision leads to more robust classifications.After finding the primitive types for each section, we perform a global spline fit (Section 4.3).We perform a final adequacy check on this global fit by feeding the actual fit measurements to ourclassifier (as three equal representative fits). If the classifier deems any of the corner sections inad-equate, which can happen occasionally due to the more accurate boundary conditions, we replacethe corresponding section configuration with the next one in our preference order.Training and Validation We trained a forest with 100 trees on manually annotated polygon cornersacross 23 input images at different resolutions, exhibiting a wide variety of geometric configura-tions. Figure 4.4 shows an example of an annotated model used for training. Thanks to the genericdesign of our feature space, this small training dataset is sufficient to train a robust classifier, in con-trast to Hoshyari et al. [18], who require more than 100 annotated input images and train separateclassifiers for different resolutions.Our training does not limit the depth of trees in the random forest; however, in practice itproduces naturally compact trees with an average of 40 leaves per tree, which is further evidence thatthe description of corner fit properties using the proposed features generalizes well. This observationis further supported by the validation test we performed on the classifier. From all the trainingsamples we annotated, we use 50% for training and 50% for testing. Our trained classifier achievesan accuracy of 99.3% on the test data set.204.3 Spline Shape OptimizationOur framework performs two types of spline fitting: individual per-corner section fits using differentprimitive combination are computed during the corner classification stage, and a global fit of aknown (closed) sequence of primitives is performed at the end to produce the final result. We firstexplain how we model the individual classification fits and then describe the modifications for thefinal fit.When computing the per-corner fits, we seek spline degrees of freedom (i. e., control points)that minimize the following objective function:argminSFs(S) = ωsacc · fsacc(S)+ωtang · ftang(S)s.t. G1 continuity between consecutive primitives,(4.2)where fsacc promotes proximity to the input raster and ftang promotes alignment with polygontangents. We solve this problem using the Levenberg-Marquardt algorithm[32] while enforcingconstraints by appropriate elimination of variables.Initialization The iterative Levenberg-Marquardt algorithm requires an initial guess. Furthermore,as we outline below, we define the constituent energies based on a set of reference primitive curves.For both purposes, we require a set of initial curves that is sufficiently close to the optimal solution.We find these initial curves using an analytic construction described in Appendix B.Accuracy Energy To assess a spline’s accuracy, we sample the raster boundary corresponding tothe spline at pixel midpoints M, and measure the distances from these samples to the spline. For anefficient optimization, we project all midpoints onto the initial curves, resulting in pairs of pointsand corresponding curve parameter values (p, t). We then linearize the objective, settingfsacc(S) =1|M| ∑(p,t)∈M(S(t)− p)2 (4.3)This energy promotes closeness to the raster boundary. Since the raster is only a very coarse ap-proximation of the vectorized shape, we assign a relatively small weight of ωsacc = 0.01.Tangent Energy We employ a similar sampling-based approach to promote tangent alignment.We define a tangent objective where we prescribe tangents at polygon edge midpoints, corners,and between them. To this end, we first calculate an equidistant sampling T of the initial curvesand assign target tangents that we derive from linearly interpolating the nearest midpoint and cornerangles using arc length. We gather these tangents T consisting of tangent directions d and parametervalues t and setftang(S) =1|T | ∑(d,t)∈T(S˙(t)∥∥S˙(t)∥∥ −d)2(4.4)21(a) Raster Input (c) Fitted Spline (d) Recomputed Polygon(e) Final Fitted Spline(f) VectorOutput(b) OriginalPolygonFigure 4.5: Left to right: raster input, polygon (with accidental edges highlighted), fittedspline, simplified polygon, and final fit.Since these tangents are much more indicative of the expected vectorization than the raster samples,we assign tangent alignment a significantly higher weight of ωtang = 1. To generate the resultsshown in the thesis a sampling density of 8 tangents per pixel has been used, but we have observedsatisfactory results for values as low as 2. Reducing the sampling density of the tangents, andpotentially midpoints, is fundamental in order to vectorize higher resolution rasters.Final Fit The goal of the final fit is to produce a fitted spline consisting of a known set of primitivesthat best balances accuracy, tangent alignment, and simplicity (minimal curvature variation). Toaccount for simplicity across all primitives we augment the optimized energy Fs (Eq. 4.2) with anadditional shape energy term fshape which minimizes curvature variation across all curve primitivesand enforces soft G2 constraints between them:fshape(S) = ffair(S)+ fG2(S) (4.5)We measure curvature variation using the discretization presented by Lu et al. [27] (E˜CV E , Equation(8)) and sum over all Be`zier curves ffair(S) = ∑b∈S E˜CV E(b). To target G2 continuity, we find allspline parameter values TG2 at which two Be`zier curves meet and penalize curvature differences:fG2(S) = ∑t∈TG2(κS(t−)−κS(t+)|κS(t−)|+ |κS(t+)|+ ε)2, (4.6)where κS(t−) and κS(t+) are the left- and right-sided curvatures at t, respectively, and ε = 10−3is a small number to avoid division by zero. Note that the summands of fG2 are approximately1 whenever the two curvatures have opposite signs. This reflects the desire to penalize curvaturedifferences for curves with the same sign as reducing those improves curve simplicity. In inflectionscenarios, however, the change in sign by itself is already disruptive and the degree of disruptivenessdoes not significantly change as the curvature magnitudes change. For the shape energy, we set aweight ωshape = 0.1 that is between ωacc and ωtang.Corner Feedback Loop. Our polygon computation aims to maximize continuity and minimizecurvature change across all polygon corners. Thus, once our classifier determines the C0 corners we22RasterInput(a) (e) (f)(d)(c) RasterRegularitiesPolygonRegularitiesSymmetryParallel OrthogonalContinuationUnregularizedSpline(g) Final Regularized OutputRegularizedPolygon(b)UnregularizedPolygonFigure 4.6: We first identify raster regularities (a) and use them to modify the unregularizedpolygon (b) producing a more regular polygon (e). We combine detected raster regulari-ties with regularities detected on the polygon (f) to regularize the output spline (g).could theoretically rerun the polygon computation to obtain a simpler solution by not optimizingeither criterion at such discontinuous corners. However, such iterative processing is both time-consuming and, in general, unnecessary as it rarely impacts the cycle choices. In our experimentsthe only instances where such iterated processing meaningfully impacted the results was when theC0 corner was adjacent to short (two pixel or less) edges. Even in these cases the impact on thecycle was limited to the immediate vicinity of the corners. Consequently, to avoid expensive andredundant recomputation once corners are detected, we only adjust the cycle locally and only in thepresence of accidental edges (ones of length two or less). In this local path computation, we fixall polygon edges except a small neighborhood of three edges around the accidental edge. We thenrun a shortest path extraction on the graph G between these bounding edges with a modified cost(Eq. 3.5) that does not include continuity and curvature variation costs across the C0 corner andrecompute the spline fit (Figure 4.5).High-resolution vectors As mentioned in the description of the objectives, for accuracy as well astangens we use a sampling based approach which makes the size of linear systems to be solved ineach Levenberg-Marquardt iteration scale linearly with the input resolution. Even using the under-lying sparse matrix representation available in Eigen 3.3 [16] the systems can become prohibitivelylarge, for example resulting in a matrices with tens of thousand of rows for inputs of ∼ 500px.Even if the advantages of our work are predominantly for lower-resolution inputs, our graph con-struction as well as corner classification can be more effective for higher-resolution inputs than thoseof Potrace [35].A more production-oriented implementation of a scalable vectorization algorithm could benefit fromhaving the density of the samples scale with the number of polygon vertices by having only 1 tan-gent sample at polygon edge midpoints and corners. Alternatively, the smoother sections of the finalspline could be substituted with G1 analytic constructions [35] or mostly G2 [46].23Chapter 5RegularizationPerceptual and graphics research suggest that human observers recognize regularities in input data,and expect those to be preserved during fitting or vectorization [18, 25, 31]. We therefore aim topreserve input regularities in our final output. Similar to prior work [18, 25, 31], we focus onsymmetries, parallelism, continuations, and axis-alignment. To differentiate between accidentaland non-accidental raster and polygon level regularities, we only consider axis-aligned edges in theregularity computation if they are at least two pixels long. When regularity conflicts with accuracyor simplicity we prioritize regularity over the other cues, unless stated otherwise. Since regularityenforcement needs to be exact, regularities cannot be addressed by adding soft terms to our polygonor spline fitting energies. Instead, we enforce regularities on the polygon level using a combinationof graph pruning and polygon modification. Though not impossible to enforce symmmetries at thepolygon extraction stage, it would increase the complexity of the polygon extraction to be exponen-tial. On the spline level, we enforce regularities in two ways: we modify the primitive configurationclassifications to reflect detected regularities, and add hard constraints to the final fit optimizationwhen necessary. To keep our classifier training robust and general, we remove all samples from thetraining and test data set that are affected by regularities that impact classification.Orthogonal and Axis-Aligned Edges We orthogonalize polygon corners where one edge is non-accidental and already axis-aligned, if this operation does not decrease accuracy, or simplicity (mea-sured via inflection count). During fitting, we make axis aligned edges whose lengths are at least50% of the extent of the respective bounding box side more prominent by disallowing the use of(single) curve primitive configurations for the two incident polygon corners (thus forcing the methodto use the curve-line or line-line configuration along these edges).Parallel Edges We detect and enforce prominent axis-aligned parallel edges present in the rasterinput (we consider the edges as prominent if the distance between them is no longer than theiroverlap length), if the parallelism constraint is unenforceable at the fitting level otherwise (i.e. theirslopes are not a priri in the same quadrant).24In the fitting stage, we classify edges as parallel using the distance criterion above, and anangle threshold of 20◦ [17] and constrain the tangents of line segments corresponding to thesepolygon edges to the average polygon edge direction. Furthermore, to account for distinctly visibletranslational symmetries, if a pair of parallel edges is connected by exactly one other polygon edge,we make the primitive configurations of this edge’s corners equal (choosing the lower continuityone if they differ).Symmetries Raster data can only capture exact symmetries whose reflection planes have an ori-entation of integer multiples of 45◦. We detect and attempt to enforce all such symmetries at thepolygon level. If two raster symmetries conflict, we prioritize symmetries present a priori in theunregularized polygon, and prioritize longer symmetric boundary paths over shorter ones. We pri-oritize simplicity over symmetry, since raster symmetries are inherently noisy. In the fitting stage,we equalize section classifications for polygon corners that are associated to each other via a rastersymmetry by downgrading the one with higher priority. Further handling in the final fitting stage isnot necessary as symmetric fits emerge naturally from the fitting formulation.Continuations Continuations [41] are parts of a boundary that are interrupted by aprotrusion or other part that does not logically belong to the base shape (see inset). Inall such cases, viewers expect the interrupted curve to be continuous at the expense ofmaking the vector outline discontinuous at protrusion end-points. We detect and enforce continua-tions both at the raster level (axis-aligned) and at the polygon level (arbitrary orientation).We preserve axis-aligned continuations on the raster level, when the edges incidentto the participating corners are non-accidental by forcing all cycles to pass through thecontinuation vertices.We detect continuations of arbitrary orientation using the empty-circle and same-line test from[18]. Instead of only considering pairs of closest concave corners, we follow perception litera-ture [12] and allow continuations between all pairs of concave corners if the continuation does notexceed a total angle of 60◦.We describe a test which we found to be equivalent in practice to the inscribed circle test butdoesn’t require testing intersections between a variable number of edges.We consider all halfedge pairs where the two cornersincident to the continuation are concave and their con-necting line segment stays inside the shape as candidates.We then filter these candidates according to the followingempirically determined criteria. All relevant measurescan be found in the inset figure. For brevity, we use thenotation x′ = pi− x to denote the complement of x. Notethat the angles α,β ,γ exist on both sides of the continuation with potentially different magnitudes.We only denote one of them in the below formulas.25• The distance of the projections (orthogonal to p1 p2) of the midpoint m = 12 (p1+ p2) ontothe lines defined by the participating polygon edges, is d ≤ 1 px · s with a data-dependentscaling factor s = b32 px , where b is the extent of the raster’s bounding box along the longestdimension. This distance measures the degree of alignment of the polygon edges.• Both polygon vertices must not be flat, i. e., α ′ ≥ 30◦, at least one of them must be ≥ 45◦.• Both continuation angles must be relatively flat, i. e., β ≤ 45◦.• The continuation must be distinctly different than the polygon on both sides, i. e., γ ≥ 45◦.Furthermore, the continuation must be straighter than the next polygon edge, i. e., α ′−β ≥−5◦. Note that α ′+β = γ only if the continuation forms a convex angle.• The angle between the participating polygon edges must be flat, i. e., ζ ≥ 110◦ to avoid turnsof the continuation.All halfedge pairs that fulfill these criteria form a valid continuation candidate.Since only one continuation is possible for each halfedge, we greedily choose at most one,prioritizing more apparent continuations. We assign an error φ to each continuation that measureshow much the continuation edges deviate from being perfectly aligned,φ = 10 ·φang+φdistφang = β 21 +β22 +0 continuation is no inflection2 · (β1+β2) otherwiseφdist = d2(5.1)and return the continuation with the smallest error.While locating continuations, we also consider those that would result from moving polygoncorners by one pixel in either direction along the raster boundary. We greedily choose the contin-uations that result in flatter continuation angles and relocate the polygon vertex if a continuationwith an offset was chosen, effectively aligning the vertex pair connected by the continuation to eachother. In the final fitting stage, since the protrusion is effectively separated from the base shape bythe continuation, we assign the line-line primitive configuration to the participating polygon corners.26Chapter 6Multicolor InputsFor simplicity, our description so far addressed the baseline scenario of a drawing consisting oftwo uniformly colored regions separated by a single closed boundary. Real-life images howevertypically contain multiple colored regions.When extending our framework to handle such data, we need to ensure fitting consistency alongshared boundaries, and in particular resolve junctions where more than two regions meet. To enforcethese constraints at the polygon level, we first constrain the polygons to go through the junctions. Weapply this constraint for each region independently by adding junction vertices to the graph vertexset V , even if these are flat, and removing all graph edges that bypass them. We then unify thepolygon subpaths between junctions shared by pairs of regions, by selecting the subpath candidatewith lower polygon cost (Eq. 3.5) and using it for both regions. This process results in a consistentpolygonal network that approximates the input raster. In the immediate neighborhood of a junctionit can happen that flat boundary vertices are seen as midpoints from one region and not the other. Inorder to have a consistent boundary network we use the same decision making as the most regularof the two boundaries (i.e. the one which has midpoints).We compute the spline primitive classifications for each region separately and then enforcethe following consistency rules. We observe that at both valence three and valence four junctions(the only two possible on raster data) we can have at most two continuous spline pairs. Thus if aregion classifies the junction as a smooth primitive configuration type (either single-curve or curve-(a) Raster Input (b) Separate Region Vectorizations (c) With Junction ResolutionFigure 6.1: Multicolor input processing: (a) raster input, (b) result of processing each regionboundary independently, (c) our result with consistent junction resolution.27line), at least one neighboring region needs to classify it as a line-line (C0 corner) configuration.We further note that adding flat vertices to the polygon edge graph, can generate false positives,making the polygon appear smoother than it is. We therefore allow two neighboring regions toboth classify the junction as a smooth configuration only if vectorizing the regions separately wouldgenerate a polygon that uses the junction as a vertex with a smooth configuration type. If this isnot the case, we view the smooth configuration as likely false positive and only allow at most oneof the two neighboring regions to be classified as smooth. In practice, we found that this criterioncan be approximated to a sufficient degree of accuracy by testing if the polygon edge incident toboth regions is axis-aligned. If any of the above criteria are not fulfilled, we adjust the sectionclassification of the most acute polygon section around the junction to the line-line configurationuntil all criteria are met. For all other valence-2 sections shared between two regions, we unify theirclassifications by picking the less continuous one if they differ (this can happen due to differentregularities being present in the two regions).Finally, we perform a global fit for all primitives across all regions. We constrainprimitives along the same polygon subpaths away from the junctions to have thesame geometry for both regions by variable elimination. To produce a water-tightoutput, we replace line primitives of a line-line configuration of a junction sectionwith the primitives of a neighboring smooth fit (see inset, black lines denote polygonedges, black dots denote edge midpoints). This junction resolution still adheres to thecontinuity choices made locally and guarantees gap- and overlap-free vectorizations.To determine the continuity at the junction we use the local classification for eachcolored region, which we observed to be a reliable indicator of the correct continuity,though not perfect (see inset). To find the correct location of the junction at the curvelevel, which could be anywhere within a 1 pixel square centered at polygon vertexwith valence greater than 2, we split the opposite Bezier at the intersection of the linewith the curve. This preserves axis-aligned regularities in the case of two neighboringdiscontinuous regions,28Chapter 7Results and ValidationWe have tested our method on 143 diverse inputs across a range of resolutions, 43 of these are shownin this thesis. We assessed its performance on inputs used by the closest prior method of [18], aswell as on 86 new inputs collected from numerous repositories (e. g. Figures 7.1, 7.8). Our testset contains 52 images consisting of two regions separated by a single closed boundary as well as91 multi-color examples. For images with large single color regions, simply enforcing accuracy isoften sufficient to generate adequate vectorizations [18]. Thus, the more sophisticated mechanismproposed in our work is most necessary for vectorizing low- to medium-resolution data, where[Hoshyari et al. 2018]Ours[Selinger 2003]Raster Input[Hoshyari et al. 2018] Ours[Selinger 2003]Raster Input[Hoshyari et al. 2018]Ours[Selinger 2003]Raster InputFigure 7.1: Comparison of our method to across inputs of various resolutions (please zoom into see details).290% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%[Hoshyari et al. 2018]Potrace [Selinger 2003][Kopf and Lischinski 2011]ours bothother neitherVote PercentageFigure 7.2: The percentages of user preference in our study. Each row shows the results ofcomparing our results to those of the respective alternative method.accuracy alone is not a sufficient criterion. Our test-set is consequently dominated by inputs withregion resolutions below 100×100, with many inputs at low resolutions of 32×32 and less. Visualinspection confirms that our results are consistently well aligned with viewer expectations.7.1 Comparison to Prior Work.We quantitatively asses our results by comparing them to algorithmic alternatives via a comparativeperceptual study. Study participants were shown input images, together with our result and analternative or ground truth result using the following layout. The input was shown at the top andmarked as ‘A’, and the two vectorizations were placed at the bottom and marked as ‘B’ and ‘C’.Participants were then asked to select the result that resembles the input the most: “Which of thetwo images on the bottom ‘B’ or ‘C’ better represents image ‘A’? If both are equally good thenselect ‘Both’, and if neither represent ‘A’ then select ‘Neither’ ”. The answer options were “B”,“C”, “Both”, and “Neither”. We collected answers for each query from ten different participantsusing the protocol described in Appendix A. The results of the study are detailed next.In the study, we compared our results to those produced by the two most highly ranked methodsin the recent study of Hoshyari et al. [18] across 81 inputs (Potrace [35], and [18]). For complete-ness, we also included 28 comparisons to the popular method of [23] on low-resolution color data,the target domain of their method, plus the simple examples in Figure 1.1. The test dataset consistedof 37 inputs with a single region boundary, 28 low-resolution multi-color inputs, and 16 multi-colorinputs of medium to high resolution. We used default parameters for all methods. Representativeexamples of all methods across different resolutions are shown in Figures 1.1, 1.1, and 7.1. Ourcomparisons focus on inputs with region resolutions of up to 100 since, as noted earlier, for higherresolutions, fitting accuracy dominates all other cues, with all methods producing fairly similar re-sults. At lower resolutions, however, especially 16× 16 and below, our method clearly dominatesthe alternatives and was preferred in 94% of the cases over the results of [18].30Input Raster Artist Ours 0%25%50%75%100%Preference Percentage over all Inputsoursbothartistneither109044202710Figure 7.3: Comparison against artist-produced vectorizations (left) and the distribution ofvotes from our user study (right). While viewers preferred our results on the top twoinputs, they preferred the artist result on the raster axe (inset numbers).7.2 Comparisons to Manual Vectorization.Manually vectorizing multi-color inputs is extremely time consuming (as a single region takes over30 minutes to fit accurately [18]). We consequently only test our method against 15 vectorizedbinary images provided by the authors of [18]. Figure 7.3 shows three side-by-side comparisons ofartist-created and our results. Overall, participants ranked our results as on par or better than thosecreated by the artist 55% of the time. When analyzing the inputs on which artist results were rankedas superior (e. g. Figure 7.3c) we conjecture that this preference is due to recognition - as the artistoutputs reflect the author’s knowledge of the depicted shapes.7.3 Non-Uniform Shading(a) Raster Inputs (b) VectorizationsFigure 7.4: Vectorizations (right) of raster inputs (left) with fuzzy (anti-aliased) boundaries.Our focus is on inputs with distinct, uniquely defined boundaries, thus our examples are domi-nated by inputs where each region has a uniform color. However, many clip art images have varyingshading near inter-region boundaries, e. g., due to anti-aliased rasterization (Figure 7.4a). In this31case identifying the optimal location for the boundary itself, becomes a challenge. To extend ourmethod to handle such data we modify the graph construction and accuracy terms used for polygonextraction and smooth fitting.We compute region boundaries to use as a starting point for our algorithm, as follows. Weobserve that the region boundary we use for binary data is an isoline at the average color of twoneighboring regions. This definition directly generalizes to non-uniformly shaded data. We lo-cate all intersections of the isoline with the pixel grid, similar to the first stage of the MarchingSquares algorithm. Since the rest of our algorithm expects axis-aligned boundaries and in orderto preserve sharp features, we connect these points with axis-aligned edges to form each regionboundary. Where the isoline crosses two sides of a pixel rectangle, there are two possible orthog-onal ways to connect the opposite points. Since our graph construction implementation expects asingle polyline, we used a local rule to reduce the number of accidental corners. To decide which ofthe two connections to take we consider the next two possible connections and choose the one thatdoesn’t introduce a corner. While not being the optimal choice globally, it is more consistent alongstraight lines than the random choice. To further improve the quality of the boundary extraction andsubsequent polygon, it would be ideal to collapse tiny steps in long pixel runs.Our accuracy term (see Section 3.1) is based on the in-out partitioning of pixelsimposed by a graph edge and intuitively measures the distance that a pixel has to bemoved to lie in the correct partition. In non-uniformly shaded inputs, this partitioningis not binary but continuous: the line imposes a linear color gradient between the regioncolors onto the pixel space (see inset). Using this imposed gradient, we can equivalently define thepixel accuracy as the distance to the closest location whose imposed line gradient color equals thepixel color. Since the gradient bandwidth b depends on the rasterization method, we use separateper-pixel estimates that we derive from the pixel color and its distance d to the isoline: b = d ·12∆c , where ∆c ∈ [0,0.5] is the pixel color difference to the isoline color, normalized by the regioncolor difference. Doing so, we achieve zero accuracy error for lines that coincide with the isoline.Equivalently to the uniform shading case, the accuracy error of a graph edge is the maximum pixelaccuracy error of all pixels around the isoline whose color is not one of the two region colors.For the accuracy term in spline fitting (4.3), equivalently to the uniform shading case, we samplemidpoints of region boundary edges and use those points in the respective accuracy formulations.7.4 Ablation Studies.Figures 7.5 and 7.6 show the impact of the different energy terms in our optimized functions.Figure 7.5 shows the impact of dropping one of our three perceptual cues (accuracy, smoothness,simplicity) on the intermediate polygons. Figure 7.6 shows the impact of reducing the numberof primitive configurations. As demonstrated, the outputs produced with all features in place aresignificantly better than the alternatives.32Input RasterResult withoutOur resultInput RasterResult withoutOur resultInput RasterResult withoutOur result,Figure 7.5: Polygon ablation study: Raster input (left), polygons and vectorizations obtainedwithout using accuracy, smoothness, or simplicity cues (middle), and our result using thefull energy.(a) Input Raster (b) Polygon (c) No Single-Curve (d) No Curve-Line (e) Our ResultFigure 7.6: Fitting ablation study: Raster input (a), polygon (b), and vectorizations obtainedwithout using one of the primitive configurations (c-d) and our result using all configu-rations7.5 Performance.Our method takes 0.5 and 1.2 seconds to vectorize a single region at resolutions of 32× 32 and64×64, respectively (medians across our binary dataset). The majority of the runtime is distributedapproximately equally between the initial spline fitting and the subsequent corner feedback loop.The polygon extraction stage takes in average between 10% and 20% of the total runtime andincreases as more regularities are present in the input, which require more regularization work onthe polygon. The times are measured on an 8th Generation Intel Core i7 CPU at 3.7 GHz. Thesetimes are dramatically faster than those of [18] who take in average between 30 and 50 times as33Input Raster [Hoshyari et al. 2018]Preferred 7/10 timesOursPreferred 2/10 timesInput Raster [Selinger 2003]Preferred 6/10 timesOursPreferred 3/10 timesFigure 7.7: Examples of pairs of our and alternative outputs where our results were consis-tently judged as inferior.long.Figure 7.8: Additional multi-color results across multiple resolutions.7.6 Limitations.Our evaluation identified 11 inputs on which our results were judged as inferior to those producedby other methods (see Figure 7.7). It is hard to pinpoint exactly the reason of failure for these inputs,some differences can be attributed to regularization strategies, polygon vertex placement or cornersection classification. One other factor which our work does not address directly is recognition –our framework is content-agnostic, while viewers often recognize the depicted shapes and use thisinformation as a vectorization prior. Using such priors can be an interesting future research topic.We designed the graph construction to account for all pixel-level details by carefully constrain-ing polygon edges to include them. If the input is noisy, our method will likely retain unwanteddetails in the final vector output.We are not able to enforce regularities between multiple boundaries in the current framework, lead-ing to suboptimal results in those inputs with clear black outlines (Figure 7.1, 7.8). It would beinteresting to extend the current implementation to handle this type of inputs, but orthogonal to the34vectorization problem we are trying to solve.To handle anti-aliased inputs more robustly, detection of approximate regularities could be exploredas well.35Chapter 8ConclusionWe presented PolyFit a new method for clip-art vectorization, and demonstrated it to produce resultsbetter aligned with user expectations, compared to the existing alternatives. Our method performsparticularly well on lower resolution data, where individual regions have dimensions of 32×32 orless. Contrary to prior work it does not require separate training for different resolutions and issignificantly faster than the closest alternative. Key to the success of our method is the two-stepprocessing, where we compute a polygonal fit using a perception-motivated energy function, andthen use the output of this fit as input to a learned classifier, that facilitates our final vectorization.Our work opens the door for several interesting followups.• Human interpretation of raster clip-art is likely affected by object recognition; combining ourmethod with recognition techniques can further improve vectorization outputs. Similarly, un-derstanding how the shapes are mentally decomposed by the observer into their hierarchicalsubparts is the question we faced when trying to design robust code for detecting continua-tions. While we found our local rules reliable in understanding whether two opposite half-edges are a plausible continuation, we required a global understanding to determine if anotherpart of the shape was preventing us from really seeing it as a continuation.• Our method uses simple criteria for detecting and enforcing regularities; viewer-perceivedregularity detection on raster data is an important stand-alone problem that merits furtherresearch.• Even with a non-sophisticated implementation, our method runs at usable speeds but there isa lot of room for improvement. The shortest cycle computation as described in Section 3.0.2could be made faster by finding the boundary subpath which contains the least number ofedges and find the shortest path for each edge, since the shortest cycle has to go through oneof them, choosing the one with the least cost guarantees that it is indeed the shortest cycle.With a more thorough understanding of how viewers perceive regularities, the regularizationstep could also be made more efficient than a linear trial and error process. Similarly, asmentioned in Section 4, in order to vectorize high-resolution inputs, our corner dection stage36could be combined with more efficient ways to construct G1 or ideally G2 splines for theglobal fitting stage.37Bibliography[1] Adobe. Adobe Illustrator 2017: Image Trace. http://www.adobe.com/, 2017. → page 4[2] B. Andres, S. Kirchhoff, and E. Levinkov. A Fast C++ Implementation of Random Forests,Mar. 2016. URL https://github.com/bjoern-andres/random-forest. → page 19[3] B. Aronov, T. Asano, N. Katoh, K. Mehlhorn, and T. Tokuyama. Polyline fitting of planarpoints under min-sum criteria. In International Symposium on Algorithms and Computation,2004. → page 4[4] I. Baran, J. Lehtinen, and J. Popovic´. Sketching clothoid splines using shortest paths. InCGF, volume 29,2, pages 655–664, 2010. → pages 4, 9[5] R. Bellman and R. Roth. Curve fitting by segmented straight lines. Journal of the AmericanStatistical Association, 64(327):1079–1084, 1969. → page 4[6] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. → page 19[7] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. IJCV, 22(1):61–79, 1997.→ page 4[8] Eagle. Eagle. http://everything2.com/index.pl?node id=1859453, 1997. → page 5[9] G. Farin. Curves and Surfaces for CAGD: A Practical Guide. Morgan Kaufmann PublishersInc., 2002. → page 4[10] M. Fatemi, A. Amini, L. Baboulaz, and M. Vetterli. Shapes from pixels. IEEE TIP, 25(3):1193–1206, 2016. → page 5[11] J.-D. Favreau, F. Lafarge, and A. Bousseau. Fidelity vs. simplicity: a global approach to linedrawing vectorization. ACM SIGGRAPH, 2016. → page 4[12] D. J. Field, A. Hayes, and R. F. Hess. Contour integration by the human visual system:Evidence for a local “association field”. Vision Research, 33(2):173 – 193, 1993. ISSN0042-6989. doi: https://doi.org/10.1016/0042-6989(93)90156-Q. URLhttp://www.sciencedirect.com/science/article/pii/004269899390156Q. → page 25[13] M. A. T. Figueiredo, J. Leita˜o, and A. K. Jain. Unsupervised contour representation andestimation using b-splines and a minimum description length criterion. IEEE TIP, 9(6):1075–187, 2000. → page 4[14] S. Fleishman, D. Cohen-Or, and C. T. Silva. Robust moving least-squares fitting with sharpfeatures. ACM TOG, 24(3):544–552, 2005. → page 438[15] N. Goldberg, Y. Kim, S. Leyffer, and T. Veselka. Adaptively refining dynamic program forlinear spline regression. Computational Optimization and Applications, 58(3):523–541,2014. → page 4[16] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010. → page 23[17] R. Hess and D. Field. Integration of contours: new insights. Trends in cognitive sciences, 3(12):480–486, 1999. → page 25[18] S. Hoshyari, E. Alberto Dominici, A. Sheffer, N. Carr, D. Ceylan, Z. Wang, and I.-C. Shen.Perception-driven semi-structured boundary vectorization. ACM Transaction on Graphics, 37(4), 2018. doi: 10.1145/3197517.3201312. → pagesviii, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 16, 20, 24, 25, 29, 30, 31, 33, 44[19] G. Jaklicˇ and E. Zˇagar. Curvature variation minimizing cubic hermite interpolants. AppliedMathematics and Computation, 218(7):3918–3924, 2011. → page 43[20] X. Jun, W. Holger, L. Wilmot, and S. Stephen. Interactive vectorization. In Proceedings ofSIGCHI 2017. ACM, 2017. → page 4[21] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. IJCV, 1(4):321–331, 1988. → page 4[22] K. Koffka. Principles of Gestalt Psychology. International library of psychology, philosophy,and scientific method. Routledge & K. Paul, 1955. → pages 3, 8[23] J. Kopf and D. Lischinski. Depixelizing pixel art. ACM TOG, 30(4):99:1–99:8, 2011. →pages viii, 1, 2, 5, 8, 30[24] G. Lecot and B. Levy. Ardeco: Automatic region detection and conversion. In EGSR, pages349–360, 2006. → page 4[25] Y. Li, X. Wu, Y. Chrysanthou, A. Sharf, D. Cohen-Or, and N. J. Mitra. Globfit: Consistentlyfitting primitives by discovering global relations. ACM TOG, 30(4):52:1–52:12, 2011. →page 24[26] Y. Liu and W. Wang. A revisit to least squares orthogonal distance fitting of parametriccurves and surfaces. In Proc. Advances in Geometric Modeling and Processing, pages384–397, 2008. → page 4[27] L. Lu, C. Jiang, and Q. Hu. Planar cubic g1 and quintic g2 hermite interpolations viacurvature variation minimization. Computers & Graphics, 70:92–98, 2018. → page 22[28] A. Mazzoleni. Scale2x. http://www.scale2x.it/, 2001. → page 5[29] J. McCrae and K. Singh. Sketching Piecewise Clothoid Curves. In Proc. Sketch-BasedInterfaces and Modeling, 2008. → pages 4, 9[30] J. McCrae and K. Singh. Neatening sketched strokes using piecewise french curves. In Proc.EG Symposium on Sketch-Based Interfaces and Modeling, pages 141–148, 2011. → page 4[31] R. Mehra, Q. Zhou, J. Long, A. Sheffer, A. Gooch, and N. J. Mitra. Abstraction of man-madeshapes. ACM TOG, 28(5):137:1–137:10, 2009. → page 2439[32] J. Nocedal and S. Wright. Numerical Optimization. Springer Series in Operations Researchand Financial Engineering. Springer New York, 2006. ISBN 9780387400655. URLhttps://books.google.ca/books?id=VbHYoSyelFcC. → page 21[33] A. Orzan, A. Bousseau, H. Winnemo¨ller, P. Barla, J. Thollot, and D. Salesin. Diffusioncurves: A vector representation for smooth-shaded images. ACM TOG, 27(3), 2008. → page4[34] ScanFont. Font Lab, http://old.fontlab.com/font-converter/scanfont//, 2017. → page 5[35] P. Selinger. Potrace: a polygon-based tracing algorithm. In http://potrace.sourceforge.net,2003. → pages viii, 1, 2, 5, 6, 23, 30[36] S. S. Skiena. The Algorithm Design Manual. Springer Publishing Company, Incorporated,2nd edition, 2008. ISBN 1848000693. → page 11[37] M. Stepin. Hqx. http://web.archive.org/web/20070717064839/www.hiend3d.com/hq4x.html,2003. → page 5[38] J. Sun, L. Liang, F. Wen, and H.-Y. Shum. Image vectorization using optimized gradientmeshes. In ACM SIGGRAPH, 2007. → page 4[39] D. Sy´kora, J. Buria´nek, and J. Zˇa´ra. Sketching cartoons by example. In Proc. Sketch-BasedInterfaces and Modeling, pages 27–34, 2005. → page 5[40] Vector Magic. Cedar Lake Ventures http://vectormagic.com/, 2017. → page 5[41] J. Wagemans, J. H. Elder, M. Kubovy, S. E. Palmer, M. A. Peterson, M. Singh, and R. von derHeydt. A century of gestalt psychology in visual perception i. perceptual grouping andfigure-ground organization. Psychological Bulletin, 138(6):1172–1217, 2012. → pages3, 8, 9, 19, 25[42] C. Wang, J. Zhu, Y. Guo, and W. Wang. Video vectorization via tetrahedral remeshing. IEEETIP, 26(4):1833–1844, April 2017. → page 4[43] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang. Deep networks for image super-resolutionwith sparse prior. In IEEE ICCV, pages 370–378, 2015. → page 4[44] M. Weber and B. Herzog. Autotrace. http://autotrace.sourceforge.net, 2004. → page 5[45] T. Xia, B. Liao, and Y. Yu. Patch-based image vectorization with automatic curvilinearfeature alignment. ACM TOG, 28(5), 2009. → page 4[46] Z. Yan, S. Schiller, G. Wilensky, N. Carr, and S. Schaefer. K-curves: Interpolation at localmaximum curvature. ACM Trans. Graph., 36(4), July 2017. ISSN 0730-0301. doi:10.1145/3072959.3073692. URL https://doi.org/10.1145/3072959.3073692. → page 23[47] M. Yang, H. Chao, C. Zhang, J. Guo, L. Yuan, and J. Sun. Effective clipart imagevectorization through direct optimization of bezigons. IEEE TVCG, 22(2):1063–1075, 2016.→ pages 4, 5[48] S.-H. Zhang, T. Chen, Y.-F. Zhang, S.-M. Hu, and R. R. Martin. Vectorizing cartoonanimations. IEEE TVCG, 15(4):618–629, 2009. → page 540Appendix APerception Studies of PolygonApproximationOur polygon approximation algorithm is guided by observations from two small-scale perceptionstudies.Our first study was designed to address the question of where humans place corners when ap-proximating raster inputs using polygons. Participants were shown an editing interface for polygondrawing and asked to draw polygons best approximating the raster. The interface allowed them toplace polygon corners at all pixel corners and pixel edge midpoints. The study included 5 partic-ipants and 7 representative, abstract, raster inputs. An example of the polygons participants drewis shown in Figure A.1,top. Across all inputs the participants chose raster corners (Section 3) aspolygon corners 64% of the time, and chose edge-midpoint immediately adjacent to those 19% ofthe time. They used midpoints of short symmetric two and one pixel edges 10% of the time. Only7% of the corners they used were placed in other locations. Analysis of the cases where partici-pants chose mid-points next to corners,shows that these were largely chosen in pairs as a proxy forsmoothing the corners, a task handled by our subsequent smooth fitting step. These observationsRaster Input User s1 User s2 User s5 Polygon Output VectorsRaster Input User b2 User b4 User b5 Polygon Output VectorsFigure A.1: Examples of participant drawn perception study results (b-f) and our outputs (g)on same inputs (a). Top: participants were allowed to use all pixel corners and edgemidpoints as polygon corners. Bottom: participants were allowed to use the same rasterboundary corners/midpoints as our method.41motivated us to use only raster corners and short edge mid-points as corners for the graph computa-tion (Section 3), as including more vertices would increase computation cost and bring few benefitsin terms of vectorization outcome.The goal of our second study was to analyze the type of edges humans choose and in particularto observe how they balance accuracy against other cues. The study used a similar interface tothe first one but only allowed participants to place corners at the locations used by our algorithm.The study included 5 participants and 14 representative, abstract, raster inputs. An example of thepolygons participants drew is shown in Figure A.1,bottom. 96% of the edges chosen by viewerswere at Manhattan distance of less than 1 from pixel corners. From all edges, 27% produced pixelcenter classifications different from those induced by the raster boundaries where all misclassifiedpixels were on the same side, only 1% of edges had misclassified pixels on both sides of the line.These criteria motivated our choice of edges to include in the graph and our accuracy term.As both studies show, while there some variance between participant results, overall the shapesthey draw are very similar.Visual comparison of study results to our outputs further reinforces ouralgorithmic choices, as our polygons are very similar to the manually created ones (Figure A.1,right).42Appendix BSpline Fitting InitializationWhen fitting single-curves to corners, we initialize the endpoints of the Be´zier curve to lie on themidpoints of the corresponding polygon edges. We place the interior control points along the linesegments and derive their distance from the endpoints using the curvature variation-minimizingscheme of Jaklicˇ et al. [19]. For curve-line primitive configurations, we compute a circular arc thatis tangential to the polygon edges and touches the shorter of the two edges at its midpoint. We theninitialize a Be´zier curve to approximate this arc and add a line segment along the longer edge. Weinitialize line-line primitive configurations by placing the line end points at the respective midpointsand corners.43Appendix CQuantitative AssessmentThe study was conducted using the Mechanical Turk interface. Participants were shown two exam-ple questions, the one shown in Figure C.1, and a second showing a raster square as input and asquare and rounded square as answer options, and the correct answers to those (circle and square).Each participant was then shown twenty randomly selected queries with each query shown twicewith ”B” and ”C” switched. To filter out inconsistent answer, we also asked participants the circleexample question, the answer to which they were explicitly shown (Figure C.1), and discarded allanswers from participants who answered this question incorrectly. For consistency with [18], wediscarded inconsistent answers where a user chose different answers to the same duplicated queryand all answers from participants who answered inconsistently over 60% of the queries.44Figure C.1: The filter question for the study. All other questions had a similar layout45Appendix DAlgorithm overviewHere we give an overview of the algorithm for a single boundary.Algorithm 1 VectorizeImage(I)→ SB← SegmentImageKopfLischinski(I)G← ConstructBoundaryGraphWithMidpoints(B)G← DetectRasterContinuations(G)P← FindShortestCycle(G)Raa, P, G← ApplyRegularities(FindAA(G, P), G, P)Rparallel , P, G← ApplyRegularities(FindParallel(G, P), G, P)R90◦ , P, G← ApplyRegularities(FindOrthogonal(G, P), G, P)Rsym, P, G← ApplyRegularities(FindSymmetries(G, P), G, P)Rcont , P, G← ApplyRegularities(FindContinuations(G, P), G, P)R← Raa,Rparallel,R90◦ ,Rsym,RcontS f ixstart , S f ixend , Sno f ix ←ConstructExemplaryCurves(B, P)CP← ClassifyPolygonSections(B, P, R, S f ixstart , S f ixend , Sno f ix)S← FitWithShapeOptimization(B, CP)46
You may notice some images loading slow across the Open Collections website. Thank you for your patience as we rebuild the cache to make images load faster.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- PolyFit : perception-aligned vectorization of raster...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
PolyFit : perception-aligned vectorization of raster clip-art via intermediate polygonal fitting Dominici, Edoardo Alberto 2020
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | PolyFit : perception-aligned vectorization of raster clip-art via intermediate polygonal fitting |
Creator |
Dominici, Edoardo Alberto |
Publisher | University of British Columbia |
Date Issued | 2020 |
Description | Raster clip-art images, which consist of distinctly colored regions separated by sharp boundaries typically allow for a clear mental vector interpretation. Vectorizing these images can facilitate compact lossless storage and enable numerous processing operations. Despite recent progress, existing vectorization methods that target this data frequently produce vectorizations that fail to meet viewer expectations. We present PolyFit, a new clip-art vectorization method that produces outputs well aligned with human preferences. Since segmentation of such inputs into regions had been addressed successfully, we focus on fitting piecewise smooth vector curves to the raster input region boundaries, a task prior methods are particularly prone to fail on. While perceptual studies suggest the criteria humans are likely to use during mental boundary vectorization, they provide no guidance as to the exact interaction between them; learning these interactions directly is problematic due to the large size of the solution space. To obtain the desired solution, we first approximate the raster region boundaries with coarse intermediate polygons leveraging a combination of perceptual cues with observations from studies of human preferences. We then use these intermediate polygons as auxiliary inputs for computing piecewise smooth vectorizations. We define a finite set of potential polygon to curve primitive maps, and learn the mapping from the polygons to their best fitting primitive configurations from human annotations, arriving at a compact set of local raster and polygon properties whose combinations reliably predict human-expected primitive choices. We obtain the final vectorization by fitting the computed primitive sequence to the raster data. Comparative user studies show that our method outperforms state-of-the-art approaches on a wide range of data; our results are preferred three times as often as those of the closest competitor across inputs with various resolutions. On low-resolution color data, this preference grows to a ratio of more than 15:1. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2020-04-06 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0389752 |
URI | http://hdl.handle.net/2429/73924 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2020-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2020_may_dominici_edoardo_alberto.pdf [ 7.73MB ]
- Metadata
- JSON: 24-1.0389752.json
- JSON-LD: 24-1.0389752-ld.json
- RDF/XML (Pretty): 24-1.0389752-rdf.xml
- RDF/JSON: 24-1.0389752-rdf.json
- Turtle: 24-1.0389752-turtle.txt
- N-Triples: 24-1.0389752-rdf-ntriples.txt
- Original Record: 24-1.0389752-source.json
- Full Text
- 24-1.0389752-fulltext.txt
- Citation
- 24-1.0389752.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0389752/manifest