Low Dimensional Search for Efficient Texture Synthesis by Fred Kimberley B . S c , The University of British Columbia, 2002 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T O F THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Department of Computer Science) We accept this tfiSsis as conforming to the required standard T H E U N I V E R S I T Y O F BRITISH C O L U M B I A October, 2004 © Fred Kimberley, 2004 FACULTY OF GRADUATE STUDIES THE UNIVERSITY OF BRITISH COLUMBIA Library Authorization In presenting this thesis in partial fulfillment o f the r e q u i r e m e n t s for a n a d v a n c e d d e g r e e at t h e University o f British C o l u m b i a , I a g r e e that the Library shall m a k e it freely available for r e f e r e n c e a n d study. I further a g r e e t h a t p e r m i s s i o n for e x t e n s i v e c o p y i n g o f this thesis for scholarly p u r p o s e s m a y b e g r a n t e d by t h e h e a d o f m y d e p a r t m e n t or by his or her r e p r e s e n t a t i v e s . It is u n d e r s t o o d that c o p y i n g or publication o f this thesis for financial g a i n shall not be a l l o w e d w i t h o u t m y written p e r m i s s i o n . N a m e of A u t h o r Title of T h e s i s : Degree: Department of (please print) Lpw Pir"tV>siflri<7 ' Date (dd/mm/yyyy) ^arc~/) fflf Year: hik. CoMOVifyf EfficiW 2 Te*tWf 0 ^Vt/AffK OH ^CltY\f€- -rt :A. . o_rn_:j.:_i_ *~>o^l ui..I T h e University f British yC m bi_:_ ia Vancouver, BC Canada g rad. u be. ca/form s/?form I D = T H S p a g e 1 of 1 last updated: 20-Jul-04 Abstract ii Abstract Current texture synthesis algorithms rely upon high dimensional approximate nearest neighbour (ANN) searches to determine the best pixel to use at the current position. The feature vectors used in the A N N search are typically between 100 and 300 dimensions. A large amount of research has examined how to reduce the number of feature vectors that need to be searched but very little has been done to speed up the actual comparisons. We present two new texture synthesis algorithms that use an order of magnitude fewer dimensions during the A N N search. In addition, we construct and make use of an error texture that further reduces the time spent comparing two feature vectors. Contents iii Contents Abstract ii Contents iii L i s t of Tables vi L i s t of Figures vii Acknowledgements 1 2 ix Introduction 1 1.1 Motivation 1 1.2 Overview 2 Previous W o r k 4 2.1 Pixel-Based Methods 4 2.1.1 Cluster-Based Probabilistic Model 5 2.1.2 Steerable Pyramids 7 2.1.3 Parzen Estimation 7 2.1.4 Multiresolution Sampling 8 2.1.5 Non-Parametric Sampling 9 2.1.6 Fixed Neighbourhood Search 9 2.1.7 Coherence Based 11 2.1.8 Non-Hierarchical Synthesis 11 2.1.9 Variable Neighbourhood Search 11 2.1.10 Image Analogies 12 Contents 2.2 3 2.1.12 Progressively Variant Textures 13 Patch-Based Methods 14 2.2.1 Chaos Mosaic . . . 15 2.2.2 Image Quilting 15 2.2.3 Patch based Sampling 16 2.2.4 Hybrid Texture Synthesis 17 2.2.5 Non-Periodic Tilings 17 2.2.6 Graphcuts 19 2.2.7 Feature Matching and Deformation 20 23 3.1 Obtaining the Texton Mask 23 3.2 Texton Mask Synthesis 24 3.2.1 Patch Selection 24 3.2.2 Patch Warping 25 Colour Synthesis 28 3.3.1 Low Dimensional Feature Vector 28 3.3.2 Distance Function 28 3.3.3 Search Techniques 30 E n t r o p y M i n i m i z i n g Synthesis A l g o r i t h m 33 4.1 Discretizing the Texture 33 4.1.1 Vector Quantization 33 4.1.2 Feature Vectors 34 4.2 4.3 5 13 L o c a l a n d G l o b a l Feature Synthesis A l g o r i t h m 3.3 4 2.1.11 Jump Maps Choosing the Dimensions to Keep 34 4.2.1 35 Entropy and Information Gain Synthesizing the Texture 40 Results 41 5.1 Local and Global Synthesis 41 5.1.1 41 Texton Mask Synthesis Contents 5.1.2 5.2 6 v Colour Synthesis 42 Entropy Minimizing Synthesis 47 Conclusion 58 6.1 Future Work 59 6.1.1 Texton Masks 59 6.1.2 Entropy Minimizing Synthesis 60 A Effects of High Dimensionality on Nearest Neighbour Search A.l Sparsity of Data A.2 Volumes of Hyperspheres 61 61 62 B Synthesis Times 64 Bibliography 68 List of Tables vi List of Tables 4.1 Conditional entropies 38 B.l Texton mask synthesis times 64 B.2 Colour synthesis times for hash-based search 65 B-3 Colour synthesis times for fc-coherence search 65 B.4 Candidate neighbourhood selection times 66 B.5 Synthesis times using information gain 67 List of Figures vii List of Figures 2.1 Popat and Picard's causal neighbourhood 2.2 L-shaped causal neighbourhood 10 2.3 Pixel orders for variable neighbourhood search 12 2.4 Minimum error boundary cut 16 2.5 Wang tile creation 18 2.6 Wang tile packing 20 2.7 Boundary edge selection using Graphcuts 21 3.1 Texton mask feature vector 26 3.2 Patch warping 27 4.1 Naive neighbourhood ordering using information gain 37 4.2 Avoiding blocks of repeated pixels 39 5.1 Automated texton selection 43 5.2 Texton mask synthesis 44 5.3 Texton mask synthesis 45 5.4 Effect of number of generations in order independent search . . . 47 5.5 Robustness of the distance-based hashing 48 5.6 Colour synthesis failures 49 5.7 Colour synthesis with fc-coherence search 50 5.8 Colour synthesis results with distance-based hashing 51 5.9 Effect of number of textons on synthesis results 53 5.10 Effect of number of textons on low frequency textures 6 54 List of Figures viii 5.11 Synthesis of natural textures 55 5.12 Synthesis of low frequency textures 56 5.13 Synthesis of highly structured textures 57 Acknowledgements ix Acknowledgements I would like to thank my supervisor for encouraging me to work on this topic. His insights have been invaluable during this work. I would also like to thank Jim Little for his many helpful comments and suggestions. Thanks to my proofreader Ciaran Llachlan Leavitt for taking the time to correct my many mistakes. Thanks to Michael Cohen, Alexei Efros and Vivek Kwatra for their permission to reproduce images from their publications. I would like to thank everyone around the lab who has made the last two years such an enjoyable experience. Last, but definitely not least, I would like to thank my family for their continuous support throughout my education. I couldn't have done it without them. Chapter 1. 1 Introduction Chapter 1 Introduction Begin at the beginning and go on till you come to the end; then stop. —Charles Dodgson. 1.1 Motivation The quest for realistic computer generated images has led to the development of many techniques. One of these techniques that is used in almost every modern graphics application is texture mapping [8]. Texture mapping provides an efficient way to include detail on the surfaces of objects in the scene. The effectiveness of texture mapping is limited by its reliance on the image to be mapped. If an image is mapped onto an object in the scene that is bigger (in texture coordinates) than the original texture then the texture must be tiled. In this case care must be taken that the original texture is continuous across its boundaries. If it is not then a seam will appear on the rendered object. Even when the original texture can be tiled tiling often leads to a noticeable, regular repetition of features. One possible solution is to use larger textures. Using larger textures presents other disadvantages. Textures can be captured from nature, created by hand, or procedurally defined [32]. Textures captured from real objects can vary widely depending on lighting conditions and the camera angle. Visual artifacts can be seen when the lighting in the scene does not match the lighting conditions from when the texture was captured. Having an artist create a texture is a time Chapter 1. Introduction 2 consuming process. Also, any textures that are stored take up space in memory. Even on modern systems there is a limit to how much memory can be used to store textures. Procedural textures can be used to create textures of any size without having to store the entire texture. Most procedural textures are limited to a specific class of textures such as animal spotting patterns [39]. It can also be difficult to select the proper parameters to generate the texture you want. Texture synthesis was developed to address these problems. Most modern texture synthesis algorithms can synthesize tileable textures even when they are given a nontileable source texture. Also, because texture synthesis algorithms create arbitrarily large textures, regular feature repetition does not cause a problem. Texture synthesis can be applied to many different types of textures and the results will be similar to the source texture. The parameters used in texture synthesis algorithms usually provide an intuitive speed/quality trade-off. Most modern texture synthesis algorithms rely on an approximate nearest neighbour (ANN) search to find the best patch to synthesize at the current location. Calculating nearest neighbours in high dimensions is computationally expensive. The usefulness of finding the nearest neighbour (NN) also becomes questionable as the dimensionality increases [6] [19]. Some synthesis algorithms have dealt with this by removing N N search entirely from the runtime portion of the algorithm [46] [21]. Although this works for many textures it results in noticeable artifacts for some classes of textures. Principal components analysis (PCA) is a relatively fast method to reduce the dimensionality but often retains the instability found in high dimensional searches [1]. To avoid some of these problems we reduce the dimensionality of the A N N search while still comparing enough information. 1.2 Overview In this thesis we explore two new algorithms for texture synthesis. The goal of both algorithms is to use as few dimensions as possible when performing an A N N search. The main contributions of our approaches are: Chapter 1. Introduction 3 • Separation of local a n d , g l o b a l features: First a texton mask [47] is synthesized (Section 3.2). Global information is taken from the. texton mask and uses only 5-9 dimensions (Section 3.3.1). The local information is then supplied using a small causal neighbourhood. A n error metric for using the combined local and global information is developed (Section 3.3.2). Search structures that take advantage of this separation are then developed (Section 3.3.2). • A m e t h o d for identifying a n d r e m o v i n g the least i m p o r t a n t d i mensions: The image is clustered into n unique pixel values (Section 4.1). Feature vectors are constructed using this clustered representation. The dimensions in the feature vectors are compared to determine which provide the most information gain (Section 4.2). This process is repeated until the desired number of dimensions is reached. These dimensions are the only ones used in the A N N search. Chapter 2 provides an overview of related work. Chapters 3 and 4 provide a detailed explanation of the two algorithms. Chapter 5 shows the results that are obtained from them, and Chapter 6 discusses possible improvements and extensions to the algorithms. Chapter 2. Previous Work 4 Chapter 2 Previous Work You can know the name of a bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird... So let's look at the bird and see what it's doing that's what counts. —Richard Feynman. Several approaches to texture synthesis have been developed in the past few years. Most recent texture synthesis methods consist of finding the best matching patch from the original texture and placing it in the current position in the synthesized texture. After placing the patch, some form of correction is often used to fix areas that don't match up properly with the existing patches. Pixel-based methods can be regarded as a special case where the patch size is l x l . The main differences between these methods is how they select the patch to be placed, synthesis order, and how they correct the patch boundaries. In this section, I will concentrate on methods which follow this general framework. This will be followed by a brief discussion of methods that do not follow this framework. 2.1 Pixel-Based Methods Pixel-based texture synthesis methods have had a lot of recent successes with certain types of textures. In general, pixel based methods model texture as a Markov random field (MRF) [11]. M R F s have two important properties for texture synthesis: stationarity and locality. Locality means that the probability 5 Chapter 2. Previous Work of the current pixel having a given value is dependent on only a finite number of surrounding pixels. Mathematically, given an image / , a pixel p e i* and a neighbourhood ui(p) C I the value of p is independent of I \ u>{p). Stationarity implies that the probability distribution function (PDF) is the same for all p £ I. Consequently the size and shape of the neighbourhood to(p) is the same for all p£l. Pixel based methods all create some type of feature vector to describe the neighbourhood. This feature vector is then used to determine the value of the current pixel either deterministically or nondeterministically. Unless otherwise stated, the feature vectors used for all of the following pixel based methods are composed of a concatenation of R G B triples for all of the pixels in the neighbourhood. 2.1.1 Cluster-Based Probabilistic Model Popat and Picard [33] use a semi-parametric model to synthesize textures. They note that it is infeasible to calculate the probability density function (PDF) of a high-dimensional vector unless it is broken down into a product of conditional P D F s . However, this fails to capture the interdependence of the vector elements. As a result, they use a probability mass function ( P M F ) p(x) and restrict it to be of a form where the chain rule can be applied. p(x) = p(xi,...,x ) N = p{xi)p(x \x )p(x \x ,x ) 2 l 3 1 2 •• -p{x \xi,...,XN—I) N Initially the data is clustered into M clusters using the Linde-Buzo-Gray (LBG) algorithm [25]. The P M F , p(x), is composed of a weighted sum of M component P M F s , p (x). m Each component P M F is centred on a different clus- ter. M Chapter 2. Previous Work where w m > 0 and ]£m=l (i = To ensure q(x) can be factored by the chain rule the component P M F s are restricted to the form Qm(^0 where the f ,nS m — J^J / m , n ( * E n ) > are discretized Gaussians. A causal neighbourhood is used to predict the value of the current pixel according to the restricted P M F . A causal neighbourhood is a neighbourhood where all of the pixels in the neighbourhood have already been synthesized. A noncausal neighbourhood contains pixels that have not been synthesized. The shape of causal neighbourhoods often depends on the synthesis order used. To reduce the dimensionality, a hierarchical method is proposed. Once the previous level has been synthesized, the pixel values of every fourth pixel are given by the pixel values of the previous level. This pixels are not synthesized with the current level but they are added to the causal neighbourhood as in Figure 2.1. • • • • • • • •••••••• ••• ••••• • • ••••• X • • X X • • • • ••• Figure 2.1: The causal neighbourhood combined with pixels synthesized at the previous level in Popat and Picard's hierarchical texture synthesis. Pixels with dots are in the neighbourhood, pixels with an X are the pixel to be synthesized and grey pixels were synthesized at the previous level. Chapter 2. Previous Work 2.1.2 7 Steerable Pyramids Heeger and Bergen [17] used knowledge of the human visual system to create their synthesis algorithm. Textures are considered similar if they produce a similar set of responses in a bank of orientation and spatial-frequency selective linear filters. They take advantage of this fact by forcing the synthesized texture to have matching histograms of filter outputs with the original texture. The algorithm starts by initializing the synthesized image to random noise. The algorithm then iteratively alternates between forcing the histogram of the synthesized image match the histogram of the original image and matching the histograms of each of the pyramid subbands. Although there is no proof of convergence, in practice the algorithm converges within a few iterations. Allowing the algorithm to continue for excessive iterations introduces artifacts. 2.1.3 Parzen Estimation Paget and Longstaff [29] were able to obtain accurate texture synthesis by using a nonparametric M R F model. Given a noncausal neighbourhood for the current pixel, the local conditional probability density function ( L C P D F ) is computed to determine the probabilities of the current pixel value. The L C P D F is obtained by using the Parzen-window density estimator [31] to perform density estimation on the multi-dimensional histogram created by the neighbourhood data. The texture is then synthesized using relaxation. The Parzen-window density estimator is a nonparametric density estimation technique that gathers information from neighbours within a fixed distance of the current point. Relaxation techniques modify the image one pixel at a time until it converges with a probability given by the P D F of the M R F . Two well known stochastic relaxation algorithms are the Gibbs sampler and the Metropolis algorithm [12][15]. The relaxation is performed using a deterministic algorithm called the iterative conditional modes (ICM) algorithm [7]. A multiscale approach similar to the one used by Popat and Picard [33] is used to reduce the neighbourhood size. 1 Paget and Longstaff later adapted their algorithm to make use of the strong 8 Chapter 2. Previous Work M R F property [30]. A strong M R F is a M R F where every sub-neighbourhood of the defined neighbourhood is Markovian. The L C P D F for a strong M R F can be written as p{x\xi,...,x )= n Y[ P(x\x ,r€C), r CCw(p),CgC'o'(p) where C C u>(p) is a clique. Cliques are subsets of the neighbourhood where every pixel in the clique is a neighbour to every other pixel in the clique. This new formulation for the L C P D F alleviates some of the problems with having sparse data to calculate high dimensional PDFs. 2.1.4 Multiresolution Sampling DeBonet [9] starts with the assumption that, below some resolution, textures contain regions that differ by less than some discrimination threshold. Below this threshold, similar regions can be permuted without altering the perceived characteristics of the texture. Initially an analysis pyramid is constructed. The analysis pyramid consists of a Laplacian pyramid and the responses to a filter bank of oriented first and second order Gaussians derivatives. Synthesis is done using a multiresolution approach, beginning at the lowest resolution level. The synthesis is initialized with the top level of the synthesized pyramid consisting of a tiled copy of the top level of the analysis image (a single pixel). Every level of the synthesis pyramid is constructed using information from the equivalent level of the analysis pyramid. For a given pixel location at the current level a feature vector composed of the filter responses for its parent pixels at all of the higher levels is constructed. A candidate set is then composed of all of the pixels in the analysis pyramid that are similar enough to the current pixel. To measure similarity, a vector of thresholds is used. If the distance between any features in the analysis pixel feature vector and the synthesized feature vector exceeds the corresponding threshold then the pixel is not added to the candidate set. A separate vector of thresholds is used for every level of the pyramid. Once the candidate set has 9 Chapter 2. Previous Work been calculated uniform random sampling within the candidate set is used to determine the value of the current pixel to be synthesized. 2.1.5 Non-Parametric Sampling Efros and Leung [14] model the input texture as a Markov Random Field ( M R F ) . Given an image / and a pixel p € hynth they let uj(p) C I s y n th be a square patch of the synthesized image with width w and centred at p. Because M R F s are local, the value of p is independent of / \ ui(p) given match &(p) uibest = {w C The closest = argmin ci(a;(p), ui) C / for the current pixel is found. A set w I : d(u>{p),ui) < (1 + e)d(ui(p),<jJbest)} is then created. The values of the centre pixels in O(p) are used to create a histogram. This histogram is then sampled uniformly to find the value of the synthesized pixel. The synthesis is initialized with a 3 x 3 patch taken from the original image / . The distance function is the L2 norm multiplied by a two-dimensional Gaussian. In general, not all of the pixels contained in u>(p) are known when pixel p is being synthesized. In this case, only the known values of u>(p) are used and the error is normalized by the number of known pixels. To synthesize the whole image, pixels are synthesized in a spiral order starting at the 3 x 3 seed patch. 2.1.6 Fixed Neighbourhood Search Wei and Levoy [42] developed a technique that is very similar to Efros and Leung's [14]. While Efros and Leung only used the locality property of M R F s , Wei and Levoy also use the fact that M R F s are stationary. This means that the neighbourhood does not change from pixel to pixel. Wei and Levoy use an L shaped causal neighbourhood uj(p) as shown in Figure 2.2. The pixels contained in this neighbourhood can all be assumed to have been synthesized because the synthesis progresses in scanline order. The synthesized image is initialized to random values. The pixel values in the neighbourhoods are concatenated to form a high-dimensional feature vector. To make the synthesis more efficient the feature vectors are preprocessed using tree-structured vector quantization 10 Chapter 2, Previous Work (TSVQ). This process reduces the number of pixels to be searched to approximately 10% of the original pixels. It also provides an efficient search structure so that the nearest neighbour is found in 0(log(n)) time where n is the number of pixels remaining after T S V Q . The nearest neighbour is determined using the 1,2 norm and is used as the value of the synthesized pixel. 0 1 2 3 4 12 13 14 5 6 7 8 9 15 16 17 10 11 18 19 20 Figure 2.2: A 5 x 5,2 causal neighbourhood used by Wei and Levoy. At the current level the neighbourhood is L-shaped so that all pixels in the neighbourhood have been synthesized when synthesis is performed in scanline order. The lower resolution level uses a square neighbourhood because the entire lower resolution image has already been synthesized. To further speed up their algorithm, Wei and Levoy incorporated a multiresolution pyramid. Synthesis is performed one level at a time starting with the level with the lowest resolution. Except for the top level, each level of the pyramid uses pixel values from the current level and from the level above it. At the current level the feature vector is composed of pixels from the L-shaped neighbourhood of the current level and a square shaped neighbourhood from the previous level. This allows smaller neighbourhoods to be used because largescale features are synthesized at lower levels. These differences make Wei and Levoy's algorithm about two orders of magnitude faster than Efros and Leung's. 11 Chapter 2. Previous Work 2.1.7 Coherence Based Ashikhmin [5] noted that the nearest neighbour to a pixel in the synthesized image often came from a location in the source image that was near the source locations of pixels in its neighbourhood. Along with the synthesized image, a coherence map that stores the source locations of the pixels in the synthesized image is maintained. Each pixel in the L-shaped neighbourhood w(p) adds an appropriately shifted pixel to the candidate set. This results in a candidate set with only 0(w) pixels to be checked. This implies that the number of pixels checked is constant with respect to the source image size. This algorithm works especially well for images with high frequencies that tend to be blurred by Wei and Levoy's algorithm [42]. 2.1.8 Non-Hierarchical Synthesis The typical scanline order does not work well for all textures. A n image of a plant with branches coming off of a stem will not be synthesized well. If the image is synthesized from the bottom up the branchings are more likely to be correct. Harrison [16] noted this fact and developed an algorithm that determines an optimal synthesis order for each texture. The locations and values of previously synthesized pixels constrain surrounding pixels. The most constrained pixel can be synthesized most accurately. The entropy for every pixel in the image is calculated and stored. The entropies are modified with a small amount of noise to avoid having pixels with the same entropy. The pixel with the lowest entropy in the image is synthesized. After the pixel is synthesized the entropies of pixels in its neighbourhood are updated. 2.1.9 Variable Neighbourhood Search A N N search tends to lead to features with blurred edges in the synthesized image. Tran and Datta [37] suggest that the blurring is caused by using large neighbourhoods in areas where they are unnecessary. They reduce the blur- ring by using a variable neighbourhood search. The neighbourhood begins as 12 Chapter 2. Previous Work a 5 x 5 L-shaped noncausal neighbourhood. The average R G B value of the neighbourhood is calculated. If the difference between the R G B pixel values and the avarage value exceeds a user defined threshold, A, the A N N is found using the current neighbourhood. If the threshold is not exceeded the next pixel is added to the neighbourhood. The pixel ordering is shown in Figure 2.3. The neighbourhood has a maximum size of 21 x 21. To accelerate the A N N search a kd-tree is used. i 1 2 3 1 0 j 4 . ____ , . 15 16 17 18 19 20 21 14' 6 7 8 9 10 22 13 5 1 2 3 11 23 12 4 0 0 Figure 2.3: The pixel ordering used in variable neighbourhood search. The 3 x 3 , 5 x 5 and 7 x 7 neighbourhoods and their orders are shown. 2.1.10 Image Analogies Hertzmann et al. [18] combined Ashikhmin's work with Wei and Levoy's to create a higher quality synthesis algorithm. To synthesize a pixel the nearest neighbours using an approximate nearest neighbour (ANN) search and coherence based search are both computed. To encourage verbatim copying, a coherence parameter K is selected by the user. The synthesized pixel is given the value of the coherence search result unless the distance to the A N N result is less than K times the distance to the coherence based result. Higher values of K result in more coherent images. In practice n is typically in the range 0.5 < K < 5. Although this algorithm takes significantly longer than Ashikmin's algorithm it does a better job at hiding the edges between coherent patches. Chapter 2. Previous Work 2.1.11 13 Jump Maps Jump maps [46] reduce run-time processing to an absolute minimum. Most of the run-time computation spent during texture synthesis is spent on A N N calculations. In the jump-map, no A N N computation is done at run-time. Instead, up to three ANNs are computed as a preprocess. Each of these A N N s is assigned a probability based on its similarity to the current pixel and then the probabilities are normalized. These normalized probabilities are placed into the jump map. At run-time one of the previously synthesized adjacent pixels is randomly selected with equal probability. A random number in the range [0, n] is selected, n is a user defined parameter that controls the patch-size. Higher values of n result in decreased jumps and more verbatim copying. If the random number is less than 1, the jump map is accessed and one of the ANNs is chosen based on their probabilities. To avoid artifacts a small Gaussian patch (3x3 or 5 x 5) is blended into the synthesized image instead of a single pixel. Jump maps provided the first real-time pixel based texture synthesis algorithm. However, it produces noticeable artifacts in textures with low frequency content. It is also very dependent on the synthesis order. In practice, a Hilbert curve is used. 2.1.12 Progressively Variant Textures The progressively variant textures algorithm [47] was developed to synthesize nonhomogenous textures. Nonhomogenous textures are those where the size and shape of features varies as a function of spatial location throughout the texture. The key to the algorithm is a user specified texton mask. Textons are the fundamental elements in preattentive texture perception [48] [22]. The exact definition of what a texton is is still unclear. Julesz proposed that they can be elongated blobs, terminators, and crossings. Shum suggests that 2D textons may be image patches that can be obtained using simple clustering techniques [35]. The texton mask is a coarse segmentation of the the image based on this definition of a texton. The texton mask corresponds to features in Chapter 2. Previous Work 14 the source texture and usually consists of only 2 or 3 colours. In addition to the texton mask the user also specifies an orientation field and transition function for both the original texture and for the synthesized texture. To synthesize a target pixel p, first the neighbourhood must be constructed. This neighbourhood is oriented according to the orientation field vector at the current point and then scaled according to the magnitude of the vector. The already synthesized pixels are then resampled to determine the neighbourhood. The texton mask at the current pixel is then synthesized using a nearest neighbour search in the L norm. Next the pixel value is determined by finding the 2 pixel that minimizes dist(N (y), c N {p)) + dist(N (v), c m N (p)), m where N (p) is the neighbourhood around pixel p in the colour image, c N (p) m is the neighbourhood around pixel p in the texton mask, p is the target pixel and v is the current candidate pixel. Because of the rotation, synthesis quality is affected by synthesis order. The synthesis order used is the same as in [38]. For synthesis of images a scanline order is used. When synthesizing the texture directly onto a geometric model an orientation field is specified and is used to provide local coordinates. The synthesis proceeds to the next vertex in scanline order using this local coordinate frame. 2.2 Patch-Based Methods Patch-based texture synthesis techniques enforce local statistics by directly copying whole blocks, B, of the original texture. Due to the reduced number of calculations required to synthesize an image patch-based methods are typically significantly faster than pixel-based methods. In fact, several real-time patch based algorithms have been developed. The major differences between most patch-based algorithms is how they deal with patch boundaries. Features in the texture are not guaranteed to match across a patch boundary. Figure 2.4 15 Chapter 2. Previous Work demonstrates that even when the best matching block is used discontinuities are present at the boundaries. 2.2.1 Chaos Mosaic Chaos mosaics [45] permute the patches of the synthesized image in a deterministic, yet visually stochastic, manner. The key to the algorithm is the use of the cat map to permute the patches. In the cat map, the pixel at location [x ,y ) l is mapped to (x ,y ) l+1 l+1 l by the equations x l+l = (x + y )mod n y i+1 = [x + 2i/)mod n l l l where the source texture is n x n. Although the cat map results in visually stochastic, tileable images, it does not sufficiently preserve local statistics for texture synthesis. To preserve local features the cat map is applied to blocks of texture instead of individual pixels. The algorithm starts with a simple tiling of the source texture to obtain the desired output texture size. For each tile in the output texture a random block is selected. These random blocks not only have random location, but random width and height as well. Several iterations of the cat map are applied to all of the random blocks. Edge mismatches are dealt with by blending the edges of the blocks with the tiled image. 2.2.2 Image Quilting The image quilting algorithm [13] attempts to place patches in an intelligent manner. Synthesis is performed in scanline order one patch at a time. Each patch to be synthesized has a small overlap with the previously synthesized patches. The best patch is selected as the patch which minimizes the Li norm for the pixels in the overlap region. Once the best patch has been selected, dynamic programming is used to find the minimum error boundary cut ( M E B C ) through the overlap region. The M E B C is the cut through the overlap region Chapter 2. Previous Work Hi Figure 2.4: Given the source texture (a) patch based methods construct output textures (b), (c), and (d) by placing patches and then correcting edge mismatches, (b) is synthesized by randomly placing patches and the patch edges are clearly visible, (c) finds the best match and places it but edge discontinuities are still visible, (d) finds the best match and then uses the M E B C to correct artifacts at patch boundaries. The image is from [13] and used here with permission from the authors. 2.2.3 Patch based Sampling Liang et al. [24] developed a patch based algorithm that was very similar to image quilting. The two main differences between there algorithm are the boundary fixing and the patch selection algorithm. The overlap region is corrected by blending with the previously synthesized patches as in [45]. The current patch is selected by using a novel approximate nearest neighbour (ANN) algorithm. The A N N search is sped up by combining three methods. First, Principal Component Analysis (PCA) is applied to the data to reduce the dimensionality Chapter 2. Previous Work 17 of the problem. Second, a kd-tree is used to find the high-dimensional A N N after P C A . Finally, a quad-tree pyramid (QTP) is used to find a small set of initial candidates. The Q T P is similar to a Gaussian pyramid except that at every level, 4 higher level images are computed. After these techniques are applied to speed up the A N N search the algorithm is able to achieve real-time results. 2.2.4 Hybrid Texture Synthesis Hybrid texture synthesis [ 2 8 ] was developed Nealan and Alexa to improve the quality of the tile overlap. After the current tile has been found and added to the synthesized image the quality of the overlap region is examined. Pixels that differ beyond a threshold T are removed. If the entire overlap region exceeds a threshold then it is resynthesized with patches one quarter the size of the current patch. This procedure is performed recursively until an acceptable patch is found or individual pixels are used. The removed pixels in the boundary region are then replaced using individual pixel synthesis. Because the search neighbourhood cannot be precomputed, A N N search structures cannot be used to speed up the search. A full brute force search over the input texture would be very time consuming. Instead the best match is found in the Fourier domain using an 0(n log n) time algorithm. Additionally, the colour channels are weighted with the vector W J J , G , B = { 0 . 2 9 9 , 0 . 5 8 7 , 0 . 1 1 4 } . This weighting corresponds to the luminance component of the Y I Q colour space. 2.2.5 Non-Periodic Tilings Cohen et al. [ 2 1 ] used Wang tiles to construct a nonperiodic tiling of texture patches. A Wang tile is a square tile where each edge is assigned a colour. A valid tiling requires that all shared edges between two tiles have matching colours. There exist sets of tiles that can be shown to create strictly aperiodic tilings of any size. However, these tilings may require large amounts of backtracking Chapter 2. Previous Work 18 and often do not result in visually stochastic tilings. The authors use a set of tiles that does not guarantee aperiodicity. Tiles are placed in scanline order. As a result only the top and left sides of the new tile are required to match. In the tile sets that the authors construct, there are two matching tiles for any combination of top and left sides. One of the two matching tiles is randomly selected as the current tile. Tiles are constructed so that no blending is required at runtime. A diamond shaped block is randomly selected for each colour in the tiling. The blocks are placed in each Wang tile so that the outer edges of the tile have the appropriate colours. The block edges within the Wang tiles are blended using M E B C . If the error for the set of Wang tiles exceeds a user specified threshold, then the coloured blocks are discarded and reselected. Because of how the blocks are constructed, edges with the same colour are guaranteed to match without any blending. (a) (b) Figure 2.5: Wang tiles are created by choosing four diamond shape regions from the original texture. They are then placed together to form a larger diamond and M E B C is used on the overlap regions. A square shaped tile is taken from the large diamond to form a single Wang tile, (b) shows the arrangements used to make an 8 tile set. The image is from [21] and used here with permission from the authors. A graphics hardware implementation using Wang tiles has also been developed [41]. Because graphics hardware does not provide any guarantees about the order in which the tiles will be synthesized the algorithm should provide a method for determining the four edge colours without knowing the four adjacent 19 Chapter 2. Previous Work tiles. A hash table based on the Cat-Map [45] is used to determine the four edge colours. The construction of the hash table guarantees that the edges of the patch will match adjacent patches. If the final rendered texture i s n x n tiles, the hash table should contain at least n entries to prevent repetition. The hardware implementation of Wang tiles uses all possible edge combinations. To store the tiles a 1-dimensional packing is developed that uses all possible edge combinations and ensures that all tiles match adjacent tiles. This packing is then applied in both dimensions to construct the final 2-dimensional packing. This packing is important to avoid boundary artifacts caused by mipmap filtering. 2.2.6 Graphcuts Kwatra et al. [23] represent the texture to be synthesized as a graph and reformulate the problem of finding the best seam into a min-cut problem. The min-cut problem has been well studied and several algorithms exist for solving it. Given two texture patches A and B a graph is constructed in which every pixel in the overlap region is a node. Additional nodes are added for A and B and edges between them and some of the pixels are given an infinite cost. This constrains these nodes to come from one of the two original patches. A cost function is defined as {s,t,A,B)= M \\«.)-Bisn \\«t)-B G (s) 11 + 11 G\(t) H + ll G%(s) H + ll G%{t) ||' : + m d A where s and t are adjacent pixels in the overlap region, A(s) and B(s) are the colours of the pixel at location s in the old and new patches, G^(s) is the gradient in patch A along direction d, and || • || is an appropriate norm. The seam is chosen using a solution to the min-gut problem as in Figure 2.7.' When a new patch is added near an old seam, the old seam is taken intq^account and may be altered. Three separate algorithms are proposed for selecting a patch and placing it in the output image. The important feature in all of these algorithms is that Chapter 2. Previous Work Figure 2.6: Wang tile packing. 20 The left image shows an example of a 1- dimensional packing using two edge colours. The right image shows a 2-dimensional packing using 2 horizontal and 3 vertical edge colours. the patch can be placed anywhere, even on top of a previously synthesized area. This can be desirable for covering bad seams. 2.2.7 Feature Matching and Deformation Wu and Y u [44] make use of a feature map to help reduce discontinuities across patch boundaries. A feature map is a binary image that contains the set of curvilinear features in the image. The feature map is obtained using a bilateral sharpening filter followed by a two pass edge detection algorithm. The first pass 21 Chapter 2. Previous Work Figure 2.7: The left hand image shows the overlap region between two patches. The right side shows the graph formulation and the minimum cost cut. The image is from [23] and used here with permission from the authors. finds strong edges and the second pass finds connections between the edges. Finally, a thinning algorithm is applied to ensure the features have only one pixel width. A n error metric between two feature map patches is required for the synthesis. The distance function should account for both the distance between the features and the orientation of the features. Two features that cross but are perpendicular to each other are not a good match. The distance between two features is fdist{h,f )=\\h- h\\ +T\\v -V \\ , 2 2 where v\ and v are the tangents at f\ and f l 2 2 2 2 and r is a weight indicating the importance of the tangent. The function W(f ) returns the feature, synth patchy i n t h e c u r r e n t patch that i i i e m n m Z fdist{f ,f ). synth S In general, patch the overlapping region of a patch to be synthesized will contain more than one feature. The function Bf(ff ) returns a count of the number of features, atch j.jjat jsynth ^jsynth^ a r e jsynth^ mapped to ^ jsynth^ compared, (jff**, / J o t c fP Given the set of already synthesized features, a t c h a n ( j the set of features in the patch currently being \ ..., fi }, atch dist(f, ) = i £ fdist(fr , th 9 i the distance is w ur )) th f + \ fU B p atch 3 - j where j3 is a weight between the location/orientation and bijectivity terms. 22 Chapter 2. Previous Work Bijectivity is then enforced by removing extraneous features from the overlap region. The feature map and the final image are synthesized at the same time using a scanline order. The weighted sum of the feature map error and the Z, norm of 2 the colour information is used to select the best patch. Once the best matching patch has been selected it is warped so that the features are consistent across patch boundaries. The warping function moves the points in the overlap region so that the features in the newly synthesized feature map patch match the already synthesized features. At the same time it is desirable that the features stay fixed along the bottom and right hand edges of the synthesized patch. The warp is obtained using scattered data interpolation. First a thin-plate spline [10] warping is attempted. Thin-plate splines satisfy the warp constraints with the minimum bending energy. Unfortunately, they require a matrix inversion to calculate. The matrix is singular for many sets of constraints. If the matrix is singular Shepard's method [20] is used instead of thin=plate splines. Shepard's method is an iterative algorithm for scattered data interpolation. Once the warping function has been determined it is applied to both the feature map and the colour information. Warping takes a considerable amount of time and the algorithm is only able to synthesize a 256 x 256 pixel output texture in about 2 minutes on a 2 GHz Pentium IV. This is considerably longer than many previous patch based methods. Chapter 3. Local and Global Feature Synthesis Algorithm 23 Chapter 3 Local and Global Feature Synthesis Algorithm Make everything as simple as possible, but not simpler. —Albert Einstein. The texture synthesis algorithm proposed here can be broken down into three main stages; obtaining a texton mask, synthesizing the texton mask, and synthesizing the final texture using the texton mask. These stages are described in detail here. 3.1 Obtaining the Texton Mask Before synthesis can begin, a texton mask must be created for the input texture. The texton mask can be created manually [47]. However, manual construction of the texton mask is a time consuming process and, in general, synthesis results are not overly sensitive to the texton mask used [47]. Because of this, a fast technique that creates reasonable texton masks without user intervention is desirable. The k-means algorithm is used to separate the image into k textons where k is specified by the user. For most applications, values of 2 or 3 are sufficient for reasonable synthesis results. The A;-means algorithm is a simple iterative clustering algorithm. The algorithm is initialized by choosing k random pixels as the initial centroids. Every pixel in the image is then assigned to the cluster Chapter 3. Local and Global Feature Synthesis Algorithm 24 with the closest centroid. Once all of the pixels have been assigned the centroids are recalculated and the process continues until convergence. If the algorithm does not converge within a set number of iterations then the current centroids at that point in time are used. In practice the algorithm almost always converges. Clustering is performed on the 3-dimensional R G B triples of the pixel values. Clustering can also be performed in other colour spaces such as H S V however this does not significantly affect the results in most cases. 3.2 Texton Mask Synthesis The texton mask is synthesized using a patch-based technique. A technique based on image warping is used to ensure that features are aligned at patch boundaries. A special feature vector is used for both selection and warping. Because the texton maps have a sparse set of possible values the feature vectors only encode edges. Each entry in the feature vector is a triple consisting of the current texton, the horizontal or vertical offset within the patch, and the normal direction. To construct a feature vector for a candidate patch, start on a pixel above the upper left corner of the patch. Continue one pixel at a time to the right until an edge is encountered. Add an entry to the feature vector for the pixels on the left and right sides of the edge. Continue with this until the right hand edge of the patch is reached. A similar process is used for constructing feature vectors for the left hand side of the candidate patch. Because feature edges are sparse for many textures, the feature vectors have a relatively low dimensionality. 3.2.1 Patch Selection Because we are using an image warping technique to align patch boundaries, it is important that the features located at patch boundaries be bijective. That is, the feature vectors for the previously synthesized patches and the patch to be synthesized must both have the same textons in the same order. If this is Chapter 3. Local and Global Feature Synthesis Algorithm 25 not the case then there is no warping that can ensure that the features match at the edges. Each candidate patch has two feature vectors. One for the left side and one for the top. To ensure bijectivity patches are rejected immediately if the sequence of textons in any feature vector does not match the sequence of textons in the already synthesized patches. The distance between two feature vectors is calculated using equation 3.1. i Here v new and v old are the feature vectors, o, is the i th offset and rij is the i th normal and r is a weight usually set to 0.1. Patches are synthesized in scanline order. The first patch is randomly selected. For patches along the top and left sides both feature vectors are taken from the only adjacent synthesized patch. If a point is reached during the synthesis where a bijective match cannot be found the process is restarted using a different random starting point. Patch synthesis is initialized by randomly choosing a patch. 3.2.2 Patch Warping The patch warping should meet several criteria. The positions of the pixels along the bottom and right edges of the patch should not change. The positions of the texton edges immediately above the patch and to the left of the patch should match the features of the adjacent previously synthesized patches. The first method that was examined for the warping was thin-plate splines (TPS). T P S simulate the bending of a thin sheet of metal that is bent to fit the constraints. T P S have the minimum bending energy of any warping function that canfitthe constraints. To calculate the interpolation from T P S a linear system must be solved. For this problem the linear system was frequently illconditioned. In fact, it is not uncommon for the matrix to be singular. Chapter 3. Local and Global Feature Synthesis Algorithm 26 Left Side Feature Vector: {1,5, 160}, {0, 6, 340}, {0, 11, 160}, {1, 12, 340} Figure 3.1: A feature vector is constructed for both the top and left edges of the texton mask patch to be synthesized. The feature vectors are composed of triples containing the texton value, the offset between the current texton boundary and the edge of the patch and the tangent of the texton boundary. To maintain a relatively high speed we use a bilinear warping function. For every pixel in the patch, the horizontal and vertical offsets from the warping are computed independently and then combined to determine the source pixel in the original patch. To calculate the horizontal offset, the patch is divided vertically by a series of lines going from the point (o" , —1) to the point o° , P) for a etu ld patch of size P x P. The closest line on each side of the current pixel is found. Here the left and right edges of the patch are considered vertical lines. The y-coordinate of the source pixel is _ Vsource (ydest +mi+ — -m r where (xdest, mi,m ,bi, r bi)(m . r . , +W- *Xdest . mi * x t + b + m - b + mi * x r r des . L r . d e s t , - h) , > . \^-^J + bi are the coordinates of the current pixel within the patch, ydest) and b are calculated using the equation for the line for the lines r immediately left and right of the current pixel. Equation 3.2 can also be used to calculate x dinates. s o u r c e by appropriately switching the x and y values in the coor- Chapter 3. Local and Global Feature Synthesis Algorithm 27 Figure 3.2: The texton mask patches are warped during synthesis. The x coordinate of the pixel in the unwarped patch that gets mapped to the current position, {xd t,ydest), es is calculated by Equation 3.2. Chapter 3. Local and Global Feature Synthesis Algorithm 3.3 28 Colour Synthesis Once we have the texton mask the final texture can be synthesized. A lowdimensional feature vector containing information from the texton mask and from the synthesized colour values is used to synthesize the final texture. As a result of how the information is stored, a variety of search structures can be used to speed up the search. 3.3.1 Low Dimensional Feature Vector The feature vector used to synthesize the final texture consists of two parts. The first portion of the feature vector uses data taken from the synthesized texton mask. The first element in the feature vector contains the current texton. The next four values contain the distance from the current pixel to the nearest texton edge directly above, below, left and right of the current pixel. For efficiency, these four values are computed and stored in a distance map before colour synthesis begins. The next four dimensions contain the texton values on the other side of the four edges from the previous four dimensions. These last four dimensions may be omitted when the texton mask contains only two texton values. At most, the texton mask contributes 9 dimensions to the feature vector. Because the texton mask contains the largest features in the texture the colour information only local colour information is needed. In many cases a 3 x 3 causal neighbourhood is sufficient to produce reasonable results. If a scanline order is used then this results in a 21 dimensional feature vector. If a random order is used then there is a 33 dimensional feature vector. This is much lower than traditional techniques. A typical 9 x 9 x 2 causal neighbourhood as used by Wei and Levoy [42] has a 195 dimension feature vector. 3.3.2 Distance Function Because the feature vector contains several different types of information an Lk norm would not be useful for calculating the distance between two feature Chapter 3. Local and Global Feature Synthesis Algorithm 29 vectors. The distance values and the colour values can be easily compared using weighted Lk norms. However, it does not make sense to compare texton values using an Lk norm because there is no ordering information between textons. Instead a delta function is used. The full distance equation is 6 dist(u,v) = ! ) + A y ] \\UJ - Vj\\^ + r(1 - S Ul :Vl t=2 io d - t=7 Ui,Vi) S ( )\\ i W + i U ~ i\\k > (-) v 3 2 3 i=ll where r, A, and K are constant weights and w(i) is a weight function such that the combined weight of a single pixels colour channels sums to one. w(i) may be a constant or it may weight the colour terms unevenly as in [28]. The correct choice for the constants r, A and K is important for the algorithm to work effectively. Because it is very important that the final colour value belongs to the same texton as the current texton r is set extremely high. Often r is set to infinity. This guarantees that the final pixel will come from the same texton and can simplify search. A and K are also set fairly high relative to the individual colour values. By keeping these values high, the large scale global information is preserved. The choice of which Lk norms to choose is also important. It has been shown that in high dimensions the choice of k can affect the meaningfulness of the result of a nearest neighbour search [2]. Higher values of k provide poor contrast between the nearest and farthest neighbours as the dimensionality increases. For high dimensions, the relative contrast given by an Lk norm with a smaller value for k is more likely to dominate the contrast of an Lk norm with a higher k value. Although we have reduced the dimensionality significantly, Chapter 3. Local and Global Feature Synthesis Algorithm this behaviour can be empirically shown to some degree for dimensionalities as low as 2. For dimensionalities of around 10-20, this behaviour applies over 95% of the time when the L% and L2 norms are compared [2]. To account for this behaviour we usually use the L\ norm in our distance calculations. The behaviour of nearest neighbour search in high dimensions is discussed in detail in Appendix A . 3.3.3 Search Techniques The unique structure of the feature vector lends itself to specialized search structures. For example, if r = 00 then we can create one small search structure for every possible texton value instead of having one large search structure. For all of the search structures described here it can be assumed that the search space has already been split into sets with the same texton values. Sequential Scan At first glance a full systematic search seems like a hopelessly naive algorithm. However, it has been empirically shown that many exact nearest neighbour search structures are outperformed by a sequential scan for dimensionalities as low as 10 [40]. Due to the low dimensionality of our method compared to other modern methods we can perform a full search in a relatively short amount of time. Additionally, when r = 00 then we only need to perform a sequential scan over approximately l/n th of the search space where there are n unique texton values. Distance-Based Hashing Since the value of A is relatively high compared to the coloured values it makes sense to split the search space based on the distances to the nearest texton edge. It is very likely that the nearest neighbour will match at least one of the edge distances exactly. This likelihood can be increased by using the L2 norm for the distances since it penalizes large differences. A separate hash table is 30 31 Chapter 3. Local and Global Feature Synthesis Algorithm constructed for each of the four distance dimensions. Each hash table contains one hash bucket for each possible distance value. In practice, there are usually significantly less possible distance values than the width or height of the source texture. Each hash bucket is capable of holding enough pixels to store the entire original source image. During search, for each distance dimension, the appropriate hash bucket is chosen. Within the buckets a systematic search is performed. Although, this results in some repeated distance calculations for pixels that match more than one distance the extra calculations is reasonably small. The space complexity of distance-based hashing is 0(nm) number of pixels in the source image and m = where n.is the max(height ,width source As a result the space complexity can vary between 0(ny/n) ). source and 0(n ). 2 The worst case time complexity is 0(n) per synthesized pixel. \ fc-coherence Search fc-coherence search is based on Ashikhmin's synthesis method for natural textures [5]. Ashikhmin's texture synthesis method [5] can be viewed as the special case where k = 1. It was first used by Tong et al. [36] to synthesize bidirectional texture functions (BTFs). Every pixel po in the synthesized image corresponds to some pixel p\ in the source image. When synthesizing a pixel p we look at s all pixels po in the neighbourhood of p . s For each po £ N(Ps) there is some translation ip such that ip(loc(po)) = loc(p ). We then apply tp to all of the s pixels pi corresponding to the pixels po € N(p ) to obtain the set of candidate s pixels p i . For every pixel p i in the source image we build a coherence set C(p\) of the k — 1 pixels that are most similar. The complete candidate set C(p ) s for possible pixels consists of all of the pixels p i and all of the pixels in their coherence sets C(pi). Mathematically this can be described as C(p ) = {Vp|p € p i U C ( p i ) , p i = rl>(pi),4>{loc(p )) = loc(p )\/p s 0 s 0 € N{p ).} s Because the noncausal neighbourhoods used in the the feature vector are very small (typically 3 x 3) a larger neighbourhood A (p ) is used to determine r s Chapter 3. Local and Global Feature Synthesis Algorithm 32 the candidate set Cp - Typically a 5 x 5 causal neighbourhood is sufficient to s produce reasonable results. Properly selecting which pixels pi belong in the coherence set C(p\) is important for this search method to work effectively. Selection of the coherence sets is a preprocess so we do not need to use the low-dimensional feature vector to find the k—1 nearest neighbours. We use a feature vector composed of R G B triples from a square neighbourhood around the current point. The size of this neighbourhood varies depending on the characteristics of the source texture. Typically a 9 x 9 neighbourhood is sufficient. This results in a 243 dimensional feature vector. Because this is a high dimensionality we use the L\ norm to compute distances. It is typical for pixels located near each other in the source texture to appear in the same coherence set. We want to prevent this to encourage diversity in the synthesized texture. As in [46] we prevent pixels that are within a certain radius r of pixels in the coherence set from being added to the coherence set. r is typically set to 5. fc-coherence search has a space complexity of 0(kn) and a time complexity of 0(km) where the input and output images contain n and m pixels respectively. For fixed values of k, the time complexity is constant and the space complexity increases linearly with the number of input pixels n. These are both desirable properties for a texture synthesis algorithm. Chapter 4. Entropy Minimizing Synthesis Algorithm 33 Chapter 4 Entropy Minimizing Synthesis Algorithm Only entropy comes easy. —Anton Chekhov. The second texture synthesis algorithm uses a more traditional approach. The main contributions of this algorithm are: • Automatically selecting a causal neighbourhood for each texture. • Doing most of the computation for the distance calculations as a preprocess. 4.1 Discretizing the Texture For the later steps of the algorithm to work properly the texture must be discretized into a small number of values. For an n x m texture we use vector quantization (VQ) to find at most MIN(TO, n) vectors. There are many choices of which V Q algorithm and pixel representation to use. 4.1.1 Vector Quantization We use the fc-means algorithm for V Q . The /c-means algorithm begins by randomly selecting m pixels and setting the initial cluster centroids to their feature vectors. Then for every pixel in the image the distance between its feature vector and every cluster centroid is calculated. The pixel is assigned to the cluster Chapter 4. Entropy Minimizing Synthesis Algorithm 34 with the closest centroid. Once all pixels have been assigned to a cluster the centroids are recalculated. The new centroid is the mean of all the feature vectors for pixels assigned to the cluster. The process repeats until either no pixels change clusters or a maximum number of iterations is exceeded. At some point in the algorithm, a cluster may have no pixels assigned to it. When this happens, it is assigned its previous value although the cluster is empty. The cluster will often become nonempty again in future iterations. Depending on the texture and the choice of feature vectors it may not be possible to assign m unique centroids. In this case, we divide the number of clusters in half. We repeat this until we can assign all of the clusters a unique initial starting value. 4.1.2 Feature Vectors We can use any data in the image to construct the feature vectors as long as there is a good error metric for them. The easiest feature vector to use is the R G B triples for the pixels and an Lk norm for the distance metric. Because this is being done as a preprocess it makes sense to perform a little extra computation and use a perceptually linear colour space such as La*b*. Simpler feature vectors such as only using the luminance values per pixel can also be used. Additional information such as the horizontal and vertical gradients per colour channel can also be included. Finally, values from surrounding pixels can be included in the feature vector and combined with complicated error metrics that include masking effects [26]. 4.2 Choosing the Dimensions to Keep Using the assumption that textures can be represented as M R F s we know that every pixel value can be computed by examining a finite set of each pixel's neighbours. Wei and Levoy [42] took advantage of this by using an L-shaped causal neighbourhood of pixels around the current pixel. This neighbourhood must be Chapter 4. Entropy Minimizing Synthesis Algorithm large enough to contain information about the feature being synthesized and the relationship between features. For example, if a wire mesh is being synthesized the neighbourhood must be at least as large as the space between wires. Otherwise the mesh may become irregular with many wires close together in some areas and large gaps in others. This requirement leads to large neighbourhoods and consequently high dimensional feature vectors. If we are using an n dimensional feature vector to synthesize a texture, can we achieve a result with similar quality using an n — 1 dimensional feature vector? In many common cases the answer is yes. The colour channels in an R G B triple do not vary independently for most textures. Adjacent pixels often have similar R G B values. This provides a large amount of redundancy that can be exploited. From this point on, instead of rejecting dimensions from the neighbourhood, we will call the standard causal neighbourhood the candidate neighbourhood and add candidate dimensions to the actual neighbourhood. 4.2.1 Entropy and Information Gain To decide which dimensions to add to the neighbourhood we need a metric to determine how much information about the current pixel each dimension contributes. Information gain provides a simple way to do this. Storing a data point, X, requires H(X) bits. H(X) is the entropy of X and is calculated by m (4.1) where m is the number of possible values X can have and pj is the probability that X is the j th value. In most applications data points don't exist by themselves. Previous data points are often related to the current data point in some fashion. Given the value of a previous data point, Y, we can compute the conditional entropy 35 Chapter 4. Entropy Minimizing Synthesis Algorithm w ) = E E p ^ = ^ y = ^ { x = v \ l Y = V ] ) ri m = P g = v) £ y j 36 « P(X = \Y VI = V J ) io g p { x = v A Y = v j ) m = ^2P{Y = v )H(X\Y J = v ), j , (4.2) J=I where VJ is the j " 1 possible value for Y and P ( Y = Vj) is the probability that Y = Vj. The difference between H(X) and i7(A"|y) is the information gain IG(X\Y) = H(X) - H(X\Y) (4.3) Applying this to texture synthesis, we want to keep the dimensions that provide the largest information gain. To simplify the calculations we use the clusters from the V Q instead of using actual pixel values. This also means that instead of choosing an individual colour channel within a pixel to add we are choosing the whole pixel. A naive implementation would simply calculate the information gain for each pixel in the candidate neighbourhood and add them in order of decreasing information gain until the number of dimensions in the neighbourhood reaches a threshold. In practice this results in pixels that are close to the target pixel being included and further pixels are left out. Figure 4 . 1 shows the order that pixels are added to the neighbourhood for some textures using this approach. This means that we lose the information about interfeature distances that the larger neighbourhood provided and we are not fully exploiting the redundancy between pixels. To build the neighbourhood we start by finding the pixel, Yi that results in the lowest conditional entropy H(X\Y{) and add it to the neighbourhood. Then we find and add the pixel Y2 that minimizes the conditional entropy H(X\Yi,Y2). H(X\Yi, We repeat this until the conditional entropy is equal to zero Y 2 , . . . , Y ) = 0. Now we restart the process using the next best conn ditional entropy H(X\Y \) n+ > H(X\Y\). If the best pixel has already been Chapter 4. Entropy Minimizing Synthesis Algorithm 23 20 14 16 17 21 24 23 20 16 11 15 19 24 22 12 8 5 7 13 21 17 9 4 1 3 10 18 14 6 2 F ! 15 16 12 19 17 | 21 18 i 10 11 14 20 13 1 4 8 ! 3 6 2 5 7 9 | ! 22 i i : 18 ! L 37 1 5 8 12 5 9 11 19 4 10 1 6 2 11 15 17 12 16 13 10 14 3 7 13 I 21 18 20 | 19 | ! j I I I L r : t 6 5 1 7 2 3 8 I 9 ! i 4 ! I Figure 4.1: The top images are two example textures. The middle images show a neighbourhood for each texture with the neighbours ordered by decreasing information gain. The bottom images show multiresolution neighbourhoods ordered by decreasing information gain. Chapter 4. Entropy Minimizing Synthesis Algorithm Dim # H(X\Y ) Dim # H(X\Y ) 0 4.86652 12 4.5736 1 4.8833 13 4.58375 2 4.90284 14 4.5859 3 4.93099 15 4.57048 4 4.92979 16 4.52477 5 4.8721 17 4.59278 6 4.88349 18 4.49937 7 4.87238 19 4.54705 8 4.92225 20 4.56063 9 4.92542 10 4.86394 11 4.87569 n 38 n Table 4.1: The conditional entropies from the green scales texture using a {5 x 5,2} causal neighbourhood. The dimensions on the right are from taken from the lower resolution level of a Gaussian pyramid while those on the left are from the current level. The dimension number is the same as used in Figure 2.2. added to the neighbourhood it is still used to calculate joint conditional entropies but is not added again to the neighbourhood. The process continues until the threshold for the number of dimensions is reached. For a given pixel, the conditional entropies obtained using pixels from higher levels in the multi-resolution pyramid are often significantly lower than the conditional entropies using other pixels at the current level. Table 4.1 shows the conditional entropies H(X\Y) for every pixel in a {5 x 5, 2} causal neighbourhood. When the number of pixels being compared is less than the number of pixels in the neighbourhood that come from the previous level then it is possible that all of the pixels being compared come from the previous level. When this occurs the algorithm is unable to distinguish between the pixels at the current Chapter 4. Entropy Minimizing Synthesis Algorithm 39 level that contribute to a single pixel at the lower resolution level. This leads to 2 x 2 blocks of pixels that all have the same source pixel in the original image. To correct this, a check is made to ensure that at least two of the pixels in the candidate neighbourhood came from the current level. If there are not at least two pixels from the current level the candidate dimension with the highest conditional entropy is removed from the candidate neighbourhood and replaced with the dimension from the current level with the lowest conditional entropy. This process is repeated until there are two candidate dimensions from the current level. Figure 4.2: The right hand side shows the 2x2 blocks of repeated pixels that occur when only dimensions from the lower resolution neighbourhood are used in the synthesis. The left hand image shows the result when the algorithm is forced to choose some dimensions from the current level. The top of the image shows the pixels contributing to the feature vectors. Chapter 4. Entropy Minimizing Synthesis Algorithm 4.3 Synthesizing the Texture A A;-coherence search is used to provide a constant time search for approximate nearest neighbours. To further speed up the A N N search an approximate error is used instead of the actual error. The approximate error is computed using the cluster centroids that were obtained by V Q . For every pair of centroids the distance between them is computed and stored in an error texture. The number of errors that need to be stored is 0(n) where n is the number of pixels in the source image. The approximate error is calculated by accumulating the approximate errors for each pixel in the final neighbourhood. Although this results in some inaccuracy in the search it allows arbitrarily computationally expensive error metrics to be used with a constant cost at runtime. To minimize the number of dimensions needed to accurately synthesize textures a multiresolution Gaussian pyramid is used. 40 41 Chapter 5. Results Chapter 5 Results Results! Why, man I have gotten a lot of results. I know several thousand things that won't work. —Thomas Edison A l l of the results shown here were obtained using an 1.8 G H z Pentium IV C P U with 512 M B of R A M . 5.1 5.1.1 Local and Global Synthesis Texton Mask Synthesis It takes between 0.5 and 9 seconds to determine the initial texton mask. The amount of time required depends on the size of the source image and the desired number of textons. The time spent synthesizing the texton mask varies widely depending on the initial texton mask. When 32 x 32 patches are used the feature vectors typically have between 6 and 12 triples for two texton values. Textures with larger feature vectors take longer to synthesize because they need to restart more often to maintain bijectivity. As the output size grows it becomes more likely that the synthesis process will need to restart. This means that the time required to synthesize a texton mask grows rapidly with the size of the output. Table B . l shows the synthesis times for several texton masks. The enforcement of bijectivity means that most candidate patches are rejected very early in the comparison process. This helps keep the time from growing quickly with the size of the input. Chapter 5. Results 42 Texton mask synthesis has a few weaknesses. The algorithm for determining the textons does not always divide the images in ways that are appropriate for the human visual system. The segmentation is usually acceptable when only two texton values are used but becomes more noticeable when the number of texton values used is increased as shown in Figure 5.1. In many cases, even when the texton mask does not divide the image well, the final synthesis result is still acceptable. Highly structured texton masks are often poorly synthesized. The best matching patch at a given step in the procedure might require significant warping. This affects the structure of the final image and is visually quite noticeable. Additionally, straight edges in texton masks can become bent across patch boundaries or even curved within a single patch (Figure 5.3). Horizontal and vertical edges in texton masks are often warped to noticeably different angles. Texton masks containing more than two texton values or with large numbers of transitions between textons typically take longer to synthesize. The requirement for bijectivity across patch boundaries results in an increased number of restarts in these cases. When synthesizing larger textures the randomized restart process prevents some texton masks from being synthesized. 5.1.2 Colour Synthesis The results obtained using the exhaustive search are the same as those obtained from the distance-based hash search. The hash method examines far less pixels and achieves its results more quickly than does the full search. As a result no exhaustive search results are included here. The distance-based search method is robust when it is given poorly synthesized texton masks. In general the maximum distance between textons in the texton mask does not grow with the size of the input texture. As a result the time taken by this method grows linearly with the input size. fc-coherence search is faster than the distance-based search because it compares fewer potential matches per synthesized pixel. The difference increases Chapter 5. Results 43 (a) (b) (c) (d) (e) (f) Figure 5.1: The results of automated texton selection. The overall structure is captured well and the major features are visible for all of the images. In (c) the highlights from the red section of the image are clustered with the green texton. Smooth transitions like those in (d) and (e) are not captured well by the texton mask. The addition of a third texton value to (d), (e) and (f) does not add much information about the features. In (f), the third texton does not differentiate between the green patches and the yellow patches. Chapter 5. Results Figure 5.2: Results of texton mask synthesis, (a) contains some visible patch boundaries, (b) shows a poor distortion in the lower left corner. Chapter 5. Results 45 (a) (b) (c) Figure 5.3: Results of texton mask synthesis. The horizontal and vertical edges in (b) and (c) are distorted by the patch warping. Although the original texton mask for (c) only had straight edges the synthesized mask has some curved edges as a result of patch warping. Chapter 5. Results 46 with larger source textures because the synthesis time forfc-coherencesearch is unaffected by the size of the source image. The results using fc-coherence search typically have numerous obvious artifacts that can be removed by using multiple passes of the synthesis algorithm. This makesfc-coherencesearch ideal for order independent search [43]. The texton mask provides global information about the texture making multiresolution pyramids unnecessary, even when using order independent search. The time spent using order independent search is usually comparable to that obtained by Wei and Levoy [43]. Our algorithm performs far worse when the texton mask synthesis requires several restarts. The results using texton masks withfc-coherencesearch have different artifacts from Wei and Levoy's when not enough generations are used. When only one generation is used Wei and Levoy's results have errors similar to Bonet's [9]. Our results show the global structure after the first generation but are blocky, similar to what would be expected if a lower resolution level of a Gaussian pyramid were scaled to the current size. Figure 5.4 shows the importance of using enough generations in the synthesis. Thefc-coherencesearch is not as robust as distance-based hashing when a poorly synthesized texton mask is used. This is a result of the larger number of pixels searched by the distance-based hashing. Figure 5.5 shows a comparison of the two methods when a poorly synthesized texton mask is used. Some failures of the colour synthesis algorithms are shown in Figure 5.6. Textures that contain smooth transitions of a single hue such as water, a sky with light clouds, or flames are synthesized poorly. The texton mask can not break up the image in a fine enough manner and, as a result, sharp edges appear in the synthesized image instead of smooth transitions. Another common problem is a 'speckled' effect. The distance portion of the error calculation overrides the local colour information leading to a lack of coherence in the reproduced colours. This leads to high frequency noise in the final image. Chapter 5. Results 2 iterations 47 3 iterations Figure 5.4: The original texture and its texton mask are shown at the left. The synthesized texton mask is shown next to them. The results of A>coherence search using order independent texture synthesis are shown on the right. 5.2 Entropy Minimizing Synthesis Using information gain to reduce the dimensionality results in very efficient synthesis. The results are generally very high quality. Selecting the dimensions for the candidate neighbourhood typically takes between 30 seconds and 10 minutes. This depends on the number of pixels required in the candidate neighbourhood, the size of the source image, and the number of textons the image is divided into. The complexity for calculating the conditional entropy grows exponentially with the number of pixels already known. Textures that require large numbers of pixels to be known before the conditional entropy becomes Chapter 5. Results 48 Figure 5.5: The original texture and its texton mask are shown at the left. The synthesized texton mask is shown next to them. The results of kcoherence search and distance based hashing are shown on the right. The fc-coherence search shows extensive problems with colour mismatches in areas where the synthesized texton mask is very different from the original texton mask. zero spend more time calculating the pixels in the conditional neighbourhood. Table B.4 shows some examples of the time required to calculate the candidate neighbourhood. Table B.5 shows that the number of textons used does not affect the final synthesis time. The number of textons used affects the time required to choose the dimensions that are kept and results in different candidate neighbourhoods being selected. The number of textons used also affects the error calculation. The more textons are used, the more accurate the error calculation will be. Figures 5.9 and 5.10 show the effect of the number of textons on the final synthesis result. Unless otherwise specified all of the results in this section use the Z/2 norm in the R G B colour space. The reduced neighbourhood search is able to accurately synthesize a wide range of textures When artifacts are present they are similar to those caused by fixed neighbourhood search [43]. When a very small number of dimensions (~ 4 — 6) or a very small number of textons (~ 8) is used then artifacts appear that are similar to those obtained using to small a neighbourhood with other synthesis techniques. The reduced neighbourhood search has poor results on some low frequency Chapter 5. Results 49 Figure 5.6: Examples of colour synthesis failures. The textures that contain various shades of a single colour are synthesized poorly. The smooth transitions are lost in the caustics image. The green scales image lacks the checkerboard pattern from the original image. The rock and brick textures have the correct overall structure but the colours are poorly reproduced. A l l of the images were synthesized using fc-coherence search with 3 generations and k = 2. Similar results are obtained using distance-based hashing for these examples. The texton masks are shown in Figures 5.2 and 5.3. Chapter 5. Results 50 Figure 5.7: Examples of colour synthesis using A>coherence search. Although the synthesized texton masks are visibly quite different from the originals the details and colour are accurately reproduced. A l l of the images were synthesized usingfc-coherencesearch with 3 generations and k = 2. Chapter 5. Results 51 Figure 5.8: The left two columns show the source textures and their texton masks. The right columns show the synthesized texton masks and the results of using distance-based hashing. Even when the texton mask is distorted the colour synthesis results are acceptable. Chapter 5. Results 52 textures and some highly structured textures. In some low frequency examples the artifacts are similar to those caused by jump maps [46]. The noticeable patch discontinuities are not as frequent as they are when jump maps are used. Additionally, changing some of the parameters often removes the jump map type artifacts. The artifacts resulting from using information gain to reduce the neighbourhood size are more like the artifacts caused by jump maps [46] than artifacts caused by using to small a neighbourhood. Unlike [46] our method does not create obvious seams when synthesizing low frequency textures. Artifacts caused by using a small neighbourhood are not present when the candidate neighbourhood is chosen from a sufficiently large neighbourhood unless a very small number of dimensions is used (~ 4 — 6). Highly structured textures are synthesized poorly. These are textures, with regularly spaced features. In cases where the features are highly detailed the synthesis algorithm often does not have enough dimensions in the search to accurately calculate the best visual match. As a result the synthesized image contains high frequency noise in some of these areas. 53 Chapter 5. Results 32 64 Figure 5.9: The number of textons used in the synthesis has a large impact on the quality of the results. When only 8 textons are used the image is very blurry. When 16 textons are used the image is still slightly blurred. A t 32 textons the image is clear but there is an uneven distribution of black and red berries. The image is clear and the distribution of black and red berries matches the input image when 64 textons are used. 32 64 Figure 5.10: The number of textons used in the synthesis also affects low frequency textures. When too few textons are used the interfeature distances are not maintained and the features are not accurately reproduced. When too many textons are used the images show slight discontinuities at patch boundaries. Chapter 5. Results 55 Figure 5.11: Natural textures are synthesized using fc-coherence search with k = 1. The quality of synthesis is similar to Ashikhmin's results. The images are all synthesized using 32 textons except for the rocks which use 64 textons. The circuit texture used 8 dimensions, the rocks used 16 dimensions, the grass used 12 dimensions and the flowers used 24 dimensions. Chapter 5. Results 56 Figure 5.12: Low frequency texture synthesis. The artifacts that are present in the images are consistent with fixed-neighbourhood search results. The wavy pink lines have some obvious patch edges that are similar to what would be seen using jump maps. The green scales texture was synthesized using a {5 x 5,2} neighbourhood at every level except the top which used a {5 x 5,1} neighbourhood. 57 Chapter 5. Results S ? describing the response r>f that neuro ht.-.! i function of posUion—ii perhap functional description of that neuron. seek « single conceptual and mathem. scribe the wealth of simple-cell recej id neurophy»iologically and inferred especially if such a framework has the it helps us to understand the funetio leeper way. Whereas no generic mo« ussians (DOG), difference of offset C ri v i tive of a Gaussian, hifher derivati - function, and so on—can he expect! inip!e-ceU receptive field, we noneth UJ 122 Ms* »*V«**'^*< S S S L f t * ' ^ * ^ * ' " « « nruroph3»ll<>os»lp#«»« L 9 •V , i.llyif.uch..„ . * » • h;bir lieip* m tn underitiinoc. ' l fue *er woy. W l m u n , G « I •onuroph* (DOG), difference a r' Y- H i l l y * of • Giuuien, hither it' llle-t^ ua to urmd so on—e»n be exi • n V.»»»h, rhereetive field, we nor t < M ; w K y J pn.tf.l'""™ I 1 «aur«r„ „ " < « « tstortP " * * < « • * N < « i r » ; """•1» 't « - * thru. t-""" "> ; tK ,! r *«*?S*!J n, * " * " . . . ,|^:ititJ<:tio>.-«'i!:wene»jtf< on poldeamraH'oler,* °™ - funwt field, we nt^her ef th«t r\, ,«lllo*' « » ™ n r - 1 j>a7». ihtrro]oallnh : B m^raMnilWi^ii i mil Figure 5.13: Texture synthesis on highly structured examples. The images in the left column are the original textures. The middle column contains synthesis results with k = 1. The right column contains synthesis results with k = 2. The artifacts when k = 1 are similar to the artifacts generated by Ashikhmin's method. The artifacts when k = 2 are similar to those generated by fixed neighbourhood search except the checkerboard pattern. The artifacts in the checkerboard are similar to jump map artifacts. 58 Chapter 6. Conclusion Chapter 6 Conclusion A conclusion is simply the place where someone got tired of thinking. —Arthur Block. Splitting the synthesis into local and global features works well on textures where there are a small number of well defined features. In these cases the texton mask is able to segment the image in a reasonable manner. Texton masks are useful if a particular pattern is desired but the storage costs for the full texture are too large. A texton mask containing the pattern can be stored using only 1-2 bits per pixel. A small colour image corresponding to a patch in the texton mask would provide the colour information and the remaining colour would be synthesized. Another possible application is a sketching interface. A n artist could start with a small image showing some sky with clouds, quickly sketch out some cloud boundaries in an image and allow the program to fill in the details of the image. Many of the dimensions in a standard search neighbourhood can be removed without having much impact on the quality of the results obtained. Using approximate distances does not have a large impact on the results obtained by A N N search when the approximation is good enough. These two modifications result in a significant speed up over fixed-neighbourhood search. Chapter 6. Conclusion 6.1 Future Work 6.1.1 Texton Masks 59 The texton mask synthesis currently restarts whenever it cannot find a bijective match at the current position. Consequently, a lot of work is redone, especially for larger synthesized textures where this problem is more common. Instead of restarting the search from a random starting point the last synthesized patch could be undone and resynthesized using a slightly worse fit. This process can be used to go back several steps if necessary. Using this method only some of the previous work is redone. A method based on hybrid texture synthesis [28] could also be used for synthesizing the texton mask. The resynthesis step would ensure that textons match properly across patch boundaries. However, hybrid texture synthesis was designed to maximize image quality without concern for speed. Using it to synthesize the texton mask would result in a rendering time measured in minutes instead of seconds. Other patch-based synthesis methods will not be appropriate for texton mask synthesis. The texton mask consists of a very small number of distinct values. Any algorithms that blend adjacent patch edges together would be unusable for this application because they would create values that don't exist in the original texton mask. M E B C would restrict the synthesized texton mask to the same set of textons but it cannot adequately handle patch boundaries where feature edges don't line up well. The current V Q algorithm for generating the initial texton mask could be replaced by a better image segmentation algorithm. There are many image segmentation algorithms from computer vision that would work [4]. Interactive image segmentation [34] may also be used to achieve higher quality results. Chapter 6. Conclusion 6.1.2 60 Entropy Minimizing Synthesis A more sophisticated V Q algorithm may improve the synthesis results. There are many V Q algorithms used for image coding that could be applied to this problem [27]. Additionally, implementing a more sophisticated error metric may improve the quality of the results. Other methods for deciding which dimensions to use in the A N N search may work better than V Q followed by information gain. Using the covariance matrix is one possibility. The covariance matrix works directly on the data and would not require V Q . Information gain is used in machine learning to construct decision trees. Using our pixel clustering a constant number of two level decision trees could be constructed using 0(n) space where n is the number of pixels in the image. Individually these trees would not be able to reduce the entropy sufficiently to produce an accurate result. The results from the trees could be combined using weighted averaging or a 'voting' system to select the value for the synthesized pixel. There are several obvious extensions for the entropy minimizing synthesis algorithm. A third dimension can be added to the neighbourhood to allow for video texture synthesis. It is likely that the number of dimensions kept would have to increase for this application. A hardware implementation using order independent neighbourhoods [43] is also possible. Animation synthesis is another possible extension, m frames of animation can be thought of as an rh x 1 texture. Each pixel would be an n-tuple containing all of the joint angles (or other appropriate representation) for the entire frame. The error texture makes the synthesis algorithm independent of the number of dimensions per pixel. Special care would be required to select the clusters Appendix A. Effects of High Dimensionality on Nearest Neighbour Search 61 Appendix A Effects of High Dimensionality on Nearest Neighbour Search Real-World problems are often 'high-dimensional', that is, are described by large numbers of dependent variables. Algorithms must be specifically designed to function well in such high-dimensional spaces. —David Rogers. High dimensional spaces have very different properties from low dimensional spaces. As a result our intuition about how objects behave does not scale well to higher dimensions. Common distance metrics and algorithms were developed for low-dimensional spaces and do not always generalize well to high dimensionalities. We will examine some of the properties of high dimensional spaces and their effects on nearest neighbour search. A.l Sparsity of Data To sample a d dimensional space uniformly with n samples per dimension requires n samples. For texture synthesis we begin with a constant number of d samples (the number of pixels in the source image) and cannot increase this to suit the dimensionality of the search space. Given a constant number of data Appendix A. Effects of High Dimensionality on Nearest Neighbour Search 62 points m there are only tfm data points per dimension. For a 64 x 64 texture in a 100 dimensional space this corresponds to less than 2 samples per dimension. This implies that the space is poorly sampled and the nearest neighbour may be quite far away from a randomly chosen query point. A.2 Volumes of Hyperspheres The volume of a d dimensional hypersphere of radius r is given by SR d S r -Hr d d d -f Jo where S is the surface area of a unit hypersphere. The surface area of a unit d hypersphere is given by . 27T / D 2 ~ T(d/2)' d The gamma function is /•oo r(m) = 2 / Jo e \ - dr. r 2m l Using special forms of T(d/2) when d is an integer allows S to be rewritten as d d is odd T3=2JT! s d K d is even. 2TT" (d/2-l)! The double factorial n\\ is computed as n-(n-2)...5-3-1 n > 0 odd n- (n-2)...6-4-2 n > 0 even 1 n = ' - l , 0. S increases as d increases until d = 7. After this point S d d decreases as d increases. The d that maximizes S can be computed numerically as approxid mately d = 7.257. Beyer et al. [6] proved the following theorem. 11**11 If I^E[\\X \ d = 0 Appendix A. Effects of High Dimensionality on Nearest Neighbour Search 63 then for every e > 0 lim P[Dmaxd < (1 + e ) D m i n J = 1. d—>oo This result imply that in high dimensions all of the data points will be roughly equidistant from any query point for many distributions. Hinneburg et al. [19] later proved the following: Let the distance function 11 • 11 be an Lk metric. Given an arbitrary distribution of two points hmd ooE[ Dmax* — D dJ- [_ /_ d(1 fc) (1/2) ] = C, d k where Ck is a constant depending on k This shows that the difference between the farthest and closest points to the query point increase at a rate dependent on the Lk norm and independent of the data distribution. For Lk norms where k > 2 Dmax — Dm,in converges to k k d d 0 with increasing dimensionality. It has been shown empirically that Lk norms with lower values of k are better able to discriminate between the nearest and farthest neighbours [2]. The inability to discriminate well between nearest and farthest neighbours has several implications. Branch and bound techniques become less useful as the dimensionality increases. Arya et al. [3] showed that the optimal A N N search for any Minkowski metric and fixed dimensionality can be found in 0(d\l + 6d/e] logn) time. As dimensionality increases all tree based A N N d search techniques can be beaten by a simple linear scan of the data. The nearest neighbour also becomes less meaningful as dimensionality increases. If a small perturbation of the query point can make the nearest neighbour into the farthest neighbour then how useful is finding the nearest neighbour? 64 Appendix B. Synthesis Times Appendix B Synthesis Times Absolutely nothing should be concluded from these figures except that no conclusion can be drawn from them. —Joseph L. Brothers. A l l of the results shown here were obtained using a 1.8 GHz Petium IV C P U with 512 M B of R A M . source texture input size # of textons output size time (s) red and green 64 x 64 2 192 x 192 0.88 green scales 64 x 64 2 192 x 192 0.16 green scales 64 x 64 2 256 x 256 0.25 green scales 64 x 64 3 128 x 128 0.69 green scales 64 x 64 3 192 x 192 23.63 green scales 64 x 64 3 (manual) 128 x 128 0.14 green scales 64 x 64 3 (manual) 192 x 192 — water 128 x 128 2 192 x 192 0.49 water 128 x 128 3 192 x 192 31.27 kpt4 258 x 258 2 192 x 192 1.10 kpt4 258 x 258 2 256 x 256 1.82 Table B . l : The time taken to synthesize the texton mask grows slowly with increasing input size. The time grows rapidly with increasing output size due to the bijectivity constraint. The synthesis algorithm was unable to synthesize all of the textures at the desired sizes. 65 Appendix B. Synthesis Times source texture input size # of textons output size time (s) red and green 64 x 64 2 • 192 x 192 1.33 green scales 64 x 64 2 256 x 256 3.10 green scales 64 x 64 3 (manual) 128 x 128 0.33 green scales 64 x 64 3 128 x 128 0.66 green scales 64 x 64 3 192 x 192 1.44 water 128 x 128 2 192 x 192 9.88 water 128 x 128 3 192 x 192 5.51 Table B.2: The times required to synthesize the colour components of a texture using hash-based search. The synthesis time increases dramatically when the size of the input texture is increased. source texture # of textons output size # of iterations k time (s) red and green 2 192 x 192 3 1 1.04 red and green 2 192 x 192 3 2 1.14 red and green 2 192 x 192 3 3 green scales 2 256 x 256 1 2 0.52 green scales 2 256 x 256 2 2 1.13 green scales 2 256 x 256 3 2 1.93 green scales 3 128 x 128 3 2 0.38 green scales 3 192 x 192 3 2 0.87 . 1.24 Table B.3: The times required to synthesize the colour components of a texture using /c-coherence order independent search. The time required to synthesize an image is proportional to the number of pixels, k, and the number of iterations used. The time decreases with increasing numbers of texton values. 66 Appendix B. Synthesis Times #of #of size of neighbourhood pixels means input calculation time (s) green scales 12 64 64 x 64 28.17 green scales 16 64 64 x 64 35.26 green scales 20 64 64 x 64 58.30 green scales 24 64 64 x 64 124.90 green scales 24 64 64 x 64 895.88 green scales 20 32 64 x 64 30.91 green scales 20 16 64 x 64 14.85 red and green 12 64 64 x 64 53.26 red and green 16 64 64 x 64 82.85 red and green 20 64 64 x 64 109.05 S16_m 20 64 128 x 128 213.14 source texture Table B.4: Time required to select the candidate neighbourhood. All of the results were obtained using four level Gaussian pyramids with neighbourhood sizes of {3 x 3,1}, {5 x 5,2}, {7 x 7,2}, {9 x 9,2}. 67 Appendix B. Synthesis Times k synthesis #of #of output pixels means size red and green 12 64 192 x 192 2 0.17 red and green 16 64 192 x 192 2 0.16 red and green 20 64 192 x 192 2 0.17 green scales 12 64 192 x 192 2 0.16 green scales 16 64 192 x 192 2 0.17 green scales 20 64 192 x 192 2 0.18 green scales 24 64 192 x 192 2 0.19 green scales 65 64 192 x 192 2 0.27 green scales 16 32 192 x 192 2 0.17 green scales 20 32 192 x 192 2 0.18 green scales 20 64 256 x 256 2 0.32 S16_m 16 64 256 x 256 2 0.38 S16_m 16 64 256 x 256 1 0.20 S16_m 20 64 256 x 256 1 0.21 pebbles 16 32 192 x 192 1 0.12 flowers 24 32 256 x 256 1 0.28 grass 12 32 256 x 256 1 0.20 kpt4 20 128 512 x 512 1 0.36 source texture time (s) Table B.5: Synthesis times using the candidate neighbourhoods determined by maximizing the information gain. All of the results were obtained using four level Gaussian pyramids with neighbourhood sizes of {3 x 3,1}, {5x5,2}, {7x7,2}, {9x9,2}. 68 Bibliography Bibliography [1] Charu C. Aggarwal. On the effects of dimensionality reduction on high dimensional similarity search. SIGMOD-SIGACT-SIGART In Proceedings of the twentieth ACM symposium on Principles of database systems, pages 256-266. ACM Press, 2001. [2] Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. On the surprising behavior of distance metrics in high dimensional space. Lecture Notes in Computer Science, 1973:420-434, 2001. [3] Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45(6):891-923, 1998. [4] Tetsuo Asano, Danny Z. Chen, Naoki Katoh, and Takeshi Tokuyama. Polynomial-time solutions to image segmentation. In Proceedings of the seventh annual ACM-SI AM symposium on Discrete algorithms, pages 104113. Society for Industrial and Applied Mathematics, 1996. [5] Michael Ashikhmin. Synthesizing natural textures. In Symposium on Interactive 3D Graphics, pages 217-226, 2001. [6] Kevin Bayer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is "nearest neighbour" meaningful? Lecture Notes in Computer Science, 1540:217-235, 1999. [7] Julian E . Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, series B, 48:259-302, 1986. Bibliography 69 [8] James F . Blinn and Martin E . Newell. Texture and reflection in computer graphics. In Communications of the ACM, pages 542-547, 1976. [9] Jeremy. S. De Bonet. Multiresolution sampling procedure for analysis and synthesis of texture images. In Computer Graphics, pages 361-368. A C M S I G G R A P H , 1997. [10] Fred L . Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell., 11(6):567585, 1989. [11] Rama Chellappa and Anil Jain. Markov Random Fields: Theory and Application. Academic Press, 1993. [12] Richard C. Dubes and Anil K . Jain. Random field models in image analysis. Journal of Applied Statistics, 16(2):131-164, 1989. [13] Alexei A . Efros and William T. Freeman. Image quilting for texture synthesis and transfer. In Eugene Fiume, editor, SIGGRAPH 2001, Computer Graphics Proceedings, pages 341-346. A C M Press / A C M S I G G R A P H , 2001. [14] Alexei A . Efros and Thomas K . Leung. Texture synthesis by non-parametric sampling. In IEEE International Conference on Computer Vision, pages 1033-1038, Corfu, Greece, September 1999. [15] Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721-741, 1984. [16] Paul Harrison. A non-hierarchical procedure for re-synthesis of complex textures. In V . Skala, editor, WSCG 2001 Conference Proceedings, 2001. [17] David J . Heeger and James R. Bergen. sis/synthesis. In SIGGRAPH, Pyramid-based texture analy- pages 229-238, 1995. Bibliography 70 [18] Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, and David H. Salesin. Image analogies. In Eugene Fiume, editor, SIGGRAPH 2001, Computer Graphics Proceedings, pages 327-340. ACM Press / ACM SIGGRAPH, 2001. [19] Alexander Hinneburg, Charu C. Aggarwal, and Daniel A. Keim. What is the nearest neighbour in high dimensional spaces? In The (VLDB) Journal, pages 506-515, 2000. [20] Josef Hoschek, Dieter Lasser, and Larry L. Schumaker. Fundamentals of computer aided geometric design. A. K. Peters, Ltd., 1993. [21] Michael Cohen Jonathan. Wang tiles for image and texture generation. In John Hart, editor, SIGGRAPH 2003, Computer Graphics Proceedings, pages 287-294. ACM Press / ACM SIGGRAPH, 2003. [22] Bela Julesz. Dialogues on Perception. MIT Press, Cambridge, Mass ; London, Eng., 1995. [23] Vivek Kwatra, Arno Schodl, Irfan Essa, Greg Turk, and Aaron Bobick. Graphcut textures: image and video synthesis using graph cuts. In John Hart, editor, SIGGRAPH 2003, Computer Graphics Proceedings, pages 277-286. ACM Press / ACM SIGGRAPH, 2003. [24] Lin Liang, Ce Liu, Ying-Qing Xu, Baining Guo, and Heung-Yeung Shum. Real-time texture synthesis by patch-based sampling. ACM Trans. Graph., 20(3): 127-150, 2001. [25] Yosef Linde, Andres Buzo, and Robert M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(l):84-95, 1980. [26] Karol Myszkowski, Takehiro Tawara, Hiroyuki Akamine, and Hans-Peter Seidel. Perception-guided global illumination solution for animation rendering. In ACM SIGGRAPH, pages 221-230, 2001. 71 Bibliography [27] Nasser M . Nasrabadi and Robert A . King. Image coding using vector quantization: a review. IEEE Transactions on Communications, 36(8):957971, 1988. [28] Andrew Nealen and Marc Alexa. Hybrid texture synthesis. In Eurographics Symposium on Rendering 2003, pages 97-105, 2000. [29] Rupert Paget and Dennis Longstaff. Texture synthesis via a nonparametric markov random field, 1995. [30] Rupert Paget and Dennis Longstaff. A nonparametric multiscale markov random field model for synthesising natural textures. In Fourth Inter- national Symposium on Signal Processing and its Applications, volume 2, pages 744-747, 1996. [31] Emanuel Parzen. On the estimation of a probability density function and the mode. Annals Mathematical Statistics, 33:1065-1076, 1962. [32] Ken Perlin. A n image synthesizer. In Proceedings of the 12th annual conference on computer graphics and interactive techniques, volume 19(3), pages 287-296, July 1985. [33] Kris Popat and Rosalind Picard. Novel cluster-based probability model for texture synthesis, classification, and compression, 1993. [34] Carsten Rother, Vladimir Kolmogorov, and Andrew Blake, "grabcut": interactive foreground extraction using iterated graph cuts. ACM Trans. Graph., 23(3):309-314, 2004: [35] Heung-Yeung Shum. In search of textons. In Proceedings of the Shape Modeling International 2003, page 197. I E E E Computer Society, 2003. [36] X i n Tong, Jingdan Zhang, Ligang Liu, X i Wang, Baining Guo, and HeungYeung Shum. Synthesis of bidirectional texture functions on arbitrary surfaces. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 665-672. A C M Press, 2002. Bibliography 72 [37] Minh Tran and Amitava Datta. Synthesising textures using variable neighbourhood searching. In Proc. 7th International Conference on Digital Image Computing: Techniques and Applications, pages 643-652. CSRIO Publishing, 2003. [38] Greg Turk. Texture synthesis on surfaces. In Eugene Fiume, editor, SIGGRAPH 2001, Computer Graphics Proceedings, pages 347-254. ACM Press / ACM SIGGRAPH, 2001. [39] Marcelo Walter, Alain Fournier, and Mark Reimers. Clonal mosaic model for the synthesis of mammalian coat pattern. In Graphics Interface '98, 1998. [40] Roger Weber, Hans-Jorg Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 194-205, 24-27 1998.' [41] Li-Yi Wei. Tile-based texture mapping on graphics hardware. In Graphics Hardware 2004, pages 55-63, August 2004. [42] Li-Yi Wei and Marc Levoy. Fast texture synthesis using tree-structured vector quantization. In Kurt Akeley, editor, Siggraph 2000, Computer Graphics Proceedings, pages 479-488. ACM Press / ACM SIGGRAPH / Addison Wesley Longman, 2000. [43] Li-Yi Wei and Marc Levoy. Texture synthesis by fixed neighborhood searching. PhD thesis, Stanford, 2002. [44] Qing Wu and Yizhou Yu. Feature matching and deformation for texture synthesis. In SIGGRAPH 2003, Computer Graphics Proceedings. A C M Press / ACM SIGGRAPH, 2004. [45] Ying-Qing Xu, Baining Guo, and Heung-Yeung Shum. Chaos mosaic: Fast and memory efficient texture synthesis. Microsoft Research Technical Report MSR-TR-2000-32, April 2000. Bibliography 73 [46] Steve Zelinka and Michael Garland. Towards real-time texture synthesis with the jump map. In Paul Debevec and Simon Gibson, editors, Proceedings of the Thirteenth Eurographics Workshop on Rendering Techniques, pages 99-104. Eurographics Association, 2002. [47] Jingdan Zhang, Kun Zhou, Luiz Velho, Baining Guo, and Heung-Yeung Shum. Synthesis of progressively variant textures on arbitrary surfaces. In John Hart, editor, SIGGRAPH 2003, Computer Graphics Proceedings, pages 295-302. ACM Press / ACM SIGGRAPH, 2003. [48] Song Chun Zhu, Cheng en Guo, Ying Nian Wu, and Yizhou Wang. What are textons? In Proceedings of the 7th European Conference on Computer Vison-Part IV, pages 793-807. Springer-Verlag, 2002.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Low dimensional search for efficient texture synthesis
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Low dimensional search for efficient texture synthesis Kimberley, Fred 2004
pdf
Page Metadata
Item Metadata
Title | Low dimensional search for efficient texture synthesis |
Creator |
Kimberley, Fred |
Date Issued | 2004 |
Description | Current texture synthesis algorithms rely upon high dimensional approximate nearest neighbour (ANN) searches to determine the best pixel to use at the current position. The feature vectors used in the ANN search are typically between 100 and 300 dimensions. A large amount of research has examined how to reduce the number of feature vectors that need to be searched but very little has been done to speed up the actual comparisons. We present two new texture synthesis algorithms that use an order of magnitude fewer dimensions during the ANN search. In addition, we construct and make use of an error texture that further reduces the time spent comparing two feature vectors. |
Extent | 9176166 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-11-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051742 |
URI | http://hdl.handle.net/2429/15561 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2004-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2004-0508.pdf [ 8.75MB ]
- Metadata
- JSON: 831-1.0051742.json
- JSON-LD: 831-1.0051742-ld.json
- RDF/XML (Pretty): 831-1.0051742-rdf.xml
- RDF/JSON: 831-1.0051742-rdf.json
- Turtle: 831-1.0051742-turtle.txt
- N-Triples: 831-1.0051742-rdf-ntriples.txt
- Original Record: 831-1.0051742-source.json
- Full Text
- 831-1.0051742-fulltext.txt
- Citation
- 831-1.0051742.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051742/manifest