Progressive Image Transmission By Linear Quadtree Coding and Wavelet Transformation by Hon Wai Cheung B.A.Sc , State University of New York at Binghamton, 1992 A thesis submitted in partial tulfillment of the requirements for the degree of Master of Applied Science in The Faculty of Graduate Studies (Department of Electrical Engineering) We accept this thesis as conforming to the required stanc|arNd The University of British Columbia March 1997 ©Hon Wai Cheung, 1997 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of <£; The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract Progressive image transmission allows an approximate image to be built quickly and the details to be transmitted progressively through several stages over the channel. This technique appears very useful for picture communication over slow channels. This thesis proposes to use linear quadtree encoding combined with wavelet based technique as well as other methods to achieve a hybrid coding progressive transmission system. Table of Contents Abstract ii List of Figures V List of Tables Viii Acknowledgments iX Chapter One Introduction 1 Chapter Two Pseudo Progressive Transmission System 3 2.1 Progressive Transmission 3 2.2 Display Algorithms 6 2.2.1 Interlacing Display Format 6 2.2.2 Interlaced GIF Display Format 7 Chapter Three Quadrant interlacing Progressive Transmission System11 3.1 Pyramid Data Structure 11 3.2 Display Algorithm 12 3.2.1 Region Quadtree Decomposition 12 3.2.2 Linear Quadtree Encoding 14 3.2.3 Linear Quadtree Decoding 15 3.2.3 Quadrant Interlacing Display Format 16 3.3 Compression Method 21 3.3.1 Quadtree Condensation 21 Chapter Four Subband Coding Progressive Transmission System 22 4.1 System Architecture 22 4.2 Coding Method 23 4.2.1 Subband Coding Techniques 23 4.3 Compression Methods 25 4.3.1 Quantization 25 4.3.17 Scalar Quantization 25 4.3.1.2 Vector Quantization 26 4.3.2 Entropy Coding 26 4.4 Progressive Transmission Scheme 27 Chapter Five Hybrid Coding Progressive Transmission System 29 5.1 System Architecture 29 5.2 Transform Method 30 5.2.1 Wavelet Transforms 30 5.2.2 Daubechies Wavelet Transform With Quadrature Mirror Filter Bank 32 5.3 Progressive Transmission Schemes 38 5.3.1 Linear Quadtree Embedding 40 5.3.2 Zerotree Embedding 42 Chapter Six Results and Evaluation 46 Chapter Seven Conclusion 72 References 75 iv List of Figures Figure 2.1 Richard (64x64) on raster display format 4 Figure 2.2 Interlacing display format 6 Figure 2.3 Richard (64x64) on interlaced GIF display format 9 Figure 2.4 25% of Richard (64x64) on 3 different display formats 9 Figure 2.5 Lenna (256x256) on interlaced GIF display format 10 Figure 2.6 25% of Lenna (256x256) on 3 different display formats 10 Figure 3.1 (a) Bitmap of a 8x8 binary image 13 (b) Region decomposition of the image 13 (c) Quadtree representation of the same image 13 Figure 3.2 (a) Linear quadtree representation of the binary image 16 (b) Euclidean coordinates of the same image 16 Figure 3.3 The flow chart of quadrant interlacing implementation 18 Figure 3.4 Richard (64x64) on quadrant interlacing display format 19 Figure 3.5 Lenna (256x256) on quadrant interlacing display format 20 Figure 4.1 Subband coding progressive transmission system 22 Figure 4.2 (a) System diagram of two-band spitting 24 (b) System diagram of two-band recombination 24 Figure 5.1 Hybrid coding progressive transmission system 29 Figure 5.2 Transform matrix for DAUB4....; 35 Figure 5.3 Pyramid algorithm 35 Figure 5.4 One stage in a multiscale image decomposition 37 v Figure 5.5 Image decomposition 37 Figure 5.6 One stage in a multiscale image reconstruction 38 Figure 6.1 (a) Richard (64x64) simple computer graphic 50 (b) Lenna (128x128) common processing image 50 (c) Brain (256x256) image of human brain obtained by MRI 50 (d) Finger (512x512) FBI finger print sample 50 Figure 6.2 (a) The frequency distribution of the image "Richard" (64x64) and the corresponding wavelet coefficients after several iterations .. 51 (b) The frequency distribution of the image "Lenna" (128x128) and the corresponding wavelet coefficients after several iterations .. 52 (c) The frequency distribution of the image "Brain" (256x256) and the corresponding wavelet coefficients after several iterations .. 53 (d) The frequency distribution of the image "Finger" (512x512) and the corresponding wavelet coefficients after several iterations .. 54 Figure 6.3 (a) The corresponding wavelet coefficients map of figure 6.2(a) ..55 (b) The corresponding wavelet coefficients map of figure 6.2(b) ..56 (c) The corresponding wavelet coefficients map of figure 6.2(c) ..57 (d) The corresponding wavelet coefficients map of figure 6.2(d) ..58 Figure 6.4 (a) Hybrid coding progressive transmission display of "Richard"..59 (b) Hybrid coding progressive transmission display of "Lenna" ...60 (c) Hybrid coding progressive transmission display of "Brain" 61 (d) Hybrid coding progressive transmission display of "Finger" ...62 vi Figure 6.5 (a) Quadrant interlacing display of "Richard" 63 (b) Quadrant interlacing display of "Lenna" 64 (c) Quadrant interlacing display of "Brain" 65 (d) Quadrant interlacing display of "Finger" 66 Figure 6.6 (a) The compressed images by Interlaced GIF 67 (b) The compressed images by Quadrant interlacing 68 (c) The compressed images by Linear quadtree embedding 69 (d) The compressed images by Zerotree embedding 70 (e) The compressed images by SPIHT 71 vii List of Tables Table 6.1 Performance comparisons for the 512x512 Lenna image showing the peak-signal-to-noise rate (PSNR) in dB 48 Table 6.2 Subjective tests on 512x512 image "Lenna" 49 Acknowledgment I would like to thank Dr. Gunther Schrack for his supervision and advice. His enthusiasm and policy made work with him a pleasure. I would like to thank Dr. Panos Nasiopoulos for his contributions at the beginning of this project. I also thank my colleagues Clifford Loo, Lawrence Tang, Philip Liu, and William Woo for their helpful support during the year. The calm friendship of Sandra Tarn is much appreciated. Her support ensured that my time as a student was an enjoyable one. Most importantly I wish to thank my family, whose unquestioning support, both emotional and material, has been with me through all the endeavors of my life. ix Chapter One Introduction It is often a problem how to store and transmit images efficiently. Since the high demand on Internet browsing, limited bandwidth and cost on mobile communication, it is necessary to compress images and transmit them progressively. Because of compression, less storage space is required on both transmitter and receiver sites. Then it will take less time to let the receiver display an image because of progressive transmission. There are numerous research on progressive image transmission has been conducted, such as using JPEG [Zhang, 1994]; Laplacian pyramid coding [Mongatti, 1992]; vector quantization [Riskin, 1994]; stack-run image coding [Tsai, 1996]; space-frequency quantization [Xiong, 1993]; embedded zerotree wavelet [Shapiro, 1993] and set partitioning in hierachical trees [Said, 1996]. In this research, we propose a quadrant interlacing system and a hybrid coding system to achieve such a goal. Two different progressive image transmission techniques are provided by the proposed systems. The first is called the quadrant interlacing technique. It employs a pyramid data structure to decompose the image. It will produce a progressive display while the image is reconstructed. The second technique applies linear quadtree coding and wavelet transforms. By using linear quadtree embedding scheme or zerotree embedding scheme, the wavelet coefficients are 1 embedded into a bit stream sorted by numerical importance. A receiver has been designed so that it will display the image as soon as the first packet of coefficients arrives and will refine the image by adding on the incoming decoded coefficients. This technique will produce a good progressive image display. This thesis proposes the idea of using linear quadtree encoding combined with wavelet theory as well as other methods to achieve a hybrid coding progressive transmission system. In chapter 2, current interlaced display formats are discussed. In chapter 3, an interlaced format by using the pyramid data structure is proposed. In chapter 4, current subband progressive transmission system is described. In chapter 5, a hybrid coding progressive transmission system is proposed. In chapter 6, the performances of the proposed systems are evaluated. In chapter 7, the results of this thesis are summarized and discussed. 2 Chapter Two Pseudo Progressive Transmission System 2.1 Progressive Transmission Conventionally, an image is displayed in raster order - from left to right, top to bottom. Figure 2.1 is an example of a simple image displayed in raster order. However, if the transmission rate is very low, there are advantages to reconstruct the image differently. For example, 10 percent of the way through a raster transmission, only the top tenth of the picture can be seen, and usually this is not very informative. Generally it will be better to be able to see the whole picture at a tenth of the final resolution. This is why progressive image transmission is employed. Progressive image transmission is a class of image transmission techniques where the information concerning the image is transmitted in two or more stages. At the first stage, the transmitter sends the information of a coarse image; at later stages, further details of the image can be transmitted on the request of the viewer at the receiver. A typical application of progressive image transmission is on-line-browsing [Rashwan, 1993][Malak, 1991], where the recipient wishes to see a recognizable image as quickly as possible. If it is the desired image, the recipient will request more information, perhaps at successive stages, to improve the quality of the image; if not, the recipient can terminate the transmission immediately. 3 10 20 30 40 50 60 4 data (0.1%) transmitted 256 data (6.3%) transmitted In this thesis, we present two methods for progressive transmission. One is called "quadrant interlacing". It is a fractal coding technique to provide a progressive transmission effect. The other one is to apply a discrete wavelet transform followed by linear quadtree encoding to arrange the image spatial frequency information hierarchically. 2.2 Display Algorithms 2.2.1 Interlacing Display Format Generally, image pixels are displayed in progressive order sequentially and progressively line by line. It is called raster (more specifically, non-interlaced raster) display format [Witten, 1994]. Figure 2.1 is an example of a raster display format. It is instrumentally the simplest form and is appropriate for scanning still pictures. However, for low transmission rates, other display formats have been considered to let the browser more easily recognize the image. Most of these involve scanning a fraction (a field) of the total number of pixels of a single scan and returning to scan the remaining pixels in subsequent scans. This type of display format is called interlaced raster scanning [Brown, 1967], in that each field, in a well chosen pattern, is spatially interlaced with the previous one. An interlacing format is illustrated in Figure 2.2 First Field Figure 2.2 Interlacing display format 6 2.2.2 Interlaced GIF Display Format The Interlaced GIF [Wilson, 1991] format implements interlacing. It is used to give the browser an idea of what an image looks like before all the bitmap data has been received. A non-interlaced image displays the image in a linear fashion from the top to the bottom. Over a slow link, it may take several minutes for an image to be painted. By the time when 50% of the bitmap data is received, only the top half of the image is displayed. The content of the bottom half is still a mystery to the browser. In a non-interlaced GIF, the pixel lines are displayed in consecutive order. For example, the lines of a 16-line non-interlaced GIF file are stored as: 1,2,3,4,5 16. Interlaced GIF paints quickly over the entire display first as a very low resolution image. Only about 12.5% of the bitmap is displayed in.the first pass. The GIF image is then repainted in three more passes, with each pass supplying more resolution until all of the data is received (12.5%, 25%, and 50% respectively). A browser can usually get a good idea of the entire image when only 30 ~ 50% of the bitmap data has been received. This is useful for knowing whether or not to abort an image being viewed via a slow link. 7 The lines of the same 16-line bitmap in an interlaced GIF are stored as 1,9, 5, 13, 3, 7, 11, 15,2,4,6,8, 10, 12, 14, 16. Figure 2.3 and Figure 2.5 are examples of interlaced GIF display. Richard is a simple (64 x 64) computer image and Lenna is a complex (256 x 256) image from a photograph. A survey was conducted among ten people by asking their subjective advise. It shows that the images could both be recognized at the 50% stage. In the next chapter, another interlaced format is introduced, called the "quadrant interlacing" display format. Figure 2.4 and Figure 2.6 show 25% of the images displayed by three different display formats. It is clear that "quadrant interlacing" provides more image information for the browser. 8 100 150 50 100 150 200 250 8192 data (12.5%) 32768 data (50%) transmitted transmitted transmitted transmitted Figure 2.5 Lenna(256x256) on interlaced GIF display format. Figure 2.6 25% of Lenna (256x256) on 3 different display formats Chapter Three Quadrant Interlacing Progressive Transmission System 3.1 Pyramid Data Structure The progressive transmission technique of the quadrant interlacing progressive transmission system is based on the pyramid data structure [Alexandrov, 1993], where the images at the bottom of the pyramid correspond to a set of reduced-resolution approximations. Progressive image transmission is achieved by transmitting the levels of the pyramid ranging from the lowest resolution level (top level) to the original resolution level (bottom level). Several pyramid data structures have been described in the literature [Burt, 1983] [Tanimoto, 1979]. A pyramid data structure which is useful in representing digital images is the region quadtree [Samet, 1984]. Region quadtrees are attractive for a number of reasons: . Relative simplicity compared to other methods (e.g., discrete-cosine-transform based coding), which makes it an attractive method for applications such as High definition television (HDTV) [Strobach, 1988]. . Allows focusing on specific local areas since the data representation is based on a decomposition [Chitale, 1991]. The decomposition actually results in an image segmentation. This segmentation can be used for a variety of different image processing applications, e.g., progressive transmission [Caron, 1994]. 11 3.2 Display Algorithm 3.2.1 Region Quadtree Decomposition A natural gray-level image usually can be divided into regions of different sizes with varying amounts of detail and information. Such segmentation of the image is useful for an efficient coding of the image data. Region quadtree decomposition [Shusterman, 1994] is a technique which divides the image into two dimensional homogeneous regions. For a given n x n image, its corresponding region quadtree is defined as follows: the root node represents the entire image. If all the pixels of the image are of the same color, the root node itself becomes a leaf node holding the color of the image. Otherwise, the root node has exactly four children (or descendents), the quadrants, which represent the colors of the northwest (NW), northeast (NE), southwest (SW) and southeast (SE) blocks of the image. This decomposition continues for all nodes until the block represented by the corresponding node consists of only one color. Figure 3.1 is an example of a region quadtree representation of an 8 x 8 binary image. Quadrants of each node are arranged in the order of NW, NE, SW and SE from the left. In Figure 3A(a), 0 and 1 represent WHITE and BLACK pixels, respectively. It is assumed that the background color of the image is WHITE. Black blocks and their quadtree nodes are marked with A to P in Figure 3.1(b) and Figure 3.1(c). 12 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 A C B D E F G H K L N 0 I M J P (a) (b) • • • L 6 O i l 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ( C ) Figure 3.1 (a) Bitmap of a 8x8 binary image; (b) Region decomposition of the Image; (c) Quadtree representation of the same image 13 3.2.2 Linear Quadtree Encoding One way to store a quadtree is to use the tree structure. Since non-leaf nodes need space for the pointers to their quadrants, the space requirement for such a representation is high especially when the quadtree consists of a large number of nodes. Due to this fact, representations without using pointers are of interest. One of the pointerless quadtree representations is called the linear quadtree proposed by [Gargantini, 1982]. She pointed out that it is possible to attach a unique location code to a node, to interpret a location code as an interleaved coordinate and to replace a pointer-based quadtree by a list of the location codes of the black nodes. Each leaf node of a quadtree may be represented by a pair of integers {n, I}, where n is a spatial address named the location code and I is the level. The root of the tree structure is declared as level 0, the first subdivision as level 1, etc. The level of a node is thus its distance in the tree from the root. A leaf node at level r is called a pixel. A location code is a quaternary integer; its digits comprise the path which is traversed in the quadtree to reach a leaf node from the root. Quadrants are labeled according to a labeling scheme called the Z-order or Morton path, hence, NW, NE, SW and SE quadrants are labeled 0,1,2, and 3 respectively. The most significant digit of a location code stores the quadrant of level 1, the following digit is the quadrant of level 2, etc., the least significant digit is the quadrant of the pixel-level. A location code has always exactly r quaternary digits. A node at level l<r is represented by the location code of its NW corner pixel, and ends in r-l 0 bits. A 14 linear quadtree is a list of node/level pairs of all black nodes. If a pixel has gray scale instead of being black, a value that indicates the intensity is assigned with the node/level pair as well. Figure 3.2(a) shows the linear quadtree of an (8 x 8) binary image. 3.2.3 Linear Quadtree Decoding The location code of a pixel not only represents the traversal path in a quadtree. It is possible to calculate the Euclidean coordinates of the pixel from the location code [Schrack, 1992]. It is easy to show that the binary representation of the location code n of a pixel is an interleaved coordinate that has the following structure: n = yr-i*r-i • • • YiXiYoXo, where x = x M . . . x 7 x 0 ) y = y M . . . y,y 0 , and x„ y,are the binary digits of the internal representation of the coordinates. Figure 3.2 (b) shows the corresponding Euclidean coordinates of Figure 3.2(a). 15 X A C B D E F G H K L N 0 1 M J P (a) 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 (b) Figure 3.2(a) Linear quadtree representation of the binary image: [(7,3), (12,2), (18,3), (24,2), (32,3), (33,3), (36,3), (37,3), (39,3), (44,2), (48,3), (49,3), (50,3), (52,3), (53,3), (56,2)] Figure 3.2(b) Euclidean coordinates of the same image: [(3,1), (2,2), (3,2), (2,3), (3,3), (4,2), (5,2), (4,3), (5,3), (0,4), (1,4), (2,4), (3,4), (3,5), (2,6), (3,6), (2,7), (3,7), (4,4), (5,4), (4,5), (6,4), (7,4), (4,6), (5,6), (4,7), (5,7)] 3.2.4 Quadrant Interlacing Display Format A quadrant interlacing display format is a pyramid data structure of the image. By using the idea of region quadtree decomposition and interlacing, we can provide a better interlaced format. Consider Figure 3.1(a) as an example. Figure 3.1( c ) is the region quadtree representation of the blocks. At the resolution 1, all four blocks are leaf nodes. Therefore the receiver prints black those four quadrants at [(0004, 1), (1004,1), (2004,1), (3004,1)]. At resolution 2, only ten blocks are leaf nodes. So the quadrants [(0104,2), (0304,2), (1004,2), (1204,2), (2004,2), (2 1 04,2), (2304,2), (3004,2), (3 1 04,2), (3204,2)] are printed black. As the resolution increases, the detail of the quadrants is improved. 16 The implementation is shown in Figure 3.3. It involves linear quadtree coding. The upper left corner pixel of each quadrant is defined as the "P-Point". Then the image is divided into four quadrants. If the quadrant has a black node, the "P-Point" and the level of that quadrant will be transmitted. Therefore, at the first time, it will have a maximum of four black quadrants. It is the image at resolution 1. At the next attempt, each quadrant is divided by 4. More "P-Points" are calculated for transmission. When the receiver places them back to the position and fills that specific square, the resolution is improved. If this procedure is operated recursively, the user will be able to recognize the image effectively. Figure 3.4 and Figure 3.5 are examples of quadrant interlacing displays of Richard 64x64 and Lenna 256x256 respectively. A subjective observation experiment has been conducted. It shows that Richard can be recognized at 25% (resolution is 5) of data transmitted while Lenna can be recognized already at only 6.25% (resolution is 6) of data transmitted. 17 Linear quadtree encoding Level =1 Calculate the "P-Point" and the size of each quadrant Quadrant = 1 Yes Figure 3.3 The flow chart of quadrant interlacing implementation 18 Resolution = 1 4 coefficients transmitted 10 20 30 40 50 60 Resolution = 4 256 coefficients transmitted Resolution = 2 16 coefficients transmitted 30 40 50 Resolution = 5 1024 coefficients transmitted Resolution = 3 64 coefficients transmitted 10 20 30 40 50 60 Resolution = 6 4096 coefficients transmitted Figure 3.4 Richard (64X64) on quadrant interlacing display format 50 200 290 150 200 250 Resolution =1 4 coefficients transmitted 50 100 160 200 250 Resolution =2 16 coefficients transmitted so 100 Resolution =3 64 coefficients transmitted Resolution =4 256 coefficients transmitted Resolution =5 1024 coefficients transmitted 60 100 150 200 250 Resolution =6 4096 coefficients transmitted 50 tOO 150 200 250 Resolution =7 16384 coefficients transmitted SO 100 Resolution =8 65536 coefficients transmitted Figure 3.5 Lenna (256x256) on quadrant interlacing display format 3.3 Compression Method 3.3.1 Quadtree Condensation As demonstrated by Figure 3.2, four identical pixels are grouped into a single node and assigned a lower level value to the linear quadtree code, reducing the size of the image file. For example, the segments B, D, J, P can be stored as four linear quadtree codes at level two instead of sixteen linear quadtree codes at level three. Obviously, an image which has more spatial redundancy will perform better. Also, it works well on an image which has many zero nodes because the linear quadtree coding excludes the zero nodes. The degree of compression is image-dependent. Some images, such as Richard, yield a better compression ratio than others. The above technique is a lossless compression. If a user can tolerate some errors, a lossy compression approach can be considered. Take Lenna as an example. There is not a large visual difference between the images of resolution 7 and of resolution 8. Therefore, the image can be stored at resolution 7 and yield a 75% compression. 21 Chapter Four Subband Coding Progressive Transmission System 4.1 System Architecture The design of a subband coding progressive transmission system can be divided into three parts: subband coding techniques, quantization strategies, and error-free encoding techniques. In each part, one has the freedom to choose from a pool of candidates and this choice will ultimately affect the system performance. In this chapter, a general system design is introduced. Most of the systems are using the following architecture. Source image data Reconstructed image data Subband encoding Subband decoding Quantization Entropy encoder Quantization table Entropy code table • Dequantization Entropy decoder Channel Figure 4.1 Subband coding progressive transmission system 22 4.2 Coding Method 4.2.1 Subband Coding Techniques The basic idea of subband coding [Woods, 1986] is to decompose the image into subimages, according to the frequency zones. A bit rate is assigned to each subimage individually according to the energy content of that subimage. The low frequency bands contain maximum information, while high frequency bands contain less information. Hence more bits are allocated to low frequency bands. This results in a high quality coding at low bit rates. In a two-band subband decomposition, the encoder consists of two filters, a lowpass filter hQ(z) and a highpass filter /z,(z), followed by downsampling by two. It gives rise to two subimages which are full band at a lower sampling rate . The whole process is known as decimation. Similarly, on the synthesis side, there is an upsampling by two followed by an interpolation by filters h0(z) and h\{z). Figure 4.2 shows a two-band splitting system. 23 Decimation Decimation Subbands ~* y00. y 0i (a) y 0 0 y 0i Interpolation Interpolation (b) Figure 4.2 (a) System diagram of two-band splitting (b) System diagram of two-band recombination 24 4.3 Compression Methods 4.3.1 Quantization Quantization is the process to convert continuous quantities into digital quantities. The data may be in scalar form or vector form. The conventional techniques such as pulse code modulation perform the quantization on the scalars, e.g. individual real valued samples of the waveform or pixels of an image. 4.3.1.1 Scalar Quantization A scalar quantizer discretizes one sample at a time. |t is a special case of vector quantization. A scalar quantizer Q with L levels is entirely defined by two sets of values: L+1 values x0, x1, . . ., xL called decision levels, which divide the space of real numbers, and L reproduction values (output values) y1, y2, . . ., yL such that: Q(x) = y, if x,_, < x < x, for i = 1,2,..., I Suppose the resolution of an image is N. This will yield 3N+1 subbands. Since the variance of each subband is generally different, a quantizer is designed for each subband. Two types of errors occur during quantization. 1. Quantization noise: this is the difference between the input value and the reproduction value in the domain [ x0, xL ]. 2 5 2. Overload noise: this is the truncation effect when the signal exceeds the boundary decision thresholds x0 and xL and the values are quantized as y1 and VL-4.3.1.2 Vector Quantization Vector quantization is a generalization of scalar quantization in which vectors, or blocks, of data are quantized instead of the data themselves. To apply vector quantization to subband coding, the common approach is still to consider each subband individually. A sub-code-book is generated for each subband, and a multiresolution codebook is obtained by assembling all sub-code-books. Senoo and Girod [Senoo, 1992] compared several vector quantization algorithms for subband image coding. 4.3.2 Entropy Coding After the quantization, the subbands usually are highly correlated. Entropy encoders, such as the Huffman coder or the run-length coder, are used to further compress the data. Commonly, run-length encoding an abundance of zeros, when combined with Huffman encoding of the nonzero values, produces good results. 26 4.4 Progressive Transmission Scheme A progressive image transmission technique for subband coding was proposed by Westerink [Westerink, 1989]. The basic idea of subband coding is to split up the frequency band of a signal and then to encode each subband separately. For the progressive transmission of a subband encoded image, the subbands are sent one after the other. The transmission order of subbands is determined by the total mean squared error, D t o t, which can be expressed as the sum of the mean square error values d, per subband: where N is the number of subbands. To determine the transmission order, the transmitter evaluates the maximum of decrease in total mean square error which is equal to: max— -i*s b, where bj = number of bits for subband /; a) = variance of subband /; S = set of subbands that has not yet been transmitted. The actual transmission order will be made known to the receiver by encoding the subband numbers and sending these along with the corresponding subband data. A specific scanning pattern is used to replace the coefficients within the subband. The zigzag scanning pattern and raster scanning pattern are commonly used. 2 7 There are several disadvantages of this scheme. Firstly, traditional transform coders, such as those using the discrete cosine transform, decompose images into a representation in which the edge information tends to disperse so that many non-zero coefficients are required to represent edges with good fidelity. Secondly, no code embedding occurs within the subband because it has to follow a specific scanning pattern. In the next chapter, we propose a wavelet transform system with linear quadtree code embedding to solve the above problems. 28 Chapter Five Hybrid Coding Progressive Transmission System 5.1 System Architecture The Hybrid coding scheme is characterized by the following concepts: 1. Daubechies wavelet transform 2. Linear quadtree encoding The Daubechies wavelet transform decorrelates most images fairly well. The significant coefficients can be encoded as a quadtree. After the transformation, linear quadtree encoding assigns location codes to the coefficients for embedding. Finally, the coefficients will effectively transmitted by a embedding scheme. Figure 5.1 shows the architecture. r Transmitter Source image data r Forward wavelet • transform Linear quadtree encoding Embedding scheme Channel Reconstructed image data iReceiver Inverse wavelet transform Inverse embedding scheme Figure 5.1 Hybrid coding progressive transmission system 29 5.2 Coding Method 5.2.1 Wavelet Transforms Wavelets are functions generated from one single function ¥ by dilation and translations, The definition of wavelets as dilates of one function means that high frequency wavelets correspond to a<1 or narrow width, while low frequency wavelets have a>1 or wide width. The basic idea of the wavelet transform is to represent an arbitrary function / as a superposition of wavelets. Any such superposition decomposes / into different scale levels, where each level is then further decomposed with a resolution adapted to the level. One way to achieve such a decomposition writes / a s an integral over a and b of vF a , 'with appropriate weighting coefficients. In practice, one prefers to write / as a discrete superposition (as a sum rather than an integral). It introduces a discretization, a = a™,b = nb0a'", with m,n e Z , and a0 > 1, b0 > 0 fixed. The wavelet decomposition is then where t is a one-dimensional variable. The mother wavelet ¥ has to satisfy the condition: 30 f = Tc (fYv J / J nut \J } 111,11 with ,^,,,,,(0 = V*** (t) = <"'2xv{<"t ~ nb0) Decomposition of this type were studied in [Daubechies, 1986]. For a0 > 2, b0 > 1 there exist very special choices of y such that the mnun constitute an orthonormal basis, so that Different bases of this nature were constructed by Daubechies [Daubechies, 1988] and others. All these examples correspond to a multiresolution analysis, a mathematical tool invented by S. Mallat [Mallat, 1989] which is particularly well adapted to the use of wavelet bases in image analysis, and which gives rise to a fast computation algorithm. 31 5.2.2 Daubechies Wavelet Transforms with Quadrature Mirror Filter Bank The discrete wavelet transform, as developed by Daubechies, has many similarities to the Fast Fourier transform. Both take an input vector whose length is normally a power of two and output a different vector of the same length. The entire process is also reversible which means that transform data can be used to reconstruct the original input at any point in the procedure. The Fast Fourier transform yields a unique decomposition in terms of continuous sines and cosines as the transform output. The wavelet transform yields a decomposition which is neither continuous nor unique. The primary advantage of the discrete wavelet transform is that it does not have the limited time-frequency resolution of the Fast Fourier transform and thus provides a more accurate representation of the input [Rioul, 1991]. There are many different types of discrete wavelet transforms which have been explored since the original work in the 1980's and each have their own advantages and disadvantages when performing data analysis. As an example, some provide smoothing the representation of a signal, while others represent the signal more compactly for data compression. A particular class of wavelets may have several different high order solutions which yield a variety of characteristics from highly compacted to highly smooth within the same family [Lu, 1996]. 32 A specific discrete wavelet transform is identified by a particular set of numbers called wavelet filter coefficients. One of the simplest and most localized wavelet transforms is referred to as DAUB4 which is based on four wavelet coefficients first solved by Daubechies [Daubechies, 1986]. This particular wavelet transform is easy to implement, and is computationally efficient. Daubechies discovered that the wavelet transform could be implemented with a specifically designed pair of finite impulse response filters called a quadrature mirror filter pair. A finite impulse response filter performs a dot product between wavelet filter coefficients and the input data vector. The output of this filter is a decimated (every other sample removed) version of the input data. Repeating this process recursively results in a group of signals that divide the spectrum of the original signal into octave bands with successively coarser measurements in time as frequency decreases [Rioul, 1992]. The same effect can be accomplished mathematically by repeatedly applying a transform matrix comprising wavelet coefficients to a column vector of discrete input data. After each application of the transform matrix, the detail data and the smoothed or decimated data must be separated. The transform matrix is then applied to the smoothed data as the process is repeated. Figure 5.2 and 5.3 illustrates the wavelet transform process. The transform matrix (in this case DAUB4) is applied to the input (y1. . . y16) yielding smooth 33 data (s1. . . s8) interleaved with detail data (d1. . . d8). The results are permutated to separate the smooth and detail data (Figure 5.3). The detail data is simply stored while the transform matrix is applied to the smoothed data. Each repetition of the process divides the smooth data in half. The process can be terminated at any point, but usually proceeds until there are only two data points left. The final output is exactly the same number of data points as the input and can be used to reconstruct the original data. Note the pattern of coefficients used for the diagonal of the transform matrix in Figure 5.2. The rows consist of identical pairs of coefficients. The exception to this are the last two rows which simply wrap the values back into the first column The odd rows are the smoothing filters while the even rows yield the detail data. This accounts for the interleaved data output in Figure 5.2. The values wrap in the last row because it is assumed that data represents a repeating waveform. In a signal processing context, this pattern is called a quadrature mirror filter [Vaidyanathan, 1990], 34 "cO cl c2 c3 >1" 'si -c3 c2 -c l cO y2 dl cO cl c2 c3 yi s2 -c3 c2 -c l cO y* d2 cO cl c2 c3 ys s3 -c3 c2 -c l cO yo cB cO cl c2 c3 yi s4 -c3 c2 -c l cO d4 cO cl c2 c3 y9 s5 -c3 c2 -c l cO ylO cB cO cl c2 c3 yll s6 -c3 c2 -c l cO y\2 d6 cO cl c2 c3 yl3 si -c3 c2 -c l cO ylA dl c2 c3 cO cl y\5 sS -c l cO -c3 c2 yl6 Figure 5.2 Transform matrix for DAUB4 s1 s1 s'1 d1 s2 d'1 s2 s 3 s'2 d2 s4 d'2 s3 s 5 s'3 d3 s6 d'3 s4 s7 s'4 d4 Permutate S 8 Transform d>4 s5 d1 d1 d5 • d2 : • d2 s6 d3 d3 d6 d4 d4 s 7 d5 d5 d7 d6 d6 s8 d7 d7 d8 d8 d8 Figure 5.3 Pyramid algorithm 35 The DAUB4 transform matrix is orthogonal, thus the inverse is just a transposed version of the original. The inverse matrix can be used to reverse the wavelet transform and reconstruct the original data. DAUB4 provides a very compact representation of the input and most of the information for reconstruction is contained in the highest amplitude points of the wavelet data. For images, a two-dimensional wavelet transform is needed. It is accomplished by two separate one-dimensional transforms. Figure 5.4 represents one stage in a multiscale pyramidal decomposition of an image: wavelet coefficients of the image are computed, as in the one-dimensional case, using a subband coding algorithm. The h and g are quadrature mirror filters. The coefficients of DAUB4 are given by [Daubechies, 1986]: , 1 + V3 , 3 + V3 , 3 - V 3 , 1-V3 K - —z— «i = —z— >h = —~— K = —-— . 0 2 1 2 n 2 3 2 and -&=(-l)'V,-> w h e r e i = 0,l,2,3 This decomposition provides subimages corresponding to different resolution levels and orientations (see Figure 5.5). The reconstruction scheme of the image is presented Figure 5.6. 36 Initial image corresponding to the resolution level m Downsample by 2 along x Downsample by 2 along x 9 g Downsample by 2 along y Downsample by 2 along y Downsample by 2 along y Downsample by 2 along y Initial image corresponding to the low resolution level m-1 Detail image corresponding to the low resolution level m-1 Figure 5.4 One stage in a multiscale image decomposition Low resolution sub-image horizontal orientation sub-image (LH1) horizontal orientation sub-image (LH2) vertical orientation sub-image (HL1) diagonal orientation sub-image (HH1) vertical orientation sub-image (HL2) diagonal orientation sub-image (HH2) Figure 5.5 Image decomposition 37 Initial image corresponding to the low resolution level m-1 — Detail image corresponding to the low resolution level m-1 Upsample by 2 along y Upsample by 2 along y Upsample by 2 along y Upsample by 2 along y g Upsample by 2 along x g Upsample by 2 along x g Reconstructed image to the resolution level m Figure 5.6 One stage in a multiscale image reconstruction 5.3 Progressive Transmission Schemes The wavelet transform concentrates the original values of a two-dimensional data set or image into a relatively small number of coefficients. The information is carried by the large magnitude coefficients. The majority of the coefficients are relatively small in value and do not contribute much information to the reconstruction of the original data. However, the position of all the coefficients must still be retained for reconstruction of the wavelet coefficient map. 38 The major objective in a progressive transmission scheme is to select the most important information to be transmitted first. The mean square error (Dm s e), i j ' j where N = the number of image pixels F = the original image ; f = reconstructed image W= transform coefficient of the original image w = transform coefficient of the reconstructed image and (I, j) are the pixel coordinates which measures the largest distortion reduction, is used to determine the transmission order of the coefficients. Other proposals for progressive image transmission suggest the same strategy, for example, [DeVore, 1992] stated that after a unitary hierarchical subband transformation, the coefficients with larger magnitude should be transmitted first because they have a larger content of information. Two progressive transmission schemes are presented below. Each has its advantages and disadvantages. The choice of the scheme depends on the application. 39 5.3.1 Linear Quadtree Embedding An image may be represented as a multiscale process by utilizing an orthogonal wavelet transform of the original data. In the dyadic orthogonal case, the discrete wavelet transform provides information about the edges in an image in the form of subbands along three different orientations: horizontal (LH), vertical (HL) and diagonal (HH) (see Figure 5.5). If the nodes on a tree are viewed as the wavelet coefficients, then a causal relationship may be defined between edges of the image at a coarse scale and those in the same spatial position at a finer scale along one orientation. For example: consider an 8 x 8 image, which can be transformed into three different resolutions. For each orientation, it will have three wavelet coefficient subbands. For each coefficient in the coarser band at each orientation, there are four coefficients at the next resolution and sixteen at the finest resolution. The 21 coefficients represent the edge information along one orientation for an 8 x 8 image. The relationships of these coefficients may be represented by a quadtree which means that linear quadtree coding can be applied in order to locate the position of these wavelet coefficients. First, the image is processed by a two-dimensional wavelet transform. Then, each coefficient is assigned a location code by the linear quadtree coding method which described in section 3.2. Each coefficient now becomes a 4 0 coded-coefficient-pair which includes the location code and the value of the coefficient. After sorting the wavelet, coefficients by magnitude, the information is ordered in importance. The transmitter first sends a resolution value (R) which is used to calculate the size of the image: the size of the image = 4R followed by a series of embedded coded-coefficient-pairs. At the receiver, the decoder simply decodes the location code and then places the coefficient into the quadtree for each coded-coefficient-pair. The image can be reconstructed by an inverse wavelet transform of the received coefficients. There are several advantages of this scheme. First, the encoder and decoder design can be very simple and can easily be hardware implemented. Second, the linear quadtree coding and discrete wavelet transform are both very fast operations. Third, the system can be very robust. Because the coded-coefficient-pairs are independent of each other. For example, the channel noise caused by a the burst error only affects a small area on the reconstructed image. Also, the decoding of each coefficient location does not depend on previous information so that the system is stable even over a noisy channel. The disadvantage of this scheme is the poor compression performance. It is due to the attachment of a location code to each coefficient. The characteristic of the 41 following scheme is almost opposite. It results in a good compression ratio by trading off a complicated coding method. 5.3.2 Zerotree Embedding This progressive image transmission scheme is similar to the embedded image coding method proposed by [Shapiro, 1993]. Linear quadtree coding and the neighbor finding method are applied to speed up the search of zero roots. After the discrete wavelet transformation, every coefficient at a given scale can be related to the set of coefficients of the next finer scale of similar orientation (see Figure 5.5). The coefficient at the coarse scale is called the parent, and all coefficients corresponding to the same spatial location at the next finer scale of similar orientation are called the children. For a given parent, the sets of all coefficients at all finer scales of similar orientation corresponding to the same location are called the descendants. For a given child, the sets of all coefficients at all coarser scales of similar orientation corresponding to the same location are called the ancestors. After the linear quadtree coding, the parent-child relationship can be determined by their location code. For example, let P be the location code of a parent. The children are the next four nodes starting at location P x 41 and the next sixteen nodes beginning at location P x 42... etc. For a given location, the location all of 42 the descendants can be calculated by the above method. Also, the scanning of the coefficient map is performed by incrementing the location codes so that no child node is scanned before its parent. Given a threshold level T, a coefficient is called significant if it is larger than T. The entire wavelet coefficient map can be coded as a string of symbols from a 4-symbol alphabet which can be represented by 2 bits each. This is called Zerotree coding. The four symbols used are: (a) Positive significant (P) if the coefficient is positive and significant. (b) Negative significant (N) if the coefficient is negative and significant. (c) Isolated zero (I) if the coefficient is insignificant but has some descendants that are significant. (d) Zero root (Z) if the coefficient and all of its descendants are insignificant. To perform the coding, a sequence of thresholds ( T j ) : are chosen. Two separate files are maintained during encoding and decoding. Initially, the dominant file contains all coded-coefficient-pair, and the subordinate file is empty. During processing, the dominant file contains the coded-coefficient-pair of those coefficients that have been found to be insignificant. The subordinate file contains the magnitude of those coefficients that have been found to be significant. For each threshold, each file is scanned once. 43 During a dominant pass, coefficients in the dominant file are compared to the threshold Tj to determine their significance. The result is coded by the four zerotree coding symbols and appended to the output file for transmission. Each time a coded-coefficient-pair is encoded as significant, its magnitude is appended to the subordinate file, and it is erased from the dominant file. A dominant pass is followed by a subordinate pass. During a subordinate pass, the width of the effective quantizer step size, which defines an uncertainty interval for the true magnitude of the coefficient, is cut in half. For each magnitude on the subordinate file, this refinement can be encoded using a binary alphabet with a "1" symbol indicating that the true value falls in the upper half of the old uncertainty interval and a "0" symbol indicating the lower half. The symbols are then appended to the output file for transmission. The process continues to alternate between dominant passes and subordinate passes where the threshold is halved before each dominant pass. In the decoding operation, each decoded symbol, both during a dominant and a subordinate pass, refines and reduces the width of the uncertainty interval in which the true value of the coefficient may occur. The center of the uncertainty intervals are used as the reconstruction values. When the quadtree of the wavelet coefficients is reconstructed, the coefficients are inverse wavelet transformed in descending order according to their magnitude. 44 The advantage of this scheme is the good compression performance. It takes about one byte to represent the location and the value of the coefficient. Also, it gives a good reconstructed image quality even at high compression ratios. There are several disadvantages of this scheme. First, the encoder and decoder design is relatively complex. Second, the coding scheme is slow due to swapped and scanned information in several files. Third, the system is not robust. Because the reconstruction of the wavelet coefficient quadtree depends on the previous information so that the system may not be stable over a noisy channel. 45 Chapter Six Results and Evaluation Results from computer simulations using the proposed progressive image transmission schemes on four test images are presented. The first image, shown in figure 6.1(a), is a 64x64 computer graphic "Richard". It features a simple image structure with only a few gray levels. The second, shown in figure 6.1(b), is a 128x128 image "Lenna". It is characterized by fine detail. The third, shown in figure 6.1 (c), is a 256x256 image "brain" obtained by magnetic resonance imaging. It represents a possible application in medical image transmission. The fourth, shown in figure 6.1(d), is a 512x512 finger print sample "finger" provided by the FBI. FBI is using one of the wavelet transform algorithms for compression of their finger print files. This image is characterized by many sharp edges. A Matlab toolbox called WaveLab [Buck, 1996] was obtained from Stanford University to perform the two-dimensional Daubechies wavelet transformation. A set of programs were written to implement the different algorithms which are described in the previous chapter. The results are listed below. The first step feeds an image into the Daubechies 4 wavelet transformer. The wavelet transformer concentrates the original values of an image into a relatively small number of coefficients. Figure 6.2(a)-(d) shows the frequency distribution 46 of the original images and the corresponding wavelet coefficients after several iterations. The majority of the coefficients are relatively small in value and do not contribute much information to the reconstruction of the original data. Figure 6.3(a)-(d) are the wavelet coefficients maps of those images. The figures show the images decomposed into subbands by wavelet transformation and could be embedded by linear quadtree coding. The hybrid coding with the linear quadtree embedding scheme and the quadrant interlacing progressive transmission display of each image on each level are shown in figures 6.4(a)-(d) and figures 6.5(a)-(d). For an evaluation of system performance, objective and subjective fidelity criteria are applied on the 512x512 image "Lenna". On objective fidelity measure, the root mean square error (MSE) and the peak-signal-to-noise ratio (PSNR) are calculated. The MSE provides a pixel by pixel measure of the image change from one pass to another. The average square error between the two images, where the size of the image is given as M x N, is given by the expression: MSE = —Yll[s(rn,n)-f(m,n)]2 Mlv „,=1 „=1 where M and N are the number of rows and columns of the images and f(m,n) and g(m,n)are the (m,n) pixels of the original and the reconstructed image, respectively. The peak signal to noise ratio (PSNR) is given by the expression: 47 rara -10104—J d B The coding results for the image "Lenna" at different rates are listed in table 6.1. For comparison purposes, we also include in table 6.1 the results for Interlaced GIF, EZW [Shapiro, 1993] and SPIHT [Said, 1996]. Rate (bpp) Interlaced GIF Quadrant Interlacing Linear Quadtree Embedding Zerotree Embedding EZW SPIHT 0.125 11.64 24.18 26.55 29.32 30.23 31.11 0.25 11.71 25.23 27.47 32.62 33.17 34.14 0.5 11.86 26.37 30.11 35.65 36.28 37.25 1.0 12.19 28.67 33.21 38.77 39.55 40.46 Table 6.1: Performance comparisons for the 512x512 Lenna image showing the peak-signal-to-noise ratio (PSNR) in dB Subjective tests are used to find the answer how good an image looks to a human observer. The subject quality of an image can be evaluated by showing the image to a number of observers and averaging their evaluations. A six-grade quality scale is applied: 1. Excellent - The image is of extremely high quality. 2. Fine - The image is of high quality, providing enjoyable viewing. 3. Passable - The image is of acceptable quality. 4. Marginal - The image is of poor quality and it should be improved. 5. Inferior - The image quality is very poor. 6. Unusable - The image is so bad that it cannot be recognized. 48 Figure 6.6(a)-(e) shows the compressed images by the schemes that are listed in table 6.2. Subjective testing results for the progressive transmission algorithms are shown in Table 6.2. The results have been derived from test sessions, where two observer groups of 3 and 5 people were used. One was an expert group and the other group consisted of non-experts. All groups had quite consistent opinions, although some people said that it was very hard to find any differences between some algorithms. Subjective test grade Rate(bpp) Interlaced . GIF Quadrant Interlacing Linear Quadtree Embedding Zerotree Embedding SPIHT 0.125 6.0 4.8 3.2 1.0 1.0 0.25 6.0 3.2 2.3 1.0 1.0 0.5 6.0 2.4 1.0 1.0 1.0 1 6.0 1.5 1.0 1.0 1.0 Table 6.2: Subjective tests on 512x512 image "Lenna" The results show that the wavelet-based embedding coders consistently outperform the interlacing methods. However, the quadrant interlacing method still produced acceptable results and it is a very fast and simple algorithm. The wavelet-based schemes all produced very high quality images, even the linear quadtree embedding scheme with a high compression ratio. 49 20 40 60 (a) 20 40 60 80 100 120 (b) 100 200 300 400 500 mt 50 100 150 200 250 (c) 100 200 300 400 500 (d) Figure 6.1 (a) Richard (64x64) simple computer graphic (b) Lenna (128x128) common processing image (c) Brain (256x256) image of human brain obtained by MRI (d) Finger (512x512) FBI finger print sample Figure 6.2(a) The frequency distribution of the image "Richard" (64x64) and the corresponding wavelet coefficients after several iterations On each graph: Y axis: Frequency of occurrence X axis: Coefficients value -5000 0 5000 10000 15000 (a) "Lenna" after 7 iterations ioV 20000 -2000 -1000 0 1000 2000 3000 4000 5000 6000 (c) "Lenna" after 5 iterations 10V -500 0 500 . 1000 1500 2000 (e) "Lenna" after 3 iterations -200 -100 0 100 200 300 400 (g) "Lenna" after 1 iteration 500 -2000 0 2000 4000 6000 8000 10000 (b) "Lenna" after 6 iterations -500 0 500 1000 1500 2000 2500 3000 3500 (d) "Lenna" after 4 iterations -200 0 200 400 600 800 1000 (f) "Lenna" after 2 iterations 50 100 150 200 250 300 (h) Original image "Lenna" Figure 6.2(b) The frequency distribution of the original image "Lenna" (128x128) and the corresponding wavelet coefficients after several iterations O n each graph: Y axis: Frequency of occurrence X axis: Coefficients value 52 105 1 0 4 10? 10? 101 -5000 1C5, • x 0 5000 (a) "Brain" after 8 iterations 10« 10 3 1C2-101 10000 -1000 1000 2000 3000 4000 (d) "Brain" after 5 iterations -5000 0 5000 (b) "Brain" after 7 iterations 10000 -2000 0 2000 4000 6000 8000 (c) "Brain" after 6 iterations - 5 0 0 0 500 1000 1500 2000 2500 (e) "Brain" after 4 iterations - 5 0 0 0 500 1000 1500 (f) "Brain" after 3 iterations - 4 0 0 -200 0 200 400 600 800 (g) "Brain" after 2 iterations -200 0 200 400 (h) "Brain" after 1 iteration 600 0 50 100 150 200 250 300 (i) Original image "Brain" Figure 6.2(c) The frequency distribution of the original image "Brain" (256x256) and the corresponding wavelet coefficients after several iterations On each graph: Y axis: Frequency of occurrence X axis: Coefficients value y 53 •2000 0 2000 40M 6000 n 6 (e) "Finger" after 5 iterations 0 500 1000 1500 2000 (g^ J'Finger" after 3 iterations 1 •1000 0 1000 2000 3000 4000 Finger" after 4 iterations -400 -200 0 200 400 600 800 1000 (h) "Finger" after 2 iterations 0 100 200 300 400 (i) "Finger" after 1 iterations 50 100 150 200 0) Original image "Finger" Figure 6.2(d) The frequency distribution of the original image "Finger" (512x512) and the corresponding wavelet coefficients after several iterations On each graph: Y axis: Frequency of occurrence X axis: Coefficients value 54 55 20 40 60 80 100 120 20 40 60 80 100 120 (a) 20 40 60 60 100 120 (c) 20 40 60 80 iOO 120 20 40 60 80 100 120 20 40 60 80 100 120 (g) 20 40 60 80 100 120 (b) 20 40 60 BO 100 120 20 40 60 30 100 120 20 40 60 80 100 120 (d) 20 40 60 80 100 120 20 40 60 80 100 120 (h) Figure 6.3(b) The corresponding wavelet coefficients map of figure 6.2(b) 56 100 160 200 250 (b) 50 100 150 (c) 50 100 1 50 200 250 W GO 200 250 Figure 6.3(c) The corresponding wavelet coefficients map of figure 6.2(c) 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 2O0 300 400 500 (a) 100 200 300 400 500 (C) 1O0 200 300 400 500 (e) 100 200 300 400 500 (9) 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 100 200 300 400 500 (d) 100 200 300 400 500 (0 100 200 300 400 500 (i) 100 200 300 400 500 Ii) Figure 6.3(d) The corresponding wavelet coefficients map of figure 6.2(d) 20 40 60 Resolution = 1 20 40 60 Resolution = 2 20 40 60 80 100 120 Resolution^ 20 40 60 80 100 120 Resolution=3 20 40 60 80 100 120 Resolution=5 20 40 60 80 100 120 Resolution=7 20 40 60 80 100 120 Resolution=2 20 40 60 80 100 120 Resolution=4 20 40 60 80 100 120 Resolution=6 Figure 6.4(b) Hybrid coding progressive transmission display of "Lenna" 50 100 150 200 250 Resolutions 50 100 150 200 250 Resolut ions 50 100 150 200 250 Resolution=2 50 100 150 200 250 Resolut ions 50 100 150 200 250 Resolution=5 50 100 150 200 250 Resolut ions 50 100 150 200 250 Resolution=7 50 100 150 200 250 Resolut ions Figure 6.4(c) Hybrid coding progressive transmission display of "Brain" 10 20 30 40 50 GO Resolution = 1 Resolution = 2 Resolution = 6 Resolution = 5 Figure 6.5(a) Quadrant interlacing display of "Richard" 63 20 40 60 80 100120 Reso lu t ions 20 40 60 80 100 120 Resolution=3 20 40 60 80 100120 Resolution=5 20 40 60 80 100120 Resolution=7 20 40 60 80 1 00120 Reso lu t i ons 20 40 60 80 100120 Resolution=4 20 40 60 80 100120 Resolution=6 Figure 6.5(b) Quadrant interlacing display of "Lenna" 64 50 100 150 200 250 Resolutions 50 100 150 200 250 Resolutions 50 100 150 200 250 Resolutions 50 100 150 200 250 Resolution=7 50 100 150 200 250 Resolution=2 50 100 150 200 250 Resolutions 50 100 150 200 250 Resolutions 50 1 00 150 200 250 Resolutions Figure 6.5(c) Quadrant interlacing display of "Brain" 65 100 200 300 400 500 Resolutions 100 200 300 400 500 Resolutions 100 200 300 400 500 Resolutions Figure 6.5(d) Quadrant interlacing display of "Finger1 66 100 200 300 400 500 R=0.125;PSNR=11.64 100 200 300 400 500 R=1;PSNR=12.19 100 200 300 400 500 R=0.5;PSNR=11.86 Figure6.6(a) The compressed images by Interlaced GIF 67 Figure 6.6(b) The compressed images by Quadrant interlacing 100 200 300 400 500 R=0.125;PSNR=26.55 100 200 300 400 500 R=1;PSNR=33.21 100 200 300 400 500 R=0.5;PSNR=30.11 Figure 6.6(c) The compressed images by Linear quadtree embedding 69 Figure 6.6(d) The compressed images by Zerotree embedding 100 200 300 400 500 R=0.125;PSNR=31.11 100 200 300 400 500 R=1;PSNR=40.46 Figure 6.6(e) The compressed images by SPIHT Chapter Seven Conclusions Two types of progressive transmission systems have been investigated in this thesis. The performance and computational complexity vary drastically among systems, and there is always a trade-off between the performance and complexity. The quadrant interlacing progressive transmission system employs a region quadtree to decompose the image. By assigning linear quadtree codes on the pixels, an interlacing display format algorithm is formulated which produce a progressive display while the image is reconstructed. According to a subjective test, It performs better than the interlaced GIF. The hybrid coding progressive transmission system employs wavelet transforms with linear quadtree coding. Two schemes to embed the information have been investigated. The first scheme is called linear quadtree embedding. The encoding and decoding processes of this scheme are fast and simple. The algorithm provides an acceptable quality result even for small amounts of information received. The disadvantage is that the compression performance is not as good as for the second scheme. The second scheme is called zerotree embedding. According to an objective test, the performance is very close to the 72 EZW scheme. It provides high quality results even at a bit rate of 0.125 bits per pixel. In conclusion, the hybrid coding progressive transmission system has the highest computational load, but it offers the best performance. On the other hand, the quadrant interlacing system needs the least amount of computation, but its performance is the poorest. Both systems are using the pyramid structure to embed the image data. This approach appears very attractive since the highest level images can be reconstructed directly and with minimum processing. Therefore, it offers a natural advantage of picture browsing capability. How to select a proper progressive transmission technique depends much on the specific application. For example, in the situation of personal computer to personal computer communication using modems, the simple quadrant interlacing technique may be the ideal candidate. On the other hand, the hybrid coding approach can be employed for its superior performance at the expense of high system complexity. The above proposed system designs are novel, and the author welcomes further research. Future research may investigate the following topics: 1. Investigate the effect of different mother wavelets on images other than DAUB4. 2. Compare the systems' performance to other progressive image transmission methods such as progressive JPEG. 73 3. Simulate the systems for cellular radio channels. 4. Fine-tune the systems to optimize their performance. 5. Design program modules for an Internet browser, e.g., Netscape or Internet Explorer. 74 Reference Alexandrov, V.V., Gorsky, N.D., Image representation and processing, Kluwer Academic Publishers, London, 1993. Brown, E.F., "Low resolution TV: Subjective comparison of interlaced and non-interlaced Pictures", Bell Sys. Tech. J., vol. 46, no.1, pp. 199-232, 1967. Buck, J . , Chen, S., Donoho, D., Johnstone, J., Scargle, J. , Wavelab 0.701, a library of matlab routines for wavelet analysis. http://playfair.stanford.edu:80/~wavelab/, 1996. Burt, P.J., Adelson, E.H.," The Laplacian pyramid as a compact image code", IEEE Trans. Commun., vol. COM-31, pp. 532-540, Apr. 1983. Caron, S., Rivest, J.F., "Progressive image transmission by segmentation coding", Canadian Conference of Electrical and Computer Engineering, pp. 509-512, 1994. Chitale, P., "A decomposition technique for image compression", Southeastern Symposium of System Theory, pp. 328-332, 1991. Daubechies, I., Grossmann, A., Meyer, Y . , " Painless nonorthogonal expansions", Journal of Math. Phys., vol. 27, pp. 1271-1283, 1986. Daubechies, l.,"Orthonormal bases of compactly supported wavelet", Communications on Pure and Applied Mathematics, vol. 41, pp. 909-996. 1988. DeVore, R. A., Jawerth, B., Lucier, B. J., "Image compression through wavelet transform coding", IEEE Trans. Information Theory, vol. 38, pp. 719-746, Mar. 1992. Gargantini, I., "An effective way to represent quadtrees", Communication of ACM, vol. 25, pp. 905-910, 1982. Lu, J. , Algazi, R.V., Estes, R.R.J, "Comparative study of wavelet image coders", Optical Engineering, vol. 35, no. 9, pp. 2605-2619, Sept. 1996. Malak, M., Baker, J. , "An image database for low bandwidth communication links", IEEE Conference on Data Compression, pp. 53-62, 1991. Mallat, S., "A theory for multiresolution signal decomposition: the wavelet respresentation", IEEE Trans, on Pattern Anal, and Mach. Intel., vol. 11, no. 7, July 1989. 75 Mongatti, G., Alparone, L, Benelli, G., Baronti, S., Lotti, F., Casini, A., "Progressive image transmission by content driven Laplacian pyramid encoding", IEE Proceedings. Part 1: Communications, Speech & Vision, vol. 139, no. 5, pp. 495-500, Oct. 1992. Rashwan, M.A., Elsherif, M.S., Elsayad, A.M., "Pyramid data structures for on-line image progressive transmission", IEEE Midwest Symposium on Circuits and Systems, pp. 103-106, 1993. Rioul, O., Duhamel, P.," Fast algorithms for discrete and continuous wavelet transforms", IEEE Trans, on Information Theory, pp. 569-586, May 1992. Rioul, O., Vetterli, M., "Wavelet and signal processing", IEEE Signal Processing Magazine, pp. 14-35, Oct. 1991. Riskin, A.E., Ladner, R., Wang, R.Y., Atlas, L.E., "Index assignment for progressive transmission of full-search vector quantization", IEEE Transactions on image processing, vol. 3, no. 3, pp. 307-312, May. 1994. Samet, H.," The quadtree and related hierarchical data structures", Computing Surveys, vol. 16, no. 2, pp. 187-260, Jun. 1984. Said, A., Pearlman, W. A., "A new, fast, and efficient image codec based on set partitioning in hierarchical trees", IEEE Trans. Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243-250, Jun. 1996. Schrack, G.F., "Finding neighbors of equal size in linear quadtrees and octrees in constant time", CVGIP: Image Understanding, vol. 55, no. 3, pp. 221-230, May 1992. Senoo, T., Girod, B., "Vector quantization for entropy coding of image subbands", IEEE Trans. Image Processing, pp. 526-533, Jan. 1992. Shapiro, J.M., "Embedded image coding using zerotree of wavelet coefficients", IEEE Trans, on Signal Processing, vol. 41, 12, pp. 3445-3463, Dec. 1993. Shusterman, E., Feder, M., "Image compression via improved quadtree decomposition algorithms", IEEE Trans, on Image Processing, vol. 3, no. 2, pp. 207-215, Mar. 1994. Strobach, P., "Quadtree-structured interframe coding of HDTV sequences", in Proc. SPIE Int. Conf. Visual Commun., Image Processing (Cambridge, MA), Nov. 1988. 76 Tanimoto, S.L., "Image transmission with gross information first", Comput., Graph., Image Processing, vol. 9, pp. 72-76, Jan. 1979. Tsai, M.J., Villasenor, J. , Chen, F., "Stack-run image coding", IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, pp. 519-521, Oct. 1996. Vaidyanathan, A., "Wavelet and signal processing", Proceedings of the IEEE, vol. 78, pp. 56-93, 1990. Westerink, P.H., Biemond, J. , Boekee, D.E., "Progressive transmission of images using subband coding", IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1811-1814, 1989. Wilson, P.R., "Standards: past tense and future perfect", IEEE Computer Graphics and Applications, vol.11, pp. 44-48, 1991. Witten, I.H., Moffat, A., Bell, T .C , Managing gigabytes: Compressing and Indexing Documents and Images, Van Nostrand Reinhold, NewYork, 1994. Woods, J.W., "Subband coding of images", IEEE Trans, on Acoustics, Speech, and Signal Processing, vol. 34, no. 5, pp. 1278-1288, Oct. 1986. Xiong, Z, Ramchandran, K., Orchard, M.T., "Joint optimization of scalar and tree structure quantization of wavelet image decomposition", Proceeding of 27 t h Annual Asilomar Conference on Signal, Systems, and Computers, pp. 891-895, Pacific Grove, CA, Nov. 1993 Zhang, Y.Q., Pickholtz, R.L., "Layered image transmission over cellular radio channels", IEEE Transactions on Vehicular Technology, vol. 3, no. 3, pp. 786-794, Aug. 1994. 77
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Progressive image transmission by linear quadtree coding...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Progressive image transmission by linear quadtree coding and wavelet transformation Cheung, Hon Wai 1997
pdf
Page Metadata
Item Metadata
Title | Progressive image transmission by linear quadtree coding and wavelet transformation |
Creator |
Cheung, Hon Wai |
Date Issued | 1997 |
Description | Progressive image transmission allows an approximate image to be built quickly and the details to be transmitted progressively through several stages over the channel. This technique appears very useful for picture communication over slow channels. This thesis proposes to use linear quadtree encoding combined with wavelet based technique as well as other methods to achieve a hybrid coding progressive transmission system. |
Extent | 10247640 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-03-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0064802 |
URI | http://hdl.handle.net/2429/5920 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1997-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1997-0220.pdf [ 9.77MB ]
- Metadata
- JSON: 831-1.0064802.json
- JSON-LD: 831-1.0064802-ld.json
- RDF/XML (Pretty): 831-1.0064802-rdf.xml
- RDF/JSON: 831-1.0064802-rdf.json
- Turtle: 831-1.0064802-turtle.txt
- N-Triples: 831-1.0064802-rdf-ntriples.txt
- Original Record: 831-1.0064802-source.json
- Full Text
- 831-1.0064802-fulltext.txt
- Citation
- 831-1.0064802.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0064802/manifest