B L O C K I N G A R T I F A C T S R E M O V A L O F D I S C R E T E C O S I N E T R A N S F O R M B A S E D D E C O M P R E S S E D I M A G E S by YING LUO B.Sc, Electrical Engineering, Huazhong University of Science and Technology, Wuhan, China, 1991 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the reauired standard THE UNIVERSITY OF BRITISH COLUMBIA June 1999 © Ying Luo, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of tkcirlC^ g ^ e g The University of British Columbia Vancouver, Canada Date J * W • 1)1°) DE-6 (2/88) Abs t rac t Block-based DCT image/video compression methods have been successfully used in image and video compression applications due to their nearly optimal energy compaction property and relative ease of implementation. However, compression distortion becomes signifi-cant when these algorithms are used to code images or video under a certain bit rate. The objective of this thesis is to offer post-processing solutions to reduce the block-based DCT image compression artifacts. The emphasis will be placed on the blocking artifacts since the blocking effect is one of the most noticeable degradations of block transform coding. We first analyze the causes and properties of the blocking effect. Then we propose a robust method which detects the locations of the image block boundaries where this effect occurs. The detection results using this method are proved to be accurate at different compression bit rates. Several practical blocking artifacts removal techniques are then reviewed and analyzed. The performance of each technique is studied and evaluated. Finally, a new blocking artifact reduction algorithm which is also effective at very low bit rate is proposed. Our algorithm takes advantage of the fact that the original pixel levels in the same block are of good continuity and we use this property and the correlation between the neighboring blocks to reduce the discontinuity of the pixels across the boundaries where blocki-ness appear. Our algorithm can highly preserve the high frequency components while smoothing out the boundary discontinuity. Simulation results show that the proposed algorithm significantly reduces the blocking artifacts in both the objective and the subjective measures. It can effectively remove blocking artifacts even at very low bit rates. The amount of computation it requires is also ii acceptable compared to the iterative blocking artifact removal techniques. iii TABLE OF CONTENTS Abstract ii Table of Contents iv List of Tables vii List of Figures viii Acknowledgements xi 1 Introduction 1 1.1 Background 1 1.2 Thesis Objectives 2 1.3 Thesis outline 2 2 Compression Artifacts 4 2.1 Introduction 4 2.2 Human vision system 4 2.2.1 Weber ratio 4 2.2.2 Spatial frequency characteristics of the HVS 5 2.2.3 Color perception 6 2.3 Coding schemes 6 2.3.1 DCT transform coding 6 2.3.2 Motion representation and video compression 9 2.4 Visual artifacts in the compressed images and video streams 12 2.4.1 Blocking artifacts 12 2.4.2 Resolution loss 14 2.4.3 Ringing artifact 14 2.4.4 Mosquito artifact 14 iv 2.4.5 Color bleeding and chrominance mismatch 17 3 Blocking Artifacts Detection 18 3.1 Introduction 18 3.2 Gradient threshold method 18 3.3 Difference of Slope (DOS) 19 3.4 Experimental results 22 3.5 Summary 32 4 Blockiness Removal Techniques 33 4.1 Introduction 33 4.2 Blockiness reduction schemes used in the encoding process 33 4.2.1 Block overlapping method 33 4.2.2 Lapped orthogonal transform 34 4.3 Blockiness reduction schemes used after the decoding process 35 4.3.1 Symmetric, two-dimensional filtering 35 4.3.2 Anisotropic filtering 42 4.3.3 Adaptive filtering using edge classification 49 4.3.4 Adaptive filtering using local statistical estimation 51 4.3.5 Iterative blocking artifact removal techniques 53 4.3.6 Dithering 53 5 A New Blockiness Reduction Scheme 56 5.1 Introduction 56 5.2 The frequency representation of the spatial block discontinuity in the DC domain 57 5.3 Recover the continuity of block C in the DCT domain 60 5.4 Examples 62 5.4.1 Examples 1 62 5.4.2 Example 2 64 V 5.5 Computation cost for blockiness reduction procedure in the DCT domain 67 5.5.1 The computation cost for calculating DCT coefficients 67 5.5.2 The computation cost for coefficients modification 70 5.5.3 The computation cost for inverse DCT 71 5.5.3 Summary 71 5.6 Spatial filtering 71 5.7 Results 73 6 Conclusions and Future Work 81 6.1 Compression artifacts classification 81 6.2 The design of blocking artifacts detector 81 6.3 Analysis of blocking artifacts removal techniques 82 6.4 A new blockiness reduction scheme 83 6.5 Future work 84 Bibliography 85 Appendix A 87 vi L i s t of Tables 4.1 Symmetric filter coefficients ." 36 4.2 PSNR comparison of Indian image before and after post-processing 42 4.3 PSNR comparison of Lena image before and after post-processing 42 4.4 PSNR comparison of Peppers image before and after post-processing 42 4.5 Anisotropic filter coefficients used to filter vertical boundary 43 4.6 Anisotropic filter coefficients used to filter horizontal boundary 43 4.7 PSNR comparison of Lena image before and after post-processing 49 4.8 PSNR comparison of Indian image before and after post-processing 49 4.9 PSNR comparison of Peppers image before and after post-processing 49 5.1 PSNR comparison of Lena image before and after post-processing 74 5.2 PSNR comparison of Indian image before and after post-processing 74 5.3 PSNR comparison of Peppers image before and after post-processing 74 vii L i s t of F igures 2.1 The frequency sensitivity of the eye to a sinusoidal grating [5] 5 2.2 Block diagram of DCT-Based Coding Algorithm [4] 8 2.3 Illustration of I-frames, P-frames, and B-frames 10 2.4 Encoder Prediction Loop [4] 11 2.5 Decoder block diagram [4] 11 2.6 Blocking artifacts 15 2.7 Magnified blocking artifacts 16 3.1 Sobel Operators 18 3.2 Difference of Slope 19 3.3 Original image and Blocking Artifacts 23 3.4 Detection result by Sobel Operators 24 3.5 Detection result by DOS operators 25 3.6 Original Image 26 3.7 Decompressed image with blocking artifacts 27 3.8 Detection result for original image by Sobel operators 28 3.9 Detection result for decompressed image by Sobel operators 29 3.10 Detection result for original image by DOS operator 30 3.11 Detection result for decompressed image by DOS operator 31 4.1 a Decompressed image, bitrate=0.298 bpp, PSNR=32.51 dB 38 4. lb Post-processed image by symmetric filter, bitrate=0.298 bpp, PSNR=32.52 dB...39 V l l l 4.2a Decompressed image, bitrate=0.248 bpp, PSNR=31.2 dB 40 4.2b Post-processed image by symmetric filter, bitrate=0.248 bpp, PSNR=31.52 dB..41 4.3a Decompressed image, bitrate=0.298 bpp, PSNR=32.51 dB 45 4.3b Post-processed image by the anisotropic filters, bitrate=0.298 bpp, PSNR=33.04 dB 46 4.4a Decompressed image, bitrate=0.248 bpp, PSNR=31.2 dB 47 4.4b Post-processed image by the anisotropic filters, bitrate=0.248 bpp, PSNR=31.83 dB 48 5.1 Blockiness Diagram 58 5.2 Original block data of example 1 63 5.3 DCT coefficients for blocks A, B and C 63 5.4 Modified coefficients of block C : 64 5.5 Block data after modification 64 5.6 Original block data.of example 2 65 5.7 DCT coefficients of Blocks A and B 66 5.8 DCT coefficients of Block C before and after modification 66 5.9 Block data after modification 67 5.10 Blockiness Diagram 68 5.11 Fast 1 -D 8 points DCT diagram 70 5.12 Epsilon filter 72 5.13a Decompressed image, bitrate=0.298 bpp, PSNR=32.51 dB 75 5.13b Post-processed image by our DCT domain filter (without spatial filtering), bitrate=0.298 bpp, PSNR=32.56 dB 76 5.13c Post-processed image by our algorithm (with spatial filtering), ix bitrate=0.298 bpp, PSNR=32.56 dB 77 5.14a Decompressed image, bitrate=0.248 bpp, PSNR=31.2 dB 78 5.14b Post-processed image by our DCT domain filter (without spatial filtering), bitrate=0.248 bpp, PSNR=31.36 dB 79 5.14c Post-processed image by our algorithm (with spatial filtering), bitrate=0.248 bpp, PSNR=31.97 dB 80 A. 1 Decompressed Lena image, bitrate=0.199 bpp, PSNR=30.40 dB 87 A.2 Post-processed Lena image, bitrate=0.199 bpp, PSNR=31.05 dB 88 A.3 Decompressed Lena image, bitrate=0.271 bpp, PSNR=31.95 dB 89 A.4 Post-processed Lena image, bitrate=0.271 bpp, PSNR=32.33 dB 90 A.5 Decompressed Peppers image, bitrate=0.15 bpp, PSNR=27.97 dB 91 A.6 Post-processed Peppers image, bitrate=0.15 bpp, PSNR=29.11 dB 92 A.7 Decompressed Peppers image, bitrate=0.22 bpp, PSNR=30.39 dB 93 A.8 Decompressed Peppers image, bitrate=0.22 bpp, PSNR=31.27 dB 94 A.9 Decompressed Indian image, bitrate=0.273 bpp, PSNR=28.17 dB 95 A. 10 Post-processed Indian image, bitrate=0.273 bpp, PSNR=28.50 dB 96 A. 11 Decompressed Indian image, bitrate=0.382 bpp, PSNR=29.65 dB 97 A. 12 Post-processed Indian image, bitrate=0.382 bpp, PSNR=29.74 dB 98 X Acknowledgment I would sincerely like to thank my supervisor, Dr. Rabab Ward, for her support and guidance throughout the research and the writing of this thesis. Her valuable advice, encourage-ment and patience are greatly appreciated. Thanks also to all fellow graduate students who were always ready to share their knowledge and expertise. xi Chap te r 1 In t roduct ion 1.1 Background Over the past few decades, many digital applications have been taking place such as video conferencing, HDTV, DVD, etc. In the mean time, the demand for storing more images and transmitting more data with limited space and bandwidth keeps increasing. The popular solution to this problem is to compress the images by removing the redundancy in images so that they take less memory to store and less time to transmit. Image/video compression techniques offer better solutions to reduce file sizes while preserving the original content. There are two forms of compression: lossless and lossy. Lossless compression removes the redundancy in images without removing any image detail. For this kind of compression the reconstructed images are identical to the original ones. Unlike lossless compression, lossy compression offers a trade-off between the amount of information lost and the degree of compres-sion. Therefore lossy compression achieves far greater savings, albeit at the cost of some image degradation. Block transform coding has been recognized for several decades as an attractive lossy compression technique. It takes advantage of the fact that unitary block transformations have the property of packing the signal energy. These transformations are able to concentrate the signal information in its lower spatial frequencies and most of the higher order spatial frequencies have zero or near-zero amplitude. After applying some state-of-art quantization and entropy coding l 2 techniques, high compression rates are achieved. Entropy coding is lossless and has no influence on the spatial noise distribution in the decompressed image/video. It is quantization which provides the most amount of compression. This is the stage at which image/video information is lost. The compression ratio or alternatively the coding bit rate is controlled by the quantization step. By suitably choosing the quantization step value, a high compression ratio can be achieved without affecting the visual image quality. However, compression distortion becomes significant when the images are coded below a certain bit rate. Block-based DCT coding has been successfully used in image and video compression applications due to its nearly optimal energy compaction property and relative ease of implemen-tation. It provides the basis of two prevalent compression algorithms: JPEG and MPEG. JPEG which stands for Joint Photographic Experts Group is the standard algorithm for still image compression. MPEG which stands for Moving Picture Experts Group addresses the compression of video signals. 1.2 Thesis Objectives The objective of this thesis is to classify, analyze and offer solutions to the artifacts of DCT based decompressed images. Emphasis will be placed on the blocking artifacts removal since blocking effects cause the most noticeable degradation of block DCT transform coding. 1.3 Thesis outline In the next chapter, we are going to classify and characterize different types of transform coding artifacts. An effort will be made to reveal the causes of the artifacts. In Chapter 3, we first examine the properties of the blocking artifacts, and use this information to develop an objective criteria to detect blocking artifacts. The performance of this criteria will also be evaluated. Several artifacts removal techniques will be introduced in Chapter 4. The performance of each technique will be studied and evaluated. Finally, in Chapter 5, a new blocking artifacts reduction algorithm which is also effective at very low bit rate is proposed. This new approach is based on the analysis of the frequency distribution of the blockiness energy. The performance of the proposed method will be evaluated using both objective and subjective measures. In the last chapter, conclusions about the coding artifacts reduction algorithm will be drawn, and possible future work brought out by this thesis research will be listed and described. Chap te r 2 Compress ion Ar t i fac ts 2.1 Introduction In this chapter, we will first introduce some properties of the human visual system since these properties play a very important role in the perception of the artifacts. The sensitivity of the human visual perception to an artifact depends on the type of artifact. These artifacts are visually different from each other. The spatial activities and brightness surrounding an artifact area should be considered since they have masking effects on the artifact. To better understand the cause of compression artifacts, we will then briefly review DCT transform coding and other compression techniques. Finally, we will introduce and classify several compression artifacts. An attempt will be made to reveal the causes of the artifacts. 2.2 Human visual system The human visual system has several important properties. These include brightness per-ception, spatial frequency response and color perception. These properties are considered for the purpose of improving the subjective visual quality in the encoding procedure. They should also be considered in the artifacts removal procedure since they affect the perception of the artifacts. 2.2.1 Weber ratio The Weber ratio provides the key factor to the brightness-intensity relationship. It indi-cates the functional relationship between brightness and intensity. Assuming / is the intensity of 4 5 the background and AI is the difference value at which the contrast between two different intensi-ties is just noticeable then the ratio of AI to I is roughly a constant [6] which is Weber ratio indicates that A.I is proportional to / . This means the sensitivity of the human visual system to image detail is a function of the local average brightness. Loss of detail in dark areas of the image is less visible than in brighter areas. 2.2.2 Spatial frequency characteristics of the HVS The human visual system resembles a lowpass/bandpass filter. The frequency sensitivity of the eye to a sinusoidal grating is shown in Figure 2.1. This curve suggests that the human visual system is most sensitive to mid-frequencies and least sensitive to high frequencies [5]. Therefore, image details presented in areas with high spatial activities such as textured areas are much less visible than those in relatively flat areas. ( 2 . 1 ) 1000 . Contrast A Sensitivity 100 10 0.1 10 Spatial Frequency (cycles/degree) Figure 2.1 The frequency sensitivity of the eye to a sinusoidal grating [5]. 2.2.3 Color perception The human visual system responds differently to the luminance and chrominance compo-nents of an image. It is less sensitive to high frequencies in the chrominance components than to high frequencies in the luminance component. 2.3 Coding schemes 2.3.1 DCT transform coding The general image transform coding strategy can be summarized as follows. A digital image is first divided into subimages of 8x8 pixels known as blocks. Then each block undergoes a two-dimensional discrete transform. The discrete cosine transform (DCT) has become the most widely used discrete transform in today's transform image coding systems because of its near optimal compaction property, and because it can be implemented very efficiently by fast algo-rithms. The 8x8 DCT is defined as following: X[kvk2] = - J J CkCkx[n},n2) cos and its inverse DCT is given by (2n, + \ )kln-16 cos p(2«2 + 1 )^27t~| 16 (2.2) x[nvn2] = - J J CkCkX[kvk2]cos = 0k2 = 0 •(2«j + 1 )fe|7C-i r(2n2+ \)k2K-16 i r - cos 16 (2.3) where J _ 72 for kp k2 = 0 (2.4) 1 otherwise Note that the two dimensional DCT is a real and separable transform which is obtained by a 1-D 7 transformation carried on the rows of the image block followed by a 1-D transformation of the resulting columns [6]. The block transform coefficients reflect the frequency characteristics of the image block. The upper left corner of the transformed block corresponds to the low-frequency coefficients of the DCT, while the bottom right corner corresponds to the highest frequencies. Due to the fact that natural images tend to have more energy at low frequencies, the energy of the transformed block is concentrated in the upper left corner of the block. Most of the higher order spatial frequencies have zero or near-zero amplitudes. To achieve high data compression, only the dominant coeffi-cients are retained. Therefore, a quantization weighting matrix is used to round the coefficients to the nearest integer and most of the high order coefficients are rounded to zero. This weighting matrix is also called "Q-table". The Q-table controls the quantization step-size, which is related to the bit rate. By scaling the Q-table, different number of coefficients are retained and thus different degrees of compression can be realized. High compression ratios can be achieved without visibly affecting the image quality by choosing suitable quantization steps. Since humans are more sensi-tive to errors in the low frequency components of the image than those in the high frequencies, the Q-table has larger values in its entries that corresponding to the high frequencies than those that correspond to low frequencies. As a result, high frequency components are quantized more coarsely than low frequency components. A quantization matrix used in the TM5 codec is shown in (2.5). 8 8 16 19 22 26 27 29 34| 16 16 22 24 27 29 34 371 19 22 26 27 29 34 34 38 22 22 26 27 29 34 37 40 22 26 27 29 32 35 40 48 26 27 29 32 35 40 48 58 26 27 29 34 38 46 56 69 27 29 35 38 46 56 69 83 After the quantization, the DC coefficients are coded differentially in order to take into account the strong correlation between the DC coefficients of neighboring blocks. The AC coeffi-cients are all entropy coded on a block-by-block basis. The compressed image is either transmit-ted or stored depending on the application. Decoding constitutes an inverse operation of this procedure. General block diagrams of the transform encoding and decoding are shown in Figure 2.2. 8x8 blocks Source Image Data F D C T Quantizer Table Specification Entropy Encoder T Table Specification Compressed Image Data D C T - Based Encoder Compressed Image Data Entropy Decoder 1 Dequantizer 1 Table Specification Table Specification IDCT Reconstructed Image Data D C T - Based Decoder Figure 2.2 Block diagram of DCT-Based Coding Algorithm [4]. 9 The DCT-Based coding forms the basis of the JPEG algorithm. This coding type is also used as the basis of spatial redundancy reduction in the MPEG algorithm. 2.3.2 Motion representation and video compression A video sequence can be considered as a sequence of still images. If a frame is encoded independently of other frames, then this frame coding is called intraframe coding. Intraframes provide the capability of accessing any random frame in stored video. However, for motion pic-tures, high compression ratios can not be achieved by encoding every frame as an intraframe since this does not take into account the extensive temporal redundancy present in a video sequence. Thus MPEG's basic scheme is to predict motion from frame to frame in the temporal direction, and then to encode the differences in the frames. This encoding is called interframe encoding which involves coding a frame in reference to a previous or a future frame. In order to reduce the temporal redundancy, two types of interframe coding techniques are suggested: forward prediction and bidirectional prediction. Thus, three types of pictures are con-sidered in MPEG: Intraframes (I-frames), predicted frames (P-frames) and bidirectional frames (B-frames). Intraframes are typically distributed periodically throughout the consecutive frames of a video sequence. They are used as random access points and they provide the lowest compres-sion ratios. A P-frame is coded using forward prediction which is based on a previous reference frame. P-frames have significantly higher compression ratios than those of I-frames. Bidirectional frames are coded using forward, backward prediction, or interpolation. B-frames provide the highest amount of compression but requires both a past and a future reference frame for predic-tion. Figure 2.3 illustrates this coding scheme. Motion prediction is carried on the luminance channel on 16x16 pixels macroblocks basis. 10 Frame order 0 1 2 3 4 5 6 7 8 9 Figure 2.3 Illustration of I-frames, P-frames, and B-frames. Given a macroblock in a current B frame which is being encoded, a close match to that macrob-lock is searched in a previous, a future frame or both. For a P-frame, only the previous frame is searched. The corresponding macroblock-sized region in the reference frame can be offset in its position to that of the considered macroblock in the current frame. These macroblock offsets are known as motion vectors. Therefore, given a set of motion vectors, one vector for each macrob-lock, we can use macroblocks in a previous, a future frame or both to represent the image in the current frame. Since the motion compensated prediction is not able to represent the image accu-rately, the prediction error, i.e. the difference between the predicted picture values and those of the original pictures needs to be coded. The spatial signal of the prediction error, as well as the intraframe coded picture are then DCT encoded. DCT are done on 8x8 block basis, and the DCT coefficients are quantized by a quantizer. In the quantization process, the precision of each of the transform coefficients is adjusted to its relative importance to the human visual perception. This relative coefficient precision information is represented by a quantizer matrix. Each value in the quantizer matrix represents the coarseness of quantization of the related DCT coefficient. The 11 degree of coarseness of the entire macroblock is adjusted by the scale factor.of the quantizer. The macroblocks which are deemed to be visually less important, such as regions with large spatial activity, can be quantized more coarsely by using large scale factors. After the quantization, the quantized DCT coefficients, the motion vectors, and the quantization parameters are further com-pressed using a combination of run-length and entropy coding. The encoder and decoder diagrams are shown in Figure 2.4 and Figure 2.5, respectively. Motion estimator Motion Compensator Anchor Frame storage Coded video bitstream Entrophy coding Figure 2.4 Encoder Prediction Loop [4]. Coded video bitstream-Channel buffer Variable-lengtrJ. decoder Inverse quantizer IDCT Decoded video Motion compensator I Anchor frame storage Figure 2.5 Decoder block diagram [4]. 12 2.4 Visual artifacts in the compressed images and video streams To provide users with quality image/video services, causes of these artifacts and their effects on viewing quality need to be understood [8]. 2.4.1 Blocking artifacts Due to image block segmentation and the subsequent independent transform coding of the blocks, existing correlations among adjacent block pixels in a frame are not pursued. The signal continuities between blocks are not always guaranteed and the discontinuity between two blocks may become visible at the boundary between those two blocks. This artifact is called the "block-ing effect," which is more prominent at low-bit rate coding. This kind of edge artifact is composed only of horizontal and vertical edges at the boundaries between two blocks. The image recon-structed by transform image coding at 0.19 bit per pixel(bpp) is shown in Figure 2.6, where the blocking effect can be easily noticed and is visually unpleasant. At low bit rates, the high frequency components of an image cannot be adequately repre-sented in the transform domain since many of the higher order transform coefficients are quan-tized to zero. For a slowly varying region with no image edges, blocking artifacts appear in the reconstructed image as a regular pattern of visible block boundaries. In a complex image region, besides the sharp boundary intensity change, the original image edges appear irregular. The regu-lar pattern and their fixed grid size make them visually prominent. Blocking artifacts can also be caused by motion compensation. Starting with an empty, present frame, the predicted macroblocks are offset by the motion vectors from the reference frame to the present frame, the resultant image is composed of copied macroblocks and obviously 13 have blocking effect. This effect can be removed by adding the prediction error image. However, if this prediction error image is inadequately coded, blocking effects remain in the reconstructed images. Typically, there are three types of blocking artifacts according to human visual perception. They are described briefly below. 2.4.1.1 Staircase noise Staircase noise arises in an image region where there is an isolated, clearly distinguishable diagonal edge or feature. The diagonal edges that used to be smooth in the original image are rep-resented by a number of horizontal or vertical steps (Figure 2.7a) in the reconstructed image. This is because edges can not be well reconstructed by the coarsely quantized coefficients and because each image block is processed independently. As a result, the continuity of an edge across a block boundary is not assured. 2.4.1.2 Grid noise and false contouring Grid noise (Figure 2.7b) appears as a regular pattern of visible grid in the relative flat areas of the image. False contouring appears as abrupt intensity change in the reconstructed image in areas with gradually changing luminance. In these two kinds of regions, the variations in lumi-nance are small, and so, most of the signal energy is concentrated in the DC coefficient. Due to the coarse quantization, the AC coefficients represent the gradual change in luminance are quantized to zero. Without these AC coefficients, an abrupt change in luminance across the block boundary is produced in the reconstructed image, which is perceived as a patchiness or contouring effect. 14 2.4.1.3 Texture noise Texture noise (Figure 2.7c) is a type of quantization distortion which appears in textured image regions, such as a collection of randomly-oriented edges. It can be a combination of stair-case and grid noise. Due to the masking property of the human visual system in areas of high detail, the staircase effect in texture regions is not as prominent as that in strong edge regions. Instead, it appears as loss of detail and smearing. 2.4.2 Resolution loss Resolution loss is caused by lowpass filtering or equivalently by suppression of higher order DCT coefficients through coarse quantization. It appears as a loss of spatial detail in the reconstructed images. 2.4.3 Ringing artifact The ringing effect is visually prominent along high contrast edges with generally smooth background in the reconstructed image. Strong rippling may result around a high contrast edge. This appears as shimmering or rippling outwards from the edge up to the block's boundary [8]. Ringing artifacts can not be predicted easily. They can result in edge artifacts at object edges with any orientation. They may consist of a series of edges located very close to each other and their visibility can be affected by the masking effects caused by surrounding high frequency contents. 2.4.4 Mosquito artifact The mosquito artifacts are temporal artifacts related to high frequency distortion and cod-ing inaccuracies. They are seen as fluctuations of luminance chrominance levels in a video sequence. They usually occur around sharp edges with smooth background or in relative Original image Reconstructed image, bitrate=0.19 bpp Figure 2.6 Blocking artifacts. (a) Staircase noise (b) Grid noise and false contouring (c) Texture noise Figure 2.7 Magnified blocking artifacts. 17 stationary areas containing high spatial activities. 2.4.5 Color bleeding and chrominance mismatch Color bleeding is a kind of color artifact due to the chrominance information loss. It results in a smearing of the color between areas with high contrast chrominance values. Chromi-nance mismatch appears as color misplacement of a macroblock with respect to the color of the surrounding area. This is caused by motion compensation since motion compensation is carried out using luminance information only. Chap te r 3 B l o c k i n g Ar t i fac ts Detect ion 3.1 Introduction As most image artifacts occur in certain places in an image, artifacts detection is an impor-tant step to be carried out prior to the removal or reduction of the artifact intensity. It can also be used to evaluate the image/video compression quality. In this chapter, we focus on blockiness detection. 3.2 Gradient threshold method As mentioned earlier, blocking effects appear as discontinuities across block boundaries. Since blocking artifacts have the property of having precise spatial localization, they are easier to detect than other impairments. However, usually not all block boundaries have blocking appear-ances or effects. Therefore, one common method of detecting the block boundaries where block-ing effects occur is to apply gradient operators such as sobel filters [5] (Figure 3.1) along the block boundaries. Operator (a) is used to detect vertical edges and operator (b) is used to detect horizontal edges. The region with high gradient is considered as having edges. Strong edges are detected by thresholding the gradients. -1 0 1 -1 -2 -1 -2 0 2 0 0 0 -1 0 1 1 2 1 (a) (b) Figure 3.1 Sobel Operators. 18 19 With such a method, both the genuine edges along the block boundaries as well as the blocking artifacts will be detected. Therefore we must distinguish real edges from false edges caused by blocking artifacts. To solve this problem, we introduce another metric which is similar to the one defined in [11]. 3.3 Difference of Slope (DOS) Although both the genuine edges in an image and the blocking artifacts have large gradi-ents, they are different in appearance. For natural images, the transition of the intensity can be considered to be regular as shown in Figure 3.2 and edges are unlikely to line up with block boundaries exactly. However, blocking artifacts manifest themselves as abrupt intensity changes across block boundaries and have regular patterns. Block Boundary Intensity value N - l I 0 I Figure 3.2 Difference of Slope. The intensity slope between two adjacent pixels reflects the intensity change between them. Thus within each block the degree of intensity continuity close to the block's boundary can be measured by this intensity slope. For two horizontally adjacent blocks k-1 and k, the slope for 20 the k-1 block at the ith row and for the last N-l and N-2 column positions of the block is defined by: dk_x{i) = xk_x{i,N-\)-xk_x{i,N-2), (3.6) where xk _ , denotes the intensities of block k-1. For the kih block, and for the 0 and first column positions of the ith row, the intensity slope is: dk(i) = xk(i, \ )-xk(i,0). (3.7) It is also reasonable to assume that the average of these two slopes reflects the degree of continu-ity within these two adjacent blocks. This average is defined by: D(i) = [dk_](i) + dk(i)]/2 (3.8) Similarly, the degree of intensity continuity across two horizontally adjacent blocks can be mea-sured by the intensity slope across their boundary. This quantity is given by: D{i) = xk(i, 0) -xk_\(i, N- 1) (3.9) For natural images, the transition of the intensity is regular. Thus, the value of D(i) and D(i) should be similar. For JPEG or MPEG compressed images, the intensity continuity is pre-served within each block but it is not ensured at the boundary between any two blocks. D(i) increases a lot compared to D(i) due to blocking artifacts. Therefore, the difference between £)(/) and D(i) forms a good estimate of the blocking artifacts. This estimate is defined by: 8(0 = D(i)-D(i) = [xk(i, 0) -x k_ ,(/, N- 1)] - [dk_,(0 + dk(i)]/2 21 =\x {i,Q)-\xk{i,\)-\x (i,N-\) + \xk_x{i,N-Z) (3.10) 2 k 2. L k - 1 2 Thus the original image may have a large gradient across the boundary of two blocks, but 8, the difference of slope, is small. Blocking artifacts manifest themselves as regular patterns, N-\ thus they can be detected by thresholding ^ e(/), the sum of e(i') across the boundary. The i = 0 threshold should be adaptive to the local brightness according to Weber's law. As mentioned in the previous chapter, the visibility of blockiness also depends on the local spatial activity. Blocking artifacts are usually prominent at regions with low local spatial activi-ties, while the regions which have genuine edges usually have high local spatial activities. Thus local spatial activity can be used in a criterion to distinguish genuine edges from those due to blocking artifacts. An estimate of the local spatial activity is the local variance. However, this esti-mate require much computational effort. To avoid having to carry out more computations, we use the variation of e to measure this local spatial activities. Let and Then if the variation dmax = MAX(E(i))J = 0, . . . T V , (3.11) dmm = MIN(e(i)),i = 0,...N, (3.12) d m a x - d m i n < T l ' (3- 13> the region is considered as having low spatial activity, otherwise, it is considered as having high spatial activity and post filtering such as low pass filtering will not be performed in the region with high spatial activity so to avoid unwanted smoothing. 22 In summary, the blockiness detection is based on two measurements: [ > d max-d min < T \ 2 > N- 1 i = 0 >T2. When the blockiness is visible, the variations of the intensity across the boundaries should have similar pattern (either increasing or decreasing). This will make the first measurement very small and the second one very large. And thus this boundary is marked as having a blocking arti-fact. 3.4 Experimental results According to several tests, the detections using our difference of slope, DOS operators are rather accurate. Sobel operators on the other hand can hardly tell the difference between real edges and blocking artifacts. Figure 3.4a shows the edge map detected from the original image (Figure 3.3a) by Sobel operators. Obviously, these are genuine edges which are faultily identified as blocking artifacts. Figure 3.4b shows the edge map detected from decompressed image (Figure 3.3b) by Sobel operators. The detected edges include genuine edges and those due to blocking artifacts. If we apply DOS operators to the original image, then no edges (blocking artifacts) are detected (Figure 3.5a). If we apply DOS operators to the image with blocking artifacts (Figure 3.3b), edges due to blocking artifacts are accurately detected (Figure 3.5b). Another set of detec-tion results using the Sobel operators and the DOS operator is shown in Figure 3.6 to Figure 3.11. (b) Blocking Artifacts Figure 3.3 Original image and Blocking Artifacts. (a) Edges in the original image detected by Sobel operators (b) Blocking Artifacts detected by Sobel operators Figure 3.4 Detection result by Sobel Operators. (a) No blocking artifacts is detected in the original image by DOS operators (b) Blocking artifacts detected by DOS operators Figure 3.5 Detection result by DOS operators. Figure 3.6 Original Image. Figure 3.7 Decompressed image with blocking artifacts. Figure 3.8 Detection result for original image by Sobel operators. Figure 3.9 Detection result for decompressed image by Sobel operators. Figure 3.10 Detection result for original image by DOS operator. Figure 3.11 Detection result for decompressed image by DOS operator. 32 3.5 Summary In this chapter we have shown how edges due to blocking artifacts can be detected. We first showed that using Sobel operators, the genuine edges as well as the ones due to blocking artifacts are detected. Then we introduced the difference of slope (DOS) operator and showed that this operator only detects the edges due to the blocking artifacts. C h a p t e r 4 B lock iness Remova l Techniques 4.1 Introduction The "Blocking artifact" is one of the most noticeable degradation of the block transform coding. During the past few years, several types of techniques have been developed to reduce the blocking effects. They can be applied during the encoding process and they can also be applied during or after the decoding process. In this chapter, we will review and compare several deblock-ing techniques. 4.2 Blockiness reduction schemes used in the encoding process 4.2.1 Block overlapping method In conventional transform image coding, the image is always segmented into non-overlap-ping, rectangular shape blocks for easy processing and for yielding better compression ratios. The discontinuities at block boundaries appear as a regular pattern and are thus very visually annoy-ing. Instead of dividing an image into non-overlapped blocks, an overlap encoding method is sug-gested by Reeve and Lim [7]. This technique employed overlapped blocks where there is a small overlap around the perimeter of each block. The overlapped pixels at the boundaries would then be encoded twice. The boundary discontinuity is reduced by averaging the values of overlapped pixels in the decoding process. Block overlapping has been shown to be effective in reducing the blocking effect. However, this approach increases the amount of computations and it increases the bit rate as well since redundant information (overlapped pixels) has to be encoded. There is a 13% 33 34 increase in the bit rate (based on a 256x256 pixel image with 16x16 pixel subimages) according to [7]. 4.2.2 Lapped orthogonal transform A more efficient technique to reduce blocking effect is called Lapped Orthogonal Trans-form (LOT) [17]. In a traditional transformation, the number of transform coefficients is the same as the number of the input samples. This new type of transformation is capable of mapping L sam-ples into N transform coefficients, L>N. For a traditional 1-D transformation case, assume that the incoming signal vector X has the length MN. This vector is subdivided into M segments and each segment has N samples. Let the vector Y denotes the transform coefficients of X and that Y is given by Y = TX, where T is the transpose of an MN x MN block-diagonal matrix, in the form (4. 1) T = D 0" D (4.2) Lo DJ where D is a N x N matrix, whose columns are the transform basis. The LOT can also be defined as (4. 1) except that T is given by On T = (4. 3) 0 2J 35 where P0 is an L x N (N < L < 2N) matrix and P] & P2 are L'xiV {L 28.75 28.98 31.05 31.83 32.47 32.77 33.04 4.3.3 Adaptive filtering using edge classification The human visual system is not sensitive to blockiness that occur in texture and edge regions. Low-pass filtering at texture and edge regions also results in undesired blurring. Therefore a low-pass filtering that adapts to different regions is able to prevent low-pass blurring of regions with texture and edges. In [10], [14], [15], [19], different edge classification procedures are used to distinguish between monotone, textured and strong edge regions of the image. The spatial filtering schemes are then driven by the output of the edge classifiers. Edge and texture 50 regions are not filtered since humans are more sensitive to errors in the low frequency components of the image than those in the high frequencies. There are different kinds of edge classification methods. The classification can be done in the spatial or DCT domains. In [9], [10] and [14], spatial methods such as gradient / threshold detector and histogram methods are used. However, it is difficult to detect edges of images with complex content which are coarsely quantized. The block boundaries may be artificially identified as edges. In [15] and [19], block classification is performed by examining the DCT coefficients. The blocks which have none zero coefficients outside the (2x2 or 3 x 3 ) top left coiner region of the D C T 8 x 8 transform are declared as edge or texture regions. This method is briefly described as following. A block is marked as low frequency block if CDCT^i)9KloW = 0 . (4-9) Similarly, a block is marked as high frequency block if CDCT&D'Khigh*®' (4-10) where CDCT(i, j) is the 8 x 8 block of DCT coefficients of block (i, j), • is the element-by ele-ment multiplication, Klow and Khi h are the test matrices used for detection of low frequency blocks and high frequency blocks and are defined by 57 0 0 0 K low (4. 11) and high 0 0 0 0 0 0 0 0 0 (4. 12) respectively. 0 is the 8x8 matrix of zeros. 4.3.4 Adaptive filtering using local statistical estimation A method with impressive post-processing results is given in [18]. In this algorithm, the filtering strategy relies on the estimation of the quantization error Tm the spatial domain for each n x n block. The estimation of Jis given by: n - 1 n - 1 u = Ov = 0 where Euv is determined by the quantization step and the probability distribution of the original unquantized DCT coefficient at frequency (u, v). The definition of Euv is: Euv = J ( y - C u v ) p u v ( y ) d y ^ ' v p u v ( y ) d y u,v = 1, n, (4. 14) 52 where puv is the probability distribution of the DCT coefficient at frequency (u, v) before quanti-zation, and luv, ruv are the left and right quantization boundaries. Cuv is the minimum mean squared error estimate of each DCT coefficient which is calculated by: Cllv = E[y\ \uv t, smooth the corresponding pixels via JC = x - at and y = y + at, and for each d <-t, smooth the pixels via x. = x + at and y = y - at. 6. To alleviate the discontinuity offset problem caused by the smoothing of boundary pixels, the smoothing process is iterated until the center of the block is reached. Note that since most of the high order DCT coefficients are quantized to zero, thus after this coarse quantization, it is difficult to get a good estimate of the probability distribution of the 53 DCT coefficient at each frequency from the mean and variance of quantized coefficients. If we can not get an accurate estimate of T for each block, we may risk over-smoothing or under-smoothing in the filtering process. Another drawback of this method is that the algorithm relies on the values of the DCT coefficients and quantization factors and thus has to access data from the decoder. Thus it has to be built inside the decoder and cannot be applied after the decoder. 4.3.5 Iterative blocking artifact removal techniques The projection onto convex sets (POCS) based recovery algorithm [11]-[12] is another powerful spatially-adaptive approach. Its basic idea is to optimize the value of the quantized transform coefficients subject to some smoothness constraint and some quantization constraint. A low-pass filter is applied along block boundaries to remove the discontinuity of block boundaries. The DCT coefficients of the restored image are re-mapped back to their original quantization levels to prevent oversmoothing of images. Such procedure needs to be iterated several times. The major drawback of this approach is its high computational complexity. It takes at least 3-5 iterations for this algorithm to converge, where both forward and inverse DCT are required in each iteration, this is besides other required computational overhead [12]. It also needs to access data from the decoder, and thus it has to be built inside the decoder too. 4.3.6 Dithering This method [10] adds a carefully-controlled, random dither signal to the transform coeffi-cients during the decoding process. The dither signal conforms to a model of the average energy distribution of the transform coefficients prior to quantization. At the encoder, each N x N coefficients block undergoes a quantization process to round 54 each coefficient value to the nearest integer that can be described mathematically as: FK i = rnd{Fk /qk>,) 0 < k, I < N - 1, (4. 16) where Fkt i is the quantized value of Fk t , the (k, l)th transform coefficient in the transformed block, N is the sub-block size, k, I are the matrix row and column indices respectively, and qk [ is the corresponding quantization coefficient. At the decoder, a reverse procedure is applied and the reconstructed transform coefficients F'k l are generated by: Fk,i = Qk,rpkj- (4.i7) In [20], McLaren proposes the following model for the average energy distribution P( j of the DCT coefficients: Pk l = 34.lOco^94 - 1 0 < k, I < N - 1, (4.18) r~2 2 where G)k t = 4k + I is the spatial frequency corresponding to transform coefficient F'k t . The dither signal is then generated by weighting each Pi • by a Gaussian signal G(0, a) with zero mean and a variance. In order to avoid introducing an excessively large random com-ponent, the dither signal has to be restricted within a maximum permissible amplitude.This is achieved by thresholding G(0, a). Thus, the final form of the proposed dithering scheme is given by: 55 f9*./-^./ + ^,/-G(0,o) if \G(0,G)\ al and P0 < P, is that lower order coefficients are more accurate than higher order coefficients because of the nature of the quantiza-tion matrix and thus they need less filtering. The modified coefficients of block C are shown in Figure 5.4. The modified pixel values of block C calculated by (5. 11) and (5. 12)are listed in Fig-ure 5.5a. As we can see from Figure 5.5b that the large intensity step across the block boundary is now broken into several small steps. 63 Block C 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 40 40 40 40 40 40 40 40 80 80 80 80 80 80 80 80 Block A Block B (a) Original block data gray level A pixels (b) The intensity plot of a row of block data. Figure 5.2 Original block data of example 1. F (u,v) Fb(u,v) F iu, v) 320 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 640 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480.000 -144.980 0 50.910 0 -34.017 0 28.839 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 5.3 D C T coefficients for blocks A, B and C. 64 480.000 - 86990 0 25.460 0 -17.010 0 14.42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 5.4 Modified coefficients of block C. Block C 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 40 40 40 40 47 48 49 50 70 71 72 73 80 80 80 80 Block A Block B (a) The modified block data gray level pixels (b) The intensity plot of a row of block data. Figure 5.5 Block data after modification. 5.4.2 Example 2 Now let us take a look at another example to see how well does this deblocking method preserve the image details. The original data is listed in Figure 5.6. We can see that there is a big 65 step at the block boundary and some small intensity variations within blocks. The DCT coefficient matrices of block A, block B are listed in Figure 5.7. If we apply low-pass spatial filter to the blocks, those small variations will be wiped out. However, after applying our above method, the intensity step at the boundary is reduced a lot while the small variations within the blocks are preserved. The DCT coefficient matrices of block C and the modified coefficients of block C are listed in Figure 5.8. The modified pixel values of block C calculated by (5. 11) and (5. 12) are listed in Figure 5.9. Block c 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 20 20 23 25 28 31 29 30 56 58 62 59 58 60 61 62 Block A Block B (a) Original block data gray level Jt (b) The intensity plot of a row of block data Figure 5.6 Original block data of example 2 F («, v) 206.0 -22.322 -4.656 2.953 0 0.784 3.472 -0.659 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Fb («. v) 467.0 -7.063 -0.219 -6.906 -4.250 0.844 3.156 1.188 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 5.7 D C T coefficients of Blocks A and B. F iu,v) 353.0 -77.375 3.219 23.156 -4.938 - 14.094 -5.156 12.563 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 FMc{-u, v) 348.188 -52.313 3.219 10.594 -4.938 -7.031 -5.156 6.406 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Figure 5.8 D C T coefficients of Block C before and after modification. 67 Block C 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 20 20 23 25 32 35 34 35 50 52 57 54 58 60 61 62 Block A Block B (a) The modified block data gray level 4 6S^ (b) The intensity plot of a row of block data Figure 5.9 Block data after modification. 5.5 Computation cost for blockiness reduction procedure in the DCT domain The computational cost for blockiness reduction procedure in the DCT domain includes three parts: calculating the DCT coefficients, coefficients modifications and the inverse DCT. 5.5.1 The computation cost for calculating DCT coefficients Most of the computational effort is spent in calculating the DCT coefficients. Let's con-sider the number of computations needed to remove the appearance of one horizontal and one ver-tical edge of a block. Let 8x4 pixels of the right half of block A and the 8 x 4 pixels of the left 68 half of block B form a block denoted as block C (see Figure 5.10a). Let 4x8 pixels of the bottom half of block A and the 4 x 8 pixels of the upper half of block D form a block denoted as block E (see Figure 5.10b). The D C T coefficients of the original block, block C and block E are required. (a) Horizontal Blockiness ** V e r t i c a l Blockiness Figure 5.10 Blockiness Diagram. The horizontal blockiness removal procedure, i.e. removing the vertical edge between block A and B , needs to calculate 5 D C T coefficients for each of the original blocks (A & B) and block C, these are the D C coefficient and the 4 odd-numbered coefficients in the first row of each block. Please note that the results of calculating the 5 D C T coefficients for the first row of blocks A and B wil l be used in removing the edges between blocks A & A ' and B & B ' . Similarly, the vertical blockiness removal procedure needs to calculate the D C coefficient and the 4 odd-num-bered coefficients in the first column of each original blocks (A & D) and block E . The D C T coef-ficients for the first column of original blocks can also be used twice. This means to remove the edges between A & B and between A & D four times the cost of calculating the 4 odd-numbered D C T coefficients of one row (or column) and three times the calculation cost of calculating one D C coefficient are required. Let's consider the calculation involved in the first row 2-D D C T coef-ficients for example. 69 N-lN-l Fc(0,v) = CUCV2 £ c ( M ) W D C r ( 0 , £ ) W D C r ( v , / ) k = Ol = 0 N - 1 / / V -i = o \k = o J WDCT{v, I) . (5. 13) Thus, this calculation takes two stages. First we calculate the summation of each column. Let N- 1 5(/)= £ c ( M ) , / = 0. . . /V-l . (5.14) k = 0 Obviously, this takes 64 additions. Then we take 1-D DCT of x(l). We can use a fast algorithm to calculate the 1-D DCT of x(l). Since we only need to calculate the DC and the 4 odd-numbered coefficients, the decimation-in-frequency (DIF) fast DCT may be a good choice. The 1-D Hou algorithm [22] provides a numerically stable, fast algorithm for computing the 1-D DCT coefficients. The logic diagram of calculating the odd-numbered coefficients of the 8-point 1-D DCT is illustrated in Figure 5.11. We can see 16 additions, 8 multiplications and 4 shifts ( x 2) are needed to calculated all the 4 odd-numbered coefficients. Therefore, the total computation cost for calculating the odd-numbered DCT coefficients is 64+16=80 additions, 8 multiplications, and 4 shifts. The DC coeffi-cient is the sum of x(l) (I = 0...N- 1) and thus this takes 8 additions. We also need to calculate F(3, 3) of both block C and block E for constraints checking. F(3, 3) can be calculated by (5. 1). This takes about 64 additions and 64 multiplications. In summary, the total computation cost needed for calculating the DCT coefficients is: 70 x(0) x(2) x(4) x(6) x(7) x(5) x(3) x(1) -L±M1 •1 h + • - -1 u X(1) X(3) X(5) -X(7) Figure 5.11 Fast 1-D 8 points D C T diagram. 80 x 4 + 8 x 3 + 64 x 2 = 472 additions, 8x4 + 64x2 = 160 multiplications, and 4x4 = 16 shifts. This makes a total of 648 operations. This is less than the cost of computing one block 2-D DCT, since calculating one block 2-D DCT is equivalent to calculating 16 1-D DCT transforms and each 1-D DCT transform takes 29 additions, 12 multiplications and 5 shifts according to Hou's fast algorithm. This makes a total of (29 + 12 + 5) x 16 = 736 operations. 5.5.2 The computation cost for coefficients modification For the horizontal blockiness reduction procedure, the computation cost for constraints checking, equations (5. 5), (5. 6) and (5. 7), will be 2 additions and 3 comparisons. The computa-71 tion cost for coefficients modifications, equations (5. 8) and (5. 9), will be 2 x 5 = 10 additions and 2 x 5 - 10 multiplications. The same number of computations is required in the vertical blockiness reduction proce-dure. Thus this makes a total of (2 + 3+10+10)x2 = 50 operations. 5.5.3 The computation cost for inverse DCT The rest of computation cost is spent in calculating inverse DCT. The cost of calculating £,(/), / = 0...7 (see equation (5. 12)) is 8 x 5 = 40 additions and 8 x 4 = 32 multiplica-tions. The cost of changing original intensity value by £( / ) will be 64 additions according to equation (5. 11). Since such procedure is performed twice, one for horizontal blockiness reduction and one for vertical blockiness reduction, the total computation cost for calculating inverse DCT is 2 x (40 + 64) = 208 additions and 2 x 32 = 64 multiplications. This makes a total of 272 oper-ations. 5.5.4 Summary In summary, the total computation cost used for blockiness reduction in DCT domain is 648 + 50 + 272 = 970 operations per block. We can carry on the constraints checkings first. If the constraints checkings are failed, then the subsequent computations are not needed. 5.6 Spatial filtering For those boundaries which were detected to have heavy blockiness appearance in the 72 originally decompressed image, we can further apply a circular-asymmetric two-dimensional filter [13] which is adapted to the orientation of the block boundary. An epsilon-filter [21] may also be applied to remove any blocking artifacts that might have remained. The later filter also helps to reduce other coding artifacts, such as ringing artifacts. The epsilon-filter described in Figure 5.12 is proven to be able to smooth out fairly uniform regions while preserving sharp edges. Let xi j be the pixel value of the image, and y • • be its value after applying the epsilon fil-tering. c c yt, j = xi, j - N X X f(*i, j ~ xi + kj+i)' <5-15> k = —cl = —c where f(x) is shown in Figure 5.12 and TV is the number of pixels in 2c x 2c (c=l or 2) neighbor-hood of xi f(x) i fc { —fc —fc t Figure 5.12 Epsilon filter. The epsilon-filter performs a conditional smoothing operation. The small variations in intensity values of neighboring pixels are taken into account while large variations are ignored, thereby smoothing out fairly uniform regions while preserving sharp edges. The epsilon parame-73 ter, E represents the filtering "strength" of this postfilter. The choice of the value of the parameter epsilon should clearly depend on the strength of the artifacts, i.e. on the quality of the decoded image—lower values of epsilon are desirable when the image quality is high, higher values are desirable when the image quality is poor. 5.7 Results From our experiments, we have found that the proposed algorithm produces better subjec-tive quality and between 0.1 - 1.3 dB increase in PSNR over that of JPEG images and MPEG2 TM5. Figure 5.13 a shows the decoded JPEG image at 0.298 bpp. The post-processing results obtained by our method with and without spatial filtering are shown in Figure 5.13 b and Figure 5.13 c, respectively. As we can see, our method produces better subjective quality than the results obtained by the symmetric or anisotropic filtering. Most of the blocking artifacts in the decoded image after post-processing by our DCT domain filter (without spatial filtering) are removed except for those in the texture regions. However, blocking effects are less noticeable in the texture regions, especially in a video stream. Therefore, when the computation time is critical, spatial fil-tering may not be applied. Our method can efficiently eliminate blocking effects even for images coded at very low bit rate. It smooths out the undesired block edges while retaining the sharpness of the image. Fig-ure 5.14 a shows the decoded JPEG image at 0.248 bpp, which is very blocky and the blocking effects are hardly removed by symmetric or anisotropic filtering (see Figure 4.2 b and Figure 4.4 b). Figure 5.14 b shows the image processed by the DCT domain filter only. The blocking effects are almost gone except for several very strong blockiness. After applying spatial filtering, the blocking artifacts are smoothed out completely (Figure 5.14 c). More post-processing results 74 obtained by our method are shown in Appendix A. The PSNR comparison of several images before and after post-processing at different compression rates are shown in Table 5.1 to Table 5.3. The main attraction of our algorithm is that we can highly preserve the high frequency components while smoothing out the boundary discontinuities. This is because only the DCT coefficients related to the blocking artifact are modified while other frequency components remain the same. The modification also takes into account the correlation between neighboring blocks. Our algorithm is adaptive to different contents and qualities. Table 5.1 PSNR comparison of Lena image before and after post-processing. Compression Bit-Rate (bpp) .198 .251 .273 .320 .339 .379 .400 PSNR Before Post-Processing (dB) 26.52 27.70 28.24 29.02 2935 29.91 30.15 PSNR After Post-Processing (dB) 27.42 28.49 28.92 29.56 29.81 30.25 30.43 Table 5.2 PSNR comparison of Indian image before and after post-processing. Compression Bit-Rale (bpp) .200 .248 .273 .318 .341 .382 .423 PSNR Before Post-Processing (dB) 26.82 27.78 28.17 28.86 29.15 29.65 30.07 PSNR After Post-Processing (dB) 27.29 28.10 28.49 29.11 29.33 29.73 30.10 Table 5.3 PSNR comparison of Peppers image before and after post-processing. Compression Bit-Rate (bpp) .150 .185 .220 .248 .274 .285 .298 PSNR Before Post-Processing (dB 27.97 29.27 30.39 31.20 31.20 32.21 32.51 PSNR After Post-Processing (dB) 29.11 30.28 31.27 31.97 32.54 32.83 33.09 Figure 5.13 a Decompressed image, bitrate=0.298 bpp, PSNR=32.51 dB. Figure 5.13 b Post-processed image by our D C T domain filter (without spatial filtering), bitrate=0.298 bpp, PSNR=32.56 dB. Figure 5.13c Post-processed image by our algorithm (with spatial filtering), bitrate=0.298 bpp, PSNR=33.09 dB. Figure 5.14 a Decompressed image, bitrate=0.248 bpp, PSNR=31.2 dB Figure 5.14 b Post-processed image by our D C T domain filter (without spatial filtering), bitrate=0.248 bpp, PSNR=31.36 dB. Figure 5.14 c Post-processed image by our algorithm (with spatial filtering), bitrate=0.248 bpp, PSNR=31.97 dB. Chapter 6 Conclusions and Future Work This thesis addresses artifacts that arise in block based DCT image compression and in specific the blocking effects. The objective of the thesis is to classify, analyze and find solutions to reduce the visual appearance of blocking artifacts. The thesis covers the following topics. 6.1 Compression artifacts classification In this part, we classified several most noticeable compression artifacts. The appearance of each type of artifact and their visual perception to human eyes are examined and analyzed. Further more, by analyzing the DCT transform coding and other compression techniques such as motion compression, etc., the cause of each type of artifact is revealed. Understanding the cause and properties of compression artifacts is the basis of artifacts reduction algorithm design. 6.2 The design of blocking artifacts detector We first discussed a common method used for blocking artifacts detection. This is the gradient threshold method. We then introduced the difference of slope method. These two methods are edge detectors. However, the difference of slope method is designed according to the properties of the blocking artifacts. As a result, it is able to distinguish genuine edges from those of blocking artifacts. According to several experimental tests, the detection results using our difference of slope method are more accurate than those obtained by the gradient threshold method. 81 82 6.3 Analysis of blocking artifacts removal techniques In general, there are two classes of blocking artifacts removal techniques. One is applied during encoding process, the other is applied during or after the decoding process. The blocking artifacts removal techniques applied in the encoding process are proved to be effective in reducing the blocking effects. However, the standard compression codec has to be changed according to each specific implementation. Each method has its own problem too. The bit rate is increased when using block overlapping method. Lapped orthogonal transform requires 20-30 percent more computations. Blocking artifacts removal techniques applied during or after the decoding process have the advantage of not interfering with the standard encoding processes. This makes the application more flexible. Several post processing methods have been proposed for blocking artifacts removal. The symmetric, two-dimensional filtering is easy to implement. It needs the least computation efforts too. However, it has the drawback of high-frequency detail loss due to its low-pass filtering nature and it doesn't work at very low bit rate, i.e. less than 1/4 bpp. Anisotropic filtering is proved to be able to effectively reduce the blocking effects while preserving the sharpness of the image since it is designed to take into account the property of the directional appearance of blocking artifacts. However, this method doesn't work well at very low bit rates either. To avoid undesired blurring that results from the above filters, adaptive filtering using edge classification is another possible choice. The classification can be done in the spatial or DCT 83 domains. The spatial filtering scheme is then driven by the output of the edge classifier. Edge and texture regions are usually not filtered since humans are more sensitive to low frequency errors than to high frequency ones. However, it is difficult to detect edges of images with complex content which are coarsely quantized. The block boundaries may be artificially identified as edges. Adaptive filtering using local statistical estimation is a method which needs to estimate the local quantization errors in the spatial domain. Since most of the high order DCT coefficients are quantized to zero after coarse quantization, it is difficult to get a good estimate of the local spatial quantization error. Another drawback of this method is that the algorithm has to access data from decoder. Thus it has to be built inside the decoder. Iterative blocking artifact removal technique is another powerful spatial-adaptive approach. The major drawback of this approach is its high computational complexity. It has to be built inside the decoder too. Dithering is a method of providing masking effects and illusion of continuous-tone pictures by introducing a carefully-controlled, random dither signal which conforms to a model of the average energy distribution of the transform coefficients prior to quantization during the decoding process. 6.4 A new blockiness reduction scheme We propose a new approach which performs blockiness reduction in the DCT domain to reduce the block-to-block discontinuities present in the spatial domain. Our algorithm takes advantage of the fact that the original pixel levels in the same block are of good continuity and it 84 uses this property and the correlation between the neighboring blocks to reduce the discontinuity of the pixels across the boundaries. Our algorithm can highly preserve the high frequency components while smoothing out the boundary discontinuity. Simulation results show that the proposed algorithm significantly reduces the blocking artifacts in both objective and subjective measures. It is able to be adaptive to different image content and qualities. It can effectively remove blocking artifacts even at very low bit rate. The amount of computation it requires is also acceptable compared to iterative blocking artifact removal techniques. 6.5 Future work Blocking artifact is the most noticeable degradation resulting from block based compres-sion. The new algorithm proposed in this thesis is proved to be a powerful blockiness reduction method. However, since it requires a large number of computations, the design of a real-time version algorithm to be applied in the video signal processing is very important. Mosquito noise is another prominent noise of motion picture compression. Unlike the blocking artifacts, mosquito effects are difficult to detect and thus are difficult to remove. Further study of mosquito detection and removal forms worthwhile further research. Bibliography [1] Gregory K. Wallace, "The JPEG Still Picture Compression Standard," Communications of the ACM, April 1991, Vol.34, No. 4, pp:29-44. [2] Didier Le Gall, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of The ACM, April 1991, Vol. 34, No. 4, pp:47-58. [3] ISO/IEC 13818-2, 1995. [4] "Guide to the use of the ATSC Digital Television Standard," Advanced Television Systems Committee. [5] Anil K. Jain, "Fundamentals of Digital Image Procesing/'Prentice Hall, Englewood Cliffs, 1989. [6] Mark J.T. Smith, Alen Docef, "A study guide for digital Image Processing," Scientific Publisher, 1997. [7] H.Reeve and J. Lim, "Reduction of Blocking Effect in Image Coding," Proc. ICASSP'83, pp.1212-1215, 1983. [8] H.R.Wu, M.Yuen and B.Qiu, "Video Coding Distortion Classification and Quantitative Impairment Metrics," Proceedings of ICSP' 96, 962-965. [9] H.C. Kim and H.W. Park, "Signal Adaptive Post-processing for Blocking Effects Reduction in JPEG Image," IEEE International Conference on Image Processing . v2 , 1996,41-44. [10] J.D. McDonnell, R.N. Shorten and A.D. Fagan, "An Edge Classification Based Approach To The Post-Processing of Transform Encoded Images," Proc. ICASSP '94, (V): 329-332, 1994. [11] Shigenobu Minami and Avideh Zakhor, "An Optimization Approach for Removing Blocking Effects in Transform Coding," IEEE Transactions On Circuits And Systems For Video Technology, Vol. 5, No. 2, pp:74-82, April, 1995. 85 Bibliography 86 [12] Yongyi Yang, Nikolas P. Galatsanos and Aggelos K. Katsaggelos, "Projection-Based Spatially Adaptive Reconstruction of Block-Transform Compressed Images," IEEE Transactions On Image Processing, Vol. 4, No. 7, pp: 896-908July 1995. [13] Kou-Hu Tzou, "Post-Filtering of Transform-Coded Images," SPIE Vol. 974 Applications of Digital Image Processing XI (1988), pp: 121-126. [14] Lee, H.C. Kim, and H.W. Park, "Blocking Effect Reduction of JPEG Images by Signal Adaptive Filtering," IEEE Transactions on Image Processing, Vol. 7, No. 2, pp:229-234, Feb. 1998. [15] William E. Lynch, Amy R. Reibman and Bede Liu, "Post Processing Transform Coded Images Using Edges," Proc. ICASSP'95, pp: 2323-2326, 1995.Y.L. [16] Thomas Meier, King N. Ngan and Gregory Crebbin, "A Region-based Algorithm for Enhancement of Images Degraded by Blocking Effects," IEEE, 1996, 405:408 [17] Henrique S. Malvar and David H. Staelin, "The LOT: Transform Coding Without Blocking Effects," IEEE Transactions On Acoustics, Speech, and Signal Processing, Vol. 37, No. 4, pp: 553-559, April 1989. [18] Jim Chou, Matthew Crouse and Kannan Ramchandran, "A Simple Algorithm for Removing Blocking Artifacts in Block-Transform Coded Images," IEEE Signal Processing Letters, Vol. 5, No.2, pp: 33-35, Feb. 1998. [19] Jianping Hu, Nadir Sinaceur, Fu Li, Kwok-Wai Tarn, Zhigang Fan, "Removal of Blocking and Ringing Artifacts In Transform Coded Images," Proc. ICASSP'97, pp: 2565-2568. [20] David L. McLaren, "Removal of Subjective Redundancy From DCT-coded Images," IEE Proceedings-I, Vol. 138, No. 5, Oct. 1991, pp:345-350. [21] Arnaud Jacquin, and Hiroyuki Okada, "Content-Adaptive Postfiltering for Very Low Bit Rate Video," Data Compression Conference, 1997. pp: 111-120. [22] Hsieh S. Hou, "A Fast Recursive Algorithm for Computing the Discrete Cosine Transform," IEEE Tran. on ASSP, Vol. 35, No. 10, Oct. 1987, pp: 1455-1461. [23] Si Jun Huang, "Adaptive Noise Reduction and Image Sharpening for Digital Video Compression," Proc. of the IEEE International Conference on Systems Man and Cybernetecs. v4, 1997, pp:3142-3147. Appendix A. Figure A .2 Post-processed Lena image, bitrate=0.199 bpp, PSNR=31.05 dB Figure A.3 Decompressed Lena image, bitrate=0.271 bpp, PSNR=31.95 dB. Figure A.4 Post-processed Lena image, bitrate=0.271 bpp, PSNR=32.33 dB. Figure A.5 Decompressed Peppers image, bitrate=0.15 bpp, PSNR=27.97 dB. Figure A.6 Post-processed Peppers image, bitrate=0.15 bpp, PSNR=29.11 dB. Figure A.7 Decompressed Peppers image, bitrate=0.22 bpp, PSNR=30.39 dB. Figure A.8 Decompressed Peppers image, bitrate=0.22 bpp, PSNR=31.27 dB. Figure A . 9 Decompressed Indian image, bitrate=0.273 bpp, PSNR=28.17 dB. Figure A. 10 Post-processed Indian image, bitrate=0.273 bpp, PSNR=28.50 dB. Figure A. 11 Decompressed Indian image, bitrate=0.382 bpp, PSNR=29.65 dB. Appendix A. Figure A. 12 Post-processed Indian image, bitrate=0.382 bpp, PSNR=29.74 dB. **