Image Derivative Estimation and Its Applications to Edge Detection, Quality Monitoring and Copyright Protection by Ehsan Nezhadarya M.A.Sc., Sharif University of Technology, 2005 B.A.Sc., Sahand University of Technology, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) The University of British Columbia (Vancouver) May 2013 c Ehsan Nezhadarya, 2013 Abstract Multi-order image derivatives are used in many image processing and computer vision applications, such as edge detection, feature extraction, image enhancement, segmentation, matching, watermarking and quality assessment. In some applications, the image derivatives are modified and then inverse-transformed to the image domain. For example, one approach for image de-noising is to keep the significant image derivatives and shrink the non-significant derivatives. The de-noised image is then reconstructed from the modified derivatives. The main challenge here is how to inverse-transform the derivatives to the image domain. This thesis proposes different algorithms to estimate the image derivatives and apply them to image de-nosing , watermarking and quality assessment. For noisy color images, we present a method that yields accurate and robust estimates of the gradient magnitude and direction. This method obtains the gradient at a certain direction by applying a prefilter and a postfilter in the perpendicular direction. Simulation results show that the proposed method outperforms state-ofthe-art methods. We also present a multi-scale derivative transform, MSDT, that obtains the gradient at a given image scale using the detail horizontal, vertical and diagonal wavelet coefficients of the image at that scale. The inverse transform is designed such that any change in the image derivative results in the minimum possible change in the image. The MSDT transform is used to derive a novel multi-scale image watermarking method. This method embeds the watermark bits in the angles of the significant gradient vectors, at different image scales. Experimental results show that the proposed method outperforms other watermarking methods in terms of robustness to attacks, imperceptibility of the watermark and watermark capacity. ii The MSDT is then used to obtain a semi-blind method for video quality assessment. The method embeds pseudo-random binary watermarks in the derivative vectors of the original undistorted video. The quality of the distorted video is estimated based on the similarity between the embedded and the extracted watermarks. The simulation results on video distorted by compression/decompression show that the proposed method can accurately estimate the quality of a video and its frames for a wide range of compression ratios. iii Preface This Ph.D. thesis presents the research conducted by Ehsan Nezhadarya, under the supervision of Dr. Rabab K. Ward. The list of publications resulting from this work is given on the next page. The work presented in Chapter 2 has been published in papers [4] and [8]. The content of Chapter 3 appears in papers [2] and [5]. Chapter 4 presents our work in papers [3] and [6]. Finally, our work on video quality assessment, published in paper [1], is included in Chapter 5. The work presented in all of these manuscripts was performed by Ehsan Nezhad Arya, including designing and implementing the proposed algorithms, performing all experiments, analyzing the results and writing the manuscripts. The work was conducted with the guidance and editorial input of Dr. Z. Jane Wang and Dr. Rabab K. Ward. This thesis is written by Ehsan Nezhadarya, and is revised by Dr. Rabab K. Ward. iv • Journal Papers 1. Ehsan Nezhadarya and Rabab K. Ward, “Semi-blind Quality Estimation of H.264/AVC Compressed Video Using Digital Watermarking”, Elsevier Digital Signal Processing, under revision. 2. Ehsan Nezhadarya, Rabab K. Ward and Z. Jane Wang, “Multiscale Derivative Transform Theory and Its Applications”, submitted to IEEE Transactions on Image Processing (TIP), TIP-08430-2012. 3. Ehsan Nezhadarya, Z. Jane Wang and Rabab Ward, “Robust Image Watermarking Based on Multiscale Gradient Direction Quantization”, IEEE Transactions on Information Forensics and Security (TIFS), vol. 6, no. 4, Dec. 2011. 4. Ehsan Nezhadarya and Rabab Ward, “A New Scheme for Robust Gradient Vector Estimation in Color Images”, IEEE Transactions on Image Processing (TIP), vol. 20, no. 8, pp. 2211-2220, Aug. 2011. • Conference Papers 5. Ehsan Nezhadarya, Rabab Ward and Z. Jane Wang, “Wavelet-Based Gradient Transform And Its Applications”, IEEE International Workshop on Multimedia Signal Processing (MMSP), Banff, Canada, pp. , September 2012. 6. Ehsan Nezhadarya, Z. Jane Wang and Rabab Ward, “A New Data Hiding Method Using Angle Quantization Index Modulation In Gradient Domain”, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech, pp. 2440-2443, May 2011. 7. Ehsan Nezhadarya, Z. Jane Wang and Rabab Ward, “Watermark Survival Chance (WSC) Concept for Improving Watermark Robustness Against JPEG Compression”, IEEE Int. Conf. on Image Processing 2010 (ICIP), Hong Kong, pp. 3697 - 3700, Sep 2010. 8. Ehsan Nezhadarya and Rabab Ward, “A Robust Morphological Gradient Estimator and Edge Detector for Color Images”, IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), pp. 1062 1065, Dallas, TX, March 2010. v 9. Ehsan Nezhadarya and Rabab Ward, “An Efficient Method For Robust Gradient Estimation of RGB Color Images”, IEEE Int. Conf. Image Processing (ICIP), pp. 701-704, Cairo, Egypt, November 2009. 10. Ehsan Nezhadarya, Z. Jane Wang and Rabab Ward, “Image Quality Monitoring Using Spread Spectrum Watermarking”, IEEE Int. Conf. Image Processing (ICIP), pp. 2233-2236, Cairo, Egypt, November 2009. vi Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 xxi Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Image Gradient Estimation . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Gradient Estimation in Noisy Images . . . . . . . . . . . 3 1.1.2 Multiscale Gradient Estimation . . . . . . . . . . . . . . 5 1.1.3 Invertible Multiscale Gradient Estimation . . . . . . . . . 5 Application to Image Watermarking . . . . . . . . . . . . . . . . 6 1.2.1 7 1.2 1.3 Watermarking in the Gradient Domain . . . . . . . . . . . Application to Image/Video Quality Monitoring . . . . . . . . . 8 1.3.1 Quality Assessment Overview . . . . . . . . . . . . . . . 8 1.3.2 Available Approaches for Quality Monitoring 1.3.3 in Multimedia Networks . . . . . . . . . . . . . . . . . . 10 Quality Monitoring Using Gradient-based Data Hiding . . 11 vii 1.4 2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . 12 Noise Robust Gradient Vector Estimation in Color Images . . . . . 14 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Robust Color Gradient Estimation . . . . . . . . . . . . . . . . . 15 2.2.1 Some Definitions and Notations . . . . . . . . . . . . . . 15 2.2.2 The Proposed Class of Color Gradient Estimators . . . . . 16 2.2.3 The Proposed Highpass, Lowpass and Aggregation Operators 18 2.2.4 Computational Considerations . . . . . . . . . . . . . . . 22 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Artificial Color Images . . . . . . . . . . . . . . . . . . . 24 2.3.2 Natural Color Images . . . . . . . . . . . . . . . . . . . . 28 2.3.3 Gradient Estimation . . . . . . . . . . . . . . . . . . . . 33 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Multi-scale Derivative Transform Theory and Applications . . . . . 37 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 Multiscale Image Derivative Estimation Using 2-D CWT . 39 3.2.2 Traditional Multiscale Image Derivative Estimation Using 2.3 2.4 3 2-D DWT . . . . . . . . . . . . . . . . . . . . . . . . . . 40 The Proposed Multiscale Derivative Transform . . . . . . . . . . 42 3.3.1 Forward Derivative Transform . . . . . . . . . . . . . . . 42 3.3.2 Inverse Derivative Transform . . . . . . . . . . . . . . . . 49 Potential Applications of the Proposed Transform . . . . . . . . . 50 3.4.1 Texture Features For Classification . . . . . . . . . . . . . 50 3.4.2 Derivative Vector Quantization for Image Watermarking . 51 3.4.3 Image Quality Assessment . . . . . . . . . . . . . . . . . 53 3.4.4 Multiscale Edge Detection . . . . . . . . . . . . . . . . . 54 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Image Watermarking Based on Gradient Direction Quantization . . 57 4.1 Angle Quantization Index Modulation (AQIM) . . . . . . . . . . 61 4.2 Proposed Watermark Embedding Method . . . . . . . . . . . . . 62 3.3 3.4 3.5 4 viii 4.2.1 Gradient Direction Based Watermarking . . . . . . . . . . 4.2.2 Obtaining FD Gradient Vectors Using Multi-scale Derivative Transform . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.3 Effect of Changing the Gradient Angle on DWT Coefficients 64 4.2.4 Embedding the Watermark Bit in the Gradient Angle . . . 4.2.5 Overcoming the Bit-ordering Error Using Uniform Scrambling of the Gradient Vectors 4.2.6 . . . . . . . . . . . . . . . 66 68 Overcoming the Problem of Extracting a Watermark From an Unwatermarked Gradient Vector . . . . . . . . . . . . 73 4.3 Proposed Watermark Decoding Method . . . . . . . . . . . . . . 75 4.4 Simulation Results and Discussions . . . . . . . . . . . . . . . . 78 4.4.1 4.5 5 62 Comparison Between Single-scale GDWM and Multi-scale GDWM . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.4.2 Performance Comparison of GDWM and AQIM . . . . . 80 4.4.3 Robustness to Attacks . . . . . . . . . . . . . . . . . . . 80 4.4.4 Comparison With Other Watermarking Methods . . . . . 84 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Semi-blind Quality Estimation of Compressed Videos in the Image Derivative Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.2.1 Spread Transform Dither Modulation (STDM) . . . . . . 92 The Proposed Video Quality Assessment Method . . . . . . . . . 94 5.3.1 Watermark Embedding Method . . . . . . . . . . . . . . 94 5.3.2 Enhancing the Capacity-Imperceptibility-Robustness Trade- 5.3 off by Scrambling the Gradient Fields . . . . . . . . . . . 97 5.3.3 Watermark Extraction Method . . . . . . . . . . . . . . . 98 5.3.4 Watermark Bit Error Rate Assessment 99 . . . . . . . . . . 5.4 The Quality Assessment Method . . . . . . . . . . . . . . . . . . 101 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 104 5.5.1 The Effect of Quantization Step Size on QA Curves . . . . 105 5.5.2 The Effect of Gradient Block Size on QA Curves . . . . . 107 ix 5.6 6 5.5.3 Multi-scale Watermarking for Quality Assessment . . . . 108 5.5.4 The Effect of GOP Length on Temporal Flicker Impairment 110 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 114 6.1 Research Significance and Contributions . . . . . . . . . . . . . . 114 6.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . 118 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 x List of Tables Table 2.1 FOM values obtained from the edge maps of the noisy artificial test image for 5 different realizations of noise. The image is corrupted by a mixture of uncorrelated Gaussian noise with σ = 20 and uncorrelated salt and pepper noise with probability p = 0.2. Table 4.1 . . . . . . . . . . . . . . . . . . . . The sensitivity of the uniformity metric to different scrambling parameters, at wavelet levels 3, 4 and 5 . . . . . . . . . . . . . . . . Table 4.2 23 71 The effect of changing the scrambling parameters p and q on BER(%). The results show the deviation of the BER(%) from BERopt under different types of attacks. (Message length=64 bits, PSNR=42 dB) . 73 Table 4.3 Angular quantization step sizes at wavelet levels 3, 4 and 5. 78 Table 4.4 The SSIM comparison between the single-scale GDWM (SS . . . . GDWM) and multi-scale GDWM (MS GDWM). . . . . . . . . . . . . . . . . 80 . . . 81 Table 4.5 The BER (%) results of GDWM under different types of attacks Table 4.6 The BER (%) results of the proposed GDWM method under AWGN and Salt & Pepper attacks. . . . . . . . . . . . . . . . . . . . . . Table 4.7 83 The BER comparison between the proposed GDWM and MWT-EMD [12] under different types of attacks (Message length=64 bits, PSNR=42 dB) 84 Table 4.8 The BER comparison between the proposed GDWM and Wang’s method [104] under different types of attacks (Message length=256 bits, PSNR=42 dB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4.9 85 The BER comparison between the proposed GDWM method and Akhaee’s method [5] under different types of attacks (Message length=128 bits) xi 85 Table 4.10 The BER comparison between the proposed GDWM method and VLQIM method [48] used for embedding the watermark in image Lena, under AWGN and JPEG compression attacks (Message length=128 bits, PSNR=45.70 dB) . . . . . . . . . . . . . . . . . . . . . . . . . Table 5.1 86 MS values of the PSNRc -BER curves, obtained to estimate the quality of the test video sequences after H.264/AVC compression. . . . . . 110 Table 5.2 MS values of the SSIMc -BER curves, obtained to estimate the quality of the test video sequences after H.264/AVC compression. . . . . . 111 xii List of Figures Figure 1.1 (Left) Image “Lena”. (Right) Just Noticeable Difference (JND) values obtained for each pixel of image Lena, based on the method proposed in [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.1 Figure 2.2 7 The proposed gradient estimation approach applied in a window of size 5 × 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Behavior of the color edge detectors for the artificial test image corrupted by a mixture of uncorrelated Gaussian noise with σ = 20 and uncorrelated salt and pepper noise with probability p = 0.2. (a) Noisy image, (b) MVD result (FOM=0.6628), (c) RCMG result (FOM=0.9457), (d) FPGA-Canny result (FOM=0.8760), (e) Proposed MVD-Median-Max result (FOM=0.9684), (f) Proposed MVDMedian-Mean result (FOM=0.9713), (g) Proposed RCMG-MedianMax result (FOM=0.9850), (h) Proposed RCMG-Median-Mean result (FOM=0.9949) . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.3 25 The average FOM-SNR curves of the proposed edge detectors, for the artificial test image corrupted by 10 different realizations of the uncorrelated Gaussian noise (a) and uncorrelated salt and pepper noise (b). The curves are given for the proposed MVD-MedianMean, MVD-Median-Max, RCMG-Median-Mean and RCMG-MedianMax edge detectors. . . . . . . . . . . . . . . . . . . . . . . . xiii 26 Figure 2.4 The average FOM-SNR curves of the proposed edge detectors, for the artificial test image corrupted by 10 different realizations of uncorrelated Gaussian noise (a) and uncorrelated salt and pepper noise (b). The curves are given for MVD, RCMG, FPGA-Canny and the proposed RCMG-Median-Mean edge detectors. . . . . . . . . . . Figure 2.5 27 Color edge results for the Kodak House image. (a) Original image, (b) MVD result, (c) RCMG result, (d) Proposed RCMG-MedianMean result. . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.6 29 Color edge results for Kodak House image. (a) Closeup of the region marked by the dashed horizontal rectangle in Fig. 2.2(a), (b) MVD result, (c) RCMG result, (d) Proposed RCMG-Median-Mean result. Figure 2.7 29 Color edge results for Kodak House image. (a) Closeup of the region marked by the dashed vertical rectangle in Fig. 2.5(a), (b) MVD result, (c) RCMG result, (d) Proposed MVD-Median-Max result, (e) Proposed MVD-Median-Max result, (f) Proposed RCMG-MedianMax result, (g) Proposed RCMG-Median-Mean result. . . . . . . . Figure 2.8 30 Color edge results for Kodak House image. (a) FPGA-Canny result (with window size 3 × 3), before adding noise, (b) FPGA-Canny result (with window size 5 × 5), (c) FPGA-Canny result (with win- dow size 3 × 3), after adding salt and pepper noise with p = 0.3 to the original image, (d) Proposed RCMG-Median-Mean result, before adding noise, (e) RCMG-Median-Mean result, after adding salt and pepper noise with p = 0.3 to the original image. . . . . . . . . Figure 2.9 31 The average FOM results for 20 standard images corrupted by uncorrelated Gaussian noise (a), and uncorrelated impulse noise (b). The results are produced by MVD, RCMG, FPGA-Canny and the proposed MVD-Median-Max, MVD-Median-Mean, RCMG-Median-Max and RCMG-Median-Mean methods. . . . . . . . . . . . . . . . . 32 Figure 2.10 Gradient estimation results. (a) Original image, (b) RCMG, (c) Color Canny operator, (d) Proposed RCMG-Median-Mean. . . . . . . . xiv 34 Figure 2.11 The average gradient vector P SN Rg results for 20 standard images corrupted by uncorrelated Gaussian noise (a), and uncorrelated impulse noise (b). The results are produced by Canny, RCMG and the proposed RCMG-Median-Mean. . . . . . . . . . . . . . . . . . Figure 3.1 35 The gradient fields at wavelet scales j = 1, 2, 3 and 4, using the Haar wavelet transform. (a) Synthetic Image. (b) (Upper row) the traditional Haar-Haar gradient estimates (obtained using Eq. (3.28)), (bottom row) the proposed multiscale Haar-FD gradient estimates (obtained using Eq. (3.25)). . . . . . . . . . . . . . . . . . . . . 51 Figure 3.2 Kodak “Houses” image . . . . . . . . . . . . . . . . . . . . . . 55 Figure 3.3 Edge results for the Kodak Houses image, obtained using the HaarHaar (a), and Haar-FD WDT matrices (b). (c) The closeup of (a), at the region marked with dashed rectangle in Fig. 3.2. (d) The closeup of (b). The arrows mark the areas where the image details are blurred. Figure 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Illustration of 5-level gradient field, obtained from 5-level wavelet decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 4.2 The block diagram of the proposed watermark embedding scheme. . 61 Figure 4.3 (Left) Image “Lena”. (Right) Just noticeable values (JND) obtained for each pixel of image Lena, based on the method proposed in [9]. . Figure 4.4 63 The effect of different types of attacks on the significant gradient vectors of level 4. (a) Original image. (b) Grayvalue scaling change by scaling factor of 2. (c) Gaussian filtering with window size of 5. (d) JPEG compression with quality factor of 50. . . . . . . . . . . Figure 4.5 64 Illustration of the proposed absolute angle quantization index modulation (AAQIM). (Vectors before and after watermarking are repre- . . . . . . . . . . 67 Figure 4.6 Block diagram of the proposed watermark decoding method. . . . . 76 Figure 4.7 Illustration of the proposed GMMG filters f1 and f2 . Letters “M” sented by black and gray arrows, respectively.) and “G” stand for median and Gaussian filters, respectively. . . . . xv 77 Figure 4.8 (Upper row) Original test images “Peppers”, “Baboon”, “Barbara” and “Lena”. (Lower row) Watermarked test images using the proposed GDWM method, with a 256-bit watermark embedded and PSNR=42dB. 79 Figure 4.9 The comparison of the performance of the AQIM and the proposed GDWM methods for AWGN attack (DWR=25dB). . . . . . . . . Figure 5.1 81 Illustration of STDM. Codebooks U0 and U1 are represented by solid (or ‘0’) and dashed (or ‘1’) lines, respectively. . . . . . . . . . . . Figure 5.2 92 The block diagram of the proposed gradient based STDM watermark embedding scheme. . . . . . . . . . . . . . . . . . . . . . . . . 95 Figure 5.3 The 2-D quantization lattice used in the 2-D STDM method. . . . . 96 Figure 5.4 (a) Gradient field of image Lena, at wavelet scale 4. (b) Scrambled gradient field. Figure 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 97 (a) The proposed algorithm used to scramble the vector x, using the seed S. (b) The proposed descrambling algorithm. . . . . . . . . . Figure 5.6 98 The block diagram of the proposed gradient based STDM watermark extraction scheme. . . . . . . . . . . . . . . . . . . . . . . . . Figure 5.7 99 The watermarks bit errors at wavelet scales j = 1 (left), j = 2 (middle) and j = 3 (right) of an image distorted by H.264/AVC compression/decompression, using the quantization parameters: (a) QP=10, (b) QP=20 and (c) QP=30. . . . . . . . . . . . . . . . . 100 Figure 5.8 An example of an average PSNR-BER quality assessment curve. The dots represent the quality curves of each watermarked frame. The black curve is obtained by averaging the quality curves of the watermarked video frames. . . . . . . . . . . . . . . . . . . . . 102 Figure 5.9 First frames of the original (top row) and the watermarked (bottom row) test videos “Suzie”, ‘Carphone”, ‘Foreman” and “Mobile”. The PSNR of the watermarked frames is fixed at P SN Rw = 46.80 dB. 104 Figure 5.10 The PSNRc -BER quality assessment (QA) curves obtained for the “Foreman” video sequence at gradient scales j = 1, 2, 3, using different values of the quantization step size ∆j . P SN Rw and P SN Rc denote the PSNRs of the watermarked video before and after H.264/AVC compression/decompresion, respectively. . . . . . . . . . . . . . 106 xvi Figure 5.11 The PSNRc -BER quality assessment (QA) curves obtained for the “Foreman” video sequence by embedding the watermark bits in blocks of size [3, 2], [6, 4] and [12, 8]. . . . . . . . . . . . . . . . . . . . 107 Figure 5.12 PSNRc -BER curves obtained for the video sequences “Suzie”, “Carphone”, “Foreman” and “Mobile” at scales j = 1, 2, 3. The average PSNRc -BER curves are marked by dashed lines. . . . . . . . . . . 109 Figure 5.13 SSIMc -BER curves of the video sequence “Foreman” at scales j = 1, 2, 3. The average SSIMc -BER curve is marked by the dashed line. 110 Figure 5.14 The Flicker metric vs. the GOP length (NGOP ) curves, obtained for the watermarked test videos “Suzie”, “Carphone”, “Foreman” and “Mobile”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 xvii Glossary AAQIM Absolute Angle Quantization Index Modulation AQIM Angle Quantization Index Modulation BER Bit Error Rate CMG Color Morphological Gradient CWT Continuous Wavelet Transform DCT Discrete Cosine Transform DM Dither Modulation DOG Derivative of Gaussian DWR Document-to-Watermark Ratio DWT Discrete Wavelet Transform FD Finite Difference FOM Figure of Merit FPGA Fuzzy Peer Group Averaging FR Full Reference GDWM Gradient Direction Watermarking GOP Group of Pictures xviii HVS Human Visual System JM Joint Model (H.264/AVC reference software) JND Just Noticeable Difference LOG Laplacian of Gaussian MAE Mean Absolute Error MMAE Minimum Mean Absolute Error MOS Mean Opinion Score MPEG Moving Picture Experts Group MS Merit Score MSDT Multi-scale Derivative Transform MSE Mean Square Error MVD Minimum Vector Dispersion NR No Reference PSNR Peak Signal-to-Noise Ratio QA Quality Assessment QF Quality Factor QIM Quantization Index Modulation QP Quantization Parameter RCMG Robust Color Morphological Gradient RR Reduced Reference SIFT Scale-Invariant Feature Transform SS Spread Spectrum xix SSIM Structural Similarity STDM Spread Transform Dither Modulation VLQIM Vector Logarithmic Quantization Index Modulation VOS Vector Order Statistics WD Wavelet-Derivative WDT Wavelet-Derivative Transform WNR Watermark-to-Noise Ratio xx Acknowledgments I would like to give my sincere gratitude to my supervisor Prof. Rabab K. Ward for her great help and support. She is not only a kind person, but also a great teacher, from whom I have learnt a lot during my PhD. I also want to thank Dr. Jane Wang for her great help with the technical details of a large portion of my work. My deepest gratitude goes to my mother, Parvaneh, my father, Khosro, and my sister, Sonia, for their unflagging love, support and patience throughout my life; this dissertation as well as all my accomplishments so far would be impossible without them. Last but not least, I am thankful to all my labmates in the Signal and Image Processing lab, Farhad Faradji, Kaan Ersahin, Ilker Hacihaliloglu, Mohsen Amiri, Mani Malek Esmaeili, Lino Coria Mendoza, Ehsan Vahedi, Tanaya Guha, Simon Fauvel, Xinyi Yong, Angshul Majumdar, Hossein Bashashati, Hamid Palangi, Hamidreza Tohidi, and all my friends in the Digital Multimedia Lab. Thank you all for providing such a friendly environment. xxi To my family xxii Chapter 1 Introduction Obtaining the first and higher-order image derivatives is often a necessary task in a large number of applications. These applications can be divided into two main categories. In the first category, the image derivatives are generally calculated and used as features in high-level image processing tasks. For example, edges of an image are usually obtained by applying the first or second-order derivative operators to the image [84]. The first-order derivative (i.e. image gradient) can also be used in the structure tensor matrix to find corners and junctions of an image [3, 6, 44, 53, 55, 86]. Image segmentation methods that are based on active contours employ the gradient vectors as the external force in the force-balance equation [52, 59, 61, 113]. Other applications where the gradient vectors (and more specifically, gradient directions) are used as features include: image matching (as in Scale-Invariant Feature Transform (SIFT )) [62], skeletonisation [42], image registration [66] and quality assessment [30, 76]. A common observation among the applications in this first category is that no modification to the image is made. One of the challenges in this type of applications is to obtain estimates of the direction of the gradient (and higher-order image derivatives) that are robust to noise. Since the image derivatives constitute the high frequency portion of the image, and since the noise is also mainly of the high frequency type, the presence of noise reduces the accuracy of gradient estimation. The second category of gradient applications are involved with cases where 1 the gradient vectors are modified, depending on the application type. The modified gradient vectors are then inverse-transformed to the image domain, resulting in changes in the original image. These applications include: image denoising (total variation and nonlinear anisotropic diffusion filtering methods) [25, 36, 43], contrast enhancement [35] and watermarking [46]. The main challenge in this type of applications is to find an accurate estimate of both the direction and magnitude of the gradient (and higher-order image derivatives) that are accurate and invertible (i.e. one can obtain a good estimate of the image from these estimates). We refer to this as image derivative transform. Obtaining an invertible image derivative estimation is even more challenging when the derivative vectors need to be estimated at multiple image scales. In this thesis, we propose different gradient and higher-order derivative estimation methods for the two types of applications mentioned above. In Chapter 2, we present a new noise robust method for estimating both the gradient direction and magnitude for color images. The proposed method can be used in the first category of applications mentioned above. In Chapter 3, the Multi-scale Derivative Transform (MSDT ), as a general framework for invertible first-order and highorder derivative estimation, is presented. Besides being invertible, MSDT is shown to give an accurate estimates of both the gradient direction and magnitude. The proposed transform is applied to image watermarking and quality monitoring applications in Chapters 4 and 5, respectively. We propose embedding the watermark in the directions of the significant gradient vectors of the image. We show that the gradient direction forms a robust feature of an image since its value is not easily changed unless the image quality is severely degraded. This makes the inserted watermark both imperceptible and robust to attacks. In this chapter, we provide background information and literature review for the topics discussed in the next chapters. The concept of gradient estimation is discussed in Section 1.1. In Subsection 1.1.1, a literature review on noise robust gradient estimation methods is provided for grayscale and color images. The multiscale noninvertible and invertible gradient estimation methods are presented in subsections 1.1.2 and 1.1.3, respectively. Sections 1.2 and 1.3 discuss how the image gradient information can be utilized in image watermarking and quality monitoring applications, respectively. Finally, an overview of the thesis contributions is 2 presented in Section 1.4. 1.1 Image Gradient Estimation The gradient vector ∇f of a gray-scale continuous 2-dimensional image f (x, y) is defined as ∇f = ∂f ∂f ∂f ∂f i+ j=( , ) ∂x ∂y ∂x ∂y (1.1) where i and j denote the orthogonal unit vectors in the x and y directions, respectively. At each point (x, y), the angle of the gradient vector ∇f indicates the direction of the maximum change in f . The gradient magnitude determines the absolute value of this change. If the gradient magnitude is larger than a threshold T , point (x, y) is called an edge point. To calculate the gradient in discrete images, the continuous derivative operator is replaced with a discrete filter. Let gh [n] and gv [n] denote the horizontal and vertical gradient components of a discrete image X at pixel n = (n1 , n2 ). gh [n] and gv [n] are usually obtained from the inner product of X with a shifted version of a horizontal highpass filter ∆h and a vertical highpass filter ∆v. gh [n] = X, ∆hn and gv [n] = X, ∆v n (1.2) where ∆hn and ∆v n respectively represent the shifted versions of the horizontal and vertical gradient filters (i.e. ∆h and ∆v) to point n = (n1 , n2 ). One of the problems with the gradient filters is their susceptibility to image noise. In subsection 1.1.1, we deal with the problem of image gradient estimation in noisy images. 1.1.1 Gradient Estimation in Noisy Images Image gradient constitutes the high frequency portion of the image. Since the noise is mainly of the high frequency types, the presence of noise reduces the accuracy of the gradient estimation. It is desirable to make the gradient estimator robust to not only one type of noise, but to any kind or combination of noise. In noisy environments, prefiltering algorithms should be employed to reduce the effects of noise on the gradient vectors. Different methods that can be used as 3 the prefiltering step for the gradient estimation have been proposed. Examples of such methods are the bilateral filter [97], trilateral filter [40], chromatic filter [64], anisotropic diffusion filter [88], adaptive nearest neighbor filter [81] and peer group averaging filter [69]. Image prefiltering techniques reduce the noise, however they introduce some problems, including blurring, corrupting or removing the fine details and gradients of the image, generating new artifacts in the image and increasing the computational complexity of the gradient estimation process. To avoid these problems in gray-scale images, noise reducing filters are usually embedded in the gradient estimators as subfilters. These estimators perform differentiation in one coordinate direction and lowpass filtering in the orthogonal direction, simultaneously. The Sobel, Prewitt and the Derivative of Gaussian (DOG) are some of the gradient operators that use subfilters to reduce the effect of noise in grayscale images [84]. To estimate the image gradient in noisy color images, various types of gradient estimators have been proposed [38, 82, 89, 96]. From a general point of view, these methods can be divided into two major categories: Non-directional difference operators These methods define a distance that is based on a non-directional difference measure of the gradient magnitude. Typical representatives of the non-directional difference class are Vector Order Statistics (VOS) based gradient estimators and edge detectors [98, 99], such as the Minimum Vector Dispersion (MVD) edge detector. Directional difference operators These methods usually define some form of vector gradients to determine the direction of the gradient at each pixel, and its magnitude [34, 50, 89]. The first type of directional difference methods, called component-wise methods, are extensions of the grayscale gradient operators to color images. These operators calculate the gradient vectors for each color component and then combine them to get the final gradient vector. Examples of such combinations are the vector sum of the gradients of each color component [95] and combination of gradients of hue, saturation, and intensity computed in the HSI coordinates [23]. 4 The second type of directional difference methods, called vector-wise methods, are based on finding the magnitude and direction of the gradient vectors directly from the color vectors. These methods preserve the vector nature of color throughout the computation [91, 109]. The advantage of directional methods over nondirectional methods is their capability in finding the gradient direction as well as the gradient magnitude. However, most directional gradient estimators proposed in the literature are not as robust as the non-directional estimators. In Chapter 2, a new vector-wise directional difference color gradient estimator that yields an accurate and robust estimation of the gradient magnitude and direction is proposed. Computer simulation results show that the proposed method outperforms other proposed color gradient estimators and edge detectors. 1.1.2 Multiscale Gradient Estimation To obtain the gradient vectors directly from the image, different operators such as Finite Difference (FD ), Roberts, Sobel, Prewitt and the Derivative of Gaussian (DOG) have been proposed [84]. Since the abruptness of the edges varies in different parts of an image, multiscale versions of these operators should be used. However, except for the DOG operator, it is not clear what values the FD , Roberts, Prewitt and Sobel gradient masks can have at different scales. Thus, irrespective of the image scale, the gradient is traditionally obtained by applying the same mask (operator) to the image. However, this method has a high computational complexity. 1.1.3 Invertible Multiscale Gradient Estimation Multiscale gradient estimators obtain estimates of the gradient of different scaled versions of an image. They are frequently used in many applications, such as image matching, image segmentation and quality assessment, to extract the image features at different scales. In other applications, such as image watermarking and denoising, the multiscale gradient vectors of the image may have to be modified. However when the modified gradient vectors are inverse-transformed to the image domain, they result changes in the original image. Also, traditional multi-scale gra5 dient estimators usually result in estimates that are highly redundant. It is therefore not clear how these estimates could be inverted back to the image domain [105]. To deal with this problem, a wavelet-based image derivative estimation technique was proposed [65]. In this method1 , the horizontal and vertical wavelet subbands were used to estimate the horizontal and vertical derivatives of the different scaled versions of an image [105]. The wavelet transform has the advantage of being both computationally efficient and invertible. However, this type of derivative estimation has the following problem: the wavelet-based derivative vector is obtained as a function of only the horizontal and vertical wavelet coefficients (and not the diagonal ones). This may decrease the gradient estimation accuracy. In Chapter 3, we propose a derivative transform, called Multi-scale Derivative Transform (MSDT ), that obtains the first and higher order derivatives of an image for different scaled versions of the image. Unlike traditional wavelet-based image derivative estimators that use only the horizontal and vertical wavelet coefficients to obtain the derivatives, the proposed transform maps the diagonal as well as the horizontal and vertical wavelet coefficients to estimate the horizontal and vertical derivatives of the image. We show that MSDT has the potential to perform better than the traditional wavelet-based gradient estimation method in applications such as texture classification, edge detection, quality assessment and image watermarking. In the next section, we discuss image watermarking and the important role of image gradient estimation in imperceptible watermarking. 1.2 Application to Image Watermarking With the widespread penetration of the Internet and digital multimedia technologies, the interest in copyright protection of digital media has been rapidly increasing. Digital watermarking has emerged as a possible solution for intellectual property rights protection. In digital watermarking, additional information, called the watermark, is imperceptibly embedded into the original digital content. Watermarking methods make a trade-off between different requirements such 1 The traditional multiscale image derivative estimation method is discussed in more detail in Sections 3.2.1 and 3.2.2. 6 Figure 1.1: (Left) Image “Lena”. (Right) Just Noticeable Difference (JND) values obtained for each pixel of image Lena, based on the method proposed in [9]. as perceptual invisibility, robustness, security, data capacity and availability of side information [58]. For example, to increase the robustness of a watermark, the watermark strength needs to be increased, however this makes the watermark more visible. The invisibility requirement of watermarking limits the maximum number of watermark bits that can be embedded into a digital signal. 1.2.1 Watermarking in the Gradient Domain To achieve propoer fidelity-robustness tradeoff, Human Visual System (HVS) models could be employed in watermark embedding. By exploiting the HVS models, one can embed perceptually invisible watermarks that have higher energy than that obtained when this energy is distributed evenly over the image. Towards this aim, a Just Noticeable Difference (JND ) model, such as Watson’s JND [108], can be obtained for each transform-domain coefficient. Fig. 1.1 shows the JND values obtained for image Lena using the method proposed in [9]. The gray-value at each pixel indicates the histogram-equalized value of JND , scaled in the range [0, 1]. It is clear from the JND image that the HVS is less sensitive to distortions around the significant edges (or the significant gradient vectors) than to those in smooth areas. Therefore, the edges and object borders are good candidates for embedding a watermark. Also, modern lossy compression algorithms leave these edges relatively untouched, to maintain high image or video quality. These properties make the significant gradient vectors and edges ideal for watermark embedding. The most important issue with gradient watermarking is to 7 reconstruct the watermarked image from the watermarked gradient vectors (i.e. inverting the gradients back to the image domain). In Chapter 4, we propose an image watermarking scheme, called Gradient Direction Watermarking (GDWM ), that embeds the watermark in the direction of the significant gradient vectors obtained at multiple image scales. To deal with the gradient invertibility issue, we use a gradient estimation method that is very similar to our proposed Multi-scale Derivative Transform (MSDT ). Experimental results show that the proposed image watermarking method outperforms the state-of-theart image watermarking methods in terms of both watermark imperceptibility and robustness. In the next section, we show the application of gradient estimation in image/video quality monitoring by means of watermarking. 1.3 Application to Image/Video Quality Monitoring The automatic monitoring of the quality of media is important as it allows content providers to optimize their streaming services and bill the end users according to the received quality of service. 1.3.1 Quality Assessment Overview Image quality metrics are divided into two broad categories: subjective and objective methods. Subjective methods Main goal of subjective image quality assessment methods is to obtain the average user (viewer) opinion on the quality of image/video processed by the system. The obvious solution is to average the scores given to each image by different individuals, to form a Mean Opinion Score (MOS ) [110]. MOS would provide a numerical indication of the perceived quality of the received media after compression and/or transmission. Objective methods Objective image evaluation techniques are mathematical models that approximate the results of the subjective quality assessment. To obtain good estimates of the Mean Opinion Score (MOS), each objective metric should obey the following conditions [106]: 8 • Estimation Accuracy: the metric should correlate well with MOS. • Estimation Consistency: the metric should perform well regardless of the input data type. Based on the availability of the original signal, objective methods can be classified as: • Full Reference (FR ): Full-reference metrics refer to the case where the original signal is fully available. They measure the numerical differences between the original and the degraded signals. Most existing metrics are FR techniques. Over the last few years, several objective image quality metrics have been commonly used, such as Mean Square Error (MSE ), Peak Signalto-Noise Ratio (PSNR ), Watson’s JND metric [108], SSIM [107], VIF [92] and VSNR [26]. These metrics are unsuitable for on-line applications, since they require the availability of both the received and the original signals. • Reduced Reference (RR ): Reduced reference refers to those cases in which only a compressed or a partial version of the original signal exists. RR meth- ods extract some key information (or features) from the original and the degraded signals and assess the quality based on the extracted information. • No Reference (NR): No-reference (NR) or blind metrics assume that an original version (or reference) of a signal is not available. A no-reference image quality metric must therefore have the ability to estimate the visual quality using only the information found in the received image. NR techniques are currently only viable in situations for which prior assumptions about the signal or the types of distortions can be made. These metrics often attempt to quantify the effects of various distortion artifacts. In particular, for block-based video compression schemes such as the standards (e.g. MPEG -1/2/4, MPEG and ITU H.263), the main forms of distortions include the blocking effect [24], blurring and ringing [75]. 9 1.3.2 Available Approaches for Quality Monitoring in Multimedia Networks The design requirements of an effective quality monitoring method in digital multimedia networks are: • No reference to original data: For real-time image quality evaluation over digital multimedia networks, only the no-reference or reduced-reference metrics can be used. This is because the absence of the reference image at the receiver side makes it impossible to directly compare the received image with the original one. • Low computational complexity: To allow widespread implementation even in handheld devices, the algorithm should be of low complexity. • Local and global distortion measurement: The monitoring method should allow the localization of the distortions to reflect the perceptual errors that occur in only portions of the signal, e.g. from packet erasure. The method should also allow the computation of a metric that “measures” the extent of the global distortions, e.g. due to transcoding or other channel noise. • Quality estimation in a wide range of distortions: The quality monitoring method should be able to estimate the quality for a wide range of distortions. This makes the method flexible to different channel conditions. Two different methodologies are used in monitoring the quality of the service: • Passive quality monitoring: This approach calculates the network losses using certain measures. The obtained measures of the losses are then used to generate estimates of the image quality using available loss-distortion models. Most of the work on the packet loss-distortion modeling, such as [60, 63, 85], focus on some of the characteristics of the network or the application. In these studies, the distortion in the decoded frame sequence is modeled as a function of the packet loss rate. Correlating the packet loss with the subjective quality may be a crude approximation, as not all the packets contribute equally to the perceived signal quality. 10 • Active quality monitoring: In this approach, some information, such as a watermark, is embedded into the host image before its transmission and the image quality is estimated at the receiver side by extracting the watermark from the received image and comparing it with the original watermark. As the embedded watermark and the original image are subjected to the same distortions introduced by the transmission channel, the amount of the quality degradation in the embedded watermark is used to estimate the distortions introduced in the image, accordingly. 1.3.3 Quality Monitoring Using Gradient-based Data Hiding As discussed in Subsection 1.2.1, embedding the watermark in the significant gradient vectors of the image renders the watermark both imperceptible and robust to attacks. The robustness of the watermark to large distortions is a mandatory requirement for copyright protection task. To monitor the image/video quality however the watermark should accurately estimate both small and large distortions. To efficiently estimate the image quality for a wide range of distortions, the watermarked image should contain fragile, semi-fragile and robust watermark bits together in one image. The fragile, semi-fragile and robust watermarks are used to estimate mild, moderate and severe distortions, respectively. This is because, robust watermark bits are more sensitive to large distortions, while fragile watermark bits are more sensitive to small distortions. In Chapter 5, we present a reduced-reference watermarking-based quality assessment method for H.264/AVC compressed videos. To estimate the quality for a wide range of compression ratios, the watermark is embedded at multiple image gradient scales. As we have shown in [71], the watermark bits that are embedded at large wavelet scales are more robust to channel distortions than those embedded at small scales. This means that small distortions change the watermark bits embedded at small wavelet scales more than those embedded at large scales. For very small distortions, the watermark bits at large wavelet scales may not be affected. The simulation results show that the proposed method can accurately estimate the quality of a video and its frames in terms of the Peak Signal-to-Noise Ratio (PSNR ) and the Structural Similarity (SSIM ) quality measures. 11 1.4 Thesis Contributions The novel contributions of the thesis are as follows: 1. A novel robust method for estimating the gradient magnitude and direction in color images: This method that is presented in Chapter 2 robustly estimates the gradient in the x and y directions. The robustness against noise is achieved by prefiltering and postfiltering of the gradient in each direction. To reduce edge blurring effects introduced by these filters, the gradient in a certain direction is obtained by applying the prefilter and postfilter in the perpendicular direction. Computer simulation results show that the proposed method outperforms other recently proposed color gradient estimators and edge detectors. It is computationally more efficient than the state-of-the-art gradient estimators and is able to accurately estimate the gradient direction as well as the gradient magnitude. 2. A new invertible transform for estimating the first and higher-order derivatives of images at multiple scales: This transform that is called Multi-scale Derivative Transform (MSDT ) is presented in Chapter 3. MSDT estimates the first and higher-order derivatives of images at multiple scales is proposed. To calculate the first and higher-order image derivatives, MSDT uses the detail wavelet coefficients of the image. Unlike traditional waveletbased image derivative estimators that use only the horizontal and vertical wavelet coefficients, the proposed transform maps the diagonal as well as the horizontal and vertical wavelet coefficients to the horizontal and vertical derivatives of the image. The inverse transform is designed such that any change in the image derivative domain results in the minimum possible change in the wavelet coefficients. The potential applications of the proposed transform in texture feature extraction, edge detection, image quality assessment and image watermarking are also discussed in this chapter. 3. A novel robust image watermarking scheme for copyright protection: We propose a robust quantization-based image watermarking scheme, called the Gradient Direction Watermarking (GDWM), in Chapter 4. GDWM is based on the uniform quantization of the direction of gradient vectors. In 12 GDWM , the watermark bits are embedded by quantizing the angles of signif- icant gradient vectors at multiple wavelet scales. The proposed scheme has the following advantages: 1) increased invisibility of the embedded watermark because the watermark is embedded in significant gradient vectors, 2) robustness to amplitude scaling attacks because the watermark is embedded in the angles of the gradient vectors, and 3) increased watermarking capacity as the scheme uses multiple-scale embedding. Experimental results show that the proposed GDWM outperforms other watermarking methods and is robust to a wide range of attacks, e.g. Gaussian filtering, amplitude scaling, median filtering, sharpening, JPEG compression, Gaussian noise, salt & pepper noise and scaling. 4. A new semi-blind method for monitoring the quality of compressed/decompressed videos: This method that is presented in Chapter 5 embeds pseudo-random binary watermarks in the I-frames of the original undistorted video. To assess the quality of a segment of a distorted watermarked video, the watermark bits are extracted and the quality is estimated based on the similarity between the embedded and the extracted watermarks. To enable quality assessment for a large range of distortions, the derivative vectors of different scaled versions of each I-frame of the original video are obtained, using the proposed Multi-scale Derivative Transform (MSDT ). The proposed method was tested on different video sequences which were distorted by compression/decompression using H.264/AVC with different quality factors. The simulation results show that the proposed method can accurately estimate the quality of a video and its frames in terms of the Peak Signal-toNoise Ratio (PSNR ) and the Structural Similarity (SSIM ) quality measures. 13 Chapter 2 Noise Robust Gradient Vector Estimation in Color Images 2.1 Introduction Gradient estimation is an essential task in image processing and scene analysis. The gradient in greyscale images can be thought of as the change in the gray-level, but in color images, the “color gradient” is defined based on the change in the color vector. In other words, besides the intensity of a color pixel, its hue and saturation must be taken into account. In noisy environments, since the image gradient constitutes the high frequency portion of the image and the noise is mainly of the high frequency type, the presence of noise reduces the accuracy of the gradient estimation. Most linear operators tend to blur and distort the edges and are more susceptible to noise. To accurately estimate the gradient of an image, nonlinear methods are shown to be more effective than linear methods. However, most of the nonlinear gradient estimators proposed in the literature use a nondirectional difference operator and are only able to estimate the gradient magnitude (and not the gradient direction). In this chapter, a new nonlinear color gradient estimator that yields an accurate and robust estimates of both the gradient magnitude and the gradient direction, is proposed. This approach uses non-directional difference operators to obtain a 14 Figure 2.1: The proposed gradient estimation approach applied in a window of size 5 × 5. directional difference operator1 . For each image window, to obtain an estimate of the gradient in the horizontal and vertical directions, the nondirectional operator is implemented on each row and column. To further reduce the effect of noise, the results are also lowpass filtered in an appropriate manner. Highpass (H), lowpass (L) and aggregation (A) operators are combined so that an accurate estimation of the gradient vector could be obtained at each pixel. This chapter is organized as follows. Section 2.2 presents the mathematical formulation of the proposed class of gradient estimators. Different highpass, lowpass and aggregation operators are also proposed. In Section 2.3, the quantitative and qualitative evaluation of the method are presented. Finally, conclusions are given in Section 2.4. 2.2 Robust Color Gradient Estimation 2.2.1 Some Definitions and Notations Notation: In this chapter, bold lower case letters x denote vectors. The vector or matrix elements are denoted by lower case letters with an index, e.g. xi or xij . Furthermore, the entry (i, j) of a matrix X is also shown as (X)i,j . The color vector at pixel (i, j), in the discrete color image, is represented as f i,j . The letters “L”, “H” and “A” stand for “lowpass”, “highpass” and “averaging” filters. 1 The definitions of directional and non-directional difference operators are given in Section 1.1.1 15 2.2.2 The Proposed Class of Color Gradient Estimators In this section, a new class of color gradient operators is proposed. To obtain an accurate estimation of the gradient, conventional gradient estimators often use a highpass filter in one direction to generate the gradient in that direction and then a lowpass filter in the perpendicular direction to reduce the effect of noise [84]. For example, the Prewitt operator applies [−1, 0, 1] as the highpass filter (or the differentiation operator) and [1, 1, 1] as the lowpass filter (or the averaging operator). These filters are shown to be effective for images corrupted by short-tailed noise (e.g. Gaussian noise). Estimators based on minimizing the Mean Square Error (MSE ), such as the sample mean, have the disadvantage of heavily weighting the outliers. When the image is corrupted with a long-tailed noise (e.g. impulse noise), the Mean Absolute Error (MAE) or other more complex criteria should be minimized. Estimators for this type of noise have a nonlinear nature, such as the median filter in the case of Minimum Mean Absolute Error (MMAE ). Such nonlinear filters have been shown to be more efficient in obtaining an accurate estimate of the gradient in noisy images. A problem with nonlinear filters is the order in which the filters are applied. In the case of linear shift invariant filters, changing this order does not alter the result. For example, to obtain the horizontal gradient, one can first use a linear shift invariant highpass filter on each row followed by a linear shift invariant lowpass filter on each column of the image. Alternatively, the lowpass filter can be applied to each column first and then the highpass filter to each row. When nonlinear filters are used, applying the filters in different orders usually changes the result. To solve this problem, our proposed class of gradient estimators aggregate all the results obtained from applying the lowpass and highpass filters in different orders, as explained below and as shown in Fig. 2.1. Let the color vectors in a window W of size 5 × 5 be denoted by f i,j , i = 1, 2, . . . , 5, j = 1, 2, . . . , 5. This window size is a good compromise between the gradient localization and noise reduction. In order to estimate the gradient in the horizontal direction x at the center pixel, denoted by g x , first each row i is highpass 16 filtered by the operator H1, to obtain the value of f H i for each row i: fH i = H1(f i,1 , f i,2 , . . . , f i,5 ), i = 1, 2, . . . , 5 (2.1) In the 1-dimensional case, H1 may be defined as the norm of the difference vector. To reduce the effect of noise in the gradient estimation, the resulting vector f H i ,i = 1, 2, . . . , 5, is lowpass filtered using an operator L2, H H f HL = L2(f H x 1 , f2 , . . . , f5 ) (2.2) By changing the order of implementing the above two filters, and even in the more general case, changing the operators (refer to the scheme in the left hand side of Fig. 2.1), the following equations are obtained fL j = L1(f 1,j , f 2,j , . . . , f 5,j ), j = 1, 2, . . . , 5 L L = H2(f L f LH 1 , f2 , . . . , f5 ) x (2.3) (2.4) At the end, the horizontal gradient g x is obtained using an aggregation operator A (e.g. arithmetic mean, weighted mean, geometric mean,...) on both the gradient LH estimates obtained from each combination (f HL x , f x ). LH g x = A(f HL x , fx ) (2.5) Eqs. (2.1-2.5) can be summarized as g x = A(L2(H1(f i,j )), H2(L1(f i,j ))) (2.6) The equations for estimating gy , the gradient in direction y, are similar to those obtained above for the horizontal gradient g x and illustrated by the scheme on the right hand side of Fig. 2.1. The operations to obtain g y can similarly be summarized as g y = A(H2(L1(f i,j )), L2(H1(f i,j ))) (2.7) Based on how the operators H1, H2, L1, L2 and A, are defined the resulting gradient may be the Jacobian matrix G, the gradient vector g or the gradient mag17 nitude g (for edge detection). When g x and g y are vectors, the gradient matrix G is obtained as (g x )i G = (g x , g y ) = (g x )j (g x )k (g y )i (g y )j (g y )k (2.8) If g x and g y are scalars, then the gradient G can be simplified to a 2D gradient vector g as g = gx + jgy (2.9) Note that Eqs. (2.6) and (2.7) should be applied to each window W of the image. 2.2.3 The Proposed Highpass, Lowpass and Aggregation Operators To obtain the values of gx and g y (Eqs. (2.6) and (2.7)), the operators H1, H2, L1, L2 and A should be determined. The choice of these operators depends on the type of the noise. Operators H1 and H2 should be difference estimators that are insensitive to noise. Such operators should accurately estimate the gradient magnitude of a set of noisy color vectors. The operators Minimum Vector Dispersion (MVD ) [98, 99] and the Robust Color Morphological Gradient (RCMG ) [38], are good candidates for this purpose. in color edge detection, and MVD RCMG is presently a popular robust vector method represents the state of the art in color gradient estimation and edge detection. The choice of the lowpass filters L1 and L2 (such as the Mean or the Median) should be based on the type of the noise present in the image. The proposed robust color gradient estimators are MVD-Median-Max, MVDMedian-Mean, RCMG-Median-Max and RCMG-Median-Mean, i.e. MVD or RCMG will be used for H1 and H2 , the Median will be used for L1 and L2 and the Max or Mean will be the aggregate operator A. Each estimator uses a different set of highpass, lowpass and aggregation operators to provide the robustness to noise. MVD-Median-Max and MVD-Median-Mean Estimators Both these estimators, have the same H and L operators. They differ only in the choice of A (the aggregation operator). In each case, the operators are defined as 18 follows: H1, H2: For the set of color vectors, let a row (or a column) of a window W of size 5 × 5 be denoted by F = {f j |j = 1, 2, . . . , 5} and let the vectors f (1) , f (2) , . . . , f (5) denote the R-ordered [8] sequence of the vectors f j . Based on the MVD operator [98], H1 is defined as f H = H1(f 1 , f 2 , . . . , f 5 ) = eˆj (2.10) where ˆj is obtained from ˆj = arg min ej (2.11) j where ej is given by l ej = f (6−j) − i=1 f (i) , j = 1, 2, 3 l (2.12) where parameter l denotes the number of the color vectors that are averaged for finding the α-trimmed mean. The reason behind choosing this operator as a highpass filter is the robustness of the MVD operator to both Gaussian and the impulse noise. The operator H2 is defined in a similar way as the operator H1, but with a norm, i.e. L L L f LH = H2(f L 1 , . . . , f 5 ) = H1(f 1 , . . . , f 5 ) (2.13) L1, L2: Operators L1 and L2 should be lowpass operators that are insensitive to both Gaussian and impulse noise. Since H1 and H2 are also capable of reducing both types of noise, the lowpass operators should be designed so that they do not introduce edge blurring, as a result of noise removal. Thus, operators L1 and L2 are chosen to be the median filter but with L1 being slightly different from L2, as follows: H H The lowpass operator L2 of the vectors f H 1 , f 2 , . . . , f 5 is defined as H H H f HL = L2(f H 1 , . . . , f 5 ) = median(f 1 , . . . , f 5 ) 19 (2.14) where the function median is used as the element-wise median. The operator L1 is defined in a similar way as the operator L2, but without taking the norm, as f L = L1(f 1 , . . . , f 5 ) = median(f 1 , . . . , f 5 ) (2.15) A: The aggregation operator A in the MVD-Median-Max estimator is chosen to be the signed maximum function defined by L g x = A(f HL , f LH , {f L 1 , . . . , f 5 }) f L +f L f L1 +f L2 = max(f HL , f LH ) · sign( 4 2 5 − ) 2 H g y = A(f HL , f LH , {f H 1 , . . . , f 5 }) f H +f H = max(f HL , f LH ) · sign( 4 2 5 − H fH 1 +f 2 2 (2.16) ) However, in the MVD-Median-Mean estimator, the signed mean function is used, which is given by L g x = A(f HL , f LH , {f L 1 , . . . , f 5 }) L HL LH f 4 +f L5 f L1 +f L2 = ( f +f ) · sign( − ) 2 2 2 H g y = A(f HL , f LH , {f H 1 , . . . , f 5 }) H HL LH f 4 +f H 5 = ( f +f ) · sign( − 2 2 H fH 1 +f 2 2 (2.17) ) RCMG-Median-Max and RCMG-Median-Mean Estimators Similar to the estimators proposed above, both RCMG-Median-Max and RCMGMedian-Mean estimators share the same H and L operators. They differ in the aggregation function. In each case, the operators are defined as follows: H1, H2: In our proposed RCMG-Median-Mean estimator, operators H1 and H2 are chosen to be the RCMG gradient vector estimator and the RCMG gradient magnitude estimator, respectively, as follows: For the set of color vectors, let a row (or a column) of a window W of 20 size 5 × 5 be denoted by F = {f j |j = 1, 2, . . . , 5}. H1 is the RCMG operator [38], defined as f H = H1(f 1 , f 2 , . . . , f 5 ) = eˆi,ˆj (2.18) where ˆi and ˆj are obtained from (ˆi, ˆj) = arg max ei,j (2.19) i,j The vector difference ei,j is given by ei,j = f i − f j , i, j ∈ {F − Rs } (2.20) where Rs denotes the set of s pairs of vectors, removed from the set F [38]. The operator H2 is defined in a similar way as the operator H1, but with a norm, as L L L f LH = H2(f L 1 , . . . , f 5 ) = H1(f 1 , . . . , f 5 ) 2 (2.21) L1, L2: The lowpass operators L1 and L2 are defined in the same way as Eqs. (2.14) and (2.15). A: The aggregation function A in the RCMG-Median-Max and RCMG-MedianMean estimators are defined as in Eqs. (2.16) and (2.17), respectively. Note that for a window W of size 5×5, RCMG-Median-Max and RCMG-MedianMean can be used only with s = 1. Among the gradient estimators proposed above, RCMG-Median-Mean is expected to be superior. The main reasons for this claim are: • The RCMG operator has been shown to be more powerful than MVD for gra- dient estimation, when the image is corrupted by salt & pepper noise [38]. In the case of Gaussian noise, RCMG has almost similar performance as the MVD , for high PSNR values. However, for low PSNRs, MVD is shown to be superior. 21 • Although the Mean filter as an aggregation operator may slightly blur the image, the Max filter is less effective in reducing the effect of noise. It can be shown that Mean significantly reduces the artifacts produced by noise. 2.2.4 Computational Considerations As stated in [38], the computational complexity of the Color Morphological Gradient (CMG) and MVD , applied on n vectors, is O(n2 ) due to the n(n − 1)/2 vector distances that have to be calculated. This is also the case in As in the (s + RCMG , 1)O(n2 ), like the RCMG MVD method. we can show that its computational complexity is obtained as where s indicates the number of pairs of removed vectors. Unmethod, where sorting is applied on every pixel of the window, the RCMG-Median-Mean estimator, only sorts the vectors in each row or column. Therefore, the computational complexity of our RCMG-Median-Mean, RCMGMedian-Max, MVD-Median-Mean and MVD-Median-Max can be shown to be √ √ O(( n)2 ).O( n) i.e. O(n3/2 ), which is less than the computational complexity of RCMG and MVD. Using an efficient sliding algorithm, the complexity of RCMG and MVD methods reduces to O(n3/2 ), while the complexity of our method reduces to O(n). 2.3 Experimental Results We conducted a set of experiments to compare the performance of each proposed estimator, both qualitatively and quantitatively. The proposed color gradient estimators were applied to various synthetic and natural color images with different types of noise. The images used in the experiments were RGB color images. Uncorrelated Gaussian noise with varying standard deviation, uncorrelated salt and pepper impulse noise with different probabilities and the mixture of these two types of noise were used. The proposed scheme can be used both as a gradient estimator and as an edge detector (using the gradient magnitude at each pixel). As color edge detectors, the proposed MVD-Median-Mean, MVD-Median-Max, RCMGMedian-Mean and RCMG-Median-Max estimators are compared with each other and with two other color edge detectors: the RCMG MVD edge detector [98] and the [38]. As mentioned above, MVD is presently a popular robust vector method 22 Table 2.1: FOM values obtained from the edge maps of the noisy artificial test image for 5 different realizations of noise. The image is corrupted by a mixture of uncorrelated Gaussian noise with σ = 20 and uncorrelated salt and pepper noise with probability p = 0.2. Method MVD RCMG FPGA-Canny MVD-Median-Max MVD-Median-Mean RCMG-Median-Max RCMG-Median-Mean in color edge detection, and FOM1 0.6592 0.9404 0.8559 0.9654 0.9604 0.9818 0.9937 RCMG FOM2 0.6628 0.9457 0.8760 0.9684 0.9713 0.9850 0.9949 FOM3 0.6394 0.9337 0.8643 0.9654 0.9621 0.9741 0.9926 FOM4 0.6462 0.9350 0.8270 0.9778 0.9557 0.9687 0.9917 FOM5 0.6511 0.9431 0.8505 0.9666 0.9647 0.9840 0.9946 represents the state of the art in color edge de- tection. We also compare our proposed estmators with the color implementation of the Canny gradient estimator [50] after prefiltering with the Fuzzy Peer Group Averaging (FPGA ) filter [69]. The FPGA filter is chosen because it is the state-of- the-art in color image denoising. The parameters used in all the experiments are k = 8 and l = 12 for the MVD , s = 8 for the RCMG , k = 1 and l = 2 for the MVD-Median-Mean and MVD-Median-Max and s = 1 for the RCMG-MedianMean and RCMG-Median-Max. For the FPGA-Canny gradient estimator, we used the same parameter setting proposed by the authors [69]. Since the parameters of the FPGA filter adaptively change by the amount of image noise, in our simulations we assumed that the type and amount of noise are known. All the methods were implemented using 5 × 5 windows. Since prefiltering with FPGA using 5 × 5 win- dows removes the fine details of the image, FPGA-Canny was implemented with 3 × 3 filter windows. To quantitatively evaluate the noise robustness of each method, the edge maps of the images, before and after adding noise, are compared using Pratt’s Figure of Merit (FOM ) [84] which is defined as FOM = 1 max{NI , NA } N k=1 1 1 + α · d2k (2.22) where NI is the number of actual edge pixels and NA is the number of detected edge pixels. The distance dk is the distance from the kth actual edge to the corresponding detected edge. The scaling parameter α is usually set to 19 . In this work, 23 the edge maps are obtained by adjusting the edge threshold until the maximum FOM value is found. 2.3.1 Artificial Color Images A 128×128 artificial test image is created to compare the methods. Uncorrelated Gaussian noise with zero mean and standard deviation σ = 20, and uncorrelated salt and pepper impulses with probability p = 0.2 are added to the image. FOM values of each edge detector, obtained for 5 different realizations of the noise, are given in Table 2.1. The noisy image and the edge maps obtained from each edge detector for one of the noise realizations, are shown in Fig. 2.2(b)-(h). FOM value for MVD method is 0.5626, which shows that MVD leads to a very noisy edge map. It is also obvious that the edge points obtained from RCMG and the proposed edge detectors. The RCMG MVD are thicker than both method has an FOM=0.9457 and results in a more noise reduced edge map. Furthermore, it has better edge localization than MVD , due to its thinner edges. It can be clearly seen that the main problem with the RCMG method arises when the noise corrupts the edges. Another drawback of RCMG is in detecting the corners. Since the RCMG is implemented using 5 × 5 windows, it assumes that most of the pixels in the neighborhood of a corner are noise and should be removed. The FPGA-Canny method solves the corner problem of the RCMG , because it is implemented in 3 × 3 windows. How- ever, its low robustness to salt and pepper noise, results in FOM=0.8760. Note that when an impulse remains after prefiltering the image, the edge detector finds all its neighboring pixels as edge points. This is shown as “small squares” in Fig. 2.2(d). This is why a prefiltering method used in edge detection should be very robust to salt and pepper noise. The problem of discriminating the edge from noise is better resolved by our proposed class of edge detectors. The MVD-Median-Max, MVD-Median-Mean, RCMG-Median-Max and RCMG-Median-Mean with FOM= 0.9684, 0.9713, 0.9850 and 0.9949, respectively, are shown to be very robust to noise. Their edge localization is better than that of the RCMG and MVD methods. As previously mentioned, in each filtering window, the highpass and lowpass operators of the proposed edge detectors are all implemented row-wise and then column-wise. Such an implementa- 24 (a) (b) (c) (d) (e) (f) (g) (h) Figure 2.2: Behavior of the color edge detectors for the artificial test image corrupted by a mixture of uncorrelated Gaussian noise with σ = 20 and uncorrelated salt and pepper noise with probability p = 0.2. (a) Noisy image, (b) MVD result (FOM=0.6628), (c) RCMG result (FOM=0.9457), (d) FPGA-Canny result (FOM=0.8760), (e) Proposed MVD-Median-Max result (FOM=0.9684), (f) Proposed MVD-Median-Mean result (FOM=0.9713), (g) Proposed RCMGMedian-Max result (FOM=0.9850), (h) Proposed RCMG-Median-Mean result (FOM=0.9949) . 25 White Gaussian Noise 1 0.8 FOM 0.6 0.4 MVD−Median−Mean MVD−Median−Max RCMG−Median−Mean RCMG−Median−Max 0.2 0 0 5 10 15 20 25 SNR (a) White Salt and Pepper Noise 1 0.8 FOM 0.6 0.4 MVD−Median−Mean MVD−Median−Max RCMG−Median−Mean RCMG−Median−Max 0.2 0 0 5 10 15 20 25 SNR (b) Figure 2.3: The average FOM-SNR curves of the proposed edge detectors, for the artificial test image corrupted by 10 different realizations of the uncorrelated Gaussian noise (a) and uncorrelated salt and pepper noise (b). The curves are given for the proposed MVD-Median-Mean, MVD-Median-Max, RCMGMedian-Mean and RCMG-Median-Max edge detectors. tion blurs the image less than the case when all the pixels in a window are processed together. This leads to better corner detection than the RCMG. From Fig. 2.2(e)-(h), among the proposed edge detectors, RCMG-Median-Mean is shown to be superior, due to its better quantitative and qualitative performance. In order to compare the performance of the proposed edge detectors, the test image is corrupted by uncorrelated Gaussian noise and uncorrelated salt and pep- 26 White Gaussian Noise 1 0.8 FOM 0.6 0.4 MVD RCMG FPGA−Canny RCMG−Median−Mean 0.2 0 0 5 10 15 20 25 SNR (a) White Salt and Pepper Noise 1 0.8 FOM 0.6 0.4 MVD RCMG FPGA−Canny RCMG−Median−Mean 0.2 0 0 5 10 15 20 25 SNR (b) Figure 2.4: The average FOM-SNR curves of the proposed edge detectors, for the artificial test image corrupted by 10 different realizations of uncorrelated Gaussian noise (a) and uncorrelated salt and pepper noise (b). The curves are given for MVD, RCMG, FPGA-Canny and the proposed RCMG-Median-Mean edge detectors. 27 per noise. The results of averaging the FOM values over 10 different realizations of the Gaussian noise for each edge detector are shown in Fig. 2.3(a). Fig. 2.3(b) shows the averaged FOM results of the edge detectors for different probabilities of salt and pepper noise. Examination of the FOM plots for both types of noise, shows the superiority of the RCMG-Median-Mean to the other proposed edge detectors. RCMG-Median-Max, MVD-Median-Mean and MVD-Median-Max edge detectors exhibit a different performance for uncorrelated Gaussian versus salt and pepper noise. In Fig. 2.4, RCMG-Median-Mean as a prominent representative of the proposed class of edge detectors, is also compared with the MVD, RCMG and FPGA-Canny methods. The curves represent the average of the FOM values over 10 different realizations of the Gaussian and salt and pepper noise. It is shown that in the case of Gaussian noise, FPGA-Canny gives the best results, due to the “selective averaging” feature of the FPGA-Canny. For the case of salt and pepper noise, RCMG-Median-Mean has the best performance for various levels of the noise. 2.3.2 Natural Color Images In order to qualitatively evaluate the performance of the edge detectors on natural color images, 20 different color test images are evaluated. As a representative of those images, the Kodak color image “House” [1], shown in Fig. 2.5(a), is chosen, because the large amount of details in the image makes it a good choice for comparing different edge detectors. Fig. 2.5(b)-(d) represent the unthresholded gradient magnitude responses (of a noise-free image) of the MVD, RCMG and RCMGMedian-Mean edge detectors, respectively. Note that since the result of the FPGACanny for noise-free images is the same as that of Canny, FPGA-Canny is not considered for comparison in this example. The RCMG-Median-Mean operator produces stronger and more continuous edges (with better corner detection) than both the MVD and the RCMG. To illustrate this point, Fig. 2.6 and Fig. 2.7 present closeups of this image (the regions marked with dashed rectangles). Both figures show that the edges produced by the MVD are weak and discontinuous. The RCMG gives more continuous and stronger edges than the MVD. Unlike MVD and RCMG methods that miss the corners, MVD-Median-Max and MVD-Median-Mean are 28 (a) (b) (c) (d) Figure 2.5: Color edge results for the Kodak House image. (a) Original image, (b) MVD result, (c) RCMG result, (d) Proposed RCMG-Median-Mean result. more capable of detecting the corners. However, MVD-Median-Max and MVDMedian-Mean still lead to discontinuous edge maps. The problem is solved by RCMG-Median-Max and RCMG-Median-Mean estimators, which maintain corners and produce smooth edges. (a) (b) (c) (d) Figure 2.6: Color edge results for Kodak House image. (a) Closeup of the region marked by the dashed horizontal rectangle in Fig. 2.2(a), (b) MVD result, (c) RCMG result, (d) Proposed RCMG-Median-Mean result. 29 (a) (d) (b) (e) (c) (f) (g) Figure 2.7: Color edge results for Kodak House image. (a) Closeup of the region marked by the dashed vertical rectangle in Fig. 2.5(a), (b) MVD result, (c) RCMG result, (d) Proposed MVD-Median-Max result, (e) Proposed MVDMedian-Max result, (f) Proposed RCMG-Median-Max result, (g) Proposed RCMG-Median-Mean result. Further examination of Fig. 2.7 shows that when two edges are very close to each other, the MVD and the RCMG methods merge the edges and give a blurry version of the gradient magnitude. However, better discrimination between these edges is obtained by the proposed estimators, specially the RCMG-Median-Mean, due to their better edge localization. To qualitatively compare the proposed RCMG-Median-Mean method with FPGACanny, both methods are implemented on the image shown in Fig. 2.7(a). Fig. 2.8 depicts the implementation results, before adding noise and after adding salt and pepper noise with probability p = 0.3. Fig. 2.8(b) shows that using FPGA-Canny with 5 × 5 windows removes the fine details of the image, in comparison with Fig. 2.8(a) which uses windows of size 3 × 3. By comparing Fig. 2.8(b) and Fig. 2.8(d), we conclude that edge detection by RCMG-Median-Mean with 5 × 5 30 (a) (b) (c) (d) (e) Figure 2.8: Color edge results for Kodak House image. (a) FPGA-Canny result (with window size 3 × 3), before adding noise, (b) FPGA-Canny result (with window size 5 × 5), (c) FPGA-Canny result (with window size 3 × 3), after adding salt and pepper noise with p = 0.3 to the original image, (d) Proposed RCMG-Median-Mean result, before adding noise, (e) RCMG-Median-Mean result, after adding salt and pepper noise with p = 0.3 to the original image. windows is superior to the edge detection after prefiltering with 5 × 5 windows, in the sense of maintaining the fine edges. Fig. 2.8(c) and Fig. 2.8(e) show the result of implementing the FPGA-Canny (with 3×3 windows) and RCMG-Median-Mean (with 5 × 5 windows) on the noisy image. It is clear that when the image is highly distorted by noise, FPGA-Canny (despite using 3 × 3 windows) removes the fine details more than the RCMG-Median-Mean. The large amount of noise, remained in the image after prefiltering with the FPGA, also shows the superiority of our proposed method to FPGA-Canny. To quantitatively evaluate the noise robustness of each method, the edge maps of a database of 20 standard test images, before and after adding noise, are compared using Pratt’s FOM [84]. Uncorrelated Gaussian noise with different standard deviations and uncorrelated impulse noise with different probabilities, are used for testing the noise robustness of each edge detector. To obtain the edge maps, the edge threshold for each test was adjusted until the maximum FOM value was achieved. The curves obtained from averaging the FOM-SNR curves of the images are shown in Fig. 2.9. When SNR is higher than 10dB, the performance of the edge detectors starts to converge. However, in extremely low SNRs (less than 5dB), FPGA-Canny yields better results in the case of Gaussian noise, while for salt and pepper noise RCMG-Median-Mean always gives the best FOMs. RCMG- 31 White Gaussian Noise 1 0.8 FOM 0.6 MVD RCMG FPGA−Canny MVD−Median−Max MVD−Median−Mean RCMG−Median−Max RCMG−Median−Mean 0.4 0.2 0 0 5 10 15 SNR (db) 20 25 30 (a) White Salt and Pepper Noise 1 0.8 FOM 0.6 MVD RCMG FPGA−Canny MVD−Median−Max MVD−Median−Mean RCMG−Median−Max RCMG−Median−Mean 0.4 0.2 0 0 5 10 15 SNR 20 25 30 (b) Figure 2.9: The average FOM results for 20 standard images corrupted by uncorrelated Gaussian noise (a), and uncorrelated impulse noise (b). The results are produced by MVD, RCMG, FPGA-Canny and the proposed MVD-MedianMax, MVD-Median-Mean, RCMG-Median-Max and RCMG-Median-Mean methods. Median-Max is shown to have the third rank, for Gaussian noise, and second rank for salt and pepper noise. MVD-Median-Max and MVD-Median-Mean are good edge detectors for very noisy images, as they lead to better FOM than RCMG, at SNRs lower than 10dB. However, this is not the case in higher SNRs, as the RCMG shows better performance. The detectors, due to its very low MVD FOM is almost incomparable with the other edge results. The proposed RCMG-Median-Mean method is shown to be superior to both RCMG 32 and MVD , at all SNRs. The perfor- mance of RCMG-Median-Mean for Gaussian noise is very close to FPGA-Canny specially for SNRs higher than 5dB. For salt and pepper noise, the RCMG-MedianMean edge detector outperforms the FPGA-Canny. 2.3.3 Gradient Estimation To qualitatively evaluate the gradient vectors of each method, the gradient field of an artificial image, shown in Fig. 2.10(a), is obtained. Since the MVD edge detector can not estimate the gradient vector, the color variant of the Canny operator [50] is chosen for comparison. The Canny operator also represents the FPGA-Canny, because in noise-free images they have the same performance in gradient estimation. For the RCMG gradient estimator, the direction of the maximum gradient is defined by a line joining the pixel positions of the two vectors that constitute the pair that are furthest apart. Since this pair may be in different directions, it seems to be an inaccurate definition of edge direction. As shown in Fig. 2.10(b), even in the noiseless image, the RCMG gradient vectors deviate from the correct edge directions. The gradient vectors obtained using the Canny operator, shown in Fig. 2.10(c), estimate the edge direction more accurately. However, the edge or gradient localization of the Canny operator is worse than that of RCMG , specially in the diagonal edges. The problems with edge direction and localization are resolved using the RCMG-Median-Mean estimator, shown in Fig. 2.10(d). RCMGMedian-Mean has both the gradient direction accuracy of the Canny operator and the gradient localization of the RCMG. To test the noise robustness of the gradient vectors for each method, the PSNR g measure is used. It is defined by P SN Rg (dB) = 10 log10 M AX 2 2 [ ν(i, j) − ω(i, j) ] i i j j (2.23) where ν(i, j) and ω(i, j) are the gradient vectors at pixel (i, j) in the original and noisy images, respectively. Here, M AX denotes the maximum possible gradient magnitude in the image. This PSNR g is a good measure of the change in gradient 33 (a) (b) (c) (d) Figure 2.10: Gradient estimation results. (a) Original image, (b) RCMG, (c) Color Canny operator, (d) Proposed RCMG-Median-Mean. vectors, as a result of noise. The average PSNR g -SNR curves of the images from the same database are shown in Fig. 2.11. It is shown that the gradient vectors estimated by RCMG-Median-Mean are more robust than those obtained from the RCMG, Canny and FPGA-Canny for both uncorrelated Gaussian and uncorrelated salt and pepper noise. Note that for SNRs higher than 10dB for Gaussian noise and higher than 3dB for salt and pepper noise, the RCMG-Median-Mean estimates the gradient vectors better than the FPGA-Canny. 2.4 Summary In this chapter, a new scheme for accurately estimating the gradient vectors and detecting edges in noisy color images, is proposed. To obtain an accurate estimate of the gradient direction as well as the magnitude, three operators are employed: 34 Additive White Gaussian Noise 50 45 40 PSNR (db) 35 30 25 20 15 10 Canny RCMG FPGA−Canny RCMG−Median−Mean 5 0 0 5 10 15 SNR (db) 20 25 30 (a) White Salt & Pepper Noise 50 45 40 PSNR (db) 35 30 25 20 15 10 Canny RCMG FPGA−Canny RCMG−Median−Mean 5 0 0 5 10 15 SNR (db) 20 25 30 (b) Figure 2.11: The average gradient vector P SN Rg results for 20 standard images corrupted by uncorrelated Gaussian noise (a), and uncorrelated impulse noise (b). The results are produced by Canny, RCMG and the proposed RCMGMedian-Mean. 35 The first is the highpass operator which is a gradient magnitude estimator or edge detector. The second is a lowpass filter, for reducing the effect of noise and the third is an aggregation operator for combining the results. Using different types of each of these three operators, four different gradient estimators are proposed. For estimating the gradient magnitude and direction at each pixel, our best performing proposed estimator is the RCMG-Median-Mean which is a combination of the RCMG (as a highpass filter), the Median (as a lowpass filter) and the Mean (as an aggregation operator). The proposed RCMG-Median-Mean estimator is evaluated both quantitatively and qualitatively. Experimental results show that this estimator outperforms the recently proposed state-of-the-art edge detectors and gradient estimators. The advantages of the RCMG-Median-Mean gradient estimator are summarized as follows: 1. Since the RCMG-Median-Mean method is calculated by applying the RCMG row-wise followed by Median operator column-wise (and not to all the color vectors together), its computational complexity is less than that of MVD, RCMG and the FPGA-Canny. This is as long as all the methods use the same size windows. Applying the operators row-wise and column-wise, also improves the performance of the method in the detecting corners. 2. For noiseless images, the RCMG-Median-Mean yields more accurate estimates of the gradient direction than other estimators (such as RCMG and MVD). 3. Since for each window, the gradient in each direction is obtained by applying the Median filter in the perpendicular direction, the RCMG-Median-Mean method does not significantly blur the edges. 4. Finally, the RCMG-Median-Mean method is robust to both Gaussian and salt and pepper noise. 36 Chapter 3 Multi-scale Derivative Transform Theory and Applications 3.1 Introduction Multiscale gradient estimators such as the Multi-scale Derivative of Gaussian (DOG) are frequently used to extract the image features at different scales. These are used in applications such as image matching, image segmentation and quality assessment. In other applications, however it is desired to modify the multiscale gradients of the image. In this case, the resulting modified gradient vectors are then inverse-transformed to the image domain, resulting in changes in the original image. These applications include: image denoising (total variation and nonlinear anisotropic diffusion filtering methods) [25, 36, 43], contrast enhancement [35] and watermarking [46]. The wavelet-based image derivative estimation technique, proposed by Mallat [65], is widely used to obtain the image gradient at different scales. In this method, the horizontal and vertical wavelet subbands are used to estimate the horizontal and vertical derivatives of different scaled versions of an image [105]. The wavelet transform has the advantage of being both computationally efficient and invertible. However, this type of derivative estimation has the following problem: the wavelet-based derivative vector is a function of only the horizontal and vertical wavelet coefficients (and not the diagonal ones). This is because, as shown in Sec37 tion 3.2.1, a wavelet with p vanishing moments can only estimate the pre-smoothed version of the pth -order derivative, and the pre-smoothing filter (embedded in the derivative operator) removes the diagonal wavelet components. Thus, estimating the derivative vector by the traditional wavelet-based approach may decrease the estimation accuracy. To solve this problem, the shearlet transform has been used to estimate the derivative vectors at different scales of the image [115]. This transform captures the anisotropic information of the image more efficiently than the wavelet transform. However, its highly redundant nature makes it unsuitable for some applications, such as image watermarking. In image watermarking application if the values of the derivative vectors, obtained from the shearlet transform of the image, are modified (due to watermark embedding), the modified vectors may be different from those obtained after inverse-transforming them to the image domain and then transforming back to the derivative domain [73]. In this chapter, to take advantage of the computational efficiency and invertibility of the wavelet transform and to employ all the detail wavelet subbands in estimating the image derivatives, we propose a derivative transform that obtains the image gradient for different scaled versions of the image, using wavelet transform. This transform obtains the derivative vectors as a linear function of the multi-scale wavelet detail coefficients. At each pixel, the proposed forward derivative transform maps the diagonal, as well as the horizontal and vertical, wavelet coefficients to the horizontal and vertical derivative components. The proposed inverse derivative transform is designed based on the criterion that any change in the derivative vectors should result in the minimum possible change in the original wavelet coefficients. In image watermarking applications, this renders a watermark that was embedded in the image derivative domain to be as invisible as possible in the image domain. The rest of this chapter is organized as follows: Section 3.2 gives a brief overview of the traditional wavelet-based image derivative estimation approach. The proposed framework for multi-scale image derivative estimation is presented in Section 3.3. The forward and inverse derivative transforms are also described in this section. As a special case, we focus on obtaining the multiscale first and higher-order finite difference (FD ) derivatives of an image. In Section 3.4, the po38 tential applications of the proposed transform in texture feature extraction, edge detection, image quality assessment and image watermarking are discussed. Conclusions are given in Section 3.5. 3.2 Background In subsection 3.2.1 below, we give an overview of the traditional continuous waveletbased image derivative estimation. Its discrete version is then discussed in subsection 3.2.2. 3.2.1 Multiscale Image Derivative Estimation Using 2-D CWT In this subsection, the relationship between the pth -order derivative vector of a continuous image and the 2-D continuous wavelet transform is obtained. The 2-dimensional Continuous Wavelet Transform (CWT ) of a continuous image x(t1 , t2 ) can be defined as Ws,u1 ,u2 x = x, ψs,u1 ,u2 = 1 √ s ∞ ∞ x(t1 , t2 )ψ −∞ −∞ t1 − u1 , t2 − u2 s dt1 dt2 (3.1) where u1 and u2 denote the horizontal and vertical shifts, respectively, and s denotes the continuous wavelet scale. Let us assume that the unshifted (i.e. u1 = u2 = 0) and unscaled (i.e. s = 1) wavelets ψ 1 (t1 , t2 ) and ψ 2 (t1 , t2 ) have p vanishing moments in directions x and y, respectively, i.e. +∞ tk1 ψ 1 (t1 , t2 ) dt1 = 0 for 0 ≤ k ≤ p , −∞ +∞ −∞ tk2 ψ 2 (t1 , t2 ) dt2 = 0 for 0 ≤ k ≤ p. (3.2) It has been shown in [65] that ψ 1 (t1 , t2 ) and ψ 2 (t1 , t2 ) with p vanishing moments 39 can be written as ψ 1 (t1 , t2 ) = (−1)p ∂ p ρ(t1 , t2 ) ∂ p ρ(t1 , t2 ) , ψ 2 (t1 , t2 ) = (−1)p p ∂t1 ∂tp2 (3.3) where ρ(t1 , t2 ) is a smoothing function whose double integral is nonzero. Let ψs1 (t1 , t2 ) and ψs2 (t1 , t2 ) denote the scaled versions of ψ 1 (t1 , t2 ) and ψ 2 (t1 , t2 ), respectively, given as ψs1 (t1 , t2 ) = 1 t1 t2 1 1 t1 t2 ψ ( , ), ψs2 (t1 , t2 ) = 2 ψ 2 ( , ) 2 s s s s s s (3.4) Using Eqs. (3.3) and (3.4), it is easy to show that the horizontal and vertical wavelet components of the 2-D image x(t1 , t2 ), at scale s and point (u1 , u2 ), can be obtained as ∂ p (x(t1 , t2 ) ∗ ρs (t1 , t2 )) ∂tp1 ∂ p (x(t1 , t2 ) ∗ ρs (t1 , t2 )) Ws2 x = x ∗ ψs2 = (−s)p ∂tp2 Ws1 x = x ∗ ψs1 = (−s)p (3.5) where ∗ denotes the convolution operator, and ρs (t1 , t2 ) = 1 ρ( ts1 , ts2 ), s2 (3.6) Wsk x = k {Ws,u x 1 ,u2 ; (u1 , u2 ) ∈ R2 }. where superscripts k = 1, 2 denote the horizontal and vertical directions, respectively. Eq. (3.5) shows that by using the wavelets with p vanishing moments, the vec1 2 tor of wavelet coefficients Ws,u1 ,u2 x=[Ws,u x, Ws,u x]T can be interpreted 1 ,u2 1 ,u2 as the pth -order derivative vector of x ∗ ρs at point (u1 , u2 ). 3.2.2 Traditional Multiscale Image Derivative Estimation Using 2-D DWT A two-dimensional discrete wavelet transform 2-D DWT decomposes a discrete image X into 2-D scaling lowpass functions Φj,n and bandpass wavelets Ψkj,n 40 such that 3 X= J dkj [n]Ψkj,n aJ [n]ΦJ,n + (3.7) k=1 j=1 n∈Z2 n∈Z2 where j denotes the discrete scale; n = (n1 , n2 ) denotes the spatial shift of each wavelet in the image, and J represents the maximum wavelet scale. As seen, at scale J, the 2-D DWT decomposes an image into highpass subbands HL, LH and HH (denoted by superscript k = 1, 2, 3) and a lowpass subband LL. aJ [n] and dj [n] are called the approximation and the detail wavelet coefficients, respectively. In an orthogonal 2-D DWT , these coefficients can be obtained by projecting the image X on ΦJ,n and Ψkj,n , respectively, i.e. aJ [n] = X, ΦJ,n , dkj [n] = X, Ψkj,n , 1 ≤ k ≤ 3. (3.8) Similar to the continuous case (see Eq. (3.5)), using a wavelet with p vanishing moments, the vector of the horizontal and vertical wavelet coefficients, i.e. [d1j [n], d2j [n]]T , can be interpreted as a pth -order derivative vector of the smoothed version of image X, obtained using the wavelet at scale j and spatial shift n = (n1 , n2 ). That is the horizontal and vertical image derivatives are obtained as ghj [n] gvj [n] = 1 0 0 0 1 0 d1j [n] 2 dj [n] . d3j [n] (3.9) where ghj [n] and gvj [n] denote the pth -order horizontal and vertical image derivatives (at scale j and point n = (n1 , n2 )), respectively. As seen, the 2×3 matrix that relates the wavelet coefficients to the derivative components, is fixed for different scales j. It can also be seen that the value of the derivative vector, obtained using the traditional wavelet-based derivative estimation (see Eq. (3.9)), is independent of the diagonal wavelet coefficient. This may lead to a decrease in the inaccuracy of the image derivative estimate. To calculate the derivative of a discrete image, other operators, such as Finite 41 Difference (FD ), Roberts, Prewitt and Sobel could also be used. Each derivative operator has its own advantages and disadvantages in different applications. For example, while the finite difference operator yields an accurate estimate of the image derivative, it is highly sensitive to image noise. In the next section, we propose a new framework for obtaining different types of image derivatives at multiple scales. 3.3 The Proposed Multiscale Derivative Transform In subsection 3.3.1 below, we introduce the multiscale derivative transform. At each scale, the forward transform maps the corresponding horizontal, vertical and diagonal wavelet coefficients to the horizontal and vertical derivatives. The mapping function at each scale j depends on the wavelet basis and the derivative operator at the same scale. Since the derivative operator is not known at scales j > 1, we assume that the relationship between the derivative vectors and the wavelet coefficients is the same for different scales. Thus, the same mapping function that is derived from scale j = 1, will also be used for scales j > 1. Note that this assumption is consistent with the traditional wavelet-based derivative estimation method (see Eq. (3.9)). The inverse derivative transform, discussed in subsection 3.3.2, is designed such that any change in the derivative value would result in the minimum possible change in the wavelet coefficients. In image watermarking applications, this renders a watermark that was embedded in the image derivative domain to be as invisible as possible in the image domain. 3.3.1 Forward Derivative Transform The derivative vector g m at image pixel m = (m1 , m2 ) is composed of a horizontal and vertical components g m = gh [m] + i gv [m]. To obtain the values of gh [m] and gv [m], a horizontal derivative operator ∆h and a vertical derivative operator ∆v are applied to image X. gh [m] = X, ∆hm and gv [m] = X, ∆v m 42 (3.10) where ∆hm and ∆v m denote the shifted versions of the filters ∆h and ∆v to point m = (m1 , m2 ). Please note that the derivative operators ∆h and ∆v could be gradient operators such as FD , DOG , Roberts and Sobel, or higher-order derivative operators such as Laplacian of Gaussian (LOG ). For different scales j of an image X, to obtain the image derivative vector (i.e. g j,m = gh j [m] + i gv j [m]), the horizontal and vertical derivative filters at scale j, denoted by ∆hj,m and ∆v j,m , are applied on X: gh j [m] = X, ∆hj,m and gv j [m] = X, ∆v j,m (3.11) ∆hj,m and ∆v j,m can be represented in terms of orthogonal wavelets Ψkj,n as: 3 ∆hj,m , Ψkj,n Ψkj,n ∆hj,m = k=1 3 ∆v j,m , Ψkj,n Ψkj,n ∆v j,m = (3.12) k=1 where Ψkj,n denotes the shifted version of the wavelet Ψkj (shifted by n = (n1 , n2 )). Using Eqs. (3.11) and (3.12), the horizontal and vertical derivatives gh j [m] and gv j [m] are obtained as 3 gh j [m] = X, ∆hj,m = ∆hj,m , Ψkj,n X, Ψkj,n ∆v j,m , Ψkj,n X, Ψkj,n k=1 3 gv j [m] = X, ∆v j,m = (3.13) k=1 To obtain the Forward Derivative Transform, i.e. to represent the derivative of the 43 image in terms of the wavelet coefficients, we combine Eqs. (3.8) and (3.13) as 3 ∆hj,m , Ψkj,n dkj [n] gh j [m] = k=1 3 ∆v j,m , Ψkj,n dkj [n] gv j [m] = (3.14) k=1 The Forward Derivative Transform (3.14) can also be represented in matrix form as g j,m = Aj,m,n dj,n (3.15) where dj,n =[d1j [n], d2j [n], d3j [n]]T denotes the vector of wavelet coefficients at point n, and g j,m=[gh j [m], gv j [m]]T represents the derivative vector at point m. We denote the matrix Aj,m,n by the Wavelet-Derivative (WD) transform matrix. This matrix determines the relationship between the derivative vector and the wavelet coefficients, and is given by ∆hj,m, Ψ1j,n ∆hj,m , Ψ2j,n ∆hj,m , Ψ3j,n ∆v j,m , Ψ1j,n ∆v j,m , Ψ2j,n ∆v j,m, Ψ3j,n (3.16) It is important to note that the entries of the Wavelet-Derivative (WD ) transform matrix depend on both the derivative filters ∆hj,m and ∆v j,m , and the wavelet filters Ψkj,n . Thus, in the rest of the chapter, each WD transform matrix will be named by combining the names of the wavelet and the derivative filters used to obtain the WD transform matrix, e.g. Haar-FD, Haar-Haar, Sym2-FD2, and so forth. To obtain the WD transform matrix Aj,m,n , the derivative filters ∆hj,m and ∆v j,m should be determined at each scale j. In the case of multiscale DOG oper- ator [20], ∆hj,m and ∆v j,m are defined as the first derivative of a 2-D Gaussian function (with variance σj2 ) in the horizontal and vertical directions, respectively. Thus, depending on the values of the DOG operator and the wavelet filters, the matrix Aj,m,n will have different values at different scales. For other types of derivative operators, such as the finite difference, Roberts, 44 and so on, the derivative masks ∆hj,m and ∆v j,m are only defined for j = 1. At larger scales (i.e. j > 1), it is not clear how these filters can be designed. Thus, the WD transform matrix can only be calculated for j = 1. To find the derivative masks for scale j > 1 of the image, we assume the following: for a certain scale j of the wavelet, the derivative scale structure changes in a way similar to that of the wavelet. Mainly, if the scale of the wavelet is reduced in size by half, the scale of the derivative operator would also be reduced by half. Therefore, the similarity between each wavelet and the corresponding derivative mask is maintained for different scales. Thus, by using the inner product as a similarity measure, we get ∆hj,m, Ψkj,n = ∆h1,m , Ψk1,n ∆v j,m, Ψkj,n = ∆v 1,m , Ψk1,n (3.17) Substituting Eq. (3.17) into (3.12), the horizontal and vertical derivative masks ∆hj,m and ∆v j,m are obtained as 3 ∆h1,m , Ψk1,n Ψkj,n ∆hj,m = k=1 3 ∆v 1,m , Ψk1,n Ψkj,n ∆v j,m = (3.18) k=1 which results in Aj,m,n = A1,m,n . (3.19) Note that Eq. (3.19) is consistent with the traditional wavelet-based derivative estimation method (see Eq. (3.9)), in the sense that the relationship between the wavelet coefficients and the derivative components is the same for different scales j. Example 1 As a first example, we obtain the multiscale versions of the first-order derivative vectors, when Roberts and Finite Difference (FD ) gradient operators are used. 45 To calculate the Ψ11,0 , Ψ21,0 and WD transform matrix (Eq. (3.16)), we employ 2-D Haar wavelets Ψ31,0 , Ψ11,0 = defined as −1 +1 1 2 −1 +1 −1 +1 1 2 Ψ31,0 = Ψ21,0 = , +1 +1 1 2 , −1 −1 . +1 −1 (3.20) The Haar wavelet is chosen because it has only one vanishing moment, and thus can be used as a first-order derivative estimator. The Haar-Roberts WD transform matrix: the unscaled and unshifted (i.e. j = 1 and m = (0, 0)) horizontal and vertical Roberts gradient masks ∆h1,0 and ∆v 1,0 are +1 ∆h1,0 = 0 , ∆v 1,0 = 0 −1 0 +1 −1 . 0 (3.21) Using Eqs. (3.16) and (3.21), the Haar-Roberts WD transform matrix at scale j = 1 is obtained as −1 1 0 AHaar−Roberts = 1,0,0 1 (3.22) 1 0 Using Eq. (3.19), when m = n, the relationship between g j,n and dj,n becomes −1 1 0 g j,n = 1 1 0 dj,n . (3.23) The Haar-FD WD transform matrix: The Finite Difference (FD ) horizontal and vertical gradient masks are ∆h1,0 = −1 +1 0 0 , ∆v 1,0 = 46 +1 0 −1 0 . (3.24) Thus, the Haar-FD WD transform matrix at scale j = 1 is obtained as D AHaar−F = 1,0,0 1 0 1 (3.25) 0 1 −1 In this case, the relationship between g j,n and dj,n becomes g j,n = 1 0 1 0 1 −1 dj,n . (3.26) Example 2 In this example, we show that the traditional wavelet-based image derivative estimator (see Eq. (3.9)) is a special case of the proposed multiscale derivative estimation framework. The traditional wavelet-based derivative estimator uses the horizontal and vertical wavelet coefficients, as the horizontal and vertical image derivatives, respectively. Therefore, the derivative operators are equivalent to the wavelet filters, i.e. ∆hj,n = Ψ1j,n , Thus, the corresponding WD ∆v j,n = Ψ2j,n . (3.27) transform matrix shall be named the Ψ − Ψ matrix. At scale j = 1 and shift (∆n1 , ∆n2 ) = (0, 0), the Ψ − Ψ matrix is given by 1 0 0 Ψ−Ψ A1,0,0 = (3.28) 0 1 0 In the case of the traditional Haar wavelet-based gradient estimation (which is equivalent to using the Haar-Haar WD transform matrix in the proposed frame- work), the horizontal and vertical Haar gradient operators (at scale j = 1) are the same as the Haar filters, i.e. ∆h1,0 = 1 2 −1 +1 −1 +1 , ∆v 1,0 = 47 1 2 +1 +1 −1 −1 . (3.29) Example 3 As a third example, we obtain the multiscale 2nd -order FD derivative of an image. For this purpose, the length-4 Symlet wavelet is chosen due to its 2 vanishing moments that makes it eligible for the 2nd -order derivative estimation. Thus, Ψ11,0 , Ψ21,0 and Ψ31,0 (at point n = (0, 0)) would be of size 4×4. However, the horizontal and vertical 2nd -order FD masks, given by 1 −2 1 ∆h1,0 = 0 0 0 0 +1 0 0 0 , ∆v 1,0 = −2 0 0 , +1 0 0 0 (3.30) are of size 3 × 3. Since the inner product operation in Eq. (3.16) needs the size of the derivative operator to be equal to the size of the wavelet filters, we zero-pad the derivative operators. Depending on where the zeros are added, the derivative can be obtained at m={(0, 0), (0, 1), (1, 0), (1, 1)}. Similarly, using Eq. (3.16), the Sym2-FD2 matrices Aj,m,n at m={(n1 , n2 ), (n1 , n2 + 1), (n1 + 1, n2 ), (n1 + 1, n2 + 1)} are obtained as Sym2−F D2 Aj,(n 1 ,n2 ),(n1 ,n2 ) = Sym2−F D2 Aj,(n 1 ,n2 +1),(n1 ,n2 ) = Sym2−F D2 Aj,(n 1 +1,n2 ),(n1 ,n2 ) = Sym2−F D2 = Aj,(n 1 +1,n2 +1),(n1 ,n2 ) −0.56 0.13 0.15 −0.13 0.56 0.15 1.15 −0.22 −0.97 0.03 , −0.03 −0.31 0.97 0.26 0.22 0.26 , (3.31) , −1.15 −0.31 1.99 −0.06 −0.53 0.06 −1.99 −0.53 . Let m − n be denoted by ∆n = (∆n1 , ∆n2 ). Since Aj,m,n is fixed for different values of n1 and n2 , in the rest of the chapter, we will denote Aj,m,n by Aj,∆n = Aj,(∆n1,∆n2 ) . With this notation, Eq. (3.15) can be written as g j,n+∆n = Aj,n+∆n,n dj,n 48 (3.32) or in a more simplified form: g j,∆n = Aj,∆n dj (3.33) The parameter ∆n shows the position of the pixel where the image derivative is calculated relative to the position of the wavelet coefficient. 3.3.2 Inverse Derivative Transform In this subsection, we investigate the process of obtaining the inverse derivative transform, i.e. obtaining the wavelet coefficients from the derivative vectors. The goal here is to minimize the amount of distortion introduced in the wavelet coefficients, as a result of a change in the derivative vectors. In image watermarking application, such a criterion is important to ensure the invisibility of the watermark: a watermark that is embedded in the image derivative domain should be as invisible as possible in the image domain. Let us assume that a change ∆g j,∆n in the derivative vector g j,∆n corresponds to a change ∆dj in the vector of wavelet coefficients dj . Since Aj,∆n is a matrix of size 2 × 3, the problem of computing dj + ∆dj from g j,∆n + ∆g j,∆n would be that of solving an underdetermined set of equations. This can be expressed as g j,∆n + ∆g j,∆n = Aj,∆n (dj + ∆dj ). (3.34) Assuming that the vector dj (from which g j,∆n is computed) is already known, the problem is simplified to ∆gj,∆n = Aj,∆n ∆dj . (3.35) To reduce the amount of change in the image, we minimize the change in the wavelet coefficients: min ∆dj 2, subject to ∆gj,∆n = Aj,∆n ∆dj . (3.36) The least-norm solution (∆dn )ln of the above underdetermined equations can be 49 computed using the Moore-Penrose pseudoinverse of Aj,∆n , denoted by A+ j,∆n , as (∆dj )ln = A+ j,∆n ∆g j,∆n = ATj,∆n (Aj,∆n ATj,∆n )−1 ∆gj,∆n . (3.37) The inverse derivative transform is thus obtained as dj = dj + (∆dj )ln = dj + ATj,∆n (Aj,∆n ATj,∆n )−1 ∆gj,∆n . (3.38) Note that if no modification is made to the derivative vector g j,∆n (i.e. ∆g j,∆n = 0), the inverse transform yields the original wavelet coefficients (i.e. dj = dj ). 3.4 Potential Applications of the Proposed Transform In this section, some applications of the proposed derivative transform are discussed. In each application, the performance of the proposed multiscale derivative transform that applies the Haar-FD WDT matrix to the Haar wavelet coefficients of an image (see Eq. (3.26)) is discussed. This is also compared with the traditionally used Haar wavelet-based derivative (see Eq. (3.9)), which is the same as that obtained by applying the Haar-Haar WDT matrix to the Haar wavelet coefficients of the image. 3.4.1 Texture Features For Classification The derivative vectors can be used as features for classifying different texture images. Choosing an appropriate derivative operator is a key component in texture classification. It is easy to verify (see Eq. (3.26)) that the proposed multiscale Haar-FD derivative transform uses the horizontal, vertical and diagonal wavelet coefficients to obtain the derivative vector at each point. However, the traditional Haar-Haar gradient operator (see Eq. (3.9)) only employs the horizontal and vertical wavelet coefficients. Thus, the multiscale Haar-FD derivative transform is expected to be more sensitive than the Haar-Haar operator to changes in the diagonal wavelet subband. 50 j=1 j=2 j=3 j=4 1 5 2 5 10 2 4 15 10 20 3 6 25 15 30 10 20 30 5 j=1 10 8 15 2 4 6 8 4 1 2 j=3 j=2 3 4 j=4 1 5 2 5 10 2 4 15 10 20 3 6 25 15 30 10 20 30 5 (a) 10 8 15 2 4 6 8 4 1 2 3 4 (b) Figure 3.1: The gradient fields at wavelet scales j = 1, 2, 3 and 4, using the Haar wavelet transform. (a) Synthetic Image. (b) (Upper row) the traditional HaarHaar gradient estimates (obtained using Eq. (3.28)), (bottom row) the proposed multiscale Haar-FD gradient estimates (obtained using Eq. (3.25)). To evaluate the performance of the traditional Haar-Haar derivative transform and the proposed multiscale Haar-FD derivative transform, both methods are applied to a synthetic texture image (see Fig. 3.1(a)) that has relatively large diagonal wavelet coefficients. The 2-D Haar wavelet transform is used to obtain the wavelet coefficients at different scales. The wavelet coefficients are then transformed to the derivative vectors using the Haar-Haar and the Haar-FD WDT matrices. The derivative fields at wavelet scales j = 1, 2, 3 and 4 are shown in Fig. 3.1(b). As seen, the average magnitude of the Haar-FD derivative vectors is higher than that of the Haar-Haar derivative vectors. Since the Haar-Haar gradient estimator neglects the diagonal wavelet coefficients, it may not be suitable for estimating the gradients in texture images that have high-energy coefficients in their diagonal subbands. 3.4.2 Derivative Vector Quantization for Image Watermarking In some applications, such as image watermarking [73], uniform quantization of the derivative vector angles is used. The derivative vectors are thus changed, due to the derivative vector angle quantization. One of the goals in such applications is to minimize the image distortion introduced as a result of the change in the derivative vectors. Therefore, the derivative operator should be chosen such that the resulting distortion in the image (or its wavelet coefficients) be as low as possible. In this 51 chapter, we use the variance of the change in the wavelet coefficients, denoted by Dd , as an indicator (metric) of the distortion. Since the derivative vector angles are uniformly quantized, the distribution of the change in the derivative vector angle will also be uniform. Thus, the change in the horizontal and vertical derivative fields can be modeled by two independent AWGN processes. This leads to the average change in the derivative vectors being equal to 0, i.e. E{∆g} = 0. The variance of the change in the wavelet coefficients, is thus obtained as Dd = E{ ∆d 22 } = E{Tr[(∆d)(∆d)T ]} = E{Tr[AT (AAT )−1 ∆g∆g T (AAT )−1 A]} = Tr[AT (AAT )−1 C ∆g (AAT )−1 A] = Tr[(AAT )−1 C ∆g (AAT )−1 AAT ] = Tr[(AAT )−1 C ∆g ] (3.39) where Tr[.] denotes the matrix trace, and C ∆g indicates the covariance matrix of ∆g. Assuming that ∆gh and ∆gv are two uncorrelated AWGN noise with vari2 , the covariance matrix C 2 and σ∆g ances σ∆g ∆g can be written as v h C ∆g = 2 σ∆g h 0 0 2 σ∆g v . (3.40) Using Eqs. (3.39) and (3.40), the variance of the change in the wavelet coefficients, resulting from the change in the traditional Haar-Haar and the proposed Haar-FD gradient vectors, are obtained as 2 Haar = σ 2 Dd ∆gh + σ∆gv (3.41) DF D d = 2 2 3 σ∆gh + 2 2 3 σ∆gv . We can see that the wavelet coefficients are less distorted when the proposed multiscale Haar-FD gradient transform is used. In other words, any change in the FD gradient domain distorts the image less than when the same change is made to the 52 Haar gradient vectors. 3.4.3 Image Quality Assessment The gradient vectors can also be used to evaluate the amount of distortion introduced in an image [28, 117]. One goal in image quality assessment is to detect and measure even the minute changes in the image. Thus, the higher the gradient sensitivity, the more the changes in the image are transferred to the gradient domain. As a result, the image quality would be measured more accurately. Below, as an indicator of the sensitivity of each gradient transform, we measure the variance of the changes in the gradient vectors, denoted by Dg . Assume that the change in an image is modeled by AWGN noise. It is easy to show that using an orthonormal wavelet basis (such as Symlet wavelets), the changes in the wavelet coefficients ∆d can also be modeled as AWGN noise. Using this assumption, the expected (average) change in the wavelet coefficients would be E{∆d} = 0. The variance of the changes in the gradient vectors is obtained as = E{ ∆g 22 } = E{Tr[(∆g)(∆g)T ]} Dg = E{Tr[A∆d∆dT AT ]} = Tr[AC ∆d AT ] (3.42) and because of the cyclic property of matrix trace, we get Dg = Tr[(AT A)C ∆d ] (3.43) where Tr[.] denotes the matrix trace and C ∆d denotes the covariance matrix of ∆d. Since ∆d is AWGN noise, C ∆d can be expressed as C ∆d = 2 σ∆d 1 0 0 0 2 σ∆d 2 0 0 2 σ∆d 3 0 . (3.44) Using Eqs. (3.43) and (3.44), the variance of the changes in the traditional Haar and the proposed FD Haar and D F D , respectively, gradient vectors, denoted by Dg g 53 can be obtained as 2 Haar = σ 2 Dg ∆d1 + σ∆d2 (3.45) DF D g = 2 σ∆d 1 + 2 σ∆d 2 + 2 . 2σ∆d 3 We can see that the sensitivity of the proposed multiscale Haar-FD derivative transform to the AWGN change in the wavelet coefficients is larger than that of the HaarHaar transform. This is because the Haar-Haar derivative transform does not take the diagonal coefficients into consideration, and thus is insensitive to any change in these coefficients. This makes the proposed multiscale Haar-FD derivative transform more suitable for image quality assessment purpose. On the other hand, the Haar-Haar derivative transform may be more appropriate for image denoising, due to its better presmoothing filter. 3.4.4 Multiscale Edge Detection Accurate estimation of the gradient vectors is an essential step to obtain the edges of an image. In this subsection, to detect the edges, we first estimate the gradient vectors of the image at multiple scales, using the proposed transform. Then, the point-wise product of the gradient magnitudes at all scales is obtained, as proposed in [87]. To compare the performance of the proposed MSGT that uses the Haar-FD derivative transform with the traditional Haar-Haar transform, the unthresholded edges of the gray-scale version of Kodak Houses image [1], shown in Fig. 3.2, are detected. The edge maps obtained using the Haar-Haar and Haar-FD WDT matrices are shown in Figs. 3.3(a) and 3.3(b), respectively. Figs. 3.3(c) and 3.3(d) represent the closeups of Figs. 3.3(a) and 3.3(b) (at the region marked with a black dashed rectangle in Fig. 3.2), respectively. As expected, the Haar-FD derivative transform yields a more accurate edge map with less blurring of image details. This can be verified by comparing the edge maps shown in Figs. 3.3(c) and 3.3(d), at areas marked by white arrows. 54 Figure 3.2: Kodak “Houses” image (a) (b) (c) (d) Figure 3.3: Edge results for the Kodak Houses image, obtained using the HaarHaar (a), and Haar-FD WDT matrices (b). (c) The closeup of (a), at the region marked with dashed rectangle in Fig. 3.2. (d) The closeup of (b). The arrows mark the areas where the image details are blurred. 55 3.5 Summary In this chapter, a wavelet-based method for obtaining the first and higher order derivatives of at multiple scaled versions of 2-D images is proposed. Previous wavelet-based derivative estimation methods calculate the image derivative using only the coefficients of the horizontal and vertical subbands at different scales. This is equivalent to using wavelet bases, as derivative operators at different scales. In most applications however other types of gradient operators such as Finite Difference (FD ), Derivative of Gaussian (DOG) and Roberts are used to obtain the image derivatives. In this chapter, we obtain these types of image derivatives at different scales in terms of wavelet coefficients. In the proposed transform, at each image scale, the values of the derivative vectors are obtained as a function of the coefficients of the horizontal, vertical and diagonal wavelet subbands. The inverse transform is designed such that any change in the gradient domain results in the minimum possible change in the wavelet coefficients. By using the diagonal wavelet subband, the performance of the proposed gradient transform when applied to texture feature extraction, edge detection and image quality assessment is higher than that of the conventional wavelet-based gradient estimation method. 56 Chapter 4 Image Watermarking Based on Gradient Direction Quantization Digital watermarking has been proven to be a promising tool in many applications such as copyright protection, broadcast monitoring, fingerprinting, authentication and device control. Watermarking approaches can generally be classified into two categories [112]: Spread Spectrum (SS ) based watermarking [32], [83] and quantization based watermarking [27, 57, 70]. The SS type watermarking adds a pseudorandom noise-like watermark into the host signal and has been shown to be robust to many types of attacks. Based on the distribution of the coefficients in the watermark domain, different types of optimum and locally optimum decoders have been proposed [31], [10], [5] and [49]. Many SS based methods have been developed. For example, Wang et al. used a key dependent randomly generated wavelet filter bank to embed the watermark [104]. Lahouari et al. proposed a robust watermarking algorithm based on balanced multiwavelet transform [41]. Bi et al. proposed a watermarking scheme based on multi-band wavelet transform and empirical mode decomposition (MWT-EMD) [12]. In quantization watermarking, a set of features extracted from the host signal are quantized so that each watermark bit is represented by a quantized feature value. Kundur and Hatzinakos proposed a fragile watermarking approach for tamper proofing, where the watermark is embedded by quantizing the DWT coef- ficients [56]. Chen and Wornell [27] introduced Quantization Index Modulation 57 (QIM ) as a class of data-hiding codes, which yields larger watermarking capacity than SS based methods [10]. Gonzalez and Balado proposed a quantized projection method that combines QIM and SS [78]. Chen and Lin [29] embedded the watermark by modulating the mean of a set of wavelet coefficients. Wang and Lin embedded the watermark by quantizing the super trees in the wavelet domain [103]. Bao and Ma proposed a watermarking method by quantizing the singular values of the wavelet coefficients [7]. Kalantari and Ahadi proposed a logarithmic quantization index modulation (LQIM) [48] that leads to more robust and less perceptible watermarks than the conventional QIM . Recently, a QIM -based method, that em- ploys quad-tree decomposition to find the visually significant image regions, was proposed [79]. Quantization based watermarking methods are fragile to amplitude scaling attacks. Such attacks do not usually degrade the quality of the attacked media but may severely increase the bit error rate (BER). During the last few years, many improved techniques have been proposed to deal with this issue, including the use of pilot signals [37, 93, 94], the design of amplitude-scale invariant codes, such as Trellis codes [67], orthogonal dirty paper codes [4] and order-preserving lattice codes [16]. Ourique et al. proposed Angle Quantization Index Modulation (AQIM ), where only the angle of a vector of image features is quantized [74]. Embedding the watermark in the vector’s angle makes the watermark robust to changes in the vector magnitude, such as amplitude scaling attacks. One promising feature for embedding the watermark using AQIM is the an- gle of the gradient vectors with large magnitudes, which we refer to as significant gradient vectors. A few watermarking methods that rely on the significant image wavelet coefficients (which usually correspond to significant edges), have been proposed [14, 15, 47, 90]. Based on the concept of sub-image histogram consistency, Kim and Oh presented a watermarking method for text document images using edge direction histograms [54]. Since the histogram consistency is only valid for text documents, their method can not be directly applied for watermarking other media types. To our knowledge, no image watermarking method based on the directions of the significant gradient vectors have been proposed so far. In this chapter, we propose an image watermarking scheme that embeds the watermark using uniform quantization of the direction of the significant gradient 58 vectors obtained at multiple wavelet scales. The proposed method has the following advantages: 1) by embedding the watermark in the direction of the gradient vectors (using angle quantization techniques), the watermark is rendered robust to amplitude scaling attacks, 2) by embedding the watermark in the significant gradient vectors, the imperceptibility of the embedded watermark is increased, since the HVS is less sensitive to minor changes in edges and textured areas than in smooth regions, 3) by embedding the watermark at multiple scales, the watermarking capacity is enhanced, i.e. more watermark bits can be embedded in the image. Traditional multi-scale gradient estimators, such as the multi-scale Sobel estimator [84], are redundant and have the problem of inter-scale dependency. To avoid this problem, we employ DWT to estimate the gradient vectors of different scaled versions of the image. To quantize the gradient direction, we propose the Absolute Angle Quantization Index Modulation (AAQIM ). AAQIM solves the problem of angle discontinuity at θ = π by quantizing the absolute angle value. To quantize the gradient angle, we first derive the relationship between the gradient angle and the DWT coefficients. The corresponding DWT coefficients are then modified based on the changes introduced by quantizing the gradient angles. Thus, to embed the watermark bits, the gradient field that corresponds to each wavelet scale is obtained using the Haar-FD WD matrix obtained in Eq. (3.25). This is illustrated in Fig. 4.1, where each gradient vector g j corresponds to the 3 wavelet coefficients d1j , d2j and d3j . The straightforward way to embed the watermark bits is to partition the gradient fields into non-overlapping blocks. Each watermark bit is then embedded in each block. The bit is inserted into the most significant gradient vectors of the block. Embedding the watermark in the significant vectors makes it robust to attacks. In natural images however, some parts of the image may have all or most of the significant vectors, while other parts may have none. This could cause the following two problems: 1) the watermark bits that are embedded in the blocks that do not contain any significant vectors, will be less robust to attacks, 2) the most significant vectors in some blocks may have magnitudes that are close to each other. If a watermark bit is embedded in the most significant vector of such a block, even a small amount of attack can alter this vector so that it loses its status as the most significant vector. Thus the decoder may mistakenly extract the watermark bit from 59 Figure 4.1: Illustration of 5-level gradient field, obtained from 5-level wavelet decomposition. an unwatermarked vector. Our proposed solution to these problems is similar to the method proposed in [112], which addresses the issue of uneven embedding capacity. To address problem number 1 above, we uniformly scramble the locations of all the gradient vectors, so that each block contains at least one significant gradient vector. Uniform vector scrambling increases the gradient magnitude entropy, and thus reduces the probability of finding two vectors with similar magnitudes in each block. This will somehow address problem number 2 also. Increasing the difference in the magnitudes of the watermarked and unwatermarked vectors is also proposed to help identify the watermarked vectors correctly. In the rest of this chapter, a brief overview of Angle Quantization Index Modulation (AQIM ) is given in Section 4.1. In Section 4.2, the watermark embedding method is proposed. The gradient direction and magnitude are obtained in terms of DWT coefficients and the proposed AAQIM and uniform significant vector scram- bling technique are introduced in this section. The watermark decoding method is proposed in section 4.3. The experimental results and performance comparisons 60 Figure 4.2: The block diagram of the proposed watermark embedding scheme. are given in Section 4.4. Finally, conclusions are given in Section 4.5. 4.1 Angle Quantization Index Modulation (AQIM) AQIM [74] is an extension of the Quantization Index Modulation (QIM ) [27] method. The quantization function, denoted by Q(θ), maps a real angle θ to a binary number as follows: Q(θ) = 0 if ⌊θ/∆⌋ is even 1 if ⌊θ/∆⌋ is odd (4.1) where the positive real number ∆ represents the angular quantization step size and ⌊.⌋ denotes the floor function. where the following rules are used to embed a watermark bit into an angle θ: • If Q(θ) = w, then θ takes the value of the angle at the center of the sector it lies in. • If Q(θ) = w, then θ takes the value of the angle at the center of one of the two adjacent sectors, whichever is closer to θ. 61 These rules can be expressed as: ∆⌈θ/∆⌉ − ∆/2 if Q(θ) = w ∆⌈θ/∆⌉ + ∆/2 if Q(θ) = w and w θ = θ > (∆⌈θ/∆⌉ − ∆/2) ∆⌊θ/∆⌋ − ∆/2 if Q(θ) = w and θ ≤ (∆⌈θ/∆⌉ − ∆/2) (4.2) 4.2 Proposed Watermark Embedding Method Fig. 4.2 shows the block diagram of the proposed embedding scheme. The watermark is embedded by changing the value of the angle (the direction) of the gradient vectors. First, the 2-D DWT is applied to the image. At each scale, we obtain the gradient vectors in terms of the horizontal, vertical and diagonal wavelet coefficients. To embed the watermark, the values of the DWT coefficients that correspond to the angles of the significant gradient vectors are changed. We now discuss the details in the following subsections. 4.2.1 Gradient Direction Based Watermarking To achieve high fidelity-robustness tradeoff, Human Visual System (HVS) models could be employed in watermark embedding. Towards this aim, the Just Noticeable Difference (JND ) can be obtained for each transform-domain coefficient. Fig. 4.3 shows the JND values obtained for image Lena using the method proposed in [9]. The gray-value at each pixel indicates the histogram-equalized value of JND , scaled in the range [0, 1]. It is clear from the JND image that the HVS is less sensitive to distortions around the significant edges (or the significant gradient vectors) than to those in smooth areas. Therefore edges, object borders and significant gradient vectors are good candidates for embedding a watermark. The watermark can be embedded in the gradient magnitude and/or in the gradient direction. One disadvantage of the former option is the sensitivity of the watermark to amplitude scaling attacks. However, the second option, embedding the watermark in the gradient direction, is robust to many types of attacks, such as scaling, Gaussian filtering and compression (as these attacks do not severely affect the gradient direction). 62 Figure 4.3: (Left) Image “Lena”. (Right) Just noticeable values (JND) obtained for each pixel of image Lena, based on the method proposed in [9]. Fig. 4.4(a) shows a diagonal ramp edge. Significant gradient vectors at each pixel are depicted by arrows. As shown in Fig. 4.4(b-d), the magnitudes of the gradient vectors are easily changed by scaling, Gaussian filtering and JPEG compression. The angles of the significant gradient vectors however remain almost unchanged. The gradient directions form a robust feature of an image since their values are not easily changed unless the image quality is severely degraded. The above properties of the gradient direction makes the inserted watermark both robust and imperceptible. We therefore propose embedding the watermark in the directions of the significant gradient vectors of the image. 4.2.2 Obtaining FD Gradient Vectors Using Multi-scale Derivative Transform In this section, we obtain the Finite Difference (FD ) gradient vectors at different image scales by using the proposed Multi-scale Derivative Transform in Chapter 3. Using Eq. (3.25), the FD gradient vectors are obtained as g j [n] = d1j [n] + d3j [n] 2 +i d2j [n] − d3j [n] 2 . (4.3) where dk denotes the 2-D Haar wavelet coefficients. Superscripts 1, 2 and 3 denote the horizontal, vertical and diagonal directions, respectively; j and n respectively represent the scale and position where the image gradient is calculated. This is 63 (a) (b) (c) (d) Figure 4.4: The effect of different types of attacks on the significant gradient vectors of level 4. (a) Original image. (b) Grayvalue scaling change by scaling factor of 2. (c) Gaussian filtering with window size of 5. (d) JPEG compression with quality factor of 50. shown in Fig. 4.1. The direction θj [n] and the magnitude rj [n] of the gradient vector g j [n] can be expressed as tan(θj [n]) = rj [n] = 1 2 d2j [n] − d3j [n] d1j [n] + d3j [n] (d1j [n] + d3j [n])2 + (d2j [n] − d3j [n])2 . (4.4) (4.5) We use the gradient direction θj in the range of (−π, π]. 4.2.3 Effect of Changing the Gradient Angle on DWT Coefficients Assume that a change dθj in the gradient angle corresponds to changes ∂d1j , ∂d2j and ∂d3j in the wavelet coefficients d1j , d2j and d3j , respectively. 64 To find the {∂dkj }, k = 1, 2, 3, in terms of dθj , we first obtain the partial derivative of θj with respect to ∂d1j , ∂d2j and ∂d3j . We take the logarithm of both sides of Eq. (4.4) ln(tan(θj )) = ln(d2j [n] − d3j [n]) − ln(d1j [n] + d3j [n]). (4.6) By using the first order Taylor approximation of each term, we obtain ∂d2j − ∂d3j ∂d1j + ∂d3j (1 + tan2 (θj )) dθj ≃ − . tan(θj ) d2j − d3j d1j + d3j (4.7) Note that the approximation error is negligible when |∂dk | ≪ |dk |. After more simplifications, we obtain dθj as a function of ∂d1j , ∂d2j and ∂d3j . dθj ≃ − tan(θj ).∂d1j + ∂d2j − (1 + tan(θj )).∂d3j (1 + tan2 (θj ))(d1j + d3j ) . (4.8) Since we embed the watermark bit in the gradient direction, the change in the gradient magnitude r (Eq. (4.5)) should be zero, i.e. the gradient magnitude remains the same. To simplify the equations, we denote 4r 2 by R, and use the first order Taylor approximation of both sides of Eq. (4.5) so that dRj /2 ≃ (d1j + d3j )(∂d1j + ∂d3j ) + (d2j − d3j )(∂d2j − ∂d3j ) (4.9) Since dRj = 0, Eq. (4.9) results in ∂d1j + tan(θj )∂d2j + (1 − tan(θj ))∂d3j ≃ 0 (4.10) To make the watermark as imperceptible as possible, the change in the wavelet coefficients due to embedding the watermark should be as minimum as possible. Therefore, we minimize the overall change in the wavelet coefficients. min ∂d1j ,∂d2j ,∂d3j (∂d1j )2 + (∂d2j )2 + (∂d3j )2 (4.11) Thus the problem is to minimize Eq. (4.11) subject to the constraints given in 65 Eqs. (4.8) and (4.10). This results in (∂dj )opt where Thus, the watermarked 1 (∂dj )opt ≃ (−2 tan(θj ) + 1)U : (∂d2j )opt ≃ (− tan(θj ) + 2)U (∂d3j )opt ≃ −(tan(θj ) + 1)U (4.12) U = 13 (d1j + d3j )dθj . DWT coefficients can be expressed as (d1j )w = d1j + (∂d1j )opt , (d2j )w = d2j + (∂d2j )opt , (4.13) (d3j )w = d3j + (∂d3j )opt . Note that the watermark strength is determined by the quantization step size ∆, in Eq. (4.15). 4.2.4 Embedding the Watermark Bit in the Gradient Angle As mentioned before, AQIM is one of the best methods for angle quantization. However, it does not account for the angle discontinuity at θ = π. The discontinuity problem arises when the angle θ is close to π. As an example, consider the problem of distributing dθ among the angles of N gradient vectors {x1 , . . . , xi , . . . , xN } such that each gradient angle is changed by dθi and N i=1 dθi = dθ. Assume the angle θk of one gradient vector xk is positive and close to π such that (π − θk ) = ε. If θk is increased by dθk such that dθk > ε and since the angle is defined in the range θ ∈ (−π, π], then the resultant change in θk would be (2π − dθk ) instead of dθk . This may severely affect dθ and lead to a false watermark bit. In the proposed watermarking method, as shown in Eq. (4.12), the change in each DWT coefficient is computed in terms of dθ. If the resultant change in the angle is obtained as (2π − dθ) (instead of dθ), the large changes in the DWT coef- ficients may yield a perceptible watermark. To address this angle discontinuity issue, we propose the Absolute Angle Quan- tization Index Modulation (AAQIM ). In AAQIM , instead of quantizing the value of the angle, its absolute value is quantized in the interval |θ| ∈ [0, π]. The absolute 66 Figure 4.5: Illustration of the proposed absolute angle quantization index modulation (AAQIM). (Vectors before and after watermarking are represented by black and gray arrows, respectively.) angle quantization function is defined as follows: 0 if ⌊|θ|/∆⌋ is even, Q(|θ|) = (4.14) 1 if ⌊|θ|/∆⌋ is odd. To calculate the watermarked angle, we use the following rule: |θ w | = ∆⌈ |θ| ∆⌉− |θ| ∆⌈ ∆ ⌉ + ∆⌊ |θ| ∆⌋− ∆ 2 ∆ 2 if Q(|θ|) = w if Q(|θ|) = w and { ∆⌈ |θ| ∆⌉− or ∆ 2 ∆ 2 < |θ| ≤ π − ∆ 2 |θ| ≤ ∆/2 } (4.15) if Q(|θ|) = w and { ∆ 2 or < |θ| ≤ ∆⌈ |θ| ∆⌉− ∆ 2 |θ| > (π − ∆/2) } where w denotes the watermark bit to be embedded (i.e. w = 0 or 1). Fig. 4.5 shows how AAQIM changes the value of the angle when embedding the watermark bit. The watermarked angle θ w and the change in the angle dθ can be obtained as θ w = |θ w |sign(θ), dθ = θ w − θ. (4.16) Note that when the watermark bit is embedded in a single angle, another solution 67 for the angle discontinuity problem of the AQIM is to replace dθ with dθ 0 given by dθ 0 = dθ − 2π dθ + 2π if dθ > π, if dθ < −π . (4.17) However, this solution is not appropriate for embedding the watermark bit in the average of multiple angles. 4.2.5 Overcoming the Bit-ordering Error Using Uniform Scrambling of the Gradient Vectors As previously mentioned, at each wavelet scale, the watermark bits are embedded in the angles of the significant gradient vectors. To find the significant vectors, the straightforward way is to sort the gradient vectors in the descending order of their magnitudes. After ordering the watermark bits, the first watermark bit is embedded in a vector with highest rank, the second bit into the vector ranked as second and so on. The decoder could then know where each bit is embedded, by ordering the gradient vectors in the same way. One problem associated with this scheme is that noise or other types of attacks may change the magnitudes of the gradient vectors and thus their ranking. This type of error is called the bit-ordering error. This may cause a misalignment between the embedded and extracted watermark bits. Instead of sorting the gradient vectors to obtain the order of which watermark bit is in which gradient vector, we divide the transform domain into nonoverlapping blocks and assign each bit to each block. As such the decoder knows which bit is in which block. To assign a watermark bit to a block, we embed the bit into the significant gradient vectors of that block. Thus, any change in one of the significant gradient vectors would only affect the decoded bit in its own block. Since the proposed method does not rely on sorting the gradient vectors, noise will not affect the alignment of the embedded and extracted watermark bits. In natural images however, some parts of the image may have all or most of the significant vectors, while other parts may have none. This could cause the following two problems: 1) the watermark bits that are embedded in the blocks that do not contain any significant vectors, will be less robust to attacks, 2) the most significant vectors in some blocks may have magnitudes that are close to each other. If a wa- 68 termark bit is embedded in the most significant vector of such a block, even a small amount of attack can alter this vector so that it loses its status as the most significant vector. Thus the decoder may mistakenly extract the watermark bit from an unwatermarked vector. To overcome these problems, our solution is similar to the technique proposed in [112] for the problem of uneven embedding capacity. Before embedding the watermark bits, we spread the locations of the significant gradient vectors uniformly over the transform domain. To spread the significant gradient vectors, we propose geometrically scrambling the locations of all the gradient vectors. As a result, in the ideal case, significant gradient vectors are mapped to separate blocks. We then embed each watermark bit in the most significant gradient vectors of each block. Finally, the gradient vectors are descrambled to their original locations in the image. The steps of the proposed method can be summarized as follows: 1. Partition each gradient field into small non-overlapping blocks. The number of blocks is equal to the number of bits to be inserted. 2. Scramble the gradient vectors uniformly over the transform domain so that each block contains at least one significant gradient vector. 3. Find the significant gradient vectors of each block. If a block does not contain any significant vector, a vector with the largest magnitude in that block is assumed to be the significant vector. 4. Embed each watermark bit in the most significant vectors of each block, using a certain ordering of the blocks. The bit can be embedded in one or more vectors. 5. Descramble the gradient field. Image Scrambling and Descrambling Methods The scrambling method should be a geometric transform that would ideally result in a uniform distribution of the locations of the significant gradient vectors. 69 Different geometric image scrambling methods have been proposed. Fibonacci transformation [119], Arnold Cat transformation [68] and Gray Code transformation [118] are the most widely used for image scrambling. It has been shown in [114] that most of the geometric transforms are special cases of the affine modular transformation [120]. Therefore, we employ the affine modular transform, due to its generality. In an image of size N × N , let the coordinate vectors of each pixel in the original and scrambled images be denoted by [x, y]T and [x′ , y ′ ]T , respectively. The affine modular mapping from [x, y]T to [x′ , y ′ ]T is defined as x′ y′ = a b x c d y + e f mod(N ) (4.18) where a, b, c, d, e and f are scrambling parameters. If the absolute value of the determinant of the matrix M = [a, b; c, d] equals to 1 (i.e. det(M ) = |ad − bc| = 1), the transform is area preserving and one-to-one mapping. For a given positive integer N , the necessary and sufficient condition for the periodicity of the affine modular transform is that det(M ) and N are primal to each other [120]. By using the periodicity property of this mapping, the original image can be recovered from the scrambled image. Let the smallest period of this map be denoted by tN . If the scrambled image is constructed by applying this transformation t times, the descrambled image can be exactly recovered after consecutively applying the same transformation tN −t times [114]. To reduce the number of parameters needed in embedding and to make it an area preserving transform, the elements of the mapping matrix M are assigned the values a = 1, b = p, c = q and d = 1 + pq. The mapping matrix M is then given by M= 1 p q 1 + pq 70 . (4.19) Table 4.1: The sensitivity of the uniformity metric to different scrambling parameters, at wavelet levels 3, 4 and 5 Image Peppers Baboon Barbara Lena Parameter Set 3 0.0027 0.0022 0.0011 0.0011 0.0027 0.0022 0.0011 0.0011 0.0059 0.0067 0 0 0.0034 0.0050 0.0013 0 p=var,q=1,e=0,f=0 p=2,q=var,e=0,f=0 p=2,q=2,e=var,f=0 p=2,q=2,e=0,f=var p=var,q=2,e=0,f=0 p=2,q=var,e=0,f=0 p=2,q=2,e=var,f=0 p=2,q=2,e=0,f=var p=var,q=1,e=0,f=0 p=2,q=var,e=0,f=0 p=2,q=1,e=var,f=0 p=2,q=1,e=0,f=var p=var,q=1,e=0,f=0 p=2,q=var,e=0,f=0 p=2,q=1,e=var,f=0 p=2,q=1,e=0,f=var DWT Level 4 5 0.0051 0.0033 0.0045 0.0042 0 0 0 0 0.0051 0.0033 0.0045 0.0042 0 0 0 0 0.0050 0.0020 0.0050 0.0016 0 0 0 0 0.0041 0.0018 0.0050 0.0022 0 0 0 0 Obtaining the Scrambling Parameters that Result in Quasi-uniform Distribution It can be shown that certain choices of parameters p, q, e and f result in a quasi-uniform distribution of the locations of the significant gradient vectors. To find such parameters, we define the uniformity metric (UM) as UM = NSB NB (4.20) where NSB is the number of image blocks containing at least one significant gradient vector and NB is the number of all the blocks of the image. In the ideal case UM = 1 which means that the scrambling method has uniformly distributed the significant gradient vectors all over the image. The scrambling parameters p, q, e and f are then obtained from (p, q, e, f )opt = arg max (p,q,e,f ) 71 NSB . NB (4.21) To determine the most effective scrambling parameters, we calculate the sensitivity of UM to each scrambling parameter. Table 4.1 shows the results of the sensitivity of UM to each parameter p, q, e and f , obtained for images Peppers, Baboon, Barbara and Lena of size 512 × 512, with block sizes 4 × 8, 4 × 2 and 2 × 2 at wavelet levels 3, 4 and 5, respectively. The watermarks have 256 bits. The sensitivity of UM to a parameter x ∈ {p, q, e, f } is denoted by Sx and is estimated by Sx ≃ σ(UM) σ(x) (4.22) where σ(.) denotes the standard deviation. To calculate the sensitivity of UM to each parameter, the other parameters are fixed at their optimum values. Table 4.1 shows that the uniformity metric UM is not sensitive to the parameters e and f . In all cases, Sf is equal to 0 and Se is mostly between 0 and 0.0011. Since, UM is not very sensitive to e and f , these parameters can be ignored in Eq. (4.18). Eliminating these parameters helps simplify the optimization of Eq. (4.21) and speed up the watermark embedding process. Therefore the scrambling transformation is now obtained as x′ y′ = 1 x p q 1 + pq y mod(N ) (4.23) where the scrambling parameters p and q are integers that are computed from (p, q)opt = arg max (p,q) NSB . NB (4.24) The values of popt and qopt are found via exhaustive search. The chaotic mapping of Eq. (4.23), which is called the generalized Arnold Cat map, has been shown to have a high randomness score and a low computational complexity [116]. Note that since the number of gradient vectors at wavelet levels 3, 4 and 5 is much smaller than the total number of the gradient vectors in the image, applying the generalized Arnold Cat map to these vectors is computationally feasible. By optimizing the parameters p and q, the watermarking becomes more robust to attacks, and thus the Bit Error Rate (BER ) is reduced in decoding. For watermark decoding, the optimum values of the parameters p and q need to be sent to the 72 Table 4.2: The effect of changing the scrambling parameters p and q on BER(%). The results show the deviation of the BER (%) from BERopt under different types of attacks. (Message length=64 bits, PSNR=42 dB) Image Gaussian Filter 5×5 7×7 9×9 0 0 0 3×3 0 Median Filter 5×5 7×7 0.57 1.96 9×9 7.01 JPEG QF=10 QF=20 1.71 0.14 σ = 10 0.14 AWGN σ = 20 1.51 σ = 30 5.40 Peppers 3×3 0 Baboon 0 0 0 0 0 2.63 5.64 7.92 1.59 0 0 1.02 5.33 Barbara 0 0 0 0 0 0 2.36 6.19 1.47 0 0 0.85 4.23 Lena 0 0 0 0 0 0.31 1.34 3.86 1.31 0 0.13 1.16 5.37 receiver. This however increases the required information. One solution to this problem is to use fixed values of these parameters at the transmitter and the receiver sides. However, this will not always result in minimal BER. To evaluate the effect of using non-optimum parameters on the BER, we quantify the amount of deviation of the BER from BERopt as D 2 = E (BER − BERopt )2 . (4.25) Table 4.2 shows the effect of changing the integer parameters p and q, in the interval [1, 5], on the BER deviation D. A random 64-bit binary message is embedded at wavelet levels 4 and 5, with block sizes 4 × 8 and 2 × 4, respectively. The parameters (p, q)opt for images Pappers, Baboon, Barbara and Lena are set to (3, 4), (1, 1), (2, 3) and (2, 1), respectively. We note that, for each type of attacks, the more severe the attack, the more effective it is to use the optimum values of the scrambling parameters. Therefore, for attacks that do not severely degrade the watermarked image, fixed non-optimum scrambling parameters can be used in both the transmitter and the receiver sides. 4.2.6 Overcoming the Problem of Extracting a Watermark From an Unwatermarked Gradient Vector As discussed in subsection 4.2.5, each watermark bit is embedded in the most significant gradient vectors of each block. However, an attack may reduce the magnitude of these vectors such that they are no longer identified as significant vectors. In that case, the receiver assumes a gradient vector that has not been 73 watermarked, as watermarked. Thus, an error that we call watermark extraction from an unwatermarked vector error is generated. The probability of the error increases when the magnitudes of the significant vectors are close together. It can be shown that scrambling the gradient vectors would significantly reduce this error. Such uniform scrambling increases the gradient magnitude entropy of each block. Thus the probability of finding two vectors in a block with similar magnitudes decreases. To further reduce the error, we propose to increase the gap between the watermarked and unwatermarked gradient vectors, by increasing the magnitudes of the watermarked vectors. This is done as follows: To make the change in the magnitude as imperceptible as possible, we minimize the overall change in the DWT coefficients. Such minimization is given by min ∂d1j ,∂d2j ,∂d3j (∂d1j )2 + (∂d2j )2 + (∂d3j )2 (4.26) To find the relationship between the gradient magnitude change drj and ∂dkj , we expand Eq. (4.5) using the first order Taylor approximation of Rj = 2rj2 as dRj /2 ≃ (d1j + d3j )(∂d1j + ∂d3j ) + (d2j − d3j )(∂d2j − ∂d3j ) (4.27) Since we are only increasing the magnitude of the gradient vectors, dθj = 0. Thus, from Eq. (4.8) we have − tan(θj ).∂d1j + ∂d2j − (1 + tan(θj )).∂d3j ≃ 0 (4.28) Note that since the magnitude of each gradient vector is changed after embedding the watermark, angle θj in Eq. (4.28) represents the angle of the gradient vector after inserting the watermark i.e. θjw . By minimizing Eq. (4.26), subject to the constraints given in Eqs. (4.27) and (4.28), and by replacing θj with θjw , the following relationship between (∂d1j )opt , (∂d2j )opt 74 , (∂d3j )opt and dRj is obtained: (∂dj )opt 1 w (∂dj )opt ≃ (2 + tan(θj ))T : (∂d2j )opt ≃ (1 + 2 tan(θjw ))T (∂d3j )opt ≃ (1 − tan(θjw ))T T = dRj 6 (4.29) /((d1j )w + (d3j )w )/(1 + tan2 (θjw )) where (d1j )w and (d3j )w denote the watermarked DWT coefficients in wavelet level j and subbands LH, HL and HH, respectively. To determine the amount of magnitude change dr in each watermarked significant gradient vector, the following facts should be taken into account: • Significant vectors with large magnitudes are less susceptible to attacks than vectors with small magnitudes. Thus, the amount of increase in the lowmagnitude vectors should be more than that in the large-magnitude vectors (i.e. dr ∝ 1r ). • The magnitudes of significant vectors that are close to those of insignifi- cant vectors, should be increased more than the other significant vectors (i.e. dr ∝ 1/(r − rinsignif icant )), because it is more probable that their positions are switched with insignificant vectors. We propose the following equation (for the change in the magnitude of each watermarked vector), as it satisfies the above conditions: dr = α −(r−rLIS ) e r (4.30) where α is a constant that adjusts the overall gradient magnitude change in the image and rLIS is the magnitude of the largest insignificant gradient vector in each block. 4.3 Proposed Watermark Decoding Method The watermark bits are extracted following the reverse embedding steps, as shown in Fig. 4.6. At the transmitter side, each watermark bit is embedded into the BR 75 Figure 4.6: Block diagram of the proposed watermark decoding method. most significant gradient vector of each block. BR is the number of most significant vectors used. For example, when BR = 1, the gradient vector with the largest magnitude is used and when BR = 2, the two most significant vectors are used and so on. In this thesis, in all the experiments BR = 2 is employed. At the receiver side, we extract the watermark bit of the BR most significant vectors and assign weights to the watermark bits based on the following rules: • A watermark bit extracted from a gradient vector with large magnitude should be given more weight than a bit extracted from a small gradient vector. • A watermark bit extracted from an angle close to a centroid of a sector in the quantization circle (see Fig. 4.5), should be given more weight than a bit extracted from an angle close to a sector boundary. Based on these two facts, each watermark bit is weighted by ak = r γ . ∆ |θ| − |θ| − ∆⌊ ⌋ + ∆/2 2 ∆ , k = 1, . . . , BR (4.31) where γ is a constant that represents the importance of magnitude weighting vs. angle weighting. In our simulations, we have set BR = 2 and γ = 4. Based on the weights determined by Eq. (4.31), the watermark bit in each block after decoding is given the value w ˆ= 1 if W Q ≥ 0.5 0 Otherwise 76 (4.32) Figure 4.7: Illustration of the proposed GMMG filters f1 and f2 . Letters “M” and “G” stand for median and Gaussian filters, respectively. where W Q is defined as BR WQ = k=1 ak Q(|θk |) (4.33) BR ak k=1 where Q(|θk |) is the absolute angle quantization function, defined in Eq. (4.14). One problem that arises with all gradient based methods is their low robustness to salt & pepper noise. The problem is due to the high frequency nature of both gradient vectors and salt & pepper noise. The long-tailed nature of this noise can drastically change the gradient vectors. To deal with this issue, we add a prefiltering stage before decoding the watermark bits. As filtering the watermarked image may introduce distortion and reduce the watermark detectability, this prefiltering stage should be carefully designed. We propose a nonlinear filter for the prefiltering stage, called Gaussian-Median -Median-Gaussian (GMMG) filter, that averages the results of two filters f1 and f2 with windows of size 3 × 3. The GMMG filters are illustrated in Fig. 4.7. In filter f1 , the median filter (M ) is implemented row-wise and the Gaussian filter (G) is implemented column-wise. Filter f2 implements the median and Gaussian filters column-wise and row-wise, respectively. Since the median filter is a nonlinear filter, filters f1 and f2 are also nonlinear. In nonlinear systems, changing the order of filters usually changes the result. For example, one can first use a median filter on each row followed by a Gaussian filter on each column of the image. Alternatively, 77 Table 4.3: Angular quantization step sizes at wavelet levels 3, 4 and 5. Image Peppers Baboon Barbara Lena ∆3 (rad) π/12 π/12 π/12 π/12 ∆4 (rad) π/22 π/16 π/22 π/26 ∆5 (rad) π/34 π/18 π/28 π/28 the Gaussian filter can be applied to each column first and then the median filter to each row. To solve this problem of f1 and f2 , we first implement the median filter followed by the Gaussian filter and then the Gaussian filter followed by the median filter. The results are then aggregated for each filter f1 and f2 . 4.4 Simulation Results and Discussions To evaluate the performance of the proposed method, we embed different pseudorandom binary watermarks of size 256 in the grayscale test images “Peppers”, “Baboon”, “Barbara” and “Lena”. All test images are of size 512 × 512. To render the watermark as imperceptible as possible, length-10 Symlet wavelets are used. Since this wavelet has 5 vanishing moments, applying Eq. (4.3) with length-10 Symlet coefficients approximately results in the pre-smoothed 5th -order gradient of the image. The gradient using Symlet filter is a complicated highpass filter. For a 256-bit watermark, 128, 64 and 64 bits are embedded in the gradient fields at wavelet levels 3, 4 and 5, respectively. The gradient fields corresponding to each wavelet scale are then partitioned into blocks. Since each watermark bit is embedded in each block, the block size is obtained based on the number of bits to be embedded at each scale. To embed 128, 64, 64 watermark bits, we used the block sizes at gradient levels 3, 4 and 5 as 4 × 8, 4 × 4 and 2 × 2 respectively. To measure the similarity of the original image to the watermarked image, the PSNR is employed. The angular quantization step size ∆j (that indicates the watermark strength) is obtained separately for each image. Table 4.3 shows the optimum values of ∆j for each test image. It is seen that the quantization step size generally does not change much across the images. Probably due to the masking effect, the value of ∆ for the highly textured image Baboon is slightly larger. 78 The original and watermarked images shown in Fig. 4.8 illustrate that the watermark embedded with the proposed Gradient Direction Watermarking (GDWM ) method is imperceptible at PSNR=42 dB. Figure 4.8: (Upper row) Original test images “Peppers”, “Baboon”, “Barbara” and “Lena”. (Lower row) Watermarked test images using the proposed method, with a 256-bit watermark embedded and PSNR=42dB. GDWM 4.4.1 Comparison Between Single-scale GDWM and Multi-scale GDWM A major advantage of the proposed GDWM method is that we can increase the watermark capacity by embedding the watermark in multiple scales. To compare the single-scale version with the multi-scale version of GDWM , we average the results of embedding 100 different 256-bit pseudo-random binary watermarks in the images Pappers, Baboon and Lena. To simulate the single-scale GDWM (SS GDWM ), all 256 bits are embedded at wavelet scale 3, using blocks of size 4 × 4. In the multi-scale GDWM (MS GDWM), we embed 128, 64 and 64 bits at wavelet levels 3, 4 and 5, respectively. In both methods, we set the watermarking parameters such that a certain BER is obtained under a specific attack. To compare the original and the watermarked images, the SSIM metric is used due to its compatibility with the human visual system [107]. The mean SSIM (MSSIM) comparison results for Gaussian filtering, AWGN 79 Table 4.4: The SSIM comparison between the single-scale GDWM (SS GDWM) and multi-scale GDWM (MS GDWM). Image Method Peppers Peppers Baboon Baboon Lena Lena SS GDWM MS GDWM SS GDWM MS GDWM SS GDWM MS GDWM Gaussian Filter (7 × 7) (BER=0.02) 0.9853 0.9946 0.9974 0.9987 0.9935 0.9962 JPEG (Q = 10) (BER=0.13) 0.9918 0.9950 0.9991 0.9991 0.9944 0.9962 AWGN (σ = 10) (BER=0.02) 0.9918 0.9955 0.9993 0.9991 0.9944 0.9966 and JPEG compression attacks are shown in Table 4.4. Assuming that the JND threshold is set to MSSIM=0.98, we note that both methods yield imperceptible watermarks. However, at fixed values of the watermark robustness and capacity, MS GDWM yields better image quality of the watermarked image than that of the SS GDWM . Thus, by fixing the watermark robustness and perceptibility values, more bits can be embedded into the image by using the proposed MS GDWM. 4.4.2 Performance Comparison of GDWM and AQIM In this subsection, we compare the proposed GDWM method with the AQIM [74]. The Document-to-Watermark Ratio (DWR), Watermark-to-Noise Ratio (WNR) and the probability of error (PE ) are obtained based on the definitions given in [77]. The PE -WNR curve for AQIM is obtained using Monte Carlo simulations. To calculate the PE -WNR curves for GDWM, 1000 different realizations of a 256bit pseudo-random binary watermark are embedded in 20 different images. The probability of error is calculated using BER at the decoder side. Fig. 4.9 illustrates the BER results of AQIM and GDWM, with DWR=25dB. It is obvious that the proposed GDWM methods outperform the AQIM. 4.4.3 Robustness to Attacks To evaluate the robustness of the proposed scheme, each watermarked image is distorted by Gaussian filtering, median filtering, JPEG compression, Gaussian noise, salt & pepper noise and rotation. After the attacks, each watermark is extracted and is compared with the original watermark in terms of Bit Error Rate (BER ). The 80 0 10 −1 10 −2 PE 10 −3 10 −4 10 AQIM [27] GDWM0 GDWM −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 WNR (dB) Figure 4.9: The comparison of the performance of the AQIM and the proposed GDWM methods for AWGN attack (DWR=25dB). Table 4.5: The BER (%) results of GDWM under different types of attacks Image Method Peppers GDWM Amplitude Scaling (scale=2) 0 Baboon GDWM 0 0 Barbara GDWM 0 Lena GDWM 0 similarity measure BER Gaussian Filter 3×3 5×5 0 0.58 Median Filter 3×3 5×5 1.17 6.64 JPEG Compression (Q) 20 30 40 50 60 1.41 0.18 0.04 0 0 Rotation (θ) −0.5 0.5 37.47 35.89 0.96 5.03 19.75 1.41 0.62 0.19 0.07 0 38.73 36.42 0 0.13 1.10 8.14 1.69 0.16 0.05 0 0 38.44 39.90 0 0.17 0 7.10 1.65 0.41 0.22 0.10 0 36.86 36.37 is obtained by averaging over 100 runs with 100 different pseudo-random binary watermarks. The BER (%) results of the proposed GDWM methods, under different types of attacks are shown in Table 4.5. The message length is 256 bits. The parameters (p, q)opt for images Pappers, Baboon, Barbara and Lena are set to (2, 2), (2, 2), (2, 3) and (2, 3), respectively. The corresponding PSNR values for GDWM are 43.06dB, 43.29dB, 42.70dB, 43.54dB, respectively. Robustness Against Amplitude Scaling Attack The BER results of the GDWM scheme under amplitude scaling attacks are shown in Table 4.5. It can be seen that the proposed methods are extremely robust to this attack (i.e. BER=0). This is due to embedding the watermark in the angles of the gradient vectors. 81 Robustness Against Gaussian Filtering The watermarked image is filtered by Gaussian filters with different standard deviations and filter lengths. Table 4.5 shows the BER results of test images attacked by Gaussian filters with size W × W , where W ∈ {3, 5}. The standard deviation σ of each filter is set to W 6 . As expected, since the Gaussian filter changes only the magnitude of the gradient vectors, the proposed GDWM method are robust to this attack. Robustness Against Median Filtering The median filter being nonlinear, results in a much larger change in the direction of a gradient vector than the linear Gaussian filter. However, unlike the Gaussian lowpass filter which blurs the edges of an image, the median filter preserves the edge points. It has been shown that devising a watermarking scheme that is robust to median filtering is challenging [51], this is because the median filter might destroy the watermark severely. The BER results of our proposed GDWM method under median filtering attacks are shown in Table 4.5, where the median filters are implemented using 3 × 3 and 5 × 5 windows. It is clear that watermark detection after a median filtering attack is less efficient in highly textured images, such as Baboon. Robustness Against Lossy JPEG Compression JPEG compression is the most popular image compression standard for Internet applications. Thus, for multimedia watermarking applications, a watermark should be very robust to JPEG compression attacks. The BER results of the proposed GDWM method under JPEG compression attacks are shown in Table 4.5. The JPEG Quality Factor (QF) is varied from 20 to 60. It can be seen that the proposed methods are quite robust against JPEG compression, especially for the practical compression range of still images, i.e. QF ≥ 50. Robustness Against Rotation Attack The rotation attack is a practical but challenging attack for blind watermarking. Due to the bit-ordering errors in the watermarked image before and after rotation, 82 Table 4.6: The BER (%) results of the proposed GDWM method under AWGN and Salt & Pepper attacks. AWGN (σ) 20 30 13.47 25.92 Salt & Pepper (p) 0.01 0.02 0.04 0.11 0.26 1.57 Image Method Peppers GDWM 10 1.32 Baboon GDWM 1.28 12.48 26.28 0.54 1.07 3.77 Barbara GDWM 1.40 12.87 26.59 0.14 0.64 2.63 Lena GDWM 1.85 13.52 26.39 0.03 1.53 4.44 even a small image rotation could significantly increase the BER s of the extracted watermark bits. Table 4.5 shows the BER results of GDWM under rotation attacks, where each test image is rotated by −0.5◦ and 0.5◦ . It is noted that the proposed methods do not show great robustness to rotation (and translation) attacks. This is because the conventional wavelet transform is not invariant to rotation (and translation). However, the proposed scheme can be modified to increase the robustness to these attacks by using the complex wavelet (CW) or stationary wavelet (SW) transforms, though the gradient vectors of the image should be represented in terms of CWT or SWT coefficients instead. Robustness Against AWGN Additive white Gaussian noise (AWGN) is probably the most commonly used short-tailed noise for modeling communication channels and noisy environments. A desirable watermark should be robust to AWGN. Because the proposed method embeds the watermark bits in the significant gradient vectors, the effect of noise is significantly reduced. To evaluate the robustness of the proposed method, AWGN with standard deviation σ ∈ {10, 20, 30} (in the range of [0, 255]) is added to the watermarked image. The BER results are shown in Table 4.6. Robustness Against Salt & Pepper Noise Salt & pepper noise is the most commonly used long-tailed noise in image processing. Since such noise is not additive, it is hard to remove without incurring changes to the image itself. However, by applying the prefiltering proposed in Sec- 83 Table 4.7: The BER comparison between the proposed GDWM and MWT-EMD [12] under different types of attacks (Message length=64 bits, PSNR=42 dB) Image Method Peppers Peppers Baboon Baboon Lena Lena MWT-EMD GDWM MWT-EMD GDWM MWT-EMD GDWM 3×3 0 0 0 0 0 0 Gaussian Filter 5×5 7×7 9×9 1.56 N/A N/A 0 0 0 0 N/A N/A 0 0 0 3.13 N/A N/A 0 0 0 3×3 0 0 0 0 0 0 Median Filter 5×5 7×7 7.81 9.36 1.56 0 12.5 12.5 0.50 3.81 9.38 12.50 0 0.65 9×9 51.56 4.62 78.13 12.31 51.56 3.84 JPEG (Q = 20) 0 0.06 0 0 0 0 Salt & Pepper (p = 0.08) 2.51 0.40 3.34 0.40 2.67 0.28 Rotation (θ = 0.5) 40.63 22.87 45.31 20.81 43.75 20.87 tion 4.3, the results become acceptable. Table 4.6 shows the BER results of GDWM when salt & pepper noise is added to the watermarked images, with probability p ∈ {0.01, 0.02, 0.04, 0.06}. By investigating the BER results of the proposed CDWM methods under dif- ferent attacks, we note that the proposed GDWM is robust to salt and pepper noise, AWGN, Gaussian filtering, median filtering and JPEG compression attacks. The results show that GDWM provide almost similar robustness to amplitude scaling, Gaussian filtering, median filtering and JPEG compression attacks. For the AWGN and salt & pepper noise attacks, GDWM is shown to yield superior performances. 4.4.4 Comparison With Other Watermarking Methods In order to evaluate the performance of the proposed GDWM method, we compare it with four watermarking methods proposed in [104], [12], [5] and [48]. Methods [104] and [12] are chosen because they are the state-of-the-art methods in spread spectrum watermarking and [48] is chosen as one of the state-of-the-art methods in quantization based watermarking. Table 4.7 compares the BER results of the proposed GDWM method with MWTEMD [12] against Gaussian filtering, median filtering, JPEG compression, AWGN, salt & pepper noise and rotation attacks. As in [12], for each test image, we embed a 64-bit watermark with PSNR of 42 dB, where 32 bits are embedded in gra- dient vectors at level 4 and 32 bits are embedded in gradient vectors at level 5. The sizes of the blocks at wavelet levels 4 and 5 are set to 4 × 8 and 2 × 4, re- spectively. The mean structural similarity (SSIM) values of the watermarked images Peppers, Baboon and Lena are 0.9964, 0.9991 and 0.9984, respectively. The results clearly demonstrate that the proposed 84 GDWM method consistently outper- Table 4.8: The BER comparison between the proposed GDWM and Wang’s method [104] under different types of attacks (Message length=256 bits, PSNR=42 dB) Image Method Peppers Peppers Baboon Baboon Barbara Barbara Lena Lena Wang GDWM Wang GDWM Wang GDWM Wang GDWM Median Filter 3×3 29.35 1.17 31.65 5.03 24.95 1.10 30.80 0 JPEG Q = 11 26.10 10.68 16.95 9.86 16.45 8.64 29.80 8.64 AWGN σ = 10 1.25 1.32 1.30 1.28 1.45 1.40 1.45 1.85 S&P p = 0.01 2.00 0.11 1.95 0.54 2.25 0.14 2.45 0.03 Table 4.9: The BER comparison between the proposed GDWM method and Akhaee’s method [5] under different types of attacks (Message length=128 bits) Image Method Baboon Baboon Barbara Barbara Akhaee GDWM Akhaee GDWM Scaling (s = 0.8) 3.20 0 2.34 0 JPEG (Q = 20) 1.80 0 0.40 0 AWGN (σ = 20) 0.30 1.48 0.10 1.07 S&P (p = 0.05) 2.89 0.89 1.48 0.43 forms MWT-EMD under all the considered attacks. Also, it is worth noting that the results of GDWM are based on the averaged BER s obtained from embedding 100 different watermarks in each image. Our simulations show that by embedding the message “SYS Univ” only, the BER s of our GDWM for JPEG compressed im- ages “Peppers”, “Baboon” and “Lena” (with QF = 5) are respectively 4.68, 3.12 and 6.25, while the BER s obtained from MWT-EMD for the same images are 6.25, 4.69 and 6.25, respectively. In Table 4.8 we compare the proposed GDWM method with [104], for median filtering, JPEG compression, AWGN and salt & pepper noise attacks. As in [104], the watermark length in both methods is 256 bits and the PSNR of all the water- marked test images is 42 dB. The mean structural similarity (SSIM) values of the watermarked images Peppers, Baboon and Lena are 0.9961, 0.9991 and 0.9964, respectively. The results confirm that GDWM outperforms Wang’s watermarking method under the studied attacks. Table 4.9 shows the results of comparing 85 GDWM with the non-blind method Table 4.10: The BER comparison between the proposed GDWM method and VLQIM method [48] used for embedding the watermark in image Lena, under AWGN and JPEG compression attacks (Message length=128 bits, PSNR=45.70 dB) Method VLQIM GDWM 5 0 0 AWGN (σ) 20 35 10.16 23.44 2.34 20.31 JPEG Compression (Q) 4 10 16 20 37.5 3.91 0 0 32.03 6.25 0 0 proposed in [5]. In both methods, we embed a 128-bit pseudo-random binary message in test images Baboon and Barbara with PSNR s 39.53 dB and 36.63 dB, re- spectively. The mean SSIM values of the watermarked images for the GDWM are obtained as 0.9977 and 0.9932, respectively. Each watermarked image is distorted by Scaling, JPEG compression, AWGN and salt & pepper noise attacks. The BER results show that the proposed method outperforms [5] under all the studied attacks, except for the AWGN attack. Finally, we compare our method with Vector Logarithmic Quantization Index Modulation (VLQIM ), proposed by Kalantari and Ahadi [48]. The results of embedding the watermark in the image Lena, under JPEG compression and AWGN attacks, are shown in Table 4.10. In both methods, a 128-bit watermark is embedded with PSNR=45.70 dB. The mean SSIM value of the watermarked image for the GDWM is obtained as 0.9988. The BER results show that the proposed method is more robust than the VLQIM method, to additive white Gaussian noise (AWGN) attack, for different values of σ. Under JPEG compression attack, VLQIM is more robust than GDWM , only for the quality factors lying in the range [8, 12]. For the quality factors outside this range, the proposed method GDWM outperforms VLQIM . 4.5 Summary In this chapter, we present a gradient direction quantization-based watermarking scheme. The proposed method embeds the watermark bits in the direction (angle) of significant gradient vectors, at multiple wavelet scales. To embed the watermark in the gradient direction, we find the gradient vector in terms of the wavelet coefficients in subbands LH, HL and HH. The gradient angle is then quantized 86 by modifying the DWT coefficients that correspond to the gradient vector. To em- bed the watermark in each gradient angle, the Absolute Angle Quantization Index Modulation (AAQIM ) is proposed. To extract the watermark correctly, the decoder should be able to identify the gradient vectors that were watermarked and the embedding order (i.e. which bit was embedded in which vector). To solve these problems, we propose scrambling the positions of the gradient vectors uniformly over the wavelet transform of the image. Increasing the difference in the magnitude of the watermarked and the unwatermarked vectors was also proposed to help identify the watermarked vectors correctly. The simulation results demonstrate that the proposed method yields superior robustness to different types of attacks, in comparison with other state-of-the-art watermarking methods. 87 Chapter 5 Semi-blind Quality Estimation of Compressed Videos in the Image Derivative Domain 5.1 Introduction The automatic monitoring of the quality of transmitted media is important as it allows content providers to optimize their streaming services and bill the end users according to the received quality of service. Different image (and video) Quality Assessment (QA) methods have been proposed in the literature. These methods can be broadly classified into three categories, depending on whether none, some or all of the original image information is present at the receiver. These categories are called: no-reference (blind) methods, reduced-reference (semi-blind) methods and full-reference methods [110]. For real-time image quality evaluation over digital multimedia networks, only the no-reference or reduced-reference metrics can be used. This is because the absence of the reference image at the receiver side makes it impossible to directly compare the received image with the original one. Watermarking forms a promising no-reference or reduced-reference mechanism for image quality assessment [19, 101]. The watermarking methods that are proposed for quality monitoring share some basic ideas. The watermark is embed- 88 ded into the original image at the transmitter side and the image quality is estimated at any receiver side by comparing the embedded and the extracted watermarks. As the embedded watermark and the original image are subject to the same distortions introduced by the transmission channel, by comparing the quality of the embedded watermark with the original one, the distortions introduced in the image can be estimated. Towards this aim, a few watermarking-based quality measurement methods have been proposed in the last few years [17]. These Quality Assessment (QA) methods can be classified into two broad classes: Spread spectrum (SS) and quantization based methods. Spread spectrum (SS) based methods: These methods estimate the quality by inserting a pseudo-random watermark into the host image (or video), using the spread spectrum (SS) based watermarking approach [2, 10, 31, 32, 83]. These methods have been shown to be robust against many types of attacks. To evaluate the quality of MPEG -2 and MPEG -4 compressed videos, Campisi et al. [18, 19] use an additive SS-based watermarking method to embed a fragile watermark in the middle-high frequency coefficients in the DCT-domain. In this method, the watermark is embedded in each video frame. Carli et al. [21] have proposed embedding the watermark in the perceptually important areas of the video frames. The motion, contrast and color information of the video frames are used to calculate the perceptual importance of each area. Bossi et al. [13] use a semifragile SS-based watermarking method in the DCT domain, to estimate the quality of MPEG -2 compressed video. We have also proposed a blind image quality assessment scheme, based on the Watson’s Just Noticeable Difference (JND ) modulated spread spectrum watermarking [72]. To render the watermark more imperceptible, Farias et al. [39] have embedded a fragile watermark in the moving areas of the video. By using a spread spectrum watermarking method in the the MPEG -2 DCT domain, they could estimate the quality of compressed video at different bit rates. In another work [22], they propose a no-reference objective metric for assessing the quality of MPEG -2 com- pressed videos. The semi-fragile watermarks are embedded in the mid-frequency DCT coefficients of each video frame. Quantization-based (random-binning-like) methods: This type of methods estimate the quality of the image or video by embedding a watermark, using the 89 quantization-based watermarking approach [27, 57, 70]. In this approach, a set of features extracted from the host signal are quantized so that each watermark bit is represented by a certain quantized feature value. It has been shown that these methods yield larger watermarking capacity than SS methods [10]. Wang et al. [101, 102] assessed the quality of JPEG images and MPEG -2 compressed videos using the quantization-based watermarking of the DWT coefficients. By adjusting the watermarking parameters using automatic control techniques, it was shown that the watermark Bit Error Rate (BER ) is correlated with the compressed image or video quality. Most of the existing watermarking-based QA (quality assessment) methods have some of the following common limitations: 1. The video QA methods that embed the watermark in each of the video frames suffer from the temporal flicker artifact. This artifact arises when changes in the watermark between consecutive video frames are perceptual [111]. 2. The fragile watermark used by some methods is only appropriate for estimating the quality of images that are slightly distorted. 3. The majority of these methods can not localize the image distortion [45]. 4. The SS-based QA methods suffer from the host interference problem [33]. This is because the host signal itself acts as a source of interference when extracting the watermark, and this may reduce the watermark detector’s performance. 5. For the case when the distortion introduced in each frame results from compressing/decompressing the video, to estimate the image/video quality for existing quantization-based QA methods, separate sets of values of quantiza- tion parameters or quality assessment parameters used in a frame should be sent along with each frame. This will unfortunately increase the communication overhead. In this chapter, we propose a new watermarking-based QA method, for estimat- ing the quality of videos distorted by H.264/AVC compression/decompression. To 90 address each of the 5 problems mentioned above, we embed the watermark as follows: 1. To address the problem of temporal flicker impairment (i.e. problem 1 above), the watermarks are inserted in the intra-coded frames (I-frames) only. Since no watermark is embedded in the other frames of a Group of Pictures (GOP), the temporal flicker impairment is highly reduced. 2. To address problem 2, we obtain different scaled versions of each I-frame. We then embed the watermark bits in the gradient vectors of each scaled version of the image. The watermarks embedded in small scaled versions are used to estimate mild distortions, while those embedded at the versions of the image reduced by higher ratios (i.e. at larger scales) are used to measure severe distortions. This is because the watermark bits that are embedded at larger scales are more robust than those embedded in smaller scales. Therefore, using the multiscale watermark embedding technique, the frame quality can be estimated for small as well as large distortions. 3. To address problem 3, i.e. to localize the distortion at each scale, each watermark bit is embedded into the gradient vectors of a known block. To insert each watermark bit into each block, the Spread Transform Dither Modulation (STDM ) [27] method is employed. The STDM takes advantage of the high watermarking capacity of the quantization-based watermarking approach and the robustness of the SS approach. 4. The use of the STDM method also takes care of problem 4 above (as the host-interference problem does not arise when this method is used). 5. The Bit Error Rate (BER ) between the embedded and the extracted watermarks is used as a metric to assess the quality of the video. To estimate the video quality, our method sends parameters of the average quality assessment curve (i.e. the PSNR-BER or SSIM-BER) of the whole video. The receiver calculates the BER , and then the PSNR (or SSIM ) is estimated us- ing this PSNR-BER (or SSIM-BER) curve. By sending only the parameters of the quality assessment curve for the whole video instead of those for ev91 Figure 5.1: Illustration of STDM. Codebooks U0 and U1 are represented by solid (or ‘0’) and dashed (or ‘1’) lines, respectively. ery video frame, and by setting the quantization parameters at certain fixed values for all the frames, problem 5 above is solved. A brief overview of the Spread Transform Dither Modulation (STDM ) method is given in the Section 5.2. The proposed watermarking-based video quality assessment method is described in section 5.3. The results are shown and compared in section 5.5 and conclusions are given in section 5.6. 5.2 Background 5.2.1 Spread Transform Dither Modulation (STDM) The STDM is a watermarking method that combines the high watermarking capacity of the quantization-based watermarking approach and the robustness of the SS approach [27]. Let f denote a K-dimensional vector of image features to be watermarked. As shown in Fig. 5.1, the STDM embeds a watermark bit b ∈ {0, 1} in f by quantizing the inner-product ρ between f and a spreading sequence s [11]. When the reference sequence s has a unit norm, ρ would be the magnitude of the 92 projection of f on s: K ρ = Pf = f · s = (f )i (s)i (5.1) i=1 where P f denotes the projection of f on s. To quantize the correlation value ρ, the STDM method uses the Dither Modulation (DM ) technique [27]. As shown in Fig. 5.1, the DM associates the following two codebooks U0 and U1 to watermark bits 0 and 1, respectively: U0 = {k∆ + α, k ∈ Z} ∆ U1 = k∆ + + α, k ∈ Z 2 (5.2) (5.3) where α is a parameter used as a secret key, k denotes the index of each element in the codebook and ∆ is a positive real number called the quantization step size. The watermark bit b ∈ {0, 1} is inserted into ρ using ρw = arg min |ub,k − ρ| ub,k ∈Ub (5.4) where ub,k are the elements of the codebook Ub and ρw denotes the watermarked version of ρ. The watermarked feature vector f w is obtained as f w = f − ρs + ρw s (5.5) To extract the watermark bit at the decoder, first the correlation between the received watermarked feature vector f ′ and the spreading sequence s (i.e. ρ′ = f ′ · s) is calculated. The watermark bit b∗ is then obtained as b∗ = arg min min |ub,k − ρ′ | b∈{0,1} ub,k ∈Ub 93 (5.6) 5.3 The Proposed Video Quality Assessment Method As we have shown in [71], the watermark bits that are embedded at large wavelet scales of an image are more robust (to channel distortions) than those embedded at small scales. This means that small distortion may not affect the watermark embedded in wavelet coefficients of the large wavelet scales. To be able to efficiently estimate the image quality for a wide range of distortions (including small distortions), we should therefore embed not only the robust watermark bits, but also the fragile and semi-fragile bits all in one image. In this case, the bits embedded in the coefficients of the small, medium and large scales, are considered to correspond to fragile, semi-fragile and robust watermark bits, respectively. In the proposed method, fragile, semi-fragile and robust watermark bits are inserted at wavelet scales j = 1, 2 and 3, respectively. A scrambling technique is also proposed to increase the watermark robustness in case large distortions are to be estimated. 5.3.1 Watermark Embedding Method The proposed watermark embedding scheme is shown in Fig. 5.2. The embedding steps are summarized as follows: Step 1 The I-frames of the video sequence are selected for embedding the watermark. As will be shown in subsection 5.5.4, inserting the watermarks into the I-frames reduces the temporal flicker artifact. Equally important is that this also results in a faster video watermarking and quality assessment than when every frame is watermarked. Step 2 The 2-dimensional discrete wavelet coefficients of the image are obtained for different wavelet scales. Step 3 At each wavelet scale, the version of the image corresponding to this scale is considered. For example, if the image is of size 512 × 512, the scaled ver- sions of the image corresponding to wavelet scales j = 1, 2 and 3 are of size 256 × 256, 128 × 128 and 64 × 64, respectively. The gradient vectors of each scaled version are obtained in terms of the LH, HL and HH wavelet coeffi- 94 Figure 5.2: The block diagram of the proposed gradient based STDM watermark embedding scheme. cients of the corresponding scale. This is done using Multi-scale Derivative Transform (MSDT ) (see Eq. (3.15)). Step 4 At each scale, the locations of the gradient vectors of the image are uniformly scrambled, using the proposed scrambling algorithm discussed in subsection 5.3.2. It has been shown that scrambling the gradient vectors leads to a better capacity-imperceptibility-robustness tradeoff [73, 112]. Thus, the scrambling technique yields a more robust (and more imperceptible) watermark that can be used to estimate larger image distortions. The details of the scrambling method are given in subsection 5.3.2. Step 5 To embed the bits of the watermark in a frame, the gradient field at each scale is partitioned into blocks of size K1 × K2 . Since one watermark bit is embedded in each block, the number and size of the blocks depend on the number of bits to be embedded. Step 6 In each block, one bit of the watermark is embedded in all the gradient vec- tors of that block, using the Spread Transform Dither Modulation (STDM ). To embed a watermark bit in a host vector, the 95 STDM quantizes the corre- ∆ ∆ ρv ρh Figure 5.3: The 2-D quantization lattice used in the 2-D STDM method. lation between the host vector and a spreading sequence of the same size, using a 1-D quantization lattice [11]. In the proposed method, the correlation of the gradient vectors g = [gh , gv ]T of a block (of size K1 × K2 ) and the spreading sequence s (of size K1 × K2 ) is obtained. This results in a 2-D correlation vector ρ = [ρh , ρv ]T . To quantize ρ, the 2-D quantization lattice (shown in Fig. 5.3) is used to quantize ρ. In Fig. 5.3, the lattices corresponding to bits 0 and 1 are marked by × and ◦, respectively. Step 7 The locations of watermarked gradient vectors at each wavelet scale are descrambled, using the descrambling method associated with the scrambling method in step 4. The descrambling method is explained in subsection 5.3.2. Step 8 For each scale used in step 3, the wavelet coefficients of the watermarked image are obtained in terms of their corresponding watermarked gradient vectors, using the inverse MSDT . Step 9 Finally, the watermarked image is obtained by applying the inverse wavelet transform to the watermarked wavelet coefficients. 96 (a) (b) Figure 5.4: (a) Gradient field of image Lena, at wavelet scale 4. (b) Scrambled gradient field. 5.3.2 Enhancing the Capacity-Imperceptibility-Robustness Tradeoff by Scrambling the Gradient Fields As mentioned in step 5, a watermark bit is embedded in a block of the gradient vector field corresponding to the different scaled versions of the image. As shown in [73], a watermark bit that is embedded in a block that has one or more significant gradient vectors (i.e. gradient vectors with large magnitudes) gives a better robustness-imperceptibility tradeoff than a watermark bit embedded in a block that does not contain significant gradient vectors. In natural images, some parts of the image may have many significant gradient vectors, while other parts may have none. Thus, the watermark bits embedded in parts of the image with no significant gradient vectors will be less robust (and more perceptible) than those bits embedded in other parts. One solution to this problem is to uniformly scramble the locations of the gradient vectors so that every block contains at least one significant gradient vector. Thus, for the same robustness level, watermark scrambling technique yields less perceptible watermarks. Fig. 5.4(a) shows the gradient vectors of the image Lena of size 256 × 256, at wavelet level 4. The size of the gradient field is 32×32. As shown in Fig. 5.4(b), scrambling the positions of the gradient vectors increases the probability that each block has at least one significant gradient vector. The scrambling method should be a geometric transform that results in a pseudo97 Ensure: y = Scramble(x, S) N ← length(x) n ← Rand(N, S) {Generates a pseudo-random sequence of } {length N , using seed S.} p ← SortInd(n) {First, sorts n to form ns , and then returns } {the corresponding index of each element } {of ns in the unsorted sequence n. } y[p] ← x (a) Ensure: x = Descramble(y, S) N ← length(y) n ← Rand(N, S) {Generates a pseudo-random sequence of } {length N , using seed S.} p ← SortInd(n) {First, sorts n to form ns , and then returns } {the corresponding index of each element } {of ns in the unsorted sequence n. } q ← SortInd(p) x[q] ← y (b) Figure 5.5: (a) The proposed algorithm used to scramble the vector x, using the seed S. (b) The proposed descrambling algorithm. uniform distribution of the locations of the significant gradient vectors. To scramble a gradient field of size N1 × N2 , we first reshape the field into a row of gradient vectors of length N1 N2 , using raster-scan. The row of vectors is then scrambled using our proposed algorithm, represented in Fig. 5.5(a). Finally, the scrambled row is reshaped to a gradient field of size N1 × N2 , using the inverse raster-scan. The corresponding proposed descrambling algorithm is shown in Fig. 5.5(b). 5.3.3 Watermark Extraction Method The watermark bits are extracted following the reverse embedding steps, as shown in Fig. 5.6. The gradient vectors of each I-frame are first obtained using the forward gradient transform. The gradient fields at different image scales are then partitioned into non-overlapping blocks of size K1 × K2 . The gradient fields are scrambled using the algorithm shown in Fig. 5.5(a). Each watermark bit is then extracted 98 Figure 5.6: The block diagram of the proposed gradient based STDM watermark extraction scheme. from each block of gradient vectors, using the STDM decoder (see Eq. (5.6)). Note that since the inner-product of the 2-D vector field of the received watermarked gradient vectors g′ = gh′ + i gv′ with the 2-D spreading sequence s yields a 2-D correlation vector ρ′ = ρ′h + i ρ′v , the 2-D QIM lattice, shown in Fig. 5.3, is used to decode the watermark bits. 5.3.4 Watermark Bit Error Rate Assessment To estimate the amount of degradation introduced in an I-frame of a watermarked video, we first obtain the bit error rate (BERj ) of the binary watermarks for each scale j. BERj is the sum of the number of watermark bits whose extracted values are not the same as their embedded values. This number is then normalized by dividing it by the the total number of watermark bits embedded at scale j. Our proposed quality metric is the average of the BER s of watermark bits embedded at the different scales used, i.e. BER = 1 J J BERj j=1 99 (5.7) (a) (b) (c) Figure 5.7: The watermarks bit errors at wavelet scales j = 1 (left), j = 2 (middle) and j = 3 (right) of an image distorted by H.264/AVC compression/decompression, using the quantization parameters: (a) QP=10, (b) QP=20 and (c) QP=30. where J denotes the number of scales used to embed the watermarks. To obtain the watermark BER val- ues over all the watermarked frames of the video. Note that we expect the BER BER of the whole video, we average the to increase as the amount of distortion introduced into the video increases. It is well known that by increasing the value of the quantization parameter QP of the video encoder, more distortion is introduced into the image. The quantization parameter controls the compression ratio and the video quality. The higher the QP value, the larger the compression ratio and frame distortion. Fig. 5.7 illustrates the locations of the watermark bit errors for the case where the watermark bits are embedded in the gradient vectors of an image using three wavelet scales j = 1, 2, 3. The leftmost, middle and rightmost parts of Fig. 5.7 correspond to scales 1, 2 and 3, respectively. The pixels with bit errors are represented by white. Fig. 5.7(a), Fig. 5.7(b) and Fig. 5.7(c) show the watermark bit errors after compressing/decompressing the watermarked image by H.264/AVC, using the quanti100 zation parameters QP = 10, 20, 30, respectively. In this case, BERj is equal to the normalized number of white pixels at scale j. As seen, by increasing the quantization parameter QP (i.e. increasing the amount of image distortion), the BER increases at each scale. From this figure, we can also conclude that the bit errors decrease as the scale increases (i.e. watermark robustness increases by increasing the scale). This will be discussed in more detail in subsection 5.5.1. 5.4 The Quality Assessment Method Our proposed video Quality Assessment (QA) method estimates the quality of the video frames using the PSNR (or SSIM ) metrics. The quality scores are obtained as a function of the watermark steps: Step 1 The QA curve (i.e. BER . BER The video quality is assessed by the following vs PSNR or BER vs SSIM) is plotted for each video frame before transmission. To obtain the QA curve, the video is distorted. In our experiments, we used H.264/AVC with different values of the Quantization Parameter (QP) to obtain different distortion levels. The quantization parameter controls the compression ratio and the video quality. The higher the QP value, the larger the compression ratio and frame distortion. To obtain the frame QA curve, we compute the values of the PSNR and the BER for different values of the QP. The video QA curve is then obtained by averaging the frame QA curves. The QA curves corresponding to each video frame and the average QA curve are shown in Fig. 5.8. The vertical error bars in the average QA curve indicate the variance of the QA curves in the video frames. Step 2 The average QA curve is sent to the receiver along with the video. By sending only the QA curve for the whole video instead of the QA curve for every video frame, the communication overhead is highly reduced. To further save the transmission bandwidth, the video QA curve is approximated by a linear function. Thus, only the slope and y-intercept of the linear curve need to be transmitted. It goes without saying that the approximation error depends on the linearity of the average QA curve. A linear QA curve results in a small approximation 101 50 BERHigh BER(%) 40 30 20 10 BERLow 0 30 35 40 45 P SN R(dB) 50 55 60 Figure 5.8: An example of an average PSNR-BER quality assessment curve. The dots represent the quality curves of each watermarked frame. The black curve is obtained by averaging the quality curves of the watermarked video frames. error, while a non-linear curve results in a large approximation error and thus a large quality estimation error. Step 3 At the receiver side, the quality measure (i.e. mated using the reconstructed QA PSNR or SSIM value) is esti- curve and the BER of the extracted water- mark. Different QA curves depend on the values of the watermark block size K1 ×K2 , quantization step size ∆ and the scale j used to embed the watermark. To find the values of K1 , K2 , ∆ and j that lead to reliable QA curves, we quantify the desired characteristics of a QA curve by defining a Merit Score (MS). The MS is obtained based on the following characteristics of a QA curve: Distortion range The QA curve should be able to estimate the image quality for a large range of distortions (i.e. both low and high PSNR values), i.e. the MS of the QA curve should be proportional to the estimation interval range. M S ∝ RP SN R where R denotes the range of the possible pressed/decompressed watermarked frame. 102 (5.8) PSNR (or SSIM ) of the com- Linearity of the QA curve It is not desirable to operate in regions of the QA curve where the PSNR is sensitive to extremely small changes in the BER such as the saturated regions in the top left or bottom right of Fig. 5.8 (i.e. when 0 ≤ BER < BERLow or BERHigh < BER ≤ 50%). In these regions, a small change in the watermark BER would result in a large PSNR To have less saturation, one way is to desire the curve to be as linearly QA estimation error. as possible. As mentioned in step 2 above, the linearity of the QA curve also saves the channel bandwidth by reducing the side information needed to be sent along with the video. To measure such a linearity of the QA curve, we calculate the correlation coefficient between the BER and the PSNR values. Thus, the merit score of the QA curve should be proportional to the correlation coefficient ρP SN R,BER , i.e. M S ∝ ρP SN R,BER (5.9) Note that to calculate the correlation coefficient ρP SN R,BER , only the BER values in the range [BERLow , BERHigh ] are used. Consistency of the QA curve To reduce the amount of side information (and thus to save the bandwidth), only the average QA curve could be transmitted with each video. Thus, at the receiver side, the quality of each frame is estimated using only the average QA curve (see Fig. 5.8). Using one QA curve reduces the amount of side information needed to be sent to the receiver. However, it would unfortunately increase the quality estimation error. This error is highly correlated with the standard deviation of the QA curve. To keep the estimation error low, the standard deviation of the QA curve (i.e. the standard deviation of the BER corresponding to each PSNR ) should be as small as possible. This can be written as MS ∝ 1 1 + σ BER (5.10) where σBER denotes the standard deviation of the BER at a certain PSNR and σ BER indicates the average of σBER over different 103 PSNR s. Figure 5.9: First frames of the original (top row) and the watermarked (bottom row) test videos “Suzie”, ‘Carphone”, ‘Foreman” and “Mobile”. The PSNR of the watermarked frames is fixed at P SN Rw = 46.80 dB. Combining Eqs. (5.8), (5.9) and (5.10), we define the follows: MS = Note that in Eqs. (5.8-5.11), BER QA merit score (MS) as RP SN R . ρP SN R,BER 1 + σ BER (5.11) is limited to the range [BERLow ,BERHigh ]. The bigger the MS value, the more efficient the QA curve will be for estimating the frame quality. Other factors besides the ones mentioned above may also be important for assessing the video quality. This includes the spatial and temporal localization of the distortion and the consistency of the QA curve for different types of distortions. 5.5 Experimental Results In this section, the performance of the proposed QA method is experimentally evaluated when different embedding parameters are used. Different video sequences that are compressed/decompressed with the H.264/AVC codec are studied. The tests are performed on the QCIF (176×144) 4:2:0 sequences “Suzie”, “Carphone”, “Foreman” and “Mobile”, at sampling rate of 30 frames per second (fps). The watermarks are embedded in the gradient vectors of the luma (Y) component of each I-frame, at wavelet scales j = 1, 2, 3. To obtain the gradient vectors, the Haar-FD WGT matrix is used. At each gradient scale, the watermark bits 104 are embedded in the blocks of gradient vectors, using the STDM method. Zero- mean white Gaussian pseudo-random noise is used as the spreading sequence of the STDM method. The value of the quantization step size ∆j at each scale is chosen such that the watermark is imperceptible and the QA merit score (MS) is maximized. Before distorting the video by H.264/AVC, we measure the watermark invisibility, using the P SN Rw (or SSIMw ). The invisibility of the watermark is guaranteed by keeping the P SN Rw and SSIMw of the watermarked videos larger than 46 dB and 0.9950, respectively. The first I-frame of the original and the corresponding watermarked test videos are shown in Fig. 5.9. As seen, the watermark is imperceptible in all the test video frames. To obtain the H.264/AVC compressed video, the JM 18.2 reference software is used with the following configuration: ProfileIDC=77 (main profile), LevelIDC=40 and IntraPeriod=10. 5.5.1 The Effect of Quantization Step Size on QA Curves In this subsection, we investigate the effect of ∆j , the size, on the QA STDM quantization step curve when the watermark is inserted in only one of the J scaled versions of the image. Figs. 5.10(a), 5.10(b) and 5.10(c) show the QA curves of the “Foreman” video sequence when the watermark is inserted at only one of the scales j = 1, 2, 3, respectively. For each curve, the P SN Rc (the PSNR of the video after introducing the distortion) and the watermark BER are obtained by changing the value of the quantization parameter (QP) in the range [2,44]. At each scale j, the watermarks are embedded using different values of quantization Step Size ∆j . As seen, by increasing the quantization step size (i.e. increasing the image distortion as a result of embedding the watermark) at each scale, the QA curve becomes more suitable for estimating the smaller P SN Rc values (i.e. the ones with larger distortions). This is because a large ∆j renders the watermark more robust to large distortions. However, a large ∆j may yield a more perceptible watermark. Comparing the QA curves at scales j = 1, 2, 3 shows that at a fixed P SN Rw and a fixed P SN Rc , the BER decreases as j increases. That is using larger scales 105 “Foreman” Video Sequence, H.264/AVC, j = 1 50 BER1 (%) 40 30 20 10 0 25 ∆ = 0.025, P SN Rw ∆ = 0.050, P SN Rw ∆ = 0.075, P SN Rw ∆ = 0.100, P SN Rw 30 = 61.89dB = 55.88dB = 52.33dB = 49.82dB 35 40 45 50 55 60 50 55 60 P SN Rc (dB) (a) “Foreman” Video Sequence, H.264/AVC, j = 2 50 BER2 (%) 40 30 20 10 0 25 ∆ = 0.05, P SN Rw ∆ = 0.10, P SN Rw ∆ = 0.15, P SN Rw ∆ = 0.20, P SN Rw 30 = 61.88dB = 55.86dB = 52.38dB = 49.80dB 35 40 45 P SN Rc (dB) (b) “Foreman” Video Sequence, H.264/AVC, j = 3 50 ∆ = 0.1, P SN Rw ∆ = 0.2, P SN Rw ∆ = 0.3, P SN Rw ∆ = 0.4, P SN Rw BER3 (%) 40 = 61.92dB = 55.79dB = 52.43dB = 49.87dB 30 20 10 0 25 30 35 40 45 50 55 60 P SN Rc (dB) (c) Figure 5.10: The PSNRc -BER quality assessment (QA) curves obtained for the “Foreman” video sequence at gradient scales j = 1, 2, 3, using different values of the quantization step size ∆j . P SN Rw and P SN Rc denote the PSNRs of the watermarked video before and after H.264/AVC compression/decompresion, respectively.106 “Foreman” Video Sequence, H.264/AVC, P SN Rw = 55.88dB, j = 1 50 BlockSize = [3, 2], ∆ = 0.05, σ = 1.24 BlockSize = [6, 4], ∆ = 0.10, σ = 2.24 BlockSize = [12, 8], ∆ = 0.20, σ = 3.82 BER1 (%) 40 30 20 10 0 25 30 35 40 45 50 55 60 P SN Rc (dB) Figure 5.11: The PSNRc -BER quality assessment (QA) curves obtained for the “Foreman” video sequence by embedding the watermark bits in blocks of size [3, 2], [6, 4] and [12, 8]. the watermarks embedded at large scales are more robust than those embedded at small scales. Thus, the QA curves obtained using larger scales are more suitable for estimating the quality of highly distorted videos, while the QA curves obtained at smaller scales are more suitable for estimating the quality of videos with small distortions. Therefore, we use multiple scales to embed the watermark. 5.5.2 The Effect of Gradient Block Size on QA Curves In this subsection, we investigate the effect of changing the watermarking block size (i.e. K1 × K2 ) on the QA curve. Fig. 5.11 shows the QA curves obtained using the blocks of size [3, 2], [6, 4] and [12, 8]. As an example, only the curves corresponding to scale j = 1 are shown. The P SN Rw is fixed at 55.88dB. As seen, a large block size yields smaller BER s, i.e. a more robust watermark. How- ever, higher robustness is obtained at the expense of less image distortion localization [45]. Besides, as shown in Fig. 5.11, a large block size results in a higher standard deviation σ of the QA curve (see Eq. (5.10)). This is probably because the number of blocks (and thus the watermark bits) used to calculate the BER decreases as the block size increases. Thus, the amount of uncertainty in calculating the BER is higher when larger block sizes are used. To quantitatively compare the performance of the QA curves corresponding to each block size, we use the Merit Score (MS) (see Eq. (5.11)). To calculate the MS, 107 we limit the BER to the interval 1% − 49%. The MS values of the PSNRc -BER curves, for the blocks [3, 2], [6, 4] and [12, 8] are 4.06, 2.58 and 2.11, respectively. Therefore, [3, 2] is the best block size for the proposed QA method. 5.5.3 Multi-scale Watermarking for Quality Assessment In this subsection, we evaluate and compare the performance of the proposed QA scheme in two scenarios: 1) The watermark is embedded in the gradient vectors of only one of the scaled versions of the image; 2) The watermark is embedded in the gradient vectors at multiple scales. As seen in subsection 5.5.1, embedding the watermark at one single scale is only useful in evaluating the image quality for a small range of distortions. The QA curves corresponding to small scales could be used to estimate the small dis- tortions, whereas the QA curves corresponding to large scales could be used to estimate large distortions. Our solution to the problem of assessing the quality for a large range of distortions, is to take the average of the QA curves at scales j = 1, 2, 3, by averaging the BERj s at each P SN Rc (see Eq. (5.7)). The PSNRc -BER curves obtained for four different test video sequences at scales j = 1, 2, 3 are shown in Fig. 5.12. For each video sequence, the average PSNR of the watermarked I-frames (before introducing the distortion) is fixed at P SN Rw = 46.85dB. In all the videos, the quantization step sizes at scales j = 1, 2, 3 are set to ∆1 = 0.0550, ∆2 = 0.15 and ∆3 = 0.4250, respectively. This parameter setting results in the highest merit score for each video sequence. As seen, the average PSNRc -BER curves (marked by dashed lines) behave more linearly than each PSNRc -BERj curve. The MS values of the test video sequences are given in Table 5.1. The M S1 , M S2 and M S3 denote the MS values of the QA curves at scales j = 1, 2, 3, respectively, and M Sj=1,2,3 is the M S of the average QA curve. As seen, the average PSNRc -BER curves give higher MS scores than each individual QA curve at each single scale. We also evaluate the efficiency of the proposed method in estimating the SSIM of each frame. SSIM is employed due to its compatibility with the human visual system [107]. Fig. 5.13 shows the SSIMc -BER curves of the “Foreman” video sequence. As seen, the average SSIMc -BER curve (marked by dashed line) is not 108 “Suzie” Video Sequence, H.264/AVC, P SN Rw = 46.78dB 50 ∆1 = 0.055, j = 1 ∆2 = 0.150, j = 2 ∆3 = 0.425, j = 3 j = 1, 2, 3 BER(%) 40 30 20 10 0 25 30 35 40 45 50 55 60 P SN Rc (dB) “Carphone” Video Sequence, H.264/AVC, P SN Rw = 46.84dB 50 ∆1 = 0.055, j = 1 ∆2 = 0.150, j = 2 ∆3 = 0.425, j = 3 j = 1, 2, 3 BER(%) 40 30 20 10 0 25 30 35 40 45 50 55 60 P SN Rc (dB) “Foreman” Video Sequence, H.264/AVC, P SN Rw = 46.86dB 50 ∆1 = 0.055, j = 1 ∆2 = 0.150, j = 2 ∆3 = 0.425, j = 3 j = 1, 2, 3 BER(%) 40 30 20 10 0 25 30 35 40 45 50 55 60 P SN Rc (dB) “Mobile” Video Sequence, H.264/AVC, P SN Rw = 46.85dB 50 ∆1 = 0.055, j = 1 ∆2 = 0.150, j = 2 ∆3 = 0.425, j = 3 j = 1, 2, 3 BER(%) 40 30 20 10 0 25 30 35 40 45 50 55 60 P SN Rc (dB) 109 Figure 5.12: PSNRc -BER curves obtained for the video sequences “Suzie”, “Carphone”, “Foreman” and “Mobile” at scales j = 1, 2, 3. The average PSNRc BER curves are marked by dashed lines. Table 5.1: MS values of the PSNRc -BER curves, obtained to estimate the quality of the test video sequences after H.264/AVC compression. Suzie Carphone Foreman Mobile M Sj=1 2.86 3.42 3.35 3.87 M Sj=2 2.04 2.82 2.55 2.42 M Sj=3 1.21 1.98 1.39 1.46 M Sj=1,2,3 7.24 8.17 8.72 11.76 “Foreman” Video Sequence, H.264/AVC, P SN R = 46.86dB 50 BER(%) 40 30 20 10 0 0.8 ∆1 = 0.055, j = 1 ∆2 = 0.150, j = 2 ∆3 = 0.425, j = 3 j = 1, 2, 3 0.82 0.84 0.86 0.88 0.9 SSIMc 0.92 0.94 0.96 0.98 1 Figure 5.13: SSIMc -BER curves of the video sequence “Foreman” at scales j = 1, 2, 3. The average SSIMc -BER curve is marked by the dashed line. as linear as the average PSNRc -BER curve shown in Fig. 5.12. However, the QA curve is monotonic and thus can be converted to a linear curve using a nonlinear regression technique. To quantitatively compare the performance of the QA curves, we use the results of MS metric for the four video sequences “Suzie”, “Carphone”, “Foreman” and “Mobile” that are given in Table 5.2. As seen, the “Mobile” video sequence has the best MS value. This is probably because of the large amount of spatial details and object motion in this sequence. “Suzie” has the lowest MS among the tested video sequences. 5.5.4 The Effect of GOP Length on Temporal Flicker Impairment In this subsection, we investigate the effect of varying the number of frames in the Group of Pictures (GOP) on the temporal flicker effect. We then find a GOP length for our proposed QA method, by taking into account both the flicker effect and 110 Suzie Carphone Foreman Mobile M Sj=1 0.0067 0.0063 0.0067 0.0023 M Sj=2 0.0146 0.0123 0.0169 0.0077 M Sj=3 0.0180 0.0246 0.0220 0.0246 M Sj=1,2,3 0.0503 0.0432 0.0601 0.0716 Table 5.2: MS values of the SSIMc -BER curves, obtained to estimate the quality of the test video sequences after H.264/AVC compression. temporal localization. Temporal localization of the video quality is the ability to estimate the quality at small time intervals. The temporal flicker effect results from the perceptual changes of the watermark between consecutive video frames. To quantitatively measure the temporal flicker for each watermarked test video, the metric proposed in [111] is used. Fig. 5.14 shows the average flicker curves of the test videos after inserting the watermark into the I-frames. As seen, by increasing the GOP length NGOP , the flicker is reduced in all the test videos. However, the larger the value of NGOP , the less the temporal localization will be. Thus, the NGOP value should be chosen based on the tradeoff between the temporal flicker and temporal quality localization. The examination of the F licker-NGOP curves in Fig. 5.14 shows that the flicker metric does not change much in the range NGOP ≥ 10. This leads us to the conclusion that NGOP = 10 is an appropriate length for the group of pictures, in terms of both keeping the temporal flicker small and the temporal localization high. Note that if NGOP < 10, to decrease the flicker effect, the watermark strength (i.e. the quantization step size) should be reduced. This however results in a less robust watermark which is only appropriate for estimating the quality when the distortion is small. 5.6 Summary In this chapter, a novel reduced-reference method for assessing the quality of H.264/AVC compressed video is proposed. The proposed method embeds known semi-fragile watermarks into the I-frames of the video sequence. Based on the watermark bit errors introduced by H.264/AVC compression/decompression, the 111 0.4 Foreman Suzie Carphone Mobile 0.35 Flicker Metric 0.3 0.25 0.2 0.15 0.1 0.05 0 2 4 6 8 10 NGOP 12 14 16 18 20 Figure 5.14: The Flicker metric vs. the GOP length (NGOP ) curves, obtained for the watermarked test videos “Suzie”, “Carphone”, “Foreman” and “Mobile”. quality of each frame is estimated at the receiver side. The watermark bits are embedded in the multi-scale gradient domain, using two-dimensional spread transform dither modulation (2-D STDM ). The STDM method is used because of its high watermarking capacity-robustness-imperceptibility tradeoff. The quality assessment method finds the BER of the watermark bits at the re- ceiver side and estimates the video quality (PSNR or SSIM) using the QA curves. To find the effectiveness of the quality curves in estimating the quality and to find the best watermarking parameters, a new merit score (MS) is proposed. The linearity of the quality assessment curve, the possible estimation range and the consistency of the curve for all the video frames are the three factors that are taken into account in the calculation of the proposed merit score. The experimental results show that embedding the watermark in the multiscaled versions of the image yields better merit score than embedding in one scale only. The watermark embedded at small gradient scales is used to estimate the small distortions, while the the watermark embedded at large scales is used to estimate the large distortions. It is also shown that embedding the watermark using the GOP length of 10 yields a good tradeoff between the temporal flicker and the temporal localization. In the future, we plan to estimate the video quality for other types of channel distortions, such as AWGN noise and packet loss. Some of the challenges of this 112 project are: 1) to find out the type of the distortions affecting the video; 2) to find a method that can only estimate the quality of a specific type of distortion and is robust to other types of distortions; and 3) to estimate the quality in terms of other Human Visual System (HVS) based video quality metrics, such as VQM [80] and MPQM [100]. 113 Chapter 6 Conclusions and Future Work 6.1 Research Significance and Contributions Estimation of the first and higher-order image derivatives is an essential task in many image processing and computer vision applications. The aim of this thesis is two folds. The first is to address the main challenges faced by image derivativebased methods and the second is to apply these methods on real life problems. Towards that aim, the objectives of this thesis are: 1. To accurately estimate both the gradient direction and magnitude in noisy images, specially in noisy color images. 2. To efficiently compute the derivatives at multiple scales of an image. 3. To find an invertible image derivative transform that is able to map the image derivatives from derivative domain to image domain. 4. To show that this transform can be applied to image watermarking application. 5. To use this transform in video quality monitoring application. In this thesis, we deal with these problems and propose novel methods to solve them. The main contributions of this thesis are itemized as follows: 114 1. A method for the accurate and robust estimation of both the gradient magnitude and direction This method that is presented in Chapter 2 addresses objective 1 above. It robustly estimates the gradient in the x and y directions. The robustness against noise is achieved by prefiltering and postfiltering of the gradient in each direction. To reduce edge blurring effects introduced by these filters, the gradient in a certain direction is obtained by applying the prefilter and postfilter in the perpendicular direction. The basic elements employed in each window are: highpass, lowpass and aggregation operators. The highpass operator is used as a gradient estimator, the lowpass operator is for prefiltering and postfiltering, and the aggregation operator is for aggregating the prefiltered and postfiltered gradients. Four different combinations of highpass, lowpass and aggregation operators are proposed: MVD-Median-Mean, MVD-Median-Max, RCMG-MedianMean and RCMG-Median-Max. For estimating the gradient magnitude and direction at each pixel, our best performing proposed estimator is the RCMGMedian-Mean which is a combination of the RCMG (as a highpass filter), the Median (as a lowpass filter) and the Mean (as an aggregation operator). The advantages of the RCMG-Median-Mean gradient estimator are summarized as follows: • RCMG-Median-Mean outperforms other recently proposed color gradient estimators and edge detectors. • Since the RCMG-Median-Mean method is calculated by applying the RCMG and Median operators row-wise and column-wise (and not to all the color vectors together) it is computationally more efficient than the state-of-the-art gradient estimators. Applying the operators row- wise and column-wise, also improves the performance of the method in the corner detection. • For noiseless images, the RCMG-Median-Mean yields more accurate estimates of the gradient direction than other estimators (such as RCMG and MVD). • Since for each window, the gradient in each direction is obtained by 115 applying the Median filter in the perpendicular direction, the RCMGMedian-Mean method does not significantly blur the edges. • Finally, the RCMG-Median-Mean method is robust to both Gaussian and salt and pepper noise. 2. Multi-scale Derivative Transform (MSDT ) This transform, as derived in Chapter 3, addresses the objectives 2 and 3 above. It obtains a multi-scale and invertible derivative transform. To calculate the first and higher-order image derivatives, MSDT uses the detail wavelet coefficients of the image. Unlike traditional wavelet-based image derivative estimators that use only the horizontal and vertical wavelet coefficients, the proposed transform maps the diagonal as well as the horizontal and vertical wavelet coefficients to the horizontal and vertical derivatives of the image. The inverse transform is designed such that any change in the image derivative domain results in the minimum possible change in the wavelet coefficients. Applications of this transform to copyright protection and video quality monitoring are discussed in Chapters 4 and 5, respectively. 3. A novel image watermarking scheme This method that is called Gradient Direction Watermarking (GDWM) is presented in Chapter 4. This method is described as follows: (a) We first show that significant gradient vectors of an image are good image features for embedding the watermark. This is because embedding the watermark in the significant gradient vectors renders the watermark both imperceptible and robust to many types of attacks such as image compression and AWGN noise. (b) To make the watermark robust to image intensity scaling attacks, we embed the watermark in the angles of the significant gradient vectors. To embed the watermark in each gradient angle, the Absolute Angle Quantization Index Modulation (AAQIM ) is proposed. (c) The watermark bits are embedded by quantizing the angles of significant gradient vectors at multiple wavelet scales. To obtain these an116 gles, we obtain the gradient vectors at different image scales, using the MSDT , which is proposed in Chapter 3. (d) To extract the watermark correctly, the decoder should be able to identify the gradient vectors that were watermarked and the embedding order (i.e. which bit was embedded in which vector). To solve this problem, we propose scrambling the locations of the gradient vectors uniformly over the wavelet transform of the image. This technique is explained in more detail in Chapter 4. This technique also increases the watermark robustness. The proposed watermarking scheme has the following advantages: 1) increased invisibility of the embedded watermark because the watermark is embedded in significant gradient vectors, 2) robustness to amplitude scaling attacks because the watermark is embedded in the angles of the gradient vectors, and 3) increased watermarking capacity as the scheme uses multiple-scale embedding. It is shown that the proposed GDWM outperforms other watermarking methods and is robust to a wide range of attacks, e.g. Gaussian filtering, amplitude scaling, median filtering, sharpening, JPEG compression, Gaussian noise, salt & pepper noise and scaling. 4. A novel reduced-reference video quality assessment method to monitor the quality of compressed/decompressed videos This method is another application of the MSDT transform and is proposed in Chapter 5. The proposed method embeds known semi-fragile watermarks into the I-frames of the video sequence. Based on the watermark bit errors introduced by compression/decompression, the quality of each frame is estimated at the receiver side. In this thesis, we study those introduced by H.264/AVC. The watermark bits are embedded in the multi-scale derivative domain, using two-dimensional spread transform dither modulation (2D STDM ) [27]. The STDM method is used because of its high watermarking capacity and robustness to distortions. To find the best watermarking parameters in the proposed video quality monitoring method, a merit score (MS) is proposed. The linearity of the quality 117 assessment curve, the possible quality estimation range and the consistency of the curve for all the video frames are the three factors that are taken into account in the calculation of the proposed merit score. Experimental results show that embedding the watermark in the multi-scaled versions of the image yields better merit score than embedding in one scale only. The watermark embedded in the angles of the gradient vectors at small scales is used to estimate the small distortions, while the the watermark embedded at large scales is used to estimate the large distortions. 6.2 Directions for Future Work The directions for the future work can be itemized as follows: • Color image denoising using robust gradient estimates: The robust gra- dient estimation method that is proposed in Chapter 2 can be combined with bilateral filtering to yield more noise reduction in smaller number of iterations than when solely the bilateral filtering is used. This is because the result of bilateral filtering highly relies on robust estimation of local pixel differences. Since in noisy images the pixel differences are also noisy, the bilateral filter may need more iterations to reduce image noise. By using our proposed method to robustly estimate the image gradient at each pixel, we expect the bilateral filter to reduce noise more in smaller number of iterations. • Testing the performance of other nondirectional difference operators in the proposed color gradient estimation framework: As mentioned in Chapter 2, the proposed nonlinear color gradient operator uses a nondirectional difference operator to obtain a directional difference operator. Different directional gradient estimators are proposed in Chapter 2 using only two non-directional operators RCMG and MVD . One direction for the future of this work is to test the performance of other non-directional difference operators in the proposed framework. • Employing MSDT in other applications: The Multi-scale Derivative Transform (MSDT ) that is proposed in Chapter 3, can be used in other applications 118 such as image contrast enhancement, scale-invariant feature extraction, image denoising and matching. • Using non-separable directional transforms to calculate the gradient: To calculate the first and higher-order image derivatives, MSDT uses the detail wavelet coefficients of the image. Since wavelet transform is not rotation-invariant, the gradient vectors that are obtained using MSDT will be rotation-variant. This may cause some problems in applications where robustness (or invariance) to image rotation is necessary. To overcome this problem, non-separable directional transforms, such as Contourlet, Curvelet and Ridgelet should be used to calculate image gradients at different scales. • Video quality monitoring for a wide range of distortions: A challenging direction for video quality monitoring is to propose a method to estimate the quality of a video distorted by different types of channel distortions, such as AWGN noise and packet loss. Some of the challenges of this work are: 1) to find out the type of the distortions affecting the video; 2) to find a method that can only estimate the quality of a specific type of distortion and is robust to other types of distortions. 119 Bibliography [1] URL: http://r0k.us/graphics/kodak/kodim08.html. → pages 28, 54 [2] Locally optimum detection for Barni’s multiplicative watermarking in DWT domain. Signal Proces., 88(1):117 – 130, 2008. → pages 89 [3] Corner detection based on gradient correlation matrices of planar curves. Pattern Recognition, 43(4):1207 – 1223, 2010. → pages 1 [4] A. Abrardo and M. Barni. Orthogonal dirty paper coding for informed data hiding. Security, Steganography, and Watermarking of Multimedia Contents VI, 5306(1):274–285, 2004. → pages 58 [5] M. Akhaee, S. Sahraeian, and F. Marvasti. Contourlet-based image watermarking using optimum detector in a noisy environment. IEEE Trans. on Image Proces., 19(4):967 –980, Apr. 2010. → pages xi, 57, 84, 85, 86 [6] S. Ando. Image field categorization and edge/corner detection from gradient covariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(2):179 –190, Feb. 2000. → pages 1 [7] P. Bao and X. Ma. Image adaptive watermarking using wavelet domain singular value decomposition. IEEE Trans. on Circ. and Sys. for Video Tech., 15(1):96 – 102, 2005. → pages 58 [8] V. Barnett. The ordering of multivariate data. Journal of the Royal Statistical Society. Series A (General), 139(3):318–355, 1976. → pages 19 [9] M. Barni, F. Bartolini, and A. Piva. Improved wavelet-based watermarking through pixel-wise masking. IEEE Trans. on Image Proces., 10(5):783 –791, May. 2001. → pages xiii, xv, 7, 62, 63 [10] M. Barni, F. Bartolini, A. De Rosa, and A. Piva. Optimum decoding and detection of multiplicative watermarks. IEEE Trans. on Signal Proces., 51 (4):1118 – 1123, Apr. 2003. → pages 57, 58, 89, 90 120 [11] F. Bartolini, M. Barni, and A. Piva. Performance analysis of st-dm watermarking in presence of nonadditive attacks. IEEE Transactions on Signal Processing, 52(10):2965 – 2974, Oct. 2004. → pages 92, 96 [12] N. Bi, Q. Sun, D. Huang, Z. Yang, and J. Huang. Robust image watermarking based on multiband wavelets and empirical mode decomposition. IEEE Trans. on Image Proces., 16(8):1956–1966, Aug. 2007. → pages xi, 57, 84 [13] S. Bossi, F. Mapelli, and R. Lancini. Semi-fragile watermarking for video quality evaluation in broadcast scenario. ICIP 2005, 1:I–161–4, Sept. 2005. → pages 89 [14] A. Bovik, T. Huang, and J. Munson, D. A robust wavelet-based watermarking algorithm using edge detection. Int. Journal of Comp. Sci., 2 (3):206–211, Dec 2007. → pages 58 [15] A. Bovik, T. Huang, and J. Munson, D. Robust wavelet-based video watermarking using edge detection. Advances in Elec. Eng. and Comp. Sci., 39:173–182, 2009. → pages 58 [16] B. A. Bradley. Improvement to cdf grounded lattice codes. Security, Steganography, and Watermarking of Multimedia Contents VI, 5306(1): 212–223, 2004. → pages 58 [17] T. Brandao. Image communication quality assessment based on watermarking, an overview. Instituto de Telecomunications, Tech. Rep., v0.02, 2005. → pages 89 [18] P. Campisi, M. Carli, G. Giunta, and A. Neri. Tracing watermarking for multimedia communication quality assessment. IEEE International Conference on Communications, 2:1154–1158, 2002. → pages 89 [19] P. Campisi, M. Carli, G. Giunta, and A. Neri. Blind quality assessment system for multimedia communications using tracing watermarking. IEEE Transactions on Signal Processing, 51(4):996–1002, Apr 2003. → pages 88, 89 [20] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679 –698, Nov. 1986. → pages 44 121 [21] M. Carli, M. Farias, E. Drelie Gelasca, R. Tedesco, and A. Neri. Quality assessment using data hiding on perceptually important areas. IEEE International Conference on Image Processing, 3:III–1200–3, Sept. 2005. → pages 89 [22] M. Carli, M. Farias, E. Drelie Gelasca, R. Tedesco, and A. Neri. Quality assessment using data hiding on perceptually important areas. IEEE International Conference on Image Processing, 3:III–1200–3, Sept. 2005. → pages 89 [23] T. Carron and P. Lambert. Color edge detector using jointly hue, saturation and intensity. Proceedings of IEEE International Conference on Image Processing, 3:977–981 vol.3, Nov 1994. → pages 4 [24] J. Caviedes and F. Oberti. A new sharpness metric based on local kurtosis, edge and energy information. Signal Processing: Image Communication, 19(2):147 – 161, 2004. → pages 9 [25] T. Chan, A. Marquina, and P. Mulet. High-order total variation-based image restoration. SIAM J. Sci. Comput., 22:503–516, Feb. 2000. ISSN 1064-8275. → pages 2, 37 [26] D. Chandler and S. Hemami. Vsnr: A wavelet-based visual signal-to-noise ratio for natural images. IEEE Transactions on Image Processing, 16(9): 2284 –2298, Sept. 2007. → pages 9 [27] B. Chen and G. Wornell. Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. on Information Theory, 47(4):1423 –1443, May. 2001. → pages 57, 61, 90, 91, 92, 93, 117 [28] G.-H. Chen, C.-L. Yang, and S.-L. Xie. Gradient-based structural similarity for image quality assessment. 2006 IEEE International Conference on Image Processing, pages 2929 –2932, Oct. 2006. → pages 53 [29] L.-H. Chen and J.-J. Lin. Mean quantization based image watermarking. Image and Vision Computing, 21(8):717 – 727, 2003. ISSN 0262-8856. → pages 58 [30] M.-J. Chen and A. Bovik. No-reference image blur assessment using multiscale gradient. International Workshop on Quality of Multimedia Experience, pages 70 –74, July 2009. → pages 1 122 [31] Q. Cheng and T. Huang. An additive approach to transform-domain information hiding and optimum detection structure. IEEE Trans. on Multimedia, 3(3):273 –284, Sep 2001. → pages 57, 89 [32] I. Cox, J. Kilian, F. Leighton, and T. Shamoon. Secure spread spectrum watermarking for multimedia. IEEE Trans. on Image Proces., 6(12):1673 –1687, Dec. 1997. → pages 57, 89 [33] I. Cox, M. Miller, and A. McKellips. Watermarking as communications with side information. Proceedings of the IEEE, 87(7):1127 –1141, July 1999. → pages 90 [34] A. Cumani. Edge detection in multispectral images. Comput. Vis. Graph. Image Process.: Graphical Models Image Processing, 53, 1991. → pages 4 [35] G. Deng. A generalized unsharp masking algorithm. IEEE Transactions on Image Processing, 20(5):1249 –1261, May 2011. → pages 2, 37 [36] S. Didas, J. Weickert, and B. Burgeth. Properties of higher order nonlinear diffusion filtering. Journal of Mathematical Imaging and Vision, 35: 208–226, 2009. → pages 2, 37 [37] J. J. Eggers, R. Baeuml, and B. Girod. Estimation of amplitude modifications before scs watermark detection. Security and Watermarking of Multimedia Contents IV, 4675(1):387–398, 2002. → pages 58 [38] A. N. Evans and X. U. Liu. A morphological gradient approach to color edge detection. IEEE Transactions on Image Processing, 15(6): 1454–1463, June 2006. → pages 4, 18, 21, 22 [39] M. Farias, M. Carli, A. Neri, and S. K. Mitra. Video quality assessment based on data hiding driven by optical flow information. Proc. SPIE, 5294: 190–200, 2003. → pages 89 [40] R. Garnett, T. Huegerich, C. Chui, and W. He. A universal noise removal algorithm with an impulse detector. IEEE Transactions on Image Processing, 14(11):1747 –1754, Nov. 2005. → pages 4 [41] L. Ghouti, A. Bouridane, M. Ibrahim, and S. Boussakta. Digital image watermarking using balanced multiwavelets. IEEE Trans. on Signal Proces., 54(4):1519 – 1536, 2006. → pages 57 [42] W.-B. Goh and K.-Y. Chan. The multiresolution gradient vector field skeleton. Pattern Recognition, 40(4):1255 – 1269, 2007. → pages 1 123 [43] M. R. Hajiaboli. An anisotropic fourth-order diffusion filter for image noise removal. Int. J. Comput. Vision, 92:177–191, Apr. 2011. ISSN 0920-5691. → pages 2, 37 [44] C. Harris and M. Stephens. A combined corner and edge detector. Proceedings of the 4th Alvey Vision Conference, pages 147–151, 1988. → pages 1 [45] M. J. Holliman and M. M. Yeung. Watermarking for automatic quality monitoring. volume 4675, pages 458–469. SPIE, 2002. → pages 90, 107 [46] Y. Hu, H.-K. Lee, K. Chen, and J. Li. Difference expansion based reversible data hiding using two embedding directions. IEEE Transactions on Multimedia, 10(8):1500 –1512, Dec. 2008. → pages 2, 37 [47] D. E. M. J. N. Ellinas. A robust watermarking scheme based on edge detection and contrast sensitivity function. Proc. of Int. Conf. Comp. Vision Theory and Applications, 2007. → pages 58 [48] N. Kalantari and S. Ahadi. A logarithmic quantization index modulation for perceptually better data hiding. IEEE Trans. on Image Proces., 19(6): 1504 –1517, 2010. → pages xii, 58, 84, 86 [49] N. Kalantari, S. Ahadi, and M. Vafadust. A robust image watermarking in the ridgelet domain using universally optimum decoder. IEEE Trans. on Circ. and Sys. for Video Tech., 20(3):396 –406, 2010. → pages 57 [50] T. Kanade. Image understanding research at CMU. Proc. Image Understanding Workshop, 2:32–40, 1987. → pages 4, 23, 33 [51] X. Kang, J. Huang, and W. Zeng. Improving robustness of quantization-based image watermarking via adaptive receiver. IEEE Trans. on Multimedia, 10(6):953 –959, Oct. 2008. → pages 82 [52] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International Journal Of Computer Vision, 1(4):321–331, 1988. → pages 1 [53] C. S. Kenney, B. S. Manjunath, M. Zuliani, G. Hewer, and A. V. Nevel. A condition number for point matching with application to registration and post-registration error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:1437–1454, 2004. → pages 1 [54] Y.-W. Kim and I.-S. Oh. Watermarking text document images using edge direction histograms. Pattern Recogn. Lett., 25(11):1243–1251, 2004. → pages 58 124 [55] U. Kthe. Edge and junction detection with an improved structure tensor. Pattern Recognition, 2781:25–32, 2003. → pages 1 [56] D. Kundur and D. Hatzinakos. Digital watermarking for telltale tamper proofing and authentication. Proc. of the IEEE, 87(7):1167 –1180, Jul. 1999. → pages 57 [57] D. Kundur and D. Hatzinakos. Diversity and attack characterization for improved robust watermarking. IEEE Trans. on Signal Proces., 49(10): 2383 –2396, Oct. 2001. → pages 57, 90 [58] G. Langelaar, I. Setyawan, and R. Lagendijk. Watermarking digital image and video data. a state-of-the-art overview. IEEE Signal Proces. Mag., 17 (5):20 –46, Sep. 2000. → pages 7 [59] B. Li and S. Acton. Active contour external force using vector field convolution for image segmentation. IEEE Transactions on Image Processing, 16(8):2096 –2106, Aug. 2007. → pages 1 [60] Y. Liang, J. Apostolopoulos, and B. Girod. Analysis of packet loss for compressed video: does burst-length matter? IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP ’03)., 5: 684–7, April 2003. → pages 10 [61] T. Liu, H. Zhou, F. Lin, Y. Pang, and J. Wu. Improving image segmentation by gradient vector flow and mean shift. Pattern Recogn. Lett., 29:90–95, January 2008. ISSN 0167-8655. → pages 1 [62] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60:91–110, Nov. 2004. ISSN 0920-5691. → pages 1 [63] L. Lu and X. Lu. Quality assessing of video over a packet network. In DMAMH ’07: Proceedings of the Second Workshop on Digital Media and its Application in Museum & Heritage, pages 365–369, Washington, DC, USA, 2007. IEEE Computer Society. → pages 10 [64] L. Lucchese and S. Mitra. A new class of chromatic filters for color image processing. theory and applications. IEEE Transactions on Image Processing, 13(4):534 –548, apr. 2004. → pages 4 [65] S. Mallat and W. Hwang. Singularity detection and processing with wavelets. Information Theory, IEEE Transactions on, 38(2):617 –643, March 1992. → pages 6, 37, 39 125 [66] P. Markelj, D. Tomazevic, F. Pernus, and B. Likar. Robust gradient-based 3-D/2-D registration of CT and MR to X-ray images. IEEE Transactions on Medical Imaging, 27(12):1704 –1714, Dec. 2008. → pages 1 [67] M. Miller, G. Doerr, and I. Cox. Applying informed coding and embedding to design a robust high-capacity watermark. IEEE Trans. on Image Proces., 13(6):792 –807, Jun. 2004. → pages 58 [68] C. Ming and P. Xi-jian. Image steganography based on arnold transform. Computer Application Research, 1:235–237, 2006. → pages 70 [69] S. Morillas, V. Gregori, and A. Hervas. Fuzzy peer groups for reducing mixed gaussian-impulse noise from color images. IEEE Transactions on Image Processing, 18(7):1452 –1466, Jul. 2009. → pages 4, 23 [70] P. Moulin and R. Koetter. Data-hiding codes. Proc. of the IEEE, 93(12): 2083 –2126, Dec. 2005. → pages 57, 90 [71] E. Nezhadarya, Z. Wang, and R. Ward. 17th IEEE International Conference on Image Processing (ICIP). → pages 11, 94 [72] E. Nezhadarya, Z. Wang, and R. Ward. Image quality monitoring using spread spectrum watermarking. 16th IEEE International Conference on Image Processing (ICIP), pages 2233 –2236, Nov. 2009. → pages 89 [73] E. Nezhadarya, Z. Wang, and R. Ward. Robust image watermarking based on multiscale gradient direction quantization. IEEE Transactions on Information Forensics and Security, (99):1, 2011. → pages 38, 51, 95, 97 [74] F. Ourique, V. Licks, R. Jordan, and F. Perez-Gonzalez. Angle qim: a novel watermark embedding scheme robust against amplitude scaling distortions. Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Proces. (ICASSP ’05), 2:ii/797 – ii/800 Vol. 2, Mar. 2005. → pages 58, 61, 80 [75] F. D. P. Marziliano, S. Winkler, and T. Ebrahimi. Perceptual blur and ringing metrics: application to jpeg2000. Signal Processing: Image Communication, 19(2):163 – 172, 2004. → pages 9 [76] B. Pan, Z. Lu, and H. Xie. Mean intensity gradient: An effective global parameter for quality assessment of the speckle patterns used in digital image correlation. Optics and Lasers in Engineering, 48(4):469 – 477, 2010. → pages 1 126 [77] F. Perez-Gonzalez, C. Mosquera, M. Barni, and A. Abrardo. Rational dither modulation: a high-rate data-hiding method invariant to gain attacks. IEEE Trans. on Signal Proces., 53(10):3960 – 3975, Oct. 2005. → pages 80 [78] F. Perez-Gonzlez and F. Balado. Quantized projection data hiding. Proc. of Int. Conf. on Image Proces., 2:889–892, 2002. → pages 58 [79] A. Phadikar, S. P. Maity, and B. Verma. Region based QIM digital watermarking scheme for image database in DCT domain. Computers & Electrical Engineering, In Press, 2011. → pages 58 [80] M. Pinson and S. Wolf. A new standardized method for objectively measuring video quality. IEEE Transactions on Broadcasting, 50(3):312 – 322, Sept. 2004. → pages 113 [81] K. Plataniotis, D. Androutsos, V. Sri, and A. Venetsanopoulos. Nearest-neighbour multichannel filter. Electronics Letters, 31(22):1910 –1911, Oct. 1995. → pages 4 [82] S. Z. K. Plataniotis and A. Venetsanopoulos. Comprehensive analysis of edge detection in color image processing. Opt. Eng., 38:612–625, Apr. 1999. → pages 4 [83] C. Podilchuk and W. Zeng. Image-adaptive watermarking using visual models. IEEE Journal on Sel. Areas in Comm., 16(4):525 –539, May. 1998. → pages 57, 89 [84] W. Pratt. Digital Image Processing. Wiley, New York, 1991. → pages 1, 4, 5, 16, 23, 31, 59 [85] A. Reibman, V. Vaishampayan, and Y. Sermadevi. Quality monitoring of video over a packet network. IEEE Transactions on Multimedia, 6(2): 327–334, April 2004. → pages 10 [86] K. Rohr. Localization properties of direct corner detectors. Journal of Mathematical Imaging and Vision, 4:139–150, 1994. → pages 1 [87] A. Rosenfeld. A nonlinear edge detection technique. Proceedings of the IEEE, 58(5):814 – 816, May 1970. → pages 54 [88] G. Sapiro and D. Ringach. Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Transactions on Image Processing, 5(11):1582 –1586, Nov. 1996. → pages 4 127 [89] J. Scharcanski and A. Venetsanopoulos. Edge detection of color images using directional operators. IEEE Trans. on Circ. and Sys. for Video Tech., 7(2):397–401, April 1997. → pages 4 [90] V. Senthil and R. Bhaskaran. Digital image watermarking using edge detection and wavelets with robustness analysis against jpeg compression attacks. Int. Conf. on Innovations in Inf. Tech., Dec. 2008. → pages 58 [91] L. Shafarenko, M. Petrou, and J. Kittler. Automatic watershed segmentation of randomly textured color images. IEEE Transactions on Image Processing, 6(11):1530–1544, Nov 1997. → pages 5 [92] H. Sheikh and A. Bovik. Image information and visual quality. IEEE Transactions on Image Processing, 15(2):430 –444, Feb. 2006. → pages 9 [93] I. Shterev and R. Lagendijk. Amplitude scale estimation for quantization-based watermarking. IEEE Trans. on Signal Proces., 54(11): 4146 –4155, Nov. 2006. → pages 58 [94] I. D. Shterev and R. L. Lagendijk. Maximum likelihood amplitude scale estimation for quantization-based watermarking in the presence of dither. Security, Steganography, and Watermarking of Multimedia Contents VII, 5681(1):516–527, 2005. → pages 58 [95] D. Z. Silvano. A note on the gradient of a multi-image. Comput. Vision Graph. Image Process., 33(1):116–125, 1986. ISSN 0734-189X. → pages 4 [96] P. J. Toivanen, J. Ansamki, J. P. S. Parkkinen, and J. Mielikinen. Edge detection in multispectral images using the self-organizing map. Pattern Recognition Letters, 24(16):2987 – 2994, 2003. → pages 4 [97] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. Sixth International Conference on Computer Vision, pages 839 –846, Jan. 1998. → pages 4 [98] P. Trahanias and A. Venetsanopoulos. Color edge detection using vector order statistics. IEEE Transactions on Image Processing, 2(2):259–264, April 1993. → pages 4, 18, 19, 22 [99] P. Trahanias and A. Venetsanopoulos. Vector order statistics operators as color edge detectors. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(1):135–143, Feb 1996. → pages 4, 18 128 [100] C. J. van den Branden Lambrecht, E. Lambrecht, and O. Verscheure. Perceptual quality measure using a spatio-temporal model of the human visual system. Proc. SPIE Digital Video Compression: Algorithms and Technologies, 2668:450–461, 1996. → pages 113 [101] S. Wang, D. Zheng, J. Zhao, T. W. J., and F. Speranza. A digital watermarking and perceptual model based video quality measurement. Proceedings of the IEEE Conference on Instrumentation and Measurement Technology, 3:1729–1734, May 2005. → pages 88, 90 [102] S. Wang, D. Zheng, J. Zhao, W. J. Tam, and F. Speranza. An image quality evaluation method based on digital watermarking. IEEE Transactions on Circuits and Systems for Video Technology, 17(1):98–105, January 2007. → pages 90 [103] S.-H. Wang and Y.-P. Lin. Wavelet tree quantization for copyright protection watermarking. IEEE Trans. on Image Proces., 13(2):154 –165, 2004. → pages 58 [104] Y. Wang, J. Doherty, and R. Van Dyck. A wavelet-based watermarking algorithm for ownership verification of digital images. IEEE Trans. on Image Proces., 11(2):77 –88, Feb. 2002. → pages xi, 57, 84, 85 [105] Y.-P. Wang. Image representations using multiscale differential operators. IEEE Transactions on Image Processing, 8(12):1757 –1771, Dec. 1999. → pages 6, 37 [106] Z. Wang, H. R. Sheikh, and A. C. Bovik. Objective Video Quality Assessment, in The Handbook of Video Databases: Design and Applications. CRC Press, 2003. → pages 8 [107] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600 –612, April 2004. → pages 9, 79, 108 [108] A. B. Watson. DCT quantization matrices visually optimized for individual images. In J. P. Allebach and B. E. Rogowitz, editors, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 1913, pages 202–216, Sept. 1993. → pages 7, 9 [109] S. Wesolkowski and M. E. Jernigan. Color edge detection in rgb using jointly euclidean distance and vector angle. Proc. IAPR Vis. Interf., pages 1–9, 1999. → pages 5 129 [110] S. Winkler. Digital video quality: vision models and metrics. Wiley, 2005. → pages 8, 88 [111] S. Winkler, E. D. Gelasca, and T. Ebrahimi. Toward perceptual metrics for video watermark evaluation. in Proc. of SPIE, Applications of Digital Image Processing, pages 371–378, 2003. → pages 90, 111 [112] M. Wu and B. Liu. Data hiding in image and video .i. fundamental issues and solutions. IEEE Trans. on Image Proces., 12(6):685 – 695, Jun. 2003. → pages 57, 60, 69, 95 [113] C. Xu and J. Prince. Snakes, shapes, and gradient vector flow. IEEE Transactions on Image Processing, 7(3):359 –369, March 1998. → pages 1 [114] Q. Z. Xue Yang, Xiaoyang Yu and J. Jia. Image encryption algorithm based on universal modular transformation. Inf. Tech. Journal, 9(4):680–685, 2010. → pages 70 [115] S. Yi, D. Labate, G. Easley, and H. Krim. A shearlet approach to edge analysis and detection. IEEE Transactions on Image Processing, 18(5):929 –941, May 2009. → pages 38 [116] L. Zhao, X. Liao, D. Xiao, T. Xiang, Q. Zhou, and S. Duan. True random number generation from mobile telephone photo based on chaotic cryptography. Chaos, Solitons & Fractals, 42(3):1692 – 1699, 2009. → pages 72 [117] J. Zhu and N. Wang. Image quality assessment by visual gradient similarity. IEEE Transactions on Image Processing, PP(99):1, 2011. → pages 53 [118] J. Zou and R. Ward. Introducing two new image scrambling methods. IEEE Pacific Rim Conf. on Comm., Comp. and signal Proces., 2:708 – 711 vol.2, Aug. 2003. → pages 70 [119] J. Zou, R. Ward, and D. Qi. A new digital image scrambling method based on fibonacci numbers. Proc. of Int. Symp. on Cir. and Sys., 3:965–968, May. 2004. → pages 70 [120] J. Zou, X. Tie, R. Ward, and D. Qi. Some novel image scrambling methods based on affine modular matrix transformation. Journal of Info. and Compt. Sci., 2(1):223–227, March 2005. → pages 70 130
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Image derivative estimation and its applications to...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Image derivative estimation and its applications to edge detection, quality monitoring and copyright… Nezhadarya, Ehsan 2013
pdf
Page Metadata
Item Metadata
Title | Image derivative estimation and its applications to edge detection, quality monitoring and copyright protection |
Creator |
Nezhadarya, Ehsan |
Publisher | University of British Columbia |
Date Issued | 2013 |
Description | Multi-order image derivatives are used in many image processing and computer vision applications, such as edge detection, feature extraction, image enhancement, segmentation, matching, watermarking and quality assessment. In some applications, the image derivatives are modified and then inverse-transformed to the image domain. For example, one approach for image denoising is to keep the significant image derivatives and shrink the non-significant derivatives. The denoised image is then reconstructed from the modified derivatives. The main challenge here is how to inverse-transform the derivatives to the image domain. This thesis proposes different algorithms to estimate the image derivatives and apply them to image denosing , watermarking and quality assessment. For noisy color images, we present a method that yields accurate and robust estimates of the gradient magnitude and direction. This method obtains the gradient at a certain direction by applying a prefilter and a postﬁlter in the perpendicular direction. Simulation results show that the proposed method outperforms state-of-the-art methods. We also present a multi-scale derivative transform, MSDT, that obtains the gradient at a given image scale using the detail horizontal, vertical and diagonal wavelet coefficients of the image at that scale. The inverse transform is designed such that any change in the image derivative results in the minimum possible change in the image. The MSDT transform is used to derive a novel multi-scale image watermarking method. This method embeds the watermark bits in the angles of the significant gradient vectors, at different image scales. Experimental results show that the proposed method outperforms other watermarking methods in terms of robustness to attacks, imperceptibility of the watermark and watermark capacity.The MSDT is then used to obtain a semi-blind method for video quality assessment. The method embeds pseudo-random binary watermarks in the derivative vectors of the original undistorted video. The quality of the distorted video is estimated based on the similarity between the embedded and the extracted watermarks. The simulation results on video distorted by compression/decompression show that the proposed method can accurately estimate the quality of a video and its frames for a wide range of compression ratios. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2013-05-24 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
IsShownAt | 10.14288/1.0071982 |
URI | http://hdl.handle.net/2429/44504 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2013-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2013_spring_nezhadarya_ehsan.pdf [ 5.17MB ]
- Metadata
- JSON: 24-1.0071982.json
- JSON-LD: 24-1.0071982-ld.json
- RDF/XML (Pretty): 24-1.0071982-rdf.xml
- RDF/JSON: 24-1.0071982-rdf.json
- Turtle: 24-1.0071982-turtle.txt
- N-Triples: 24-1.0071982-rdf-ntriples.txt
- Original Record: 24-1.0071982-source.json
- Full Text
- 24-1.0071982-fulltext.txt
- Citation
- 24-1.0071982.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071982/manifest