UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Perceptually based compression of emerging digital media content Azimi Hashemi, Maryam 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2014_november_azimihashemi_maryam.pdf [ 11.47MB ]
Metadata
JSON: 24-1.0167621.json
JSON-LD: 24-1.0167621-ld.json
RDF/XML (Pretty): 24-1.0167621-rdf.xml
RDF/JSON: 24-1.0167621-rdf.json
Turtle: 24-1.0167621-turtle.txt
N-Triples: 24-1.0167621-rdf-ntriples.txt
Original Record: 24-1.0167621-source.json
Full Text
24-1.0167621-fulltext.txt
Citation
24-1.0167621.ris

Full Text

 Perceptually Based Compression of Emerging Digital Media Content by Maryam Azimi Hashemi A THESIS SUBMITTED IS PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE in The Faculty of Graduate and Postdoctoral Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2014 © Maryam Azimi Hashemi, 2014    ii  Abstract Digital video has become ubiquitous in our everyday lives; everywhere we look, there are devices that can display, capture, and transmit video. The recent advances in technology have made it possible to capture and display HD stereoscopic (3D) and High Dynamic Range (HDR) videos. However, the current broadcasting networks do not even have sufficient capacity to transmit large amounts of HD content, let alone 3D and High Dynamic Range. The limitations of the current compression technologies are the motivations behind this thesis, which proposes novel methods for further improving the efficiency of the compression techniques when used on emerging digital media formats.  As a first step, we participated in the standardization efforts of High Efficiency Video Coding (HEVC), the latest video compression standard. The knowledge gained from this study became the foundation for the research that followed. We first propose a new method for encoding stereoscopic videos asymmetrically. In traditional asymmetric stereoscopic video coding, the quality of one of the views is reduced while the other view is of original quality. However, this approach is not fair for people with one dominant eye. We address this problem by reducing the quality of horizontal slices in both views. Subjective tests show that the quality, sharpness and depth of the videos encoded by our method are close to those of the original one and that the proposed method is an effective technique for stereoscopic videos coding.  In this Thesis we also focus on HDR video technology and we modify the HEVC standards to better characterize HDR content. We first identify a quality metric whose performance on compressed HDR content is highly correlated with subjective results. We then propose a new Lagrangian multiplier that uses this quality metric to strike the best balance between the bit-rate and distortion of the HDR video inside the Rate-Distortion process of the encoder. The updated Lagrange multiplier is implemented on the HEVC reference software. Our experiment results show that, for the iii  same bitrate, the subjective quality scores of the videos encoded by the HDR-accustomed encoder are higher than the ones encoded with the reference encoder. iv  Preface All of the work presented in this thesis was conducted in the Digital Multimedia Laboratory at the University of British Columbia, Vancouver campus. All figures and tables used in this thesis are original. A version of Chapter 2 has been published as Pourazad, M.T.; Doutre, C.; Azimi, M.; Nasiopoulos, P., "HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC," IEEE Consumer Electronics Magazine, July 2012. I followed the standardization developments very closely, collected data and helped with the manuscript composition. M. T. Pourazad and C. Doutre were involved in cross fact checking of the data as well as the manuscript writings and edits. P. Nasiopoulos was the supervisor on this project and was involved with research concept formation, and manuscript edits.  A version of Chapter 3 has been published as M. Azimi, S. Valizadeh, X. Li, L. Coria, and P. Nasiopoulos, "Subjective study on asymmetric stereoscopic video with low-pass filtered slices", 2012 International Conference on Computing, Networking and Communications (ICNC), Maui, HI, USA, Feb. 2012. I was the lead investigator responsible for all areas of research, data collection, as well as the majority of manuscript composition. S.Valizadeh and X.Li were responsible for results cross-checking and were involved in the research concept formation as well as the subjective test setup. L.Coria aided with manuscript edits. P. Nasiopoulos was the supervisor on this project and was involved with research concept formation, and manuscript edits. Part of the results of Chapter 3 are taken from the work published as S. Valizadeh, M. Azimi, and P. Nasiopoulos, “Bitrate reduction in asymmetric stereoscopic video with low-pass filtered slices”, IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, USA, Jan. v  2012. I was responsible for test data collection and research concept formation. S. Valizadeh was involved in manuscript writing as well as the subjective test setup. P. Nasiopoulos was the supervisor on this project and was involved with research concept formation, and manuscript edits.  A version of Chapter 4 has been published as M. Azimi; Banitalebi, A.; Dong. Y.; Pourazad, M.T.; Nasiopoulos, P., “A survey on the performance of the existing full reference HDR video quality metrics: A new HDR video dataset for quality evaluation purposes,” International Conference on Multimedia Signal Processing, Venice, Italy, November 2014. I was the lead investigator responsible for all areas of research, data collection, as well as the manuscript composition. A. Banitalebi-Dehkordi was involved in research concept formation, data collection and implementation as well as the manuscript edit. C. Dong was also involved in early research concept formation. M.T. Pourazad was involved in the early stages of research concept formation and aided with manuscript edits. P. Nasiopoulos was the supervisor on this project and was involved with research concept formation, and manuscript edits. A version of Chapter 5 is to be submitted to an IEEE Journal as M. Azimi, M. T. Pourazad, and P. Nasiopoulos, “Rate Distortion Optimization of High Efficiency Video Coding for High Dynamic Range Video Data” and parts of the same chapter have been published as M. Azimi, M. T. Pourazad, and P. Nasiopoulos, “Rate-Distortion Optimization for High Dynamic Range Video Coding", the 6th International Symposium on Communications, Control, and Signal Processing (ISCCSP), Athens, Greece, May 2014.. I was the lead investigator responsible for all areas of research, data collection, as well as the manuscript composition. M.T. Pourazad was involved in the early stages of research concept formation and aided with manuscript edits. P. Nasiopoulos was the supervisor on this project and was involved with research concept formation, and manuscript edits.   vi  Table of Contents Abstract ................................................................................................................................................... ii Preface ................................................................................................................................................... iv Table of Contents .................................................................................................................................. vi List of Tables ......................................................................................................................................... ix List of Figures ........................................................................................................................................ x List of Acronyms .................................................................................................................................. xii Acknowledgments ................................................................................................................................ xv Dedication ............................................................................................................................................ xvi 1 Introduction ...................................................................................................................................... 1 2 Background ...................................................................................................................................... 5 2.1 Video Coding Standards: From H.264/AVC to HEVC ............................................................ 5 2.1.1 Overview of the HEVC standard ....................................................................................... 6 2.2 Stereoscopic Video Coding ..................................................................................................... 18 2.2.1 Asymmetric Stereoscopic Video Coding ......................................................................... 18 2.3 High Dynamic Range (HDR) Video Coding .......................................................................... 21 2.3.1 The Rate-Distortion Optimization Problem in Video Coding ......................................... 23 3 A Novel Method for Stereoscopic Video Coding with Low-pass Filtered Slices ......................... 29 3.1 Introduction ............................................................................................................................. 29 3.2 Proposed Method .................................................................................................................... 30 vii  3.2.1 Test Setup ......................................................................................................................... 32 3.2.2 Viewers ............................................................................................................................ 34 3.2.3 Display ............................................................................................................................. 34 3.2.4 Test Assessment ............................................................................................................... 35 3.3 Results ..................................................................................................................................... 35 3.4 Conclusion .............................................................................................................................. 43 4 Evaluating the Performance of Existing Full-Reference Quality Metrics on High Dynamic Range (HDR) Video Content ........................................................................................................................... 44 4.1 Introduction ............................................................................................................................. 44 4.2 DML Dataset ........................................................................................................................... 45 4.3 Experiment Setting .................................................................................................................. 48 4.3.1 Objective Test Procedure ................................................................................................. 48 4.3.2 Subjective Test Procedure ................................................................................................ 50 4.4 Results and Discussions .......................................................................................................... 51 4.5 Conclusion .............................................................................................................................. 53 5 Rate Distortion Optimization of High Efficiency Video Coding for High Dynamic Range (HDR) Video .................................................................................................................................................... 54 5.1 Introduction ............................................................................................................................. 54 5.2 Proposed Method .................................................................................................................... 55 5.2.1 Subjective Test Setup ....................................................................................................... 61 5.3 Results ..................................................................................................................................... 63 viii  5.4 Conclusion .............................................................................................................................. 67 6 Conclusion and Future Work ......................................................................................................... 68 6.1 Conclusion .............................................................................................................................. 68 6.2 Future Work ............................................................................................................................ 69 Bibliography ......................................................................................................................................... 71  ix  List of Tables  Table 2-1. Luma intra prediction modes supported .............................................................................. 12 Table 2-2.Luma Sub-pel Interpolation Filter coefficients .................................................................... 16 Table 3-1 Parameters in test setup ........................................................................................................ 34 Table 4-1 DML Descrption of DML HDR Video Dataset ................................................................... 46 x  List of Figures  Figure 2-1. Block Diagram of HM 12.1 Encoder ................................................................................... 7 Figure 2-2. Partitioning of a 64x64 LCU to various sizes of CU ........................................................... 9 Figure 2-3 Square and non-square Prediction Units ............................................................................. 10 Figure 2-4 An example of arranging Transform Units in an LCU ....................................................... 11 Figure 2-5 Luma intra prediction modes of HEVC (above), and H.264/AVC (below) ....................... 14 Figure 2-6. Sub-pel interpolation from full-pel samples ...................................................................... 16 Figure 2-7. Asymmetric  stereoscpic video coding idea ...................................................................... 19 Figure 2-8 Decision Process for an Intra Prediction Unit .................................................................... 26 Figure 3-1. odd slices of the left view and the even slices of the right view are filtered ..................... 30 Figure 3-2 (a) left and right frames with unsmoothed edges. (b) left and right frames with smoothed edges ..................................................................................................................................................... 32 Figure 3-3. (a)First video sequence: Mother and Kid (b) Second video sequence: Two Dolls. .......... 33 Figure 3-6 Sharpness, depth and quality of Mother and Kid averaged over the viewers with smoothed edges  .................................................................................................................................................... 40 Figure 3-7 Sharpness, depth and quality of Two Dolls averaged over the viewers with smoothed edges  .................................................................................................................................................... 41 Figure 3-8 Mean Score Opinion of the video sequence versus the required bitrate with H.264 compression software ........................................................................................................................... 42 Figure 4-1 Snapshots of the first frames of HDR test video sequences (tone-mapped version): (a) Hallway, (b) MainMall, (c) MainMallTree, (d) Playground,  (e) Strangers, (f) Table, and (g) Tree ... 47 xi  Figure 4-2 The Prototype HDR Display ............................................................................................... 50 Figure 5-1 SI and TI of the HDR videos in DML dataset .................................................................... 56 Figure 5-2 Optimal points on (a) iso-rate and (b) iso-distortion lines for MainMall sequence ........... 58 Figure 5-3 The estimated λ-QP relationship for all the HDR sequences using HM12.1. (a) for P/B frames and (b) and (c) for I frames ....................................................................................................... 60 Figure 5-4 (a) Subjective Test Procedure, and (b) rating scale ............................................................ 63 Figure 5-5 Rate-MOS of the sequences Playground ............................................................................ 64 Figure 5-6 Rate-MOS of the sequences Table ..................................................................................... 65 Figure 5-7 Rate-MOS of the sequences Playground ............................................................................ 65 Figure 5-8 Rate-MOS of the sequences Table ..................................................................................... 66 Figure 5-9 Rate-MOS of the sequences Strangers ............................................................................... 66 xii  List of Acronyms 2D     2 Dimensional 3D     3 Dimensional AMP     Asymmetric Motion Partitions  AVC     Advanced Video Coding CfP     Call for proposal CIF      Common Interchange Format CU     Coding Unit DCT      Discrete Cosine Transform DM     Direct Mode DML     Digital Multimedia Lab DSIS     Double-Stimulus impairment-scale  FMO      Flexible Macroblock Ordering HD      High Definition HDR      High Dynamic Range xiii  HDR-VDP-2    High Dynamic Range Visible Difference Predictor HEVC     High Efficiency Video Coding ITU-T International Telegraph Union- Telecommunication Standardization Sector JCT-VC     Joint Collaborative Team on Video Coding LCU      Largest Coding Unit LDR     Low Dynamic Range LM     Linear Mode MC      Motion Compensation MOS     Mean Opinion Square MPEG     Moving Pictures Experts Group MPEG     Moving Picture Expert Group MSE     Mean Squared Error MV     Motion Vector  PSNR     Peak Signal-to-noise ratio PU     Perceptually Uniform xiv  PU     Prediction Unit QP     Quantization Parameter RDO      Rate Distortion Optimization SAO     Sample Adaptive Offset SDR     Standard Dynamic Range SSIM     Structural SIMilarity Index TU     Transform Unit TV     Television UHD     Ultra High Definition VCEG     Video Coding Experts Group VIF     Visual Information WPP     Wave Front Parallel Processing     xv  Acknowledgments I would like to express my most sincere and special gratitude to my supervisor and mentor Dr. Panos Nasiopoulos who generously accepted to take charge of my supervision and offered me such a fascinating topic to work on, and patiently guided me through the thesis by his knowledge and experience. I would also like to thank Dr. Mahsa Pourazad, for her support and help during different stages of this thesis.  xvi  Dedication This thesis is dedicated to: My mother who always believed in me and never stopped encouraging me. Thank you Mom for all the sacrifices you made for me so that I can be who I am today. My beloved sister, Mozhgan, from whom I learnt to never give up and never stop learning and trying. Mozhgan! Thanks for being so great! To my beloved sister, Mehrnoush, for all her love and emotional support throughout my whole life. Thank you Mehrnoush for being there whenever I needed you. To my great role model in life who happens to be a great brother too, Mehran, whom if it was not for him, I would not be writing these lines on this thesis. Thanks Mehran for always showing me the right path.  To my husband, my love and support in life, Nima that never stopped believing in me and encouraging me to go after my passions. Thanks for always being there; through joy and sadness. And last but not least, to my beloved father. Thank you dad for all the love and support you offered me through my whole life. Thanks for making my life so easy and joyful so that I could go after whatever fascinated me. You are always in my heart.   1  1 Introduction From the emergence of imaging and video technology, all the efforts of the research community, industry and academia alike, have being aiming at achieving a close to real-life viewing experience, giving the viewer the perception of “being there”. From black and white to color, low resolution to high resolution, 2D to 3D viewing experience, and recently from Low Dynamic Range (LDR) to High Dynamic Range (HDR) technology, this goal has been approached step by step. Any introduction of a new technology in digital media comes with its challenges too. In the case of 3-Dimentional Video, capturing, transmitting, storing and displaying of the videos has been revolutionized. 3D cameras and displays have been largely commercialized and appropriate 3D coding schemes have been developed and standardized. In order to broadcast 3D videos with descent quality at the viewer side, their bitrate has to shrink to meet bandwidth limitations. Stereoscopic videos consist of two views of the perceived scene, which are captured by two side-by-side aligned cameras. These views are known as left and right views accordingly. Therefore, in case of 3D or stereoscopic videos, the number of frames is at least double the amount of frames in case of the 2D videos. That is why designing an efficient compression scheme for stereoscopic videos is even more critical than for the 2D case.  By taking advantage of the spatial and temporal similarities between video frames, as in the case of 2D videos, the size and bitrate of the 3D videos may be reduced. Moreover, since both left and right views of the stereoscopic videos represent the same scene from two points of view, the similarities between the two frames are substantial. By removing these redundancies, the 3D or stereoscopic videos may be further compressed. This is the basic idea employed in the compression schemes for the stereoscopic videos. The state of the art video coding standard H.264/AVC [1] and 2  the recently developed and standardized High Efficiency Video Coding, also known in short as HEVC [2], take advantage of these ideas. They use one view as the reference view and predict the other view from the reference one by using their similarities. The idea is expanded to multi-view videos as well.  In order to further compress stereoscopic videos, a property of the Human Visual System known as the Binocular Vision Theory [3] is taken into account. Theoretically, if the quality of one view in a stereoscopic pair is lowered while the other one is unaltered, the perceived quality of the 3D version will be same as that of the original view. In other words, the degradation of the quality in one of the views does not necessarily affect the overall quality perception of the stereo pair. By applying this theory to the stereoscopic video compression schemes, the amount of data required for transmitting or storing stereoscopic pairs can be noticeably reduced. This compression approach is known as Asymmetric Stereoscopic Video Coding [4][5].  In this thesis we present a new compression scheme for stereoscopic video coding, which is based on asymmetric stereoscopic video coding. Unlike all the other asymmetric methods, in the proposed method quality degradation is applied on both views. However, the degraded areas in the left view are kept in original quality in the right view and vice versa. Our method has the advantage of being fair to both left-eyed and right-eyed observers.  Apart from 3D video technology, the recent advances in the High Dynamic Range (HDR) technology have also introduced challenges in capturing, transmitting, storing and displaying HDR content [6]. However, unlike 3D videos, HDR technologies have not been widely commercialized yet. Nevertheless, due to HDR genuine viewing quality and its obvious advantages, it is expected that HDR technology will find its way into the consumer market very soon.  3  HDR technology owes its viewing quality to the higher dynamic range that it is capable of capturing, storing and displaying. As opposed to Low Dynamic Range technology, HDR covers the brightness range and the color gamut that the Human Visual System (HVS) is capable of perceiving. The color, brightness and overall the whole presentation of HDR video is as alive as the actual captured scene.  Since HDR data contains more information of the captured scene, it is represented by higher number of bits of data per each color channel compared to the conventional 8-bit per channel used in the case of LDR. Currently, HDR data can take up to 16 bits per channel, depending on the format used. Hence, HDR data is significantly larger in size compared to the existing LDR data.  While HDR technology is maturing, backward compatibility will be very important in enabling this new market. In order to benefit from captured higher dynamic range video, we need to use current transmission channels and LDR displays. HDR content may be converted to LDR equivalent using tone-mapping schemes [7][8][9][10], allowing it to be transmitted and viewed on LDR displays. Tone-mapping methods try to quantize HDR data into 8-bit data.  Content producers and the display manufacturers are progressively adapting their technologies to embrace the true-life quality that HDR video offers. However, one area that has not received the required attention so far is the compression and transmission of HDR data. Although video compression standards such as H.264/AVC and HEVC are designed to work on higher (than LDR) bit-depth data, their Rate-Distortion optimization (RDO) and, thus, their efficiency is based on the LDR data characteristics, while visual quality is measured using metrics designed for LDR content. Therefore, when the above encoders are used on HDR videos, it is not guaranteed that the quality of the encoded video is the best achievable for a given bitrate limitation. In order to increase 4  the compression efficiency of the two standards for HDR streams, we should redesign their optimization process by considering the HDR video characteristics.  In this thesis, we optimize the rate-distortion process of the H.264/AVC and HEVC video compression standards, considerably improving their HDR compression efficiency. To begin with, we perform an extensive study on the design of the recently standardized video encoding standard HEVC and how its RDO process functions. Moreover, we study the performance of the existing LDR and HDR quality metrics on HDR video content by comparing their results with those of subjective tests. The metric yielding results which have the best correlation with those of the subjective tests in the presence of compression artifacts is the one that we employ to improve the RDO performance of H.264/AVC and HEVC on HDR content.  The rest of this thesis is organized as follows. Chapter 2 provides background information on existing video coding standards, 3D video coding methods and HDR video compression and rate-distortion optimization. In Chapter 3 we present a new method for compressing stereoscopic videos that takes advantage of the Human Visual System’s (HVS) stereo special characteristics. Chapter 4 presents a performance comparison of the existing LDR and HDR metrics on HDR video data in order to identify the most effective one to be used in the rate-distortion optimization process in Chapter 5.  In Chapter 5 we present our rate-distortion optimization for the H.264/AVC and HEVC standards, designed to improve their compression efficiency when applied to HDR video content.   5  2 Background In this section we provide background information on video compression standards, stereoscopic video compression methods, and high dynamic range (HDR) imaging and video compression techniques. In subsection 2.1 we give a detailed summary of the existing video compression standard H.264/AVC and the newly emerged standard known as High Efficiency Video Coding (HEVC), and a comparison of their coding efficiency. In subsection 2.2 we provide basic information on stereoscopic video coding schemes and specifically on asymmetric stereoscopic video coding methods. Finally, in subsection 2.3 we give an overview of the existing HDR video compression methods and the Rate-Distortion Optimization process used in the above-mentioned video compression standards. 2.1 Video Coding Standards: From H.264/AVC to HEVC H.264/AVC [1] has been the most widely used video compression standard in recent history. Digital video sequences consist of consecutive 2D still images. Hence, in video compression, in addition to removing redundancies within a single image, the main principle behind image compression standards such as JPEG [11], redundancies between the temporally adjacent frames are also taken into account in order to achieve additional bit rate reduction. Some frames in a video sequences are encoded without any reference to other frames, since they are references for other frames. These frames are encoded as a still image using Intra coding methods, i.e., only spatial redundancies are removed. These frames are known as Intra (I) frames. Other frames are predicted and encoded with reference to their former frames. These frames are called Predicted (P) frames. Finally, some frames in a video sequence are predicted using information from either a previous or a next frame, choosing the best match between the two and thus more efficient compression. These frames are known as Bi-predicted (B) frames. Typically I frames require more bits than P frames and P frames more than B frames.  6  For each group of pixels in P and B frames, usually known as blocks, there exists a reference block in previously encoded frames. Each block that is encoded with reference to another block is called an Inter-coded block. For each inter-coded block a motion vector (MV) is calculated which represents how and in which direction the predicted block has moved with reference to its reference block. The predicted block is constructed using the reference block frame number, motion vector and the residual prediction error. The residual prediction error is the difference between the reference block and the actual block that is being predicted. This technique is known as Motion Compensation (MC). A transformation, usually Discrete Cosine Transform (DCT), is applied to the residual prediction error followed by quantization, scanning and entropy coding.  In the following subsection we present an overview of HEVC, the latest and most advanced video standard, and the new features that lead to its improved performance compared to H.264/AVC. 2.1.1 Overview of the HEVC standard1 The recent advances in digital video world have made possible capturing and displaying higher quality videos that are close to real life experiences; Ultra High Definition (UHD) resolution, 3D and High Dynamic Range (HDR) videos are some examples. The amount of raw data to represent these videos is by far larger than conventional videos. Internet and broadcasting networks barely have sufficient capacity to transmit large amounts of high definition (HD) content, let alone UHD.  With the ever-increasing size of digital videos, an efficient video coding standard is critical for capturing, storing, streaming and displaying digital videos.  The limitations of current technologies prompted the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) to establish a Joint Collaborative Team on Video Coding (JCT-VC) with the objective to develop a new high-performance video coding                                                       1 This work has been published as Pourazad, M.T.; Doutre, C.; Azimi, M.; Nasiopoulos, P., "HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC," IEEE Consumer Electronics Magazine, July 2012 7  standard. A formal Call for Proposals (CfP) on video compression technology was issued in January 2010. The received proposals were presented in the first JCT-VC meeting in April 2010. The evaluations that followed showed that some proposals could reach the same visual quality as H.264/MPEG-4 AVC High profile at only half of the bit rate and at the cost of two to ten times increase in computational complexity. Since then, JCT-VC put a considerable effort towards the development of a new compression standard known as the High Efficiency Video Coding (HEVC) standard, with the aim to significantly improve the compression efficiency compared to the existing H.264/AVC high profile. Eventually, in April 2013 HEVC was ratified as the new video compression standard [12].   Figure 2-1 shows the block diagram of the basic HEVC design – as it is implemented in the most recent software codec, HM 12.1. As expected, the main structure of the HEVC encoder resembles that of H.264/AVC.  Figure 2-1. Block Diagram of HM 12.1 Encoder  8  Some of the key elements of the HEVC standard include: 1) a more flexible block structure, with block sizes ranging from 64x64 down to 8x8 pixels using recursive quad-tree partitioning, 2) improved mechanisms to support parallel encoding and decoding, including “tiles” and wavefront parallel processing (WPP), 3) more intra prediction modes (35 in total, most of which are directional), which can be done at several block sizes, 4) support for several different integer transforms, ranging from 32x32 down to 4x4 pixels, as well as non-square transforms, 5) improved motion information encoding, and 6) extensive in-loop processing on reconstructed pictures, including a de-blocking filter, sample adaptive offset (SAO) and adaptive loop filtering.  2.1.1.1  HEVC: The Features that Made the Difference   In this section, we review the design of HEVC and discuss the features that differentiate it from H.264/AVC.  2.1.1.1.1 Picture Partitioning Similar to the conventional video coding standards, HEVC is a block-based hybrid-coding scheme. One of the major contributors to the higher compression performance of HEVC is the introduction of larger block structures with flexible sub-partitioning mechanisms.   The basic block in HEVC is known as the Largest Coding Unit (LCU) and can be recursively split into smaller Coding Units (CU), which in turn can be split into small Prediction Units (PU) and Transform Units (TU). These concepts are explained in the following subsections. 2.1.1.1.2 Coding Units In H.264/AVC, each picture is partitioned into 16x16 macroblocks and each macroblock can be further split into smaller blocks (as small as 4x4) for prediction [1].  Other standards, such as MPEG-2 and H.263, were more rigid regarding block sizes for motion compensation and transforms.  Such a rigid structure often causes encoding to be less efficient when the picture resolution is changed, because the encoder is fine-tuned for a particular type of content (i.e., CIF, SD, etc.).  As the 9  picture resolution of videos increases from SD to HD and beyond, the chances are that frames contain larger smooth regions, which can be encoded more effectively when large block sizes are used.  This is the reason that HEVC supports larger encoding blocks than H.264/AVC, while it also has a more flexible partitioning structure to allow smaller blocks to be used for more textured and - in general - “uneven” regions. In HEVC, each picture is partitioned into rectangular picture areas called Largest Coding Units (LCUs) that can be as large as 64x64. The LCU notion in HEVC is generally similar to that of a macroblock in the previous coding standards. LCUs can be further split into smaller units called Coding Units (CUs), which are used as the basic unit for intra and inter coding. The size of CUs can be as large as that of LCUs or can be recursively split into four equally-sized CUs and become as small as 8x8, depending on the picture content. Due to recursive quarter-size splitting, a content-adaptive coding tree structure comprised of CUs is created in HEVC [13]. Figure 2-2 shows an example of partitioning a 64x64 LCU to various sizes of CUs.  Figure 2-2. Partitioning of a 64x64 LCU to various sizes of CU 10  2.1.1.1.3 Prediction Units Each coding unit can be further split into smaller units, which form the basis for prediction. These units are called Prediction Units (PUs).  Each CU may contain one or more PUs and each prediction unit can be as large as their root CU or as small as 4x4 in luma block sizes. While an LCU can recursively split into smaller and smaller CUs, the splitting of a CU into PUs is non-recursive (it can only be done once).  Prediction units can be square or rectangular.  In particular, a CU of size 2Nx2N can be split into two PUs of size Nx2N or 2NxN, or four PUs of size NxN. Also, in the case of inter-coded blocks, CUs can be split asymmetrically (see sub-section 2.1.1.1.7 for more details). This allows partitioning which matches the boundaries of the objects in the picture [13]. Figure 2-3 shows examples of partitioning a CU to square and non-square prediction units.   Figure 2-3 Square and non-square Prediction Units 11  2.1.1.1.4  Transform Units Similar to previous video coding standards, HEVC applies a DCT-like transformation to the residuals to de-correlated data. In HEVC, a Transform Unit (TU) is the basic unit for the transform and quantization processes. The size and the shape of the TU depend on the size of the Prediction Unit. The size of square-shape TUs can be as small as 4x4 or as large as 32x32. Non-square TUs can have sizes of 32x8, 8x32, 16x4 or 4x16 luma samples.  Each CU may contain one or more TUs; each square CU may split into smaller TUs in a quad-tree segmentation structure. Figure 2-4 shows an example of how multiple TUs are arranged in a LCU [13].    Figure 2-4 An example of arranging Transform Units in an LCU  12  2.1.1.1.5 Slices and Tile structure  HEVC introduced tiles as a means to support parallel processing, with more flexibility than normal slices in H.264/AVC, but considerably lower complexity than Flexible Macroblock Ordering (FMO) in H.264/AVC [14].  Tiles are specified by vertical and horizontal boundaries with intersections that partition a picture into rectangular regions [14] . In each tile, LCUs are processed in a raster scan order. Similarly, the tiles themselves are processed in a raster scan order within a picture. 2.1.1.1.6 Intra-frame coding Like H.264/AVC, HEVC uses block-based intra prediction to take advantage of spatial correlation within a picture.  HEVC follows the basic idea of H.264/AVC intra prediction, but makes it far more flexible.  HEVC has 35 luma intra prediction modes compared to 9 in H.264/AVC. Furthermore, intra prediction can be done at different block sizes, ranging from 4x4 to 64x64 (whatever size the PU has).  Figure 2-5 shows the luma intra prediction modes of HEVC versus those of H.264/AVC. The number of supported prediction modes varies based on the PU sizes (see Table 2-1. Luma intra prediction modes supported) [15]. Table 2-1. Luma intra prediction modes supported PU Siz Intra Prediction Modes 4x4 0-16, 34 8x8 0-34 16x16 0-34 32x32 0-34 64x64 0-2, 34  Planar mode is another intra prediction mode in HEVC, which is useful for predicting smooth picture regions. The idea is to provide a spatial prediction, which is continuous along the block 13  boundaries. The prediction is generated from the average of two linear interpolations. (horizontal and vertical). The right-column and the bottom-row are generated by a replication process from the right-top pixel and the left-bottom pixel inside the current unit [16]. The number of chroma modes in HEVC is also more than H.264/AVC. While H.264/AVC supports 4 chroma intra prediction modes, as shown in , HEVC has 6 chroma intra prediction modes: DM (Direct mode), LM (Linear mode), Vertical (mode 0), Horizontal (mode 1), DC (mode 2) and Planar (mode 3). In principle, DM and LM exploit the correlation of the luma component and chroma component [17]. 14    Figure 2-5 Luma intra prediction modes of HEVC (above), and H.264/AVC (below)          15  2.1.1.1.7 Inter Prediction Inter-coded pictures are those coded with reference to other pictures. Inter-prediction takes advantage of the similarities of each picture with its temporal neighbours and exploits these similarities. The enhancements of inter-prediction introduced in HEVC – compared to H.264/AVC – are described below.  2.1.1.1.8 Improved Sub-pixel Interpolation Like H.264/AVC, the accuracy of motion compensation in HECV is 1/4 pel for luma samples. If a motion vector points at an integer position of a luma sample, the corresponding samples of the reference pictures will form the prediction signal; otherwise a non-integer position of the luma samples is interpolated. To obtain the non-integer luma samples, separable one-dimensional 8-tap and 7-tap interpolation filters are applied horizontally and vertically to generate luma half-pel and quarter-pel samples, respectively [18]. An illustration of the surrounding full-pel samples used in generating the fractional-pel values is provided in Figure 2-6 and the filter coefficients for each non-integer luma position are available in Table 2-2. Note that, unlike H.264/AVC, the quarter-pel values are calculated from the integer luma samples with a longer filter, instead of using bilinear interpolation on the neighbouring half-pel and interger-pel values.  The prediction values for chroma components are similarly generated by applying a one–dimensional 4-tap DCT-based interpolation filter. The accuracy for chroma sample prediction is 1/8 of chroma samples. 16   Figure 2-6. Sub-pel interpolation from full-pel samples  Table 2-2.Luma Sub-pel Interpolation Filter coefficients Position Filter coefficients 1/4 {-1, 4, -10, 58, 17, -5, 1, 0} 2/4 {-1, 4, -11, 40, 40, -11, 4, -1} 3/4 {1, -5, 17, 58, -10, 4, -1, 0}  2.1.1.1.9 Motion Parameter Encoding and Improved Skip Mode In H.264/AVC, motion vectors (MVs) are encoded by calculating a predicted motion vector and encoding the difference between the desired MV and the predicted one.  The predicted MV is formed as the median of three surrounding MVs (left, above and above-right).  Furthermore, H.264/AVC has a SKIP mode, where no motion parameters or quantized residuals are encoded in the bitstream, but instead the motion parameters are inferred from a co-located MB in the previous frame.   17  In HEVC, MVs can be predicted either spatially or temporally (using MVs from previously coded pictures).  Furthermore, HEVC introduces a technique called “motion merge” [19], where a number of candidate motion parameter sets are derived from spatially and temporally close PUs and one is chosen by the encoder and signalled in the bit-stream.  A motion parameter set consists of a motion vector, reference picture index, and reference list flag.  For every inter-coded PU, the encoder can choose between 1) using explicit encoding of motion parameters (i.e., using motion vector prediction and encoding the MV difference and reference picture), using motion merge mode, or 2) using the SKIP mode (which is closely related to motion merge).  In the new SKIP mode in HEVC, the encoder also encodes the index of a motion merge candidate and the motion parameters for the current CU are copied from the selected candidate, as it is done in the motion merge mode.  However, in SKIP mode no residual data is encoded, so the only information placed in the bit-stream for the CU is the SKIP mode flag and the motion merge index.  This allows areas of the picture that change very little between frames or have constant motion to be encoded using very few bits. 2.1.1.2 Performance Compared to H.264/AVC HEVC outperforms H.264/AVC by 29.14% to 48.17% in terms of bit rate or 1.4dB to 1.9dB in terms of PSNR.  Subjective comparison of the quality of compressed videos – for the same (linearly interpolated) Mean Opinion Score (MOS) points - shows that HEVC outperforms H.264/AVC yielding average bitrate savings of 58% [20]. These objective and subjective results confirm that the goal to develop a high efficiency video coding standard, which delivers the same visual quality as H.264/MPEG-4 AVC High profile at only half of the bit rate has been accomplished.   18  2.2 Stereoscopic Video Coding As stated in previous section, the existing video coding standards take advantage of the spatial and temporal correlation between frames of a video sequence in order to compress the amount of information needed to represent them. In addition to these temporal and spatial redundancies, in stereoscopic videos there is a noticeable correlation between left and right views that can be taken into account to further compress them. This approach is used in H.264/AVC Multiview Coding [21] and recently in the HEVC Multiview Extension [22]. At the encoder side, generally one view is selected as the reference view and the other view is predicted from the reference view. For each non-reference view, supplemental information is sent to the decoder. This information contains those parts in the non-reference view that do not exist in the reference view. Hence, instead of sending the whole picture for the non-reference view, only a small part of it is sent. This method significantly improves the coding efficiency for stereoscopic video sequences [21][22]. Despite these bit-rate savings, video coding schemes still require far more bandwidth than their monoscopic counterparts. Asymmetric stereoscopic video coding [4][5] is one technique that uses knowledge of the Human Visual System and tries to further improve compression of stereo. 2.2.1 Asymmetric Stereoscopic Video Coding The main idea behind asymmetric stereoscopic video coding is based on the suppression theory of the binocular vision [3]. This theory studies the unique characteristic of the human visual system that a stereoscopic pair with one high-quality and one low-quality views may be perceived in 3D as good as the high-quality view.  Figure 2-7 depicts the idea of Asymmetric Stereoscopic Video Coding.  19    Figure 2-7. Asymmetric  stereoscpic video coding idea Perkins in [3] was the first one who introduced mixed-resolution stereoscopic pairs in 1992.  Based on the suppression theory of the binocular vision, he created stereoscopic pairs in which one view has lower resolution than the other view. He generated the low-resolution views by subsampling in both horizontal and vertical directions by a factor of 4 and by reconstructing them at the decoder end using bilinear interpolation. Using this technique, coding of the mixed-resolution stereoscopic pair would only require only 6% more bitrate than that needed for coding the high-resolution view. Subjective evaluations showed that the final perceived quality of the mixed-resolution stereo pair is close to that of the equal-resolution stereo pair. The depth perception was also similar in both 3D pairs. In [5], the effect of mixed-resolution stereoscopic video on the perceived quality, sharpness and depth is studied. In this work one view, the left view, keeps its original resolution while the other view’s resolution is decreased spatially and temporally. For spatial processing ¼ and ½ vertical and horizontal resolutions were applied. One filter implemented for temporal filtering was the averaging of the pixel value in the current frame and that of the next frame. The other one is dropping-and- repeating alternate frames. The perceived quality, sharpness and depth of the original-resolution Time Time Frame 1 Frame n Frame 1 Frame n 20  sequences were compared to the ones of the mixed-resolution ones. The subjective results showed that the quality and sharpness of the mixed-resolution sequences was close to those of the equal-resolution ones, while the depth perception of the both pairs was the same. However, these results applied to the spatially filtered sequences. The temporal filtering noticeably degraded the perceived quality and sharpness of the stereoscopic pair. Another asymmetrical approach in coding stereoscopic videos is proposed in[23]. Here instead of filtering views spatially the views are coded in their original quality and resolution yet with different Quantization Parameters (QPs). In H.264/AVC and HEVC encoders, QP specifies the quality of the encoded video. For the left view, which is the reference view, a smaller QP is chosen than the one chosen for the right view, which is the view with lower quality. It is shown in [23] that there exists a threshold in degradation of the quality in the views.  A Just Noticeable Difference threshold of 2dB between the left and the right views is used, i.e., whatever the PSNR of the left view is it is lowered by 2dB for the degraded right view. They also apply asymmetric characteristic in the chroma information of the right and left view in the sense that for right view no chroma channel is encoded and only luma information is encoded. At the decoder side, the chroma information of the right view is reconstructed from the chroma information of the left view chroma channel. Although some parts of the reconstructed right view lack chromatic content, viewers were able to see a color 3D video out of the stereoscopic pair close to the quality of the original color stream. In summary, discarding the chrominance channel of one view reduces the bitrate needed for transmission and storage of stereoscopic videos. Based on the subjective test results the depth perception of the stereoscopic pair encoded using the proposed method is the same as that of the full-colour coded stereoscopic video. Despite the fact that all the above-mentioned methods reduce the bandwidth or the memory required for the storage of the stereoscopic videos, they do not take into account the eye dominancy 21  factor. Humans differ in their eye dominance; some are right-eyed while others are left-eyed. Based on the above methods, an observer with left dominant eye may be exposed to a stereoscopic video in which the left view has lower quality or lower resolution. This can affect the perceived quality of the video. Moreover, the quality imbalance between the two views could be a cause of visual fatigue in longer sequences.  Interleaving of the low and high quality frames between the left and right views in time is investigated in [24]. For an interval of time the left views consist of the lower-quality frames while for the next interval of time the lower-quality frames reside in the right views. This way the imbalance is offset to some extent. However, subjective tests showed that the switch between the views’ quality was noticeable. Yet the subjects could not detect the cross-switch when it occurred at a scene cut. In this thesis we propose a new technique to asymmetrically encode stereoscopic videos and overcome the imbalance between the presented quality of the left and right views. We subjectively evaluate the effect of our method on the quality, depth and sharpness perception of the encoded stereoscopic videos. 2.3 High Dynamic Range (HDR) Video Coding The ultimate goal of digital media technologies has always been to generate and deliver pictures and videos with real-life quality.  Adding depth perception with the introduction of 3D technology and improving video resolution up to HD, 4K and 8K ultra HD are some of the recent efforts in the direction of giving the viewer the impression of “being there”.  Recently, HDR is regarded as one of the most advanced technological developments in digital media, significantly elevating viewers’ quality of experience. The growing popularity of the High Dynamic Range Video in multimedia application areas owes its recognition to its genuine 22  viewing quality. As opposed to Standard Low Dynamic Range (LDR) technology, HDR is capable of handling higher dynamic range and hence wider brightness range and color gamut. HDR content shown on HDR displays can be as live as the actual captured scene in terms of color and brightness range. With the introduction of HDR, the capturing process and devices, transmission, viewing displays, and media technologies have all been deeply affected. HDR video cameras are capable of recording higher dynamic range of a scene compared to the LDR cameras. On the other hand, the representation of the captured HDR data is different from the LDR videos as they hold more information. HDR data is represented by higher number of bits, generally by 10 up to 16-bit per color sample, as opposed to the conventional 8-bit data in the LDR case.  Moreover, the medium on which the HDR data is viewed should also be capable of emitting enough light so that the presented high dynamic range data can be viewed with its respective level of brightness and dynamic range. Although the content producers and the display manufacturers are adapting themselves to the new HDR technology to benefit from its real-life quality of experience, the compression and transmission of the HDR data has not been adequately addressed. The existing video coding standards such as H.264/AVC [1] and High efficiency Video Coding (HEVC) [2] are optimized for LDR videos in the rate-distortion sense, i.e., LDR content is encoded with the least possible distortion for a given bitrate. Currently, both H.264/AVC and HEVC codecs support higher bit-depth data as their input. In [25] we evaluated the performance of the HEVC video coding standard on HDR video content and compared it to that of the H.264/AVC video compression standard on the same content. Subjective evaluations of the results using an HDR display show that viewers clearly prefer the videos coded by HEVC to the ones encoded using H.264/AVC. In particular, HEVC outperforms H.264/AVC by an average of 10.18% in terms of mean opinion score and 25.08% in terms of bit rate savings. However, despite the superiority of the 23  HEVC standard over H.264/AVC in terms of encoding HDR data, none of them is designed based on properties/characteristics of High Dynamic Range data. In general, video compression algorithms take advantage of the spatial similarities within a frame and the temporal similarities between temporal adjacent frames (as explained in Section 2.1). Frames are divided into rectangular blocks and similarities within these blocks are used to reduce information using prediction modes and entropy coding techniques. Deciding which mode to use to encode the block with, involves finding the distortion and the rate each prediction mode will generate. The mode that is selected eventually is the one that yields minimum distortion for a specific bitrate bound; or is the one that yields minimum rate while its distortion does not exceed a specific distortion. Hence, the trade-off between minimizing the distortion and the rate has to be optimized. This process is known as the rate-distortion optimization (RDO) problem in video encoding. The following section describes the RDO problem. 2.3.1 The Rate-Distortion Optimization Problem in Video Coding As explained in Section 2.1, High Efficiency Video Coding is the most recent video coding standard with a higher video compression efficiency compared to prior standards such as H.264/AVC [2]. HEVC, like any other video coding standard, takes advantage of temporal similarities between frames and spatial similarities inside each frame. The base of coding in HEVC is Coding Units (CUs), which are simply rectangular blocks inside each frame. Each coding unit consists of Prediction Units, which are the base of Inter and Intra prediction (See section 2.1.1.1.3).  An Intra Prediction Unit can be as large as 64x64 or can be split recursively into smaller 32x32, 16x16, 8x8 and 4x4 units.  The supported prediction modes of each Intra Prediction Unit depend on its size. For instance for a 32x32 intra prediction unit, 35 intra prediction modes could be used while for a 64x64 unit only 4 modes are supported.  24   Similarly, for Inter Prediction Units there exist several supported partitioning modes. Each Inter-predicted PU is encoded using a reference unit inside a reference frame along with its associated Motion Vector. At the decoder side, the encoded unit is reconstructed using the residual signal, the prediction mode, and the associated information such as Motion Vector information (in the case of inter Prediction).          Thus, the quality of the final decoded frame at a given bitrate, highly depends on the encoder’s  performance in selecting the partition sizes, prediction modes, motion vectors (in the case of inter prediction), and transform modes.  The mode selection process should follow the rate bound limitation, i.e., the resulted bitrate of the encoded video should not exceed the maximum allowed bitrate. There is a trade-off between the rate and the distortion of the encoded video. To obtain the highest possible video quality at a given bitrate, rate-distortion optimization algorithms are utilized during the encoding process. The goal of the rate-distortion optimization algorithms is to minimize the distortion of the encoded video at a fixed bitrate or equivalently to minimize the bit rate at a given distortion depending on the encoder setting. The case of having fixed bitrate and minimizing distortion is expressed as follows: min{D},     subject to R≤ Rmax, (1) where D is the distortion, R is the rate of the video, and Rmax is the maximum allowed bitrate. To solve this optimization problem, the encoder utilizes the Lagrangian Multiplier method as follows: min{J},    where J=D+λ R , (2)  where J is the Rate-Distortion (RD) cost function that has to be minimized and λ is a Lagrange multiplier. An optimized λ guarantees an optimized cost at each Rate and Distortion. More specifically encoder exploits RDO process during the inter and intra prediction process, so that a 25  balance between bitrate and distortion of the to be coded CU is achieved. For inter prediction, to select the best Motion Vector, the RD cost minimization problem (2) is expressed as follows: min{J},    where JMotionVector=D + λMotionVectorR , (3) and for mode selection process (intra prediction), the RD optimization problem (2) is as follows:  min{J},    where Jmode=D + λmodeR. (4) By employing an optimized value for λ in (3) and (4) and finding the partitioning structure, intra prediction modes, and motion vectors that minimize the RD cost in (3) and (4), the best possible compressed video quality is obtained at a given bitrate. Figure 2-8 illustrates an example of intra mode decision process for a 64x64 CU. As it is observed after calculating RD cost for different partition sizes and different modes, eventually the partitioning and the intra modes that yield the least RD cost are selected. The decision process of inter prediction is more complex than intra prediction due to the flexibility in selection of a reference frame and motion vector orientation as well as the half and quarter pixel accuracy of the motion vector. The inter prediction process becomes even more complex in the case of B-frames due to bi-directional prediction (there ar e more available options for prediction to minimize (3)).  26   Figure 2-8 Decision Process for an Intra Prediction Unit The optimized λ was experimentally found for LDR videos in [26] and consistently employed in the video coding standards as follows: 𝜆 ™?? = 𝑐  ×  𝑞™???    (5) 𝜆 ™????    ™????    = 𝜆 ™??    ,        (6) where c is a constant parameter and qstep is  the quantization step size. The quantization step size is a function of the Quantization Parameter (QP). Presently in HEVC standard the λ  parameter used for encoding is defined as follows:  𝜆 ™?? = 𝛼  ×  2(( ™ ? ™ )/?) ,  (7) 27  where α is a constant parameter. The λ parameter in HEVC is empirically estimated by striking a balance between the bitrate and the quality in terms of PSNR for LDR videos. As it is observed from (7) and as it was proven in [26], the Lagrangian Multiplier (λ) is dependent on QP, and QP controls video quality. For inter and intra prediction process, λmode and  λMotion Vector are driven from (7) and (6) respectively. Then for each PU the bitrate and distortion are measured and the RD cost (J) is calculated accordingly using (2). The distortion is measured by a distortion metric that is mainly the Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or Mean of Squared Error (MSE). By finding the minimum J for each Prediction unit, it is guaranteed that the entire encoded video to have a minimum possible bitrate at a given quality level in terms of PSNR.   However, PSNR is shown not to be a good predictor of the video quality, as it does not accurately correlate with how the human visual system works [28]. A new approach was proposed in [29] for RDO process that employs the Structural Similarity SSIM index [30] as the distortion metric inside the RDO process. The results show quality improvement in the encoded videos using this method compared to the ones encoded by the reference encoder, at the same bitrate level. Nevertheless, both PSNR and SSIM are LDR-driven quality metrics and are designed to work on LDR content. Considering the current Lagrangian Multiplier is estimated for LDR video applications to balance between the bitrate and PSNR-based video quality, its application for encoding HDR video content does not necessarily strike a balance between bitrate and distortion; and hence the quality of the encoded video might not be the best achievable. In order to find the optimal λ for HDR data, the process of finding the optimal λ has to be redone using HDR data as the content and an HDR distortion metric. In this thesis we propose a new Lagrangian Multiplier for the RDO process inside the encoder using an HDR quality metric as the evaluator of the distortion. The proposed Lagrangian 28  Multiplier is implemented inside the most recent version of the HEVC codec software. To evaluate how the new Lagrangian Multiplier works in improving the quality of the encoded HDR video we subjectively compare the quality of the videos encoded by the updated/proposed encoder with the quality of the ones encoded by the original encoder over a set of HDR video data.  29  3 A Novel Method for Stereoscopic Video Coding with Low-pass Filtered Slices2 3.1 Introduction The existing asymmetric stereoscopic video encoding schemes degrade the quality of one of the views either by low-pass filtering or reducing their resolution. This will reduce the required bitrate for encoding them while their perceived quality remain unchanged in the eyes of the human observers based on the suppression theory of binocular vision [3]. If we assume the quality-degraded view is the left one and the sequence is viewed by a left-eye dominant observer, then the perceived quality by that observer may be lower than that perceived by a right-eyed observer. This shortage of these methods causes them to be unfair to observers with different dominant eye. In this chapter, we propose a novel method for asymmetrically encode the stereoscopic video content by taking into account the significance of the eye dominancy factor. Our method is fair to both right-eyed and left-eyed viewers in the sense that both groups are exposed to equal quality in left and right views while the quality of part of each view is degraded. We divide the frames of each view into slices and apply low-pass filtering to the odd slices of one of the views and the even slices of the other view. We run subjective tests to quantify the perceived sharpness, depth and quality of the resulting stereoscopic videos and compare these results to those of the original videos. Performance evaluations showed that despite the quality degradation (up to a threshold) in both views, the perceived sharpness, quality and depth of the stereo pair was close to those of the original-quality stereo pair.                                                       2 A version of this chapter has been published as Azimi, M, Valizadeh, S.; Xiaokang Li; Coria, Le.; Nasiopoulos, P., "Subjective study on asymmetric stereoscopic video with low-pass filtered slices," International Conference on Computing, Networking and Communications (ICNC), Maui, USA, Feb. 2012 30  The rest of this chapter is organized as follows. Section 3.2 describes our method. Then, in section 3.3 results are presented and discussed. Finally, conclusions are made in 3.4. 3.2 Proposed Method In our method, we examine the perceived sharpness, depth and quality of stereoscopic videos after low-pass filtering alternate horizontal slices in the right and left views. A large variety of filter levels and sizes of horizontal slices are considered.  Figure 3-1 shows one example where the odd slices (1, 3 and 5) of the left view and the even slices (2 and 4) of the right view are low-pass filtered. We consider only horizontal slices since the horizontal disparity between the two views has the potential of causing the same object to be filtered in both views in the case of other directions (e.g., vertical).  Figure 3-1. odd slices of the left view and the even slices of the right view are filtered  In our study, we consider several numbers of horizontal slices in each frame: 2, 4, 10, 40 and 72. Performance evaluations show that 10 slices per frame provide the best visual quality for all different levels of low-pass filtering, with a fair distribution of reduced quality in both views. An excessive number of slices – such as 40 and 72 - seem to annoy the viewers.  31  Our choice of a low-pass filter is a 15x15 Gaussian filter which has the following form: 𝐺? 𝑥, 𝑦 = ????? 𝑒𝑥𝑝?????????  (8) We apply Gaussian filter since it is a typical and well-known low-pass filter. We pick a window size of 15x15 in order to be able to generate filters for the different slice sizes with less and more strength by just changing the sigma in (8). In the first set of our tests, the strength of our filter follows a pulse-train pattern, which allows us to apply it to every other slice while keeping the in-between slices at their original quality. We ensure a smooth transition between filtered and unfiltered slices by applying weaker filtering at the edges compared to the center of the slice. We control the strength of our Gaussian filter generating different sigma values from a Bell function. For the pixels near the slice edges, we apply weak filters with sigma close to zero. As the pixels get further from the edges, the sigma of the filter increases such that the strongest filtering is applied to the pixels in the center of every slice. Figure 3-2 (a) and Figure 3-2 (b) show the filtering pattern of the right and left views in the first and the second set of tests, respectively. The grey slices are the low-pass filtered ones. The degree of the gray color represents the strength of the low-pass filter.  32   Figure 3-2 (a) left and right frames with unsmoothed edges. (b) left and right frames with smoothed edges  3.2.1 Test Setup 3.2.1.1 Video Sequences We consider two representative stereo video sequences for our tests. The first video, “Mother and Kid”, includes a lady standing with her kid. This video was taken outdoors and it contains different levels of details such as human faces and textures such as bushes and grass. The second video, “Two Dolls”, includes two dolls being moved in front of the camera. This video was shot indoors and it has less detail than the “Mother and Kid”. Both sequences are silent, Each sequence is 10 seconds long with 30 fps. The resolution of each view is 1920 × 1080 pixels. Our test videos are captured with two identical HD camcorders (1080i, 60Hz, NTSC) set up in parallel. Figure 3-3 (a) and (b) show the first frame of the right view of Two Dolls and  Mother and Kid, respectively. These frames are the right view of our stereoscopic videos. (a) (b) 33    Figure 3-3. (a)First video sequence: Mother and Kid (b) Second video sequence: Two Dolls. 3.2.1.2 Test Design We apply four different levels of Gaussian filter which are produced by four different sigmas of 1, 3, 10 and 30, to each of our test sequences. By adding up the original video to these four low-pass filtered videos, we come up with five different versions for each sequence, resulting in ten unique stereo sequences. In order to see how people perceive the quality degradation obtained by the same filters in 2D and 3D videos, we take the left view of each sequence as their 2D version and apply the above-mentioned filters to them.  This results in 5 unique non-stereo test videos for each sequence. (a) (b) 34  As a next step, in order to have smoothed slice edges, we consider three different sigma values (3, 10 and 30) for the maximum filter strength.  The filter with highest strength (sigma 30 ) is applied at the center row of each slice and gradually decreases to the minimum sigma value of 1 for pixel rows closer to the edges. This implementation follows the pattern mentioned in Section 3.2. This step brought about three unique test videos for each sequence. In summary, all the above test videos results in eight stereoscopic test sequences as well as five monoscopic ones for each sequence. We ask the viewers to rate the overall quality, depth and sharpness of the test sequences after viewing the original one. All test sequences were shown in random order and the subjects were not aware of the test objectives. Table 3-1 summarizes the parameters used in our experiments. Table 3-1 Parameters in test setup FILTER TYPE GAUSSIAN  Number of horizontal slices 10 Display method Stereo and non-stereo Video sequences Mother and Kid (outdoor) Two Dolls (indoor) Filter strength  Pulse Pattern Sigma of 1, 3, 10 and 30 Bell Shape Sigma of 3, 10 and 30 3.2.2 Viewers We showed our tests to 14 viewers. These viewers were between 23 and 38 years old with an average age of 28. Gender distribution was not controlled. All viewers were screened for visual acuity, color vision and contrast sensitivity. Only viewers who passed the screening participated in the experiment. 3.2.3 Display We used a 65” 3D HD TV with 16:9 aspect ratio to show the videos to the viewers. We inserted a 10-second grey field between test sequences to allow the viewers’ eyes to rest and also to give them enough time to rate perceived sharpness, depth and quality of the videos. The viewers’ 35  distance from the display was 4 times the height of the display. The room, in which we conducted the tests, was consistent with the ITU-R recommendation 500 [31]. The duration of the test was approximately 12 minutes for each participant. 3.2.4 Test Assessment We asked the viewers to rate the sharpness, overall quality and depth perception for the stereoscopic video sequences, whereas for non-stereo videos, they were asked to rate only the quality and sharpness on a vertical rating scale. The actual scale used in the tests had five equal-length labels: Excellent, Good, Fair, Poor and Bad. We used linear transformation from the scale to numbers between 0 and 100. These numbers were used to calculate the average ratings over the viewers for each video sequence. Ratings were made using the double-stimulus continuous-quality method described in ITU-R recommendation 500 [31]. The original video was shown to viewers prior to each modified video. 3.3 Results Figure 3-4 shows the result of our subjective test for the “Mother and Kid” video sequence (video 1). The vertical axis shows the averaged ratings for both the stereo and non-stereo video sequences. The horizontal axis shows the sigma value of the Gaussian filter applied to the sequences. Sharpness and quality are shown in the top and bottom plots, respectively, while depth perception for the stereo video sequence is shown in the middle figures. Viewers’ evaluations of the non-stereoscopic (2D) videos provide an indication of how strong the filtering is and how it affects the quality and sharpness of the non-stereo tested content. An overall observation for “Mother and Kid”, is that the quality and sharpness of the low-pass filtered stereo are much better than those of the low-pass filtered non-stereo videos. This is because 36  the high quality slices in one view mask the blur in the low-pass filtered slices in other view in the case of stereoscopic videos. This does not apply to monoscopic videos since there is only one view.  Figure 3-4  also implies that the quality and sharpness of the low-pass filtered stereo video are rated close to those of the original video up to a threshold in filtering strength. These results indicate that we can low-pass filter alternate slices of both views without significantly reducing the overall perceived quality of stereo pair. In this case, we may conclude that a Gaussian filter of 15x15 size, and sigma = 3 is a safe bound where most people cannot perceive the quality degradation. Beyond this point, we observe significant perceived quality and sharpness degradation  Figure 3-5 shows the results for the second video sequence, “Two Dolls”. Similar to the case of the “Mother and Kid”, the sharpness and quality of the low-pass filtered stereo videos are rated higher than those of the low-pass filtered non-stereo counterparts. Additionally, sharpness and quality of filtered stereo pairs are rated close to original stereo for up to filter-strength sigma = 3. Here the sharpness quality seems to be just a bit higher than the one for the “Mother and Kid” This may be due to the less relevant details (faces and textures) that this video has compared to “Mother and Kid”. We can also notice from the depth plots of both Figure 3-4 and Figure 3-5 that low-pass filtering will not affect the perceived depth in stereo pairs and it has remained unchanged for all levels of filtering that we applied. Our next test is designed to determine if better ratings can be achieved for quality and sharpness of the videos by gradually smoothing the slice edges. To achieve this, we applied strong filtering in the middle of each slice and reduced the amount of filtering as we got closer to the edges. Figure 3-6 shows the results of our subjective tests on sharpness, depth and video quality for video Mother and Kid with smooth edges and compares them to those with the original filter 37  (unsmoothed edges). We observe that smoothing the slice edges results in slightly better stereo video quality, sharpness. Figure 3-7 shows the results for the second video sequence, Two Dolls. For this sequence as well, our subjects rated the sharpness and quality of videos with smoothed edges slightly better than those without smoothing. In [32] we measured the bitrate reduction in our new asymmetric stereoscopic video coding. We applied the same low-pass filtering to horizontal slices of each view while the corresponding slice in the other view is of the original quality and measured the bitrate reduction with H.264/AVC standard. Our results as depicted in Figure 3-8 show that significant bitrate reduction can be achieved with video quality close to the original video. The points on the RD curve of Figure 3-8 represent Original encoded video, followed by videos low pass filtered with Guassian filter with Sigma 1, 3, 10 and 30.   38   Figure 3-4 Sharpness, depth and quality of video sequence Mother and Kid averaged over the viewers 39       Figure 3-5 Sharpness, depth. and quality of video sequence Two Dolls averaged over the viewers  40   Figure 3-6 Sharpness, depth and quality of Mother and Kid averaged over the viewers with smoothed edges  41   Figure 3-7 Sharpness, depth and quality of Two Dolls averaged over the viewers with smoothed edges  42    Figure 3-8 Mean Score Opinion of the video sequence versus the required bitrate with H.264 compression software    43  3.4 Conclusion We proposed a modified scheme for asymmetric coding of the stereoscopic video in which the videos are divided into horizontal slices in both the left and right views. Half of these slices are low-pass filtered while the corresponding slices in the other view are of original quality. We tested the perceived sharpness, quality and depth of the video sequences subjectively. Viewers rated sharpness and quality of our modified asymmetric videos close to those of the original stereoscopic videos up to a filtering strength threshold (Gaussian 15x15 with sigma = 3), while the same amount of filtering was quite apparent in the monoscopic videos. Our implementation of asymmetric video coding has the advantage of being fair for viewers with one dominant eye over the conventional asymmetric methods, because we divided the filtered slices between both views.  We also combined the proposed method with the H.264/AVC profile and encoded the processed videos. The results showed significant bitrate reduction in the videos processed with our method compared to the original ones while the video quality was rated close to that of the original video. These results confirm that low-pass filtering horizontal slices in the stereoscopic video before we apply our asymmetric compression approach is an effective technique to reduce the transmission bandwidth and storage memory required for the stereoscopic videos, while preserving the quality of the original stereoscopic video.  44  4 Evaluating the Performance of Existing Full-Reference Quality Metrics on High Dynamic Range (HDR) Video Content3 4.1 Introduction The final quality of the HDR videos depends on how well the quality of data along broadcasting steps (i.e., acquisition, transmission, and display) is preserved. Subjective evaluation of the HDR content quality is the ideal assessment measure. However, subjective evaluation of multimedia content is not always practical or efficient in some applications such as compression, broadcasting and video streaming. In such cases, objective quality metrics are required to assess visual quality. This is very important when designing compression standards, since using an accurate, content specific, quality metric will significantly improve the coder’s efficiency for the given media content. One approach to evaluate the quality of HDR content is to extend the usage of LDR quality metrics on HDR content. To this end, HDR data needs to be first processed so that its pixel values fall into a range that is supported by LDR quality metrics. This method is known as perceptually uniform (PU) encoding [33]. Another very simple, but effective technique, for employing LDR metrics on HDR data is based on the multi-exposure inverse tone-mapping. In this technique the HDR stream is tone-mapped to several LDR streams with different exposure ranges, and then the LDR metric is applied to each LDR stream and the numerical quality values are averaged at the end [34]. In addition to these LDR quality metric-based approaches, there is a limited number of quality metrics that have been developed specifically for HDR content. Dynamic Range Independent metric (DRI)-VDP [35] and DRI-VQM [36] are two HDR quality metrics that provide a visible difference map. In other words, these metrics predict the visibility of the distortions as a map, but they do not generate one                                                       3 A version of this chapter has been published as M. Azimi; Banitalebi, A.; Dong. Y.; Pourazad, M.T.; Nasiopoulos, P., “A survey on the performance of the existing full reference HDR video quality metrics: A new HDR video dataset for quality evaluation purposes,” the 12th International Conference on Multimedia Signal Processing (ICMSP 2014), Venice, Italy, November 2014  45  single numerical value for quality. However, the HDR quality metric proposed in [37] and known as HDR-VDP-2 generates a quality value in addition to the distortion map.  The performance of the most of the above mentioned quality metrics has been tested only on LDR data. One reason might be the lack of a comprehensive HDR video database. To the best of our knowledge, the only existing publicly available HDR video dataset is the one provided by Cad´ık, et al. [38]. The HDR videos in this dataset are designed and rendered for Computer Graphics applications. Their resolution is low (512x512 or lower), they are short (at most 60 frames, 3-second long), and they include scenes with low motion. The main focus of this work is to evaluate the performance of the existing LDR and HDR metrics on HDR video content which in turn will provide us with a better understanding of how well each of these metrics work and if they can be applied in capturing, compressing, and transmitting  HDR data. To this end, a comprehensive HDR video database called “DML-HDR” is created and made publicly available to the research community [39]. A series of subjective tests is performed to evaluate the quality of DML-HDR video database when several different representing types of artifacts are present using a HDR display. Then, the correlation between the results from the existing LDR and HDR quality metrics and those from subjective tests is measured to determine the most effective exiting quality metric for HDR. 4.2 DML Dataset One challenge in evaluating HDR video quality is the lack of a representative HDR video dataset. To this end, a comprehensive HDR video database called “DML-HDR” is created [39]. This video dataset consists of seven HDR videos all captured by a professional camera capable of capturing HDR videos (RED Scarlet-X), up to 16 bits per each pixel. All videos represent natural scenes. Each video sequence is approximately 10 seconds long with a frame rate of 30 frames per 46  second (fps). All sequences are recorded in 2048×1080 resolution. Table 4-1summarizes the characteristics of each video sequence, and the snapshots of these videos are shown in Figure 4-1. Please note that the frames are tone-mapped in Figure 4-1for use on LDR media. These captured videos are available both in RGBE and YUV 12-bit format. RGBE is a lossless HDR video format, where each pixel is encoded with 4 bytes, one byte for red mantissa, one byte for green mantissa, one for the blue mantissa, and one byte for a common exponent [40][41]. The YUV 12-bit format consists of three channels, Y for luma and U and V for Chroma. Each channel is represented by integer values between 0 and 4095 (12 bits). In the “DML-HDR” video dataset, in addition to the reference video sequences, five distinctive distorted versions of them are also provided for each sequence. The five representative types of distortions applied to each video are listed below:  Table 4-1 DML Descrption of DML HDR Video Dataset SEQUENCE MOTION LEVEL DETAIL LEVEL NUMBER OF FRAMES FRAME RATE(FPS) RESOLUTION FORMAT Hallway Low Low 270 30 2048x1080 YUV 4:2:0 MainMall High High 240 30 2048x1080 YUV 4:2:0 MainMallTree Low High 240 30 2048x1080 YUV 4:2:0 Playground Low Medium 222 30 2048x1080 YUV 4:2:0 Stranger Low High 300 30 2048x1080 YUV 4:2:0 Table Low Low 300 30 2048x1080 YUV 4:2:0 Tree Low  High 250 30 2048x1080 YUV 4:2:0   47   Figure 4-1 Snapshots of the first frames of HDR test video sequences (tone-mapped version): (a) Hallway, (b) MainMall, (c) MainMallTree, (d) Playground,  (e) Strangers, (f) Table, and (g) Tree Additive White Gaussian Noise (AWGN): white Gaussian noise with mean of zero and standard deviation of 0.002 was added to all frames of each video.  Based on our knowledge from LDR videos, this value of standard deviation may seem to be too small. However, observations from watching distorted HDR videos on the HDR display showed that AWGN with the standard deviation value of 0.002 is visible. This may be due to their larger dynamic range compared to LDR videos. Note that, before adding the AWGN noise to the content, all pixel values were normalized between 0 and 1. After adding the AWGN noise, pixel values were converted back to the original scale. Mean intensity shift: the luminance of the HDR videos was globally increased in all the frames of each video sequence by 10% of the maximum scene luminance. 48  Salt and pepper noise: Salt and pepper noise was added to the 2% of the pixels in each frame  of the videos. The distribution of the affected pixels by salt and pepper noise was random. Low Pass Filter: An 8×8 Gaussian low pass filter with standard deviation of 8 was applied to each frame of all the sequences. Subsequently, rapid changes in intensity in each frame were averaged out. Compression artifacts: All the videos were encoded using the HEVC encoder (HM software version 12.1 [42]) with random access main10 profile configuration. The HEVC encoder settings were as follows: hierarchical B pictures, group of pictures (GOP) size of 8, Internal bit-depth of 12, input video format of YUV 4:2:0 progressive, enabled CABAC entropy coding and rate-distortion optimized quantization (RDOQ). The quantization parameter (QP) was set to 22, 27, 32, and 37 in order to simulate impaired videos with a wide range of compression distortions. The compressed videos are available in 12-bit YUV format in the “DML-HDR” video dataset, whereas all other distorted videos are available in HDR format (.hdr). This is because the YUV format is the default format used by the HEVC reference software (HM software [42][43]).  4.3 Experiment Setting The performance of the objective quality metrics is evaluated by comparing their quality score with the Mean Opinion Score (MOS) on the set of the distorted videos in “DML-HDR” dataset. The following subsections elaborate on the objective and subjective test procedures used in our experiment. 4.3.1 Objective Test Procedure In order to meaningfully test LDR metrics on HDR content, HDR data has to be adapted in LDR domain. One method for adapting HDR data into the LDR domain is Perceptually Uniform 49  (PU) encoding method [33]. PU encoding method transforms luminance values in the range of 105 cd/m2 to 108 cd/m2 into approximately perceptually uniform LDR values. Another very simple yet effective technique for employing LDR metrics on HDR data is known as multi-exposure method [34]. In this method, the HDR data is tone-mapped with several exposures, uniformly distributed over the dynamic range of the data. The quality of each tone-mapped video, which is in turn an LDR video, is computed by the LDR metric. Then the average of the quality of all the tone-mapped versions forms the actual quality score of the metric. In this test, both methods are applied on the HDR data to be able to test LDR metrics on HDR data. LDR metrics used in our experiment include PSNR, SSIM [45] and VIF [46]. Among the existing HDR metrics, HDR-VDP-2 is used in our experiment, as it is the state-of-the-art full-reference metric that works for all luminance conditions (both LDR and HDR) [37]. This metric is designed based on Daly’s visual difference predictor (VDP) [47]. HDR-VDP-2 mimics the human visual system and is designed to predict the visibility of changes caused by artifacts on the test image. The input of the metric includes the reference image, the test image, and some other parameters such as maximum physical luminance of the display, angular resolution of the image, and more options on the viewing environment. The output of the metric is a probability map that determines the probability of detecting dissimilarity between reference and test image by a human observer in each image region. Then by using a pooling strategy the probability map is converted into a value of quality score between 0-100 [37], where 0 represents the lowest quality and 100 stands for the highest quality meaning the reference and test images are identical in terms of the quality. The older versions of HDR-VDP-2 (HDR-VDP 1.7 and HDR-VDP 1.0) provide only a distortion (difference) probability map and do not quantify the visual distortion, thus are not used in our experiment. Similarly, DRI-VDP and DRI-VQM [35] have been excluded from our test, since they only provide a distortion map without a quantitative quality value. HDR-VDP-2 Matlab code is publicly available at [48]. 50  4.3.2 Subjective Test Procedure The subjective evaluations were conducted in a room complying with the ITU-R BT.500-13 Recommendation [31]. Prior to the actual experiment, a training session was shown to the observers to familiarize them with the rating procedure. The test sessions were designed based on the Double-Stimulus Impairment Scale (DSIS) method [31]. In particular, after each 10-second long reference video, a 3-second gray interval was shown followed by the 10-second long test video.  Another 4-second gray interval was allocated after the test video, allowing the viewers to rate the quality of the test video with respect to that of the reference one. The test videos are the distorted videos from “DML-HDR” vide dataset as explained in 4.2.  The scoring is based on discrete scheme where a numerical value from 1 (worst quality) to 10 (identical quality) is assigned to each test video representing its quality with respect to the reference video [31]. Note that in order to stabilize the subjects’ opinion, a few dummy video pairs were presented at the beginning of the test and the subjects were asked to rate them. The collected scores for these videos were discarded from the final results.  Figure 4-2 The Prototype HDR Display  51  The videos were displayed on a HDR TV prototype built based on the concept explained in [49] .As illustrated in Figure 4-2, this system consists of two main parts: 1) a 40 inch full HD LCD panel in the front, and 2) a projector with HD resolution at the back to provide the backside luminance. The contrast range of the projector is 20000:1. The original HDR video signal is split into two streams, which are sent to the projector and the LCD (see [49] for details). The input signal to the projector includes only the luminance information of the HDR content and the input signal to the LCD includes both luma and chroma information of the HDR video. Using this configuration, the light output of each pixel is effectively the result of two modulations with the two individual dynamic ranges multiplied, yielding an HDR signal. This HDR display system is capable of emitting light at a maximum brightness level of 2700 cd/m2. Eighteen adult subjects including 10 males and 8 females participated in our experiment. The subjects’ age range was from 19 to 35 years old. Prior to the tests, all the subjects were screened for color blindness using the Ishihara chart and visual acuity using the Snellen charts. Those subjects that failed the pre-screening test did not participate in the test. 4.4 Results and Discussions  After collecting the subjective results, the outlier subjects were detected according to the ITU-R BT.500-13 recommendation in [31]. No outlier was detected in this test. The Mean Opinion Score (MOS) for each impaired video was calculated by averaging the scores over all the subjects with 95% confidence interval.   Table 4-2 summarizes the results of the correlation between the objective quality scores and the ones of the subjective tests. In order to estimate each metric’s accuracy, the Pearson Linear Correlation Coefficient (PCC) is calculated between MOS values and the obtained objective quality indices. The Spearman Rank Order Correlation Coefficient (SCC) is also computed to estimate the 52  monotonicity in the metrics’ results. The PCC and SCC in each column are calculated over the entire video data set. The results are reported based on three impairments categories: a) compression artifacts, b) AWGN, intensity shifting, salt & pepper noise, and low pass filtering, and c) all the impairments used in our study.   As it is observed from Table 4-2, in the presence of the AWGN, intensity shifting, salt & pepper noise, and low pass filtering distortions, VIF yields the best performance compared to HDR-VDP-2 and other used LDR metrics in our experiment (regardless of the employed adaptation  Metric/Method Impairments: AWGN, Intensity shifting, salt & pepper noise, and low pass filtering Impairment: Different levels of compression, QP: 22, 27, 32, 37 Impairment: All Pearson Correlation Spearman Correlation Pearson Correlation Spearman Correlation Pearson Correlation Spearman Correlation HDR-VDP-2 0.3639 0.3686 0.9270 0.8113 0.4871 0.3413 PSNR (PU encoding) 0.6754 0.4122 0.7444 0.7355 0.6361 0.7096 SSIM (PU encoding) 0.5634 0.5004 0.8881 0.7470 0.4526 0.5146 VIF (PU encoding) 0.9723 0.8703 0.8490 0.7929 0.8522 0.8462 PSNR (Multi-Exposure) 0.8631 0.4799 0.7744 0.6163 0.5180 0.7303 SSIM (Multi-Exposure) 0.7065 0.4724 0.8932 0.6988 0.5400 0.5198 VIF (Multi-Exposure) 0.92981 0.7273 0.7842 0.6830 0.6450 0.7517 Table 4-2  Correlation of Subjective Responses with Prediction of Objective Quality Metrics 53  method, i.e., Multi-exposure or PU encoding). In the presence of the compression artifacts, however, HDR-VDP-2 outperforms all other tested metrics. This means that any effort for designing an HDR specific rate distortion optimization scheme for HEVC must be based on using the HDR-VDP-2 quality metric.  This is the basis of our work presented in Chapter 5. 4.5 Conclusion The main purpose of the paper was to investigate the performance of existing quality metrics in evaluating the quality of HDR content. To this end, a representative HDR dataset is captured and several types of impairments are applied including compression. The dataset included 40 test videos with five types of distortions. Subject standardized subjective test procedure is implemented. In the experiment, not only HDR quality metrics, but also the proposed schemes based on LDR quality metrics are used to predict the quality of HDR videos. Experiments results showed that in the presence of compression distortions, HDR-VDP2 outperforms all other metrics. VIF using PU encoding yields the best performance in the presence of all the other tested impairments.  54  5 Rate Distortion Optimization of High Efficiency Video Coding for High Dynamic Range (HDR) Video4 5.1 Introduction Since the current Lagrangian Multiplier for encoding HDR video content employs LDR distortion metric, it does not necessarily strike a balance between the rate and distortion of the encoded HDR videos. Hence the quality of the encoded video is not expected to be the best achievable by encoder in the given rate.  As described in section 2.3, in order to find the optimal λ for HDR content, the RDO process has to be re-done with HDR data. Moreover, the distortion of the encoded HDR data has to be calculated according to HDR content characteristics. This requires employing metrics that are specifically designed for HDR data or LDR metrics that are adapted to HDR data . It is shown in Chapter 4 that HDR-VDP-2 [37] results are well correlated with the subjects’ mean opinion scores for the HDR video quality in the presence of the compression artifacts. In other words HDR-VDP-2 is the most promising quality metric among the existing state-of-the-art metrics in predicting the quality of HDR videos with compression artifacts. In this thesis we have re-worked the Lambda optimization process focusing on HDR content using HEVC encoder software. In order to come up with an updated lambda customized for HDR video characteristics, we use the HDR-VDP-2 as the distortion metric, as we are aware that it is a suitable metric for HDR content seeing that it is well correlated with human observers’ ratings. The next section describes in detail our method and its implementation in HEVC encoder.                                                        4 A version of this chapter has been published as Azimi,M; Pourazad, M.T.; Nasiopoulos, P., “Rate Distortion Optimization of High Efficiency Video Coding for High Dynamic Range Video Data”, the 6th International Symposium on Communications, Control, and Signal Processing (ISCCSP 2014), May 2014, and Azimi,M; Pourazad, M.T.; Nasiopoulos, P., “Rate-Distortion Optimization for High Dynamic Range Video Coding”, the 6th International Symposium on Communications, Control, and Signal Processing (ISCCSP 2014), May 2014  55  5.2 Proposed Method In order to come up with a new Lagrangian Multiplier (λ), which is customized for HDR video content, we need to derive a new relationship between λ and QP that reflects the HDR characteristics. In [26] the relationship between λ and QP for LDR videos is determined by fixing λ and changing QPs. The optimal λ and QP relationship is found by encoding a set of LDR videos. The quality of the encoded videos is quantified by using PSNR as the distortion metric. However, as stated earlier, since in this work the videos used were LDR and the metric used (PSNR) is designed for LDR data, the estimated λ is not necessarily the optimal Lagrangian Multiplier for encoding HDR data. To find the relationship between λ and QP for coding HDR videos based on HEVC standard, we exploit the same procedure as in [26] using a HDR dataset and HDR-VDP-2 as the distortion metric. In order to find the relationship between the optimal λ and QP, a set of λ and QP that yields the minimum distortion at several rate points has to be identified. To this end, we encode five of the HDR videos from DML dataset as explained in 4.2 with the latest HEVC software HM 12.1 [42] using several combinations of QP and λ. Table 4-1 summarizes the specifications of the selected video sequences  from the dataset for our experiment. Figure 5-1 shows the Temporal and Spatial perceptual Information of the selected videos from the dataset, which is calculated using tone-mapped version of these videos based on [43].  56   Figure 5-1 SI and TI of the HDR videos in DML dataset   The QP range is 24 to 44 with the step size of 2. Regarding λ range we used the λ values similar to the ones in [26] for optimization of video encoders. These values are 4, 25, 100, 250, 400, 730 and 1000.  Each HDR test sequence is encoded several times using different combination of λ and QP. For each combination the Rate (R) is calculated using the HEVC-based coded HDR video and the Distortion (D) is measured based on HDR-VDP-2 metric. The inputs of the HDR-VDP-2 metric include a HDR frame before compression (as a reference image), its decoded version (as a test image), and parameters such as maximum physical luminance of the display, angular resolution of the frame, and the viewing environment settings. As explained in 4.3.1 the output of the metric is a probability map that determines the probability of detecting dissimilarity between reference and test image by a human observer in each image region. Then by using a pooling strategy the probability map is converted into a value of quality score between 0-100 [37], where 0 represents the lowest quality and 100 stands for the highest quality meaning the reference and test images are identical in terms of the quality. For each scene, the encoded videos that result in similar rate (R) are grouped together as a set     57  of λ-QP. If we specify i different bitrate points then the set of λ-QP at the ith bitrate point (Ri) is defined as follows: 𝑆? = {(  𝜆, 𝑄𝑃) ∶   𝑅(𝜆, 𝑄𝑃) =   𝑅𝑖}  (9)  Each iso-rate line shown in Figure 5-2 (a), represent λs and QPs that result in the rate Ri for the “MainMall” sequence. Figure 5-2 (b) shows the quality of the encoded HDR videos in terms of HDR-VDP-2 quality score for all combinations of λ and QP. The (λ, QP) combination (Si in (9)) that yields the minimum Distortion (D) is identified as the optimal (λ, QP) combination, S,opt,. S,opt, is expressed as follows:  𝑆™? = 𝑎𝑟𝑔𝑚𝑖𝑛((?, ™ )∊??   D(λ, QP)    (10)  In other words Sopt is the set of λs and QPs that minimizes the distortion (D) at some Rate point (Ri). D is represented in the form of HDR-VDP-2 quality values. The circles in Figure 5-2 4(b) correspond to the optimal combinations of (λ, QP) at each Ri that lead to the maximum HDR-VDP-2 quality score for the “MainMall” sequence. The optimal combinations of λ and QP (Sopt) at each Rate level Ri, are used to estimate the λ -QP relationship. Figure 5-3 shows the optimal combination sets from obtained different video sequences (Sopt ) in the dataset for P/B and I frames. We fit a curve to the obtained optimal points (see blue curve in Figure 5-3). To obtain this curve, we have assumed that the relationship between λ and QP for HDR content follows similar function as the one for LDR content (see equation (5)), however with different coefficients which needs to be determined specifically for HDR content. Figure 5-3 (a) shows the estimated λ for P/B frames, which is as follows: 58   Figure 5-2 Optimal points on (a) iso-rate and (b) iso-distortion lines for MainMall sequence 59   Figure 5-3 (a) shows the estimated λ for P/B frames, which is as follows:  𝜆 = 1.5764  ×  2(?. ™ ⌤ ™ ??. ™ ⌣ )   (9) The estimated λ for I frames are shown in Figure Figure 5-3 (b) and (c) for QP <= 36 and QP >36 respectively. For I frames the coefficients of λ function is estimated separately for QP <=36 and QP >36 to increase the accuracy. The λ for I frames is expressed as follows:   𝜆 = 0.1994×2(?. ™ ⌤ ×????. ™ ⌤ )                          𝑄? ≤ 36    3.1776×2(?. ™ ⌤ ×????. ™ ⌤ )                          𝑄? > 36     (10)  To customize the RD optimization of HEVC for HDR content, we implement our estimated λ-QP functions to the HEVC encoder reference software (HM 12.1 [42]). Our HDR customized HEVC encoder software is available to the public for further research and comparison studies [50].  The HDR video data set was encoded again using the updated encoder with the updated lambda. The compressed videos were then objectively and subjectively compared with the ones encoded by the unmodified version of the reference HEVC encoder. The following sections elaborate on the subjective test setup and performance evaluations of our HDR customized HEVC encoder. The results the subjective evaluations are presented and discussed in Section 5.3. 60   Figure 5-3 The estimated λ-QP relationship for all the HDR sequences using HM12.1. (a) for P/B frames and (b) and (c) for I frames     (a)  (b)   (c) 61   5.2.1 Subjective Test Setup Subjective tests are conducted to compare the quality of the compressed HDR videos using our HDR customized HEVC encoder (with updated Lagrangian Multiplier function) [50] with that of the ones compressed using the unmodified HEVC reference encoder software [42]. In both of the encoders’ software, random access high efficiency (RA-HE) configuration was used, to ensure achieving the highest compression performance. The RA-HE configuration is as follows: hierarchical B pictures, GOP length 8, ALF, SAO, and RDOQ were enabled. The Intra period is set to 32, as the frame rate of the videos is equal to 30 fps. The test environment was compliant with the Recommendation ITU-R BT.500-13 [31]. The tests were designed based on the DSIS (Double-stimulus Impairment scale) method in [31]. In Double-Stimulus subjective test, a pair of an original video and a test video is shown to the subjects in sequence with a 3-second gray interval between the original and test video. Then the subjects are given 5 seconds to rate the overall perceived quality of the test video relative to the original one. Figure 5-4 (a) illustrates the timing of the test. In our test the original video was the uncompressed version of the HDR video and the test video was the HDR video compressed either by the reference HEVC encoder software or the updated HEVC encoder software (using our estimated  λ function). The rating scheme involves discrete labeled quality scale from 1 (worst quality) to 10 (identical quality) as shown in  Figure 5-4 (b) The videos labeled as “A” are always original videos (uncompressed) and videos labeled as “B” are the test ones; encoded either by the unmodified HEVC reference encoder software or by our HDR customized HEVC encoder. Note that in order to stabilize the subjects’ opinion, a few dummy video pairs were presented at the beginning of the test and the subjects were asked to rate them. The collected scores for these videos were discarded from the final results.  62  For the subjective tests five video sequences as listed in the Table I are used. Each video was encoded using four different QP settings (27, 32, 37, and 42) once using the reference encoder software and once by the modified encoder software (using our proposed λ function) resulting in 40 test videos. Based on the Recommendation ITU-R BT.500-13 [31], the original videos were also inserted as the test videos to examine the subjects’ reliability. The order of videos was random and the whole test lasted approximately 20 minutes. The videos were displayed on a prototype HDR TV display as shown in  and explained in 4.3.2. Sixteen subjects took part in the test with the age range of 20-35. Prior to the test all the participants were screened for color blindness and visual acuity by Ishihara and Snellen charts respectively. Those who failed the screening tests did not participate in the test. All of the subjects were considered as non-experts in the field and were not aware of our objective of the test. Subjects were first familiarized with the video-quality rating procedure in a training session before the actual test began. Each test session consisted of maximum three subjects sitting 2.5 meters away from the TV as suggested in [31]. Once the subjective test results are collected, outlier detection test was performed, according to the Recommendation ITU-R BT.500-13 [31], to discard the unreliable results. In our test no outlier was detected. Then Mean Opinion Score (MOS) of the viewers is calculated per each test video. The subjective test results are reported and discussed in the following section. 63   Figure 5-4 (a) Subjective Test Procedure, and (b) rating scale  5.3 Results To evaluate the performance of our estimated λ function for compression of HDR content, the subjective quality of the compressed HDR videos using the original HEVC encoder reference software is compared with that of the ones compressed using our HDR customized HEVC encoder (updated with our proposed λ function) . Figure 5-5, Figure 5-6, Figure 5-7, Figure 5-8, Figure 5-9 depicts the subjective quality of the encoded videos at different bitrates with 95% confidence interval for sequence MainMall, MainMallTree, Playground, Table, and Strangers, respectively. As it is observed, at the same bitrate, videos encoded by our proposed λ function achieves a relatively higher MOS quality score compared to the ones encoded by the HEVC standard λ function. In order to calculate the average coding efficiency of the proposed method based on subjective quality scores, i.e. mean opinion score (MOS) we employed “Subjective Comparison of Encoders based on fitted Curves” SCENIC method, which is originally proposed in [51]. In this method a logistic function is used to fit the RD values. The differences in MOS and bitrate are calculated between the fitted RD curves. This method is expected to report more realistic coding efficiency results in terms of MOS compared to the conventional Bjøntegaard model [52], which is widely used for calculating coding efficiency based on PSNR values. Based on SCENIC method, for the same MOS, our proposed (a) (b) 64  method yields bitrate savings of 21.37% for the “MainMall”, 41.43% for the “Playground”, 23.98% for the “MainMallTree”, 7.05% for the “Table”, and 15.56% for the “Strangers” sequence compared to the ones encoded by the reference encoder. On the other hand, for the same bitrate levels, the improvement in MOS was 0.52 for “MainMall”, 0.98 for “Playground”, 0.49 for “MainMallTree”, 0.33 for “Table”, and 0.72 for the “Strangers” sequence. As it is observed the bitrate saving is higher for the sequences that have higher TI and SI indices. TI represents the temporal (level of motion) and SI represents the Spatial Information of the scene. In summary, our experiment results confirm that changing the Lagrangian multiplier to address the HDR content characteristics significantly improves the coding efficiency of the HEVC encoder for HDR video applications.   Figure 5-5 Rate-MOS of the sequences Playground  0	  1	  2	  3	  4	  5	  6	  7	  8	  9	  10	  0	   500	   1000	   1500	   2000	   2500	   3000	   3500	   4000	   4500	  MOS Rate (KB/S) MainMall Reference	  encoder	   Proposed	  Encoder	  65   Figure 5-6 Rate-MOS of the sequences Table    Figure 5-7 Rate-MOS of the sequences Playground  0 2 4 6 8 10 0 1000 2000 3000 4000 5000 6000 7000 MOS Rate (KB/S) MainMallTree Reference Encoder Proposed Encoder 0 2 4 6 8 10 0 500 1000 1500 2000 2500 3000 MOS Rate (KB/S) Playground Reference Encoder Proposed Encoder 66    Figure 5-8 Rate-MOS of the sequences Table     Figure 5-9 Rate-MOS of the sequences Strangers 0 2 4 6 8 10 0 200 400 600 800 1000 1200 1400 1600 MOS Rate (KB/S) Table Reference Encoder Proposed Encoder 0 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 1200 1400 1600 MOS Rate (KB/S) Strangers Reference Encoder Proposed Encoder 67   5.4 Conclusion In this study we proposed a Lagrangian Multiplier (λ) for compression of HDR content based on High Efficiency Video Coding (HEVC) standard. The proposed Lagrange Multiplier similar to the one for LDR videos has a non-linear dependency on QP values yet with different constants. The proposed Lagrange Multiplier was implemented to the HEVC reference software (HM 12.1). The subjective tests as well as the objective results confirmed that our proposed HDR-accustomed encoder is significantly more efficient compared to the reference encoder for HDR video applications.   68  6 Conclusion and Future Work 6.1 Conclusion In this thesis we first gave an overview of the recently standardized HEVC compression scheme and then investigated ways/methods for increasing compression efficiency of two of the most recent advances in multimedia technology, 3D and HDR content. Increasing the compression efficiency of encoding the stereoscopic videos was achieved by taking advantage of asymmetric coding and accommodating for eye dominance; our approach was implemented in the existing H.264/AVC compression standard, since at that time HEVC had not been finalized/standardized and the codec was very unstable. The latter approach involved increasing the compression efficiency of both the H.264/AVC and HEVC video compression standards for HDR video data. In chapter 3, we proposed a modified scheme for asymmetric coding of stereoscopic content. In our approach videos are divided into horizontal slices in both the left and right views. Half of these slices are low-pass filtered (with a 15x15 Gaussian filter with sigma = 3), while the corresponding slices in the other view are of original quality. The perceived sharpness, quality and depth of the video sequences of the original stereoscopic videos were close to the ones encoded with our proposed method. This is possible despite the fact that the amount of filtering was visually quite apparent in the monoscopic videos. Our implementation of asymmetric video coding has the advantage of being fair for viewers with one dominant eye over the conventional asymmetric methods, because we divided the filtered slices between both views. Our results showed that significant bitrate reduction can be achieved while the encoded 3D video quality remains close to that of the original video. In chapter 4, the performance of existing quality metrics in evaluating the quality of HDR content was investigated. To this end, a representative HDR dataset was captured and several types of impairments were applied. The dataset included 40 test videos with five types of distortions. Our subjective tests follow the ITU recommended procedure. In the experiment, HDR quality metrics as 69  well as the proposed schemes based on LDR quality metrics are used to predict the quality of HDR videos. Performance evaluations showed that in the presence of compression distortions, HDR-VDP2 outperforms all other metrics. Overall VIF using PU encoding yields the best performance in the presence of all the tested impairments. In chapter 5 we customize the rate distortion optimization process of H.264/AVC and HEVC standards based on HDR data characteristics. The Lagrangian Multiplier (λ) that is used in the encoder to decide on the best encoded mode and motion vector is updated accordingly. The subjective results confirm that the encoders (both H.264 and HEVC) with the updated Lagrangian Multiplier outperform the original encoders in terms of bitrate savings and/or encoded video quality improvement. 6.2 Future Work Our asymmetric stereoscopic video compression method does not take into account the motion level, the degree of detail, and brightness level inside the stereoscopic video content. By considering these factors, the amount of quality degradation in the low-pass filtered slices can be adjusted accordingly. Video content with less degree of detail and motion could be degraded more and consequently result more in higher bitrate savings compared to the ones with higher level of detail and motion. For that, methods for evaluating the motion and detail may be employed to identify videos’ motion and texture index. A threshold of filtering strength can be found for different ranges of texture and/or motion. In other words, videos with higher motion/texture index can be compressed more compared to the ones with lower ones, significantly improving the 3D quality.. In our work on optimizing the H.264/AVC and HEVC standards for HDR video content, we utilize HDR-VDP-2 as the video distortion metric.  Presently, MPEG is developing a metric for HDR video, known as PSNR-2000 that has less overhead compared to HDR-VDP-2. By employing that 70  metric inside the RDO process of the encoder software instead of the conventional MSE or SAD , it is expected that more bitrate savings and better quality will be achieved. These techniques can be added up together to design a video encoder for 3D UD HDR video data, which is expected to be the next generation of digital media content format. 71  Bibliography [1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, "Overview of the H.264/AVC video coding standard," IEEE Transactions on Circuits and Systems for Video Technology, vol.13, no.7, pp.560-576, July 2003. [2] Sullivan, G.J.; Ohm, J.; Woo-Jin Han; Wiegand, T., "Overview of the High Efficiency Video Coding (HEVC) Standard," Circuits and Systems for Video Technology, IEEE Transactions on , vol.22, no.12, pp.1649,1668, Dec. 2012 [3] B. Julesz, “Foundations of Cyclopean Perception,” Chicago, IL: Univ. Chicago Press, 1971. [4] M. G. Perkins, “Data compression of stereopairs,” IEEE Trans. Commun., vol. 40, pp. 684–696, Apr. 1992. [5] L. Stelmach, W. J. Tam, D. Meegan, A. Vincent, "Stereo image quality: effects of mixed spatio-temporal resolution, "Circuits and Systems for Video Technology, IEEE Transactions on, vol.10, no.2, pp.188-193, Mar 2000 [6] Reinhard E. High dynamic range imaging: Acquisition, display, and image-based lighting. San Francisco, Calif: Elsevier/Morgan Kaufmann; 2006.  [7] E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction for digital images,” ACM Trans. Graphics, vol. 21, 2002. [8] F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high-dynamic-range images,” ACM Trans. Graphics, vol. 21, no. 3, 2002. [9] G. W. Larson, H. Rushmeier, and C. Piatko, “A Visibility Matching Tone Reproduction Operator for High Dynamic Range Scenes,” IEEE Trans. Visualization and Computer Graphics, vol. 3, no. 4, 1997. [10] F. Drago, K. Myszkowski, T. Annen, and N. Chiba, “Adaptive logarithmic mapping for displaying high contrast scenes,” Computer Graphics Forum, vol. 22, no. 3, 2003. [11] ITU-T and ISO/IEC JTC1, “Digital Compression and Coding of Continuous-Tone Still Images,” ISO/IEC 10918-1 – ITU Recommendation T.81 (JPEG), September 1992. 72  [12] "H.265 : High efficiency video coding". ITU. 2013-06-07.  [13] K. McCann, B. Bross, W.-J. Han, S. Sekiguchi, and G. J. Sullivan, “High Efficiency Video Coding (HEVC) Test Model 5 (HM 5) Encoder Description”, JCTVC-G1102, November 2011. [14] A. Fuldseth, M. Horowitz, S. Xu, A. Segall, and M. Zhou, “Tiles”, JCTVC-F335, July 2011. [15] B. Bross, W.-J. Han, G. J. Sullivan, J.-R. Ohm, and T. Wiegand, “High Efficiency Video Coding (HEVC) text specification draft 6”, JCTVC-H1003, February 2012. [16] J. Chen, and T. Lee, “Planar intra prediction improvement”, JCTVC-F483, July 2011. [17] H. Li, B. Li, L. Li, J. Zhang (USTC), H. Yang, and H. Yu, “Non-CE6: Simplification of intra chroma mode coding”, JCTVC-H0326, February 2012. [18] E. Alshina, A. Alshin, J.-H. Park, J. Lou, and K. Minoo, “CE3: 7 taps interpolation filters for quarter pel position MC from Samsung and Motorola Mobility”, JCTVC-G778, Geneva, November 2011. [19] JCT-VC, “Encoder-side description of Test Model under Consideration”, JCTVC-B204, JCT-VC Meeting, Geneva, July 2010. [20] G. J. Sullivan, J.-R. Ohm, F. Bossen, and T. Wiegand, “JCT-VC AHG report: HM subjective quality investigation”, JCTVC-H0022, February 2012. [21] Merkle, P.; Müller, K.; Smolic, A.; Wiegand, T., "Efficient Compression of Multi-View Video Exploiting Inter-View Dependencies Based on H.264/MPEG4-AVC," Multimedia and Expo, 2006 IEEE International Conference on , vol., no., pp.1717,1720, 9-12 July 2006 [22] H. Schwarz, C. Bartnik, S. Bosse, H. Brust, T. Hinz, H. Lakshman, D. Marpe, P. Merkle, K. Müller, H. Rhee, G. Tech, M. Winken, and T. Wiegand, "3D Video Coding Using Advanced Prediction, Depth Modeling, and Encoder Control Methods", IEEE Intl. Conf. on Image Processing, Oct. 2012. 73  [23] B. Bai, P. Boulanger, J. Harms, "An Efficient Multiview Video Compression Scheme," Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, vol., no., pp.836-839, 6-6 July 2005. [24] W. J. Tam, L. B. Stelmach, and S. Subramaniam, “Stereoscopic video: Asymmetrical coding with temporal interleaving,” Stereoscopic Displays and Virtual Reality Systems VIII, Vol. 4297, pp. 299-306, 2001. [25] A. Banitalebi-Dehkordi1, M. Azimi1, M. T. Pourazad, and P. Nasiopoulos1 “Compression of High Dynamic Range Video Using the HEVC and H.264/AVC Standards”, submitted as invited paper to 10th International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness Scope, Rhodes, Greece, August 2014. [26] Sullivan, G.J.; Wiegand, T., "Rate-distortion optimization for video compression," Signal Processing Magazine, IEEE, vol.15, no.6, pp.74-90, Nov 1998. [27] Wiegand, T.; Schwarz, H.; Joch, A.; Kossentini, F.; Sullivan, G.J., "Rate-constrained coder control and comparison of video coding standards," Circuits and Systems for Video Technology, IEEE Transactions on , vol.13, no.7, pp.688,703, July 2003. [28] Z. Wang and A. C. Bovik, “Mean Squared Error: Love it or Leave it?,” IEEE Signal Processing Magazine, January, 26(1), 98-117, 2009.F. Bossen, D. Flynn, K. Suhring, “HEVC reference software manual,” JVT-AE010, London, UK, 2013. [29] Shiqi Wang; Rehman, A.; Zhou Wang; Ma, Siwei; Wen Gao, "SSIM-Motivated Rate-Distortion Optimization for Video Coding," Circuits and Systems for Video Technology, IEEE Transactions on , vol.22, no.4, pp.516,529, April 2012 [30] Zhou Wang; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P., "Image quality assessment: from error visibility to structural similarity," Image Processing, IEEE Transactions on , vol.13, no.4, pp.600,612, April 2004. [31] Recommendation ITU-R BT.500-13, “Methodology for the subjective assessment of the quality of television pictures,” ITU, 2012. [32] Valizadeh, S.; Azimi, M.; Nasiopoulos, P., "Bitrate reduction in asymmetric stereoscopic video with low-pass filtered slices," Consumer Electronics (ICCE), 2012 IEEE International Conference on , vol., no., pp.170,171, 13-16 Jan. 2012 74  [33] T. O. Aydin, R. Mantiuk, and H. P. Seidel, “Extending quality metrics to full luminance range images,” SPIE, Human Vision and Electronic Imaging XIII, San Jose, USA, Jan. 2008. [34] J. Munkberg, P. Clarberg, J. Hasselgren, and T. Akenine-Möller, “High dynamic range texture compression for graphics hardware,” ACM Transactions on Graphics, vol. 25, no. 3, p. 698, Jul. 2006. [35] Tunç Ozan Aydin, Rafał Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. 2008. “Dynamic range independent image quality assessment”. ACM SIGGRAPH 2008 papers (SIGGRAPH '08). ACM, New York, NY, USA, Article 69, 10 pages, August 2008. [36] Aydın, T. O., Cˇad ́ık, M., Myszkowski, K., and Seidel, H.-P., “Video quality assessment for computer graphics applications”. ACM Transactions on Graphics (Proc. of SIGGRAPH Asia’10), 1–10, ACM, Seoul, Korea, 2010. [37] Rafat Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. “HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions”. ACM Trans. Graph. 30, 4, Article 40, 14 pages, July 2011. [38] Martin Čadík ; Tunç O. Aydin ; Karol Myszkowski ; Hans-Peter Seidel; “On evaluation of video quality metrics: an HDR dataset for computer graphics applications”. Human Vision and Electronic Imaging XVI, Proc. SPIE 7865, February 2011. [39] Digital Media Lab High Dynamic Range (DML-HDR) video dataset created at the University of British Columbia, available at http://dml.ece.ubc.ca  on November 2013. [40] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics: A survey,” Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp. 297–312, May 2011. [41] G. Ward. “Real Pixels”, J. Arvo (ed.), Graphics Gems II, pp. 80–83, San Diego, CA, USA: Academic Press, 1992. [42] https://hevc.hhi.fraunhofer.de/ retrieved on 4 January 2012. [43] ITU-R, “P.910: Subjective video quality assessment methods for multimedia applications,” Tech. Rep. P.910, ITU-R (1992). 75  [44] F. Bossen, D. Flynn, K. Suhring, “HEVC reference software manual,” JVT-AE010, London, UK, 2013. [45] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. Image Processing, vol. 13, pp. 600–612, Apr. 2004. [46] H.R. Sheikh, A.C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, issue 2, pp. 430-444, 2006. [47] S. J. Daly, “Visible differences predictor: an algorithm for the assessment of image fidelity,” Human Vision, Visual Processing, and Digital Display III, Proc. SPIE 1666, 2, Aug. 1992. [48] http://sourceforge.net/projects/hdrvdp/files/hdrvdp/ retrieved on 1 February 2012. [49] H. Seetzen et al., “High dynamic range display system,” ACM Trans. Graphics, vol. 23, no. 3, pp. 760-768, 2004. [50] http://dml.ece.ubc.ca/data/HDR-OPT-HEVC available on 30 September 2014. [51] Philippe Hanhart, Touradj Ebrahimi, Calculation of average coding efficiency based on subjective quality scores, Journal of Visual Communication and Image Representation, Volume 25, Issue 3, April 2014, Pages 555-564, ISSN 1047-3203 [52] G. Bjøntegaard, Calculation of average PSNR differences between RD-curves, Technical Report VCEG-M33, ITU-T SG16/Q6, Austin, TX, USA, 2001.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167621/manifest

Comment

Related Items