UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Scalable coding of H.264 video Ugur, Kemal 2004

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2004-0672.pdf [ 5.51MB ]
Metadata
JSON: 1.0065599.json
JSON-LD: 1.0065599+ld.json
RDF/XML (Pretty): 1.0065599.xml
RDF/JSON: 1.0065599+rdf.json
Turtle: 1.0065599+rdf-turtle.txt
N-Triples: 1.0065599+rdf-ntriples.txt
Original Record: 1.0065599 +original-record.json
Full Text
1.0065599.txt
Citation
1.0065599.ris

Full Text

SCALABLE CODING OF H.264 VIDEO by KEMAL UGUR B.Sc, Middle East Technical University, 2001  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES ELECTRICAL AND COMPUTER ENGINEERING We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA June 2004  © Kemal Ugur, 2004  Library A u t h o r i z a t i o n  In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Name of Author (please print)  S^aJflk^  Title of Thesis:  Degree:  jM c a  Department of  T e  r  of  Date (dd/mm/yyyy)  of  Coc\~iA^  /fyflrgi  t l^oinco)  The University of British Columbia Vancouver, B C Canada  ^cIcAce  o-flj  CorYlfuisr  \JrJeo  Y e a r :  ZQoq,  E^/leer.V)^  Abstract Real-time transmission of digital video over media, such as the Internet and wireless networks has recently been receiving much attention. A big challenge of video transmission over such networks is the variation of available bandwidth over time. Traditional video coding standards whose main objective is to optimize the quality of transmitted video at a given bitrate, do not offer effective solutions to the bandwidth variation problem. To deal with this problem, different scalable video coding techniques have been developed. The latest video coding standard, H.264, provides superior compression efficiency over all previous standards. This standard, however, does not include tools for coding the video in a scalable fashion. In this thesis, we introduce methods that allow encoding and transmitting of H.264 video in a scalable fashion. The method we propose is an adaptation of the existing MPEG-4 Fine Granular Scalability structure (FGS) to the H.264 standard. Our proposed algorithm minimizes the added number of the bits needed in adapting the advanced features of H.264 to the FGS system. Our proposed system has the advantages of being highly error resilient and having low computational complexity. Due to its structure, the FGS standard has low coding efficiency when compared to single layer coding. To overcome this problem, we also introduce a hybrid method that combines our proposed H.264 based FGS approach with the stream-switching approach employed in the H.264 standard. By combining different techniques, our proposed system offers a complete solution for all kinds of applications. The proposed system outperforms existing systems by offering optimum bandwidth utilization and improved video quality for the end user.  ii  Contents Abstract  ii  Contents  iii  Table of Figures  vi  Acknowledgements  viii  CHAPTER 1  1  1  1  Introduction 1.1  Thesis Objective  1  1.2  Thesis Outline  2  1.3  Introduction to Video Coding  3  1.3.1  Fundamentals of Video Coding  1.3.2  Overview of Scalable Video Coding  1.4  Overview of MPEG-4 FGS Video Coding Standard  4 12 21  1.4.1  MPEG-4 FGS Encoder Structure  23  1.4.2  MPEG-4 FGS Decoder Structure  28  Overview of H.264 Video Coding Standard  29  1.5 1.5.1  Advances in Motion Compensated Prediction  30  1.5.2  Advances in Transform Coding  30  1.5.3  Advances in Entropy Coding  31  CHAPTER 2  32  2  32  H.264 Based Fine Granular Scalability (FGS) 2.1  Introduction  32  2.2  Trivial Extension of FGS into H.264  34  2.2.1 2.3  Drawbacks of Trivial Extension of FGS to H.264 Proposed H.264 based FGS System  37 38  2.3.1  Proposed Transform Coding Structure  38  2.3.2  Proposed Entropy Coding Structure  54  2.4 2.4.1  Experimental Results Experimental Results for the Proposed CBP Coding Scheme  61 62 iii  2.4.2 2.5  Experimental Results for the Proposed Entropy Coding Scheme Conclusion  CHAPTER 3 3  64 70 71  Hybrid Structure using Stream-Switching and FGS for Scalable H.264 Video  Transmission 3.1  71  Stream Switching and SP-Frames  73  3.1.1  Overview of Stream Switching  73  3.1.2  Overview of the SP Frame Switching Concept used in H.264  74  3.1.3  Comparison of Stream Switching and Scalable Video Coding  76  3.2  Combining Stream-Switching and FGS  80  3.2.1  Adaptive Bitrate Selection for Stream Switching  82  3.2.2  Generalized Adaptive Rate Selection  83  3.3  Experimental Results  84  3.4  Conclusion  89  CHAPTER 4  90  4  90  Conclusions and Future Work 4.1  Conclusions  90  4.2  Future Work  92  APPENDIX  93  A. Exp-Golomb Codes for Entropy Coding  93  B. VLC Codes for C B P C O D E  95  B . l . CBP Codes for the First Bitplane B.2. CBP Codes for the Second Bitplane  95 101  C. VLC Codes for S U B C B P C O D E  104  D. RUN - EOP Statistics  106  D . l . BUS Sequence, Base Layer at 500 Kbps  106  D.2. BUS Sequence, Base Layer at 1.5 Mbps  107  D.3. MOBILE Sequence, Base Layer at 500Kbps  108  D.4. MOBILE Sequence, Base Layer at 1.5 Mbps  109  D.5. TEMPETE Sequence, Base Layer at 500 Kbps  110  D.6. TEMPETE Sequence, Base Layer at 1.5 Mbps  111  iv  E. Proposed VLC Tables  112  E . l . (RUN,EOP) Symbols for the First Bitplane  112  E.3. (RUN,EOP) Symbols for the Second Bitplane  113  E.4. (RUN,EOP) Symbols for the Other Bitplanes  114  Bibliography  115  v  Table of Figures Figure 1 Hybrid video coder block diagram  5  Figure 2 Illustration of Block Matching Motion Estimation  7  Figure 3 Coding and display order of Frames in a typical video bitstream  8  Figure 4 Zigzag Scan Order for DCT Coefficients  12  Figure 5 Architecture of a Streaming Video System  13  Figure 6 Block Diagram of MPEG-2 SNR Scalable Decoder  14  Figure 7 Block Diagrams of two types of SNR Scalable Encoders (a) Enhancement Layer Residue is used at the motion prediction loop at the base layer (b) Enhancement Layer Residue is not used  17  Figure 8 Frame Structure used in Temporal Scalable Systems  19  Figure 9 Frame Structure used in Spatial Scalability Systems  20  Figure 10 Block diagram of Spatial Scalable Encoder  21  Figure 11. The illustration of the FGS video delivery system. The system comprises of Encoder, Streaming Server and Decoder  23  Figure 12. Block diagram of MPEG-4 FGS Encoder  24  Figure  13  Maximum Number  of  Bitplanes  Needed  for  Bitplane  Coding  maximum_level_y=6, maximum_level_y=4, maximum_level_v=4  26  Figure 14. The Bitplane Generation Process  26  Figure 15 Block Diagram of the MPEG-4 FGS Decoder  29  Figure 16.Block Diagram of the direct implementation of FGS Encoder on H.264 Encoder Figure 17.Block diagram of corresponding  35 decoders for two cases of direct  implementation of FGS on H.264 (a) Residual Signal is calculated after the deblocking filter at the base layer (b) Residual signal is calculated before the deblocking filter at the base layer  36  Figure 18. FGS Macroblock Structures for two different transform sizes a. The 8x8 DCT Transform used by MPEG-4 b. 4x4 Integer Transform used by H.264  39  Figure 19. Grouping Scheme for the Simple CBP Coding  40  Figure 20. Grouping Scheme for Hierarchical CBP Coding  41  vi  Figure 21 Illustration of Hierarchical CBP Coding  42  Figure 22 Block Diagram of the Main Hierarchical CBP Coding Algorithm  43  Figure 23 Block Diagram of groupcbp Procedure  45  Figure 24 Block Diagram of blockcbp Procedure  46  Figure 25 Example on Hierarchical CBP Coding, Group Structure of the First Bitplane 49 Figure 26 Example on Hierarchical CBP Coding, Group Structure of the Second Bitplane 51 Figure 27 Data recovery using reversible codewords  57  Figure 28 (RUN,EOP) Statistics for the BUS Sequence at 500 Kbps  59  Figure 29 Switching between streams using SP-Frames  75  Figure 30 Structure of the proposed hybrid system  81  Figure 31 Performance of the Proposed Approach Compared with two other approaches i . Scalable Video Coding using FGS ii. Stream Switching using SP Frames  87  Figure 32 R-D Performance Comparison of the Proposed and Stream Switching Approach  88  Figure 33 R-D Performance Comparison of the Proposed FGS Approach  88  Figure 34 R-D Performance of the Adaptive Rate Selection Algorithm  89  Figure 35 (RUN,EOP) Statistics for the BUS Sequence at 500 Kbps  106  vn  Acknowledgements I would like to thank my research supervisors Dr. Panos Nasiopoulos and Dr. Rabab Ward for the valuable guidance and support they provided throughout my research. I am grateful to them for the fruitful discussions we had about research and life in general that will guide me through the rest of my life. I would also like to thank to my dear friends. Without them, I would not have enjoyed my days in Canada. I especially want to thank my friends Emre, Doruk, Caglar, Ari, Juan and my friends at Koerner's for being the reasons of my relaxed, happy and stress-free times in Vancouver. I want to thank Sebnem, for her dear support during the last stages of this thesis. I want to thank my colleagues at Image Processing Lab. All of you have brilliant skills, and I hope the future will be bright for you. I want to thank Dr. Mehran Azimi, for his support and kindness as the administrator of our lab during my studies. Finally, I would like to thank my beloved family, my mother Ayfer, my father Avni, and my sister Burcu for their continuous support of my graduate studies. I spent the last couple of years, the years my sister needed me the most, away from home pursuing graduate degree. I wish I was able to see her growing up, but I hope these times spent abroad will be worth it in the future. I dedicate this thesis to my beloved sister, Burcu.  vm  CHAPTER I 1 Introduction 1.1 Thesis  Objective  In networks used for video transmission environment, such as wireless networks and Internet, the available bandwidth for video transmission is not constant but varies over time. This variation in the available bandwidth possesses a problem for a video transmission system. Traditional video coding standards, whose objective is to optimize the quality of the video at a given bitrate, cannot cope with this bandwidth variation problem effectively. Scalable Video Coding techniques have been developed to more efficiently address this bandwidth variation problem. Scalable Video Coding (SVC) is a video coding framework that enables a system to adapt the quality of the video sequence to the underlying channel's available bandwidth. Unlike traditional video coding standards, the objective of scalable video coding is to optimize the video quality over a bitrate range instead of at a given bitrate as the bandwidth  available for each user can change over time according to the characteristics of each channel. All popular video coding standards, such as MPEG-2 and MPEG-4, include some scalability tools. The latest video coding standard, H.264, provides superior compression efficiency over all previous standards, but it does not include tools for coding the video in a scalable fashion. In this work, we introduce scalability to H.264 so that it can be more efficiently used in network environments with time varying bandwidth. We use the latest scalable video coding standard, Fine Granular Scalability (FGS) that is originally developed for MPEG-  1  4 and adapt it to H.264. We chose FGS to introduce scalability for H.264, as FGS has low implementation complexity and it is highly flexible. The research is based on already established industry standards that are proven to be superior to other methods. We modify the techniques present in FGS and include novel techniques, so that the proposed scalable H. 264 solution has low complexity, high coding efficiency and high error-resiliency. We also introduce a hybrid method that combines the FGS approach with the streamswitching approach employed in the H.264 standard. By combining different techniques our proposed system offers a complete solution for all kinds of applications. The proposed system outperforms  existing systems by offering optimum bandwidth  utilization and improved video quality for the end user.  I. 2 Thesis  Outline  In the remaining part of this chapter we first present the necessary background information on fundamentals of video coding, scalable video coding and different types of scalable video coding techniques (Section 1.3). Following that we present an overview of MPEG-4 FGS scalable video coding standard in Section 1.4. An overview on the H.264 video coding standard is presented in Section 1.5. In Chapter 2, we present the proposed scalable H.264 based FGS structure. Following that, our proposed novel H.264 based FGS structure is presented. In Chapter 3, we present our proposed approach that further incorporates the highly efficient features present in H.264 with flexible structure of FGS. This approach combines the streamswitching structure of H.264 with our H.264 based FGS structure to provide an overall highly efficient and flexible system.  2  Chapter 4 presents the conclusions of the research together with suggestions for future work.  1.3 Introduction  to Video  Coding  Digital video applications have been growing tremendously in the past few years. Such applications include DVD-video (digital versatile disk), digital cable and direct broadcast systems (DBS), videophone and videoconferencing. In addition to these applications, recent advances have made Multimedia Messaging Service (MMS) over wireless networks, and high quality video streaming [1] over the Internet possible. The main driving reason behind all these applications is the advances in efficient representation of the digital video data using advanced video coding methods. Video coding is being used wherever digital video communications, processing, acquisition and reproduction occur. The need for video coding is clear when one considers the amount of storage space or transmission bandwidth required for raw (uncompressed) video data. Consider a video program having a resolution of 720x480 pixels (a common resolution used in DVD), which is to be played at 25 frames-per-second (standard in PAL/SECAM). The bitrate of this video with three color components at 8 bits per pixel will be over 200 Mbits/s! To store this video in current DVD discs, compression by a factor of at least 200 is required. A similar requirement also holds when a digital video transmission scenario is considered. In summary, it is clear that efficient video coding is needed for feasible video transmission and storage. This need was realized by international standards organizations resulting in several standards for digital video coding, such as ISO/IEC MPEG-2[2] and ITU-T H.263[4]. In the next subsection, common techniques used in video coding standards are presented.  3  1.3.1 Fundamentals of Video Coding Video coding can be viewed as coding of a sequence of images; in other words, image coding with a temporal component. Therefore, similar techniques used for image coding can be applied for video coding as well. Image coding techniques essentially exploit the statistical redundancy in the spatial domain to achieve high compression ratios. Spatial redundancy exists in images because of the high correlation between the brightness and color of a given pixel, and the brightness and color of the nearby pixels within the same picture. Techniques used to exploit the spatial redundancies are often referred to as intracoding methods. The most popular image coding standards are transform-based [6]. In transform coding, the raw image is divided into blocks and a transform is applied to each image block to compact the signal energy into a smaller number of coefficients. The coefficients of the transformed blocks are quantized and the quantized values are entropy coded to form the image bitstream. In addition to spatial redundancy in each picture for a typical video sequence, there also exists a temporal redundancy between consecutive pictures. This is due to the fact that pictures are sampled in very short time intervals (such as 40 ms. for a 25 frame-persecond sequence) and the picture content usually changes slightly in this small amount of time. Exploiting temporal redundancy is referred to as inter-frame-coding in video coding terminology. To remove the temporal redundancies in a video sequence, all popular video coding standards [2, 4] use Motion Compensated Prediction (MCP). MCP-based coders allow information about motion between frames to be transmitted as side information in the output video bitstream. Generally, MCP consists of two stages. The first stage estimates 4  the motion between the current encoded frame and a reference frame where reference is one of the previously reconstructed frames. This first stage is generally referred to as motion estimation (ME). The second stage creates a prediction for the current frame using the estimated motion parameters and the previous reconstructed frames. This stage is referred as motion compensation (MC). Video coders that use both intra and inter-frame coding to achieve high compression ratios are called hybrid video coders and form the basis of all popular video coding standards. Block diagram of a basic hybrid-video coder is illustrated in Figure 1. This hybrid video encoder uses MCP to remove the temporal redundancies and transform coding to remove the spatial redundancies. After the redundancies are removed, the resulting signal is quantized, and then entropy coded to obtain the output video bitstream. The details of the coding process are explained below. Intra-Coding Blocks  1  i  Input Video  Transform  r  Entropy Coding  Quantization  Output Bitstream  Inverse Quantization  Inverse Transform  Motion Compensation  Frame-Store Memory  Motion Estimation  Inter-Coding Blocks  Figure 1 Hybrid video coder block diagram  5  1.3.1.1 Overview of Motion Compensated Prediction (MCP) MCP is the essential technique used in video coders to remove the temporal redundancy present between frames of a video sequence. In the output video bitstream, motion information between frames are transmitted as side information. MCP can be analyzed in two stages, the motion estimation stage, and the motion compensation stage. The motion estimation's role is to find the best prediction for the current frame from a reference frame, using a specified motion model. For the motion estimation process, several motion models have been presented in literature such as pixel-recursive [9] and variable size block matching [11], but the translational block-matching motion estimation is the most widely adopted technique due to its simplicity and good performance. In this model, the current frame to be coded is divided into blocks, and for each block a best block match is searched in the reference frame. The spatial position of the best matching block is used to calculate the motion vector for the current block. This motion estimation process is illustrated in Figure 2. The motion compensation stage forms the prediction for the current frame using the reference frame and the obtained motion vector information. The difference between the obtained prediction and the current frame is called the prediction error. This error is due to two assumptions implied in the motion compensation model. Firstly, it is assumed that all the pixels within the block undergo the same motion. Secondly the block's motion is assumed to be translational. This prediction error is then coded using transform based spatial coding methods. The coded prediction error and the motion information together form the output video bitstream.  6  Reference Frame  Current Frame  Figure 2 Illustration of Block Matching Motion Estimation  There are three types of frames classified according to which reference frames they use in the motion estimation stage. These are the intra coded (I) frames, the predictive coded (P) frames and the bidirectionally predictive coded (B) frames. For the I frames, MCP is not performed and the whole frame is intra-coded. The first frame of the video sequence has to be coded as an I frame, as there is no reference frames available at the start of coding hence MCP can not be performed. If the I frames are placed periodically in the bitstream, the decoder has the capability of random access to the video sequence. Thus, fast forward of the video sequence can be achieved by only decoding and displaying I frames. Also random access is very important in digital TV broadcast as viewers may change from one video program they are watching, to another one at anytime [7]. Because I frames do not exploit temporal redundancy their coding efficiency is low.  7  MCP is used to code the P and B frames. P frames are coded using prediction from the last I or P frame, whichever happens to be closer. This kind of prediction is called forward prediction, as the reference frame occurs temporally before the current frame. The coding efficiency of P frames is significantly higher than that of I frames, due to the MCP process involved. Besides forward prediction, B frames also use backward prediction where the reference frame occurs temporally after the current frame. Higher coding efficiency is achieved by using both the past and future frames as reference. Note that a B frame is not used for predicting any other frame. This makes it more tolerant to errors, as any error in its encoding will not propagate to other frames by the prediction process. Furthermore, B frames can be coded using a lower quality than that of the reference pictures, resulting in further bit savings [8]. Because a B frame uses a reference that may be temporally subsequent, that reference frame should be coded and made available prior to coding the B frame. Therefore, the display order and the coding order of frames are different. Figure 3 illustrates this difference in a typical coded video bitstream.  A  A  A  A  A  2  3  4  5  6  7  8  9  10  Display Order  3  4  2  6  7  5  9  10  8  Coding Order  Figure 3 Coding and display order of Frames in a typical video bitstream  8  1.3.1.2 Transform Coding After the motion compensated prediction (MCP) process is completed and a prediction is formed for the current frame, this prediction is subtracted from the current original frame to form the residual signal. The temporal redundancy is reduced at the MCP stage, but there is still spatial redundancy present in the residual signal. The most widely used method to exploit the spatial redundancy is the transform coding, in which a transform is applied to the residual signal to decorrelate the signal and compact its energy into smaller number of coefficients. After the signal is decorrelated, the resulting coefficients are entropy coded. The best transform that gives the best energy compaction results is the Karhunel-Loeve transform (KLT) [10]. The rows of the KLT consist of the eigenvectors of the autocorrelation matrix of the input signal. The autocorrelation matrix for a random process X is a matrix whose (i,j)th element [R]y is given by [R]ij=E[X„X„+\i-j\]  It can be shown that this transform minimizes the geometric mean of the variance of the transform coefficients [7]. However, this transform is data dependent and it must be recomputed for every input signal, if the input signal is non-stationary. This makes KLT unpractical for video coding. The Discrete Cosine Transform (DCT) is the most widely adopted transform in image and video coding standards. DCT is a suitable approximation to KLT and is data independent. DCT gets its name from the fact that rows of the NxN transform matrix C are obtained as a function of cosines. In video coding, DCT is applied to an 8x8 block data and the transform is given as:  9  1 (2j + \)i7T . . . . — cos— — i = 0, / = 0,..., N - 1 7Y 2N r  A  [ch =  2_ IN  (2y + \)in 2N  N-l,j  =  = 0,l,...N -1  For Markov sources with high correlation coefficient, the compaction ability of DCT is very close to that of KLT [7]. As video and image can be modeled as a highly correlated Markov sources, DCT is chosen to be part of the many video and image coding standards. Because DCT is defined in terms of floating-point values, its implementation on digital processors is not efficient. Also, the floating-point nature of DCT introduces a mismatch between the decoded data in the encoder and the decoder. This error causes degradation in the quality of the decoded video. Because of these drawbacks, H.264 standard replaced the popular DCT with a low complexity 4x4 transform specified with integer arithmetic. The transform matrix H is designed as:  H  1 1 -1 -2  1 -1 -1 2  1 -2 1 -1  It should be noted that, the rows of this transform is orthogonal, but do not have the same norm. This difference in norm is compensated in the quantization stage. The implementation of this transform on digital processors is very efficient as computing its direct and inverse transform could be carried with only additions and shifts, no multiplications [13]. Also, it is observed that the smaller block size used decreases some artifacts known as ringing and that occur at low bitrates [15].  10  1.3.1.3 Quantization and Entropy Coding The quantization stage of the video coder creates a lossy representation of the input. The quantization process divides the transform coefficients by a quantization parameter and then rounds them to the nearest integer. The quantization parameter determines the quality loss and the amount of bit savings. High values for the quantization parameter result in more loss of information and a decrease of video quality but achieves a higher compression ratio. Smaller values result in decrease of information loss, which in turn increase the output video quality but at the expense of smaller compression ratios. The resultant quantized transform coefficients are zigzag ordered and then assembled into a one dimensional array using a zigzag pattern, as illustrated in Figure 4. The first coefficient placed in the one dimensional array, is the DC coefficient of the block. The DC coefficient is followed by AC coefficients ordered roughly from low frequency to high frequency. The assembled one dimensional array is coded using "run-level" coding. The number of consecutive zeros before a nonzero DCT coefficient is called a "run" and the absolute value of the nonzero DCT coefficient is called a "level". Entropy coding is the last stage of the video coding process. In this stage the "run-level" symbols are coded in a lossless fashion along with the motion vectors and side information: During entropy coding, the input symbols are mapped to binary variable length codewords. The symbols that occur more frequently are represented with less number of bits whereas more bits are used for symbols that occur not very often. There are different types of entropy coding methods with different methods to generate the codewords. The most common techniques used in video compression are Huffman coding and arithmetic coding.  11  r  /  V  T  /  / /  7  7  /  / / / /, ¥ / / / / / / V/ / / -  / / >•  Figure 4 Zigzag Scan O r d e r f o r D C T Coefficients  1.3.2 Overview of Scalable Video Coding With the emergence of broadband wireless networks, wireless video transmission has been receiving great attention. At the same time, streaming of audiovisual content over the Internet is emerging as an important application. The primary challenge of transmitting video over wireless media and the Internet is the random fluctuations in the bandwidth available for each user [1]. In order to deliver the best visual quality to each user, video coding technologies need to deal with the problems created by bandwidth variations. Scalable Video Coding (SVC) is a video coding framework that aims to cope with the bandwidth variation problem. It enables the streaming system to adapt the quality of the video sequence to the underlying channel's available bandwidth.  12  End-User 1  End-User 2 VIDEO  STREAMING  ENCODER  SERVER End-User 3  ~>[ End-User n j  Figure 5 Architecture of a Streaming Video System  A typical system configuration for the next generation networked video applications is illustrated in Figure 5. In this configuration, video encoding takes place before the data are transmitted to the streaming server. For this reason, at encoding time, the bandwidth available for the video sequence to be streamed is not known. Also, the bandwidth available for each user can change dynamically according to the characteristics of each channel. As a result, the video encoder can not know the bitrate the video quality should be optimized at. Because of this uncertainty in the streaming bitrate, the objective of video coding for networked video is to optimize the video quality over a bitrate range instead of at a given bitrate [2].  Previous video coding standards (such as MPEG-2) include several layered scalable techniques. In layered scalable coding techniques, a video sequence is coded into a base layer and an enhancement layer. If the decoder receives only the base layer, the video sequence is reconstructed with a minimal quality. If the enhancement layer is also received by the decoder, the reconstructed video quality is increased. For layered scalable coding techniques, the enhancement layer stream must be completely received by the decoder, otherwise the video quality is not enhanced. The three different techniques for  13  layered scalable video coding are: signal-to-noise ratio (SNR) scalability, temporal scalability and spatial scalability. In the next three subsections, a brief overview of each of these different techniques is presented.  Variable-Length Enhancement Uitstream Decoding  Inverse Quantization  Base LayerVariable-Length BitstreanT Decoding  Inverse Quantization  Inverse DCT  f -^Decoded Video^  +  +f  Decoded Motion Vectors  Motion Compensation  Frame - Store Memory  Figure 6 Block Diagram of M P E G - 2 S N R Scalable Decoder  1.3.2.1 Layered SNR Scalability SNR Scalability refers to the technique that codes the video sequence into two layers at the same frame rate and the same spatial resolution but with different quantization levels. Figure 6 shows the two-layer SNR scalable decoder, in the MPEG-2 video standard. The Variable Length Decoding block decodes the base layer bitstream. The decoded information includes the motion vectors and the quantized Discrete Cosine Transform (DCT) coefficients. The quantized DCT coefficients are reconstructed by inverse quantization. Similarly, the enhancement bitstream is decoded in the Variable Length  14  Decoding block and the residual DCT coefficients are then reconstructed by inverse quantization. The reconstructed residual DCT coefficients are added to the base layer reconstructed DCT coefficients to obtain the higher accuracy DCT coefficients. The inverse DCT is then applied on the higher accuracy DCT coefficients to obtain the imagedomain difference frame. The motion compensated frame is added to the image-domain difference frames to form the decoded sequence. The SNR scalable decoder is standardized in MPEG-2 and uses the enhancement layer residue information in the motion compensation loop. However, MPEG-2 does not standardize how the scalable encoder generates the base and enhancement layer streams. Depending on whether or not the encoder uses the enhancement layer information in the motion prediction, the coding efficiency of the base and enhancement layer may change. Two standard compliant encoders are illustrated in Figure 7a Figure 7b. In these encoders, motion compensated prediction is formed using the reconstructed picture held in the frame store memory. This prediction is then subtracted from the original video and the prediction difference is formed. The latter is DCT transformed and then quantized using a high quantization parameter (coarse quantization - low quality). The base layer bitstream is formed by variable length coding of the quantized DCT coefficients. In the feedback path of the encoder, the quantized coefficients are reconstructed using inverse quantization with the same high quantization parameter. The enhancement layer residue is formed by taking the difference between the original prediction error DCT coefficients and the base layer reconstructed DCT coefficients. The enhancement layer residue is quantized using a smaller quantization parameter (fine quantization - high quality) and variable length coded to produce the enhancement layer bitstream.  15  Vanablc-Lcngth En coding  Variable-Length] Encoding  Quantization (coarse)  in  Enhancement Bitstream  • Inverse ?. Quantization (fine)  Quantization (fine)  Input Video ^-ty  1  Inverse Quantization (coarse)  5ose Layer Bitstream  Zl&bt I E n h a n c e m e n t Layer ->•{ X ) R e s i d u e is used a t t h e ^4 ^T^_ y | f motion prediction loop  Inverse ixrr Motion Compensated Prediction  Frame-Store Memory  Motion Estimation  1  (a)  I  Video ^  DCT  Vanablc-Lcngth Encoding  Quantization , (fine) V  Quantization (coarse)  Variable-Length | Encoding  Inverse Quantization (coarse)  Enhancement Bitstream  Base Layer Bitstream  Enhancement Layer Residue i s NOT used a t t h e m o t i o n p r e d i c t i o n loop  Hlf^/"" I Inverse DCT  Motion Compensated Prediction  Frame-Store Memory  Motion Estimation  (b)  16  Figure 7 Block Diagrams of two types of S N R Scalable Encoders (a) Enhancement Layer Residue is used at the motion prediction loop at the base layer (b) Enhancement Layer Residue is not used  The inverse quantized values produced in the encoding of the enhancement layer are then added to inverse quantized values at the base layer in the feedback loop. The reconstructed frames are formed by applying the inverse DCT and are stored in the frame store memory. The reconstructed frames stored at the memory of the encoder are identical to the frames stored in the SNR Scalable decoder's memory. However, for the case where a decoder does not receive the enhancement layer bitstream, the reconstructed frames at the decoder side and the encoder side will not be the same. This is because the decoder will only use the base layer information to form the reconstruction, whereas the encoder had used both the base and the enhancement layers information. The mismatch in the reconstructed encoder and decoder frames causes errors to accumulate in the decoded base-layer video sequence. This error is called drift. The drift problem decreases the coding efficiency of the base layer video. On the other hand, the high quality reference frames used at the motion compensated prediction increases the coding efficiency when the decoder receives and decodes the enhancement layer as well. Hence, the SNR scalable encoder results in low coding efficiency for the base layer, but high coding efficiency for the enhancement layer. The encoder illustrated in Figure 7b only uses the base layer information to form the prediction. In this case, the drift problem at the base layer is removed. However, if the decoder receives and decodes the enhancement layer, a drift will also occur due to a similar mismatch between the reconstructed encoded and decoded frames. Therefore, for  17  this SNR scalable encoder, the base layer coding efficiency is high, but the enhancement layer coding efficiency is low due to the drift problem. To summarize, for the layered Scalable SNR Decoder standardized in MPEG-2, there are two possibilities, namely, either the base layer has a poor performance to ensure a good performance for the enhancement layer, or the enhancement layer has a poor performance to ensure a good performance for the base layer.  1.3.2.2 Temporal Scalability In the layered temporal scalability, video is coded into two layers at the same spatial resolution but at different frame rates. If the decoder receives and decodes only the base layer, the video sequence is displayed at a low frame rate. The enhancement layer fills the missing frames and upon decoding, the video can be displayed at a higher frame rate. Several techniques are used for temporal scalable coding [3]. Figure 8 shows a possible frame structure for temporal scalability. In this structure, the prediction at the base layer is only from the base layer. This ensures that the decoder will be able to correctly decode the sequence even if only the base layer is received. The enhancement layer provides the additional frames needed to decode the sequence at a higher frame rate. The prediction at the enhancement layer can be formed using either the base or the enhancement layer itself.  18  Denotes prediction  Figure 8 Frame Structure used in Temporal Scalable Systems  1.3.2.3 Spatial Scalability Spatial Scalability refers to the technique where the video is coded into two layers at the same frame rate but with different spatial resolutions. The base layer is coded at a low resolution, whereas the enhancement layer is coded at a higher resolution. At the time of encoding, the up-sampled base layer picture can be used as prediction for the enhancement layer. The MPEG-4 spatial scalable decoder allows a "bi-directional" prediction at the enhancement layer. Both the up-sampled picture from the base layer and the previously reconstructed frame from the enhancement layer can be used as prediction for the frames at the enhancement layer. Figure 9 shows the picture structure for this kind of scalability. The frames are either coded as P or B type at the enhancement layer. The frame at the enhancement layer which is temporally coincident with an I-frame at the base layer is encoded as a P-frame. The frame at the enhancement layer which is temporally coincident with a P-frame at the base layer is encoded as a B-frame. For the P-frames at the enhancement layer, the prediction is the up-sampled reconstructed frame from the temporally coincident I-frame at the base layer. The B-frames at the enhancement layer allows "bi-directional" prediction using the up-sampled reconstructed  19  frame from the base layer as the backward reference and the previously reconstructed frame in the enhancement layer as the "forward reference". For the cases where the prediction from the base layer is selected, the motion vectors are not encoded to reduce the amount of side information transmitted.  F i g u r e 9 F r a m e S t r u c t u r e used i n S p a t i a l S c a l a b i l i t y Systems  Figure 10 illustrates the diagram for the discussed spatial scalable encoder. The original video signal is downsampled and the low resolution video signal is generated. The low resolution video signal is encoded separately and the resulting bitstream represents the base-layer information. The reconstructed low resolution frames are upsampled and are made available as an additional prediction for the enhancement layer frames. The resulting prediction error of this combined prediction is encoded and forms the enhancement layer bitstream.  20  Input Video  ^-t/  DCT  w  Variable-Length | . Encoding  Quantization  c •o  Inverse Quantization  8  Enhancement Layer  {  Bitstream  . Inverse .  .DCT  c  I lm I O I >  '>c-  jl,  r  i  *  Motion Estimation  i; Motion Compensated . Prediction  Frame-Store Memory  1  Downsamplc  ' Upsample  , Additional Prediction Signal, for the Entiancxment Layer'  DCT  . "_ Variable- Length | Encoding  Quantization  Aow Resolution Video  Inverse Quantization  Base Layer Bitstream  Inverse DCT  ft Low Resolution Reconstruction Motion Compensated Prediction  Frame-Store Memory  Motion Estimation  Figure 10 Block diagram of Spatial Scalable Encoder  1.4 Overview ofMPEG-4 FGS Video Coding Standard The system for delivering MPEG-4 FGS video is illustrated in Figure 11. This system consists of three components, FGS encoder, Streaming Server and FGS decoder. FGS encoder encodes the original video into two layers, base and enhancement layer. Because of the variation in the transmission bandwidth over time, the FGS encoder does not know what bitrate the video is going to be transmitted. For this reason, the base layer is encoded at the minimum bitrate that is guaranteed by the transmitting channel, Rmm. The enhancement layer is encoded at the maximum bitrate that the transmission channel can  21  deliver, R  max  . During transmission, the streaming server truncates the enhancement layer  bitstream according the available bandwidth. The number of bits sent to the decoder depends on the available bandwidth at the time of transmission. Thus, an FGS decoder receives the base layer and the truncated enhancement layer bitstreams. The quality of the decoded video is proportionally related to the number of bits received by the decoder for the corresponding frame. To summarize, FGS uses three components to deliver the video to the end user: 1. Scalable Video Encoder: encodes in a scalable manner at the highest possible quality. 2. Streaming Server: delivers scalable video to a given client. Maximum bandwidth utilization is achieved by truncating the video bitstream according to the available bandwidth 3. Decoder: decodes a truncated video bitstream. The reconstructed video quality decreases according to the amount of truncation performed at the streaming server. The FGS Encoder and Decoder are further described in the next subsections.  22  End-User 1  End-User 2  End-User 3  End-User n  - Encoder Encodes t h e video in a scalable m a n n e r  8 - Streaming Server Truncates a n d delivers t h e v i d e o utilizing t h e available b a n d w i d t h I  B  B  - Decoder Receives a t r u n c a t e d b i t s t r e a m a n d r e c o n s t r u c t s t h e v i d e o . Q u a l i t y is p r o p o t i o n a l to a m o u n t o finformation received  Figure 11. Structure of the end-to-end F G S video delivery system. The system comprises of  Streaming  Server  and  Encoder,  Decoder  1.4.1 MPEG-4 FGS Encoder Structure We illustrate the MPEG-4 FGS Encoder standard in Figure 12. The FGS results in an MPEG-4 non-scalable base layer encoded at an Rb bit-rate and an enhancement layer ase  encoded using bitplane coding with a maximum bit-rate of R  max  . To encode the  enhancement layer, first the residual frame is formed by taking the difference of the original (high quality) and reconstructed base layer (low quality) frames. The residual  23  frame is then DCT transformed to remove the spatial redundancy. The obtained DCT coefficients are bitplane and entropy coded to form the enhancement layer bitstream. The main steps of FGS enhancement layer coding can be summarized as: 1. Constructing the Residual Frame 2. DCT Transforming the residual frame to decrease the spatial correlation 3. Bitplane encoding of DCT coefficients 4. Entropy encoding of bitplane encoded symbols  Bitplane Scanning  XT o  CO  ai u e •TO  VLC' '.  Selective s Enhancement <  . . . . . . f-  A  . -  DCT  .  Input Video  Bitplane I  Optional i  i  S>  Find Maximum  Enhancement Layer Bitstream  DCT  Entropy Codina  Quantization  Base-Layer Bitstream *-  Inverse Quantization  Inverse DCT  Motion Compensation  Frame-Store Memory  Motion Estimation  Figure 12. Block diagram of M P E G - 4 F G S Encoder  24  1.4.1.1 Bitplane Coding of DCT Coefficients The residual frames that are found by subtracting the base layer frames from the enhancement layer frames are coded by bitplane coding instead of conventional DCT coding. In the conventional DCT coding, the quantized DCT coefficients are zigzag scanned, then a symbol for every non-zero coefficient within the block (containing its value and information regarding the number of consecutive zeros before it) is found. The resulting symbols are mapped to binary codewords using a VLC table. In bitplane coding, every DCT coefficient is treated as a binary number of several bits instead of a decimal integer of a certain value [21, 22]. For each block in the residual frame, the absolute values of its coefficients are scanned in the zigzag order as shown in Figure 4 and then assembled in a one dimensional array as shown on the left hand side of Figure 14. In Figure 14, we assume the absolute value of any coefficient is between 0 and 31 meaning 5 bitplanes are needed to represent all the coefficients correctly. The maximum number of bitplanes needed for each frame is found before bitplane coding, at the "Find Maximum" stage. It should be noted that, the number of bitplanes needed to code the luminance and chrominance components of the frame may be different as illustrated in Figure 13. Therefore, there are three syntax values maximum_Ievel_y, maximum_level_u, maximum_level_v and they are coded in the frame header to indicate the maximum numbers of bit-planes for the Y-U-V components of the frame respectively (For the case illustrated in Figure 13, maximum_level_y is 6, in a x i in u m l e v e 1_ u is 4 and m a x i m u m l e velv is 4).  25  Figure 13 Maximum Number of Bitplanes Needed for Bitplane Coding maximum_level_y=6, maximum_level_y=4, maximum_level_v=4  When each entry in this array is written in binary form, a binary matrix results (see right hand side of Figure 14). A bit-plane of a block is defined as the one dimensional array of bits corresponding to a column of this binary matrix. The first bit-plane corresponds to the binary bits formed by the Most Significant Bit's (MSB)'s of the coefficients, whereas the MSB-1 form the second bit-plane and so on. Transform Coefficients (Decimal)  Tranform Coefficients (Binary)  16  1  |  0  15  o  |  14  0  j  i  19  1  |  1 |  1  0 |  0  •  0  0  CO c/^  S  |  j 0 | 0 | i I | l |  0  j j  0 I  1  0  1 |  1  •II j 0 j 0 |  0  — CO  r^j  S  ->  CO c/*>  c-> 02  c/i  5!  -^r CQ cy>  Figure 14. The Bitplane Generation Process  After all the bitplanes are formed for each 8x8 transform block of the frame, symbols are generated for each bitplane. For each 1 in the bitplane, a symbol is formed. Each symbol  26  has two components, RUN and EOP. RUN specifies the number of consecutive zeros before the 1 and EOP specifies whether there are any more 1 left on this bitplane. If a bitplane contains all zeros, a special symbol A L L Z E R O is formed to represent it. For example, consider a bitplane that consists of following elements: 10000011000000001000...0  There are four l ' s in the array, hence four symbols are generated. The first symbol refers to the first 1 in the array, which happens to be the first element of the array. There are no O's preceding the first 1, so the RUN is 0 for the first symbol. Because there are more l's after the first 1 , EOP is 0. So the first symbol is generated as (0,0). There are five O's between the first and the next 1, so the RUN for the second symbol is 5. EOP is still zero as there are more l ' s in the bitplane. So the second symbol is generated as (5,0). Similarly, the next symbol is found to be (0,0). The last symbol refers to the last 1 in the bitplane array. There are eight O's preceding the last 1, so the RUN is found as 8 for this symbol. As this is the last 1 in the bitplane array, EOP is 1. So the last symbol is found as (8,1). To summarize (0,0), (5,0), (0,0) and (8,1) are the symbols generated for this bitplane. For the first and second bitplanes, it is very probable that most of the (8x8) blocks in a (16x16) macroblock will have A L L Z E R O symbols. That is every entry in the bitplane is zero. Instead of A L L Z E R O bitplanes separately, it is more efficient to group the A L L Z E R O bitplanes in the macroblock and code them together. This is only done for the first and second bitplanes. FGS standard uses Coded Block Pattern (CBP) for this purpose. CBP is a variable length coded binary string, placed at the macroblock header of  27  the bitstream, and specifies which blocks are A L L Z E R O within the macroblock. For details of CBP coding specified in FGS, please refer to [1].  1.4.1.2 Entropy Coding At the final stage, the (RUN, EOP) symbols of the enhancement layer are variable length coded (VLC). In VLC coding, the generated symbols are mapped into binary codewords according to the symbols' statistics. These binary codewords are stored at the encoder and they constitute the VLC table for the enhancement layer. The exact VLC table is also stored at the decoder allowing identical reconstruction of the coded symbols. 1.4.2 MP EG-4 FGS Decoder Structure Figure 15 illustrates the MPEG-4 FGS Decoder standard. The structure of FGS Decoder is similar to that of FGS Encoder. The FGS Decoder consists of two layers, base and enhancement layer. The FGS Decoder base layer is a standard MPEG-4 decoder that outputs the base layer video with the minimum quality. The FGS enhancement layer decoder is built on top of the base layer decoder to generate the enhancement video. The enhancement layer decoder operates on a truncated version of the enhancement layer bitstream. After the enhancement layer decoding process is done, the output is added to the output of the base layer decoding process to produce the high quality, enhancement video. Decoding steps of the enhancement layer for FGS decoding are presented below. The enhancement layer bitstream is first decoded with an entropy decoder. The output of the entropy decoder are (RUN,EOP) symbols and some syntax elements that will be used in bitplane decoding. The next step is the bitplane decoding step where the DCT coefficients are reconstructed. Note that, unless all the enhancement layer information is transmitted to the decoder, the reconstructed DCT coefficients at the decoder side are not  28  identical to the DCT coefficients encoded at the FGS encoder, before transmission. The more information the decoder receives, the more accurate the reconstructed DCT coefficients are. The reconstructed coefficients are then inverse transformed and the result is added to the base layer video to obtain the enhancement video.  FGS ^ Entropy Decoding  Enhancement! i liilslream  (  7 Input Bitstream  ...  r  Bitplane Deshifting  . Inverse DCT  + /^^Enhancement Layer Video  "©1  %  -  \  \  T*  Entropv Decoding  V  J Inverse Quantization  Motion Compensation  — ^  Inverse DCT  Frame-Store Memory Base Layer Video  Figure 15 Block Diagram of the M P E G - 4 F G S Decoder  1.5 Overview  of H.264 Video Coding  Standard  H.264 is the newest video coding standard and it was developed by ITU-T Video Coding Experts Group and ISO/IEC Moving Picture Experts Group. H.264 includes a number of advances in video coding technology, making it highly efficient in terms of coding and network friendliness. The design of the standard is based on a conventional block based motion compensation video coding concept described in the previous sections. However, the design also includes several new features that result in a 50% bit rate savings when compared with previous standards [18]. In this section, the main advancements offered by the H.264 video coding standard is presented, for further details please refer to [15].  29  1.5.1 Advances in Motion Compensated Prediction H.264 is much more flexible in terms of motion compensation block sizes and it can support a luminance motion compensation block size as small as 4x4. The use of a smaller block size in the motion compensation stage allows the encoder to describe complex motions more accurately, thus decreasing the prediction error. In addition, H.264 supports quarter-pel motion compensation that further improves the coding efficiency of the video coding system. H.264 supports multiple reference pictures for motion compensation by the addition of new inter-prediction types. These features enable motion to be represented a lot more accurately than previous standards. They also increase the coding efficiency of the system considerably. H.264 standard also includes an in-loop deblocking filter to decrease the blocking artifacts and increase the coding efficiency of the video. The blocking artifacts originate from both the motion compensated prediction and the residual coding stages of the process and are especially visible at low bitrates. Although, the application of a deblocking filter, i.e., after the video is decoded, has been used in previous video coding standards, H.264 places such a filter in the motion compensation loop. This improves the coder's ability to do inter-prediction that in turn, results in a better compression ratio. 1.5.2 Advances in Transform Coding One of the most important features of H.264 is its use of a different transform. Unlike the major video coding standards such as MPEG-2, MPEG-4 that use 8x8 DCT, H.264 uses a 4x4 Integer Transform. It was observed that the smaller block size decreases some of the  30  artifacts associated with transform coding. Apart from the size, the low complexity nature of the 4x4 Integer Transform makes it very efficient to implement on hardware platforms such as ASIC's or digital signal processors. Unlike DCT, the Integer Transform was designed to allow exact-matching inverse transform. This eliminates the "drift" problem due to a slight mismatch between the encoder and decoder representation of video. 1.5.3 Advances in Entropy Coding H.264 includes two methods for entropy coding, the first one is Context Adaptive Variable Length Coding (CAVLC) and the second one is Context Adaptive Binary Arithmetic Coding (CABAC). Both coding methods use context based adaptivity to improve the performance of the encoder for all types of sequences. CAVLC has relatively less computational complexity and also includes Reversible Exp-Golomb codes to code some syntax elements. Reversible Exp-Golomb codes can be used to improve the error resilience of the system and they are further described in Section 2.3.2.1. CABAC is a more powerful than CAVLC and significantly improves the coding performance of the system but with an additional complexity to encode/decode.  31  CHAPTER 2 2 H.264 Based Fine Granular Scalability (FGS) 2.1 Introduction The latest video coding standard, H.264, provides superior compression efficiency to all previous standards, but it does not include tools for coding the video in scalable fashion. We introduce scalability for the H.264 standard, so that it can be used more efficiently in network environments where bandwidth varies over time. This chapter presents the details of our developed scalable H.264 structure. This structure is based on the latest scalable video coding standard, Fine Granular Scalability (FGS) that is originally developed for MPEG-4. The proposed structure is not a straightforward extension of FGS, where the FGS structure is implemented on H.264 without any modifications. The techniques present in FGS are modified and novel techniques are developed in order to achieve the best adaptation of FGS to H.264. By modifying the FGS structure, we achieve low complexity, high coding efficiency and high error-resiliency for the overall system. FGS is the latest scalable video coding standard that was developed within the MPEG committee and is included in MPEG-4 Streaming Video Profile. FGS encodes the video with two different layers, the base layer and the enhancement layer. The enhancement layer is encoded using bitplane coding and its fine granular scalable nature makes the FGS standard a very flexible coding tool for adapting to the dynamic bandwidth change of the underlying network. The base layer of the FGS standard is encoded using  32  traditional video coding technologies. The current MPEG-4 FGS standard uses MPEG-4 to encode the base layer. H.264, the latest video coding standard, developed by the Joint Video Team (JVT) of ITU-T and ISO provides superior compression efficiency to MPEG-4. Because at a given bitrate H.264 is able to provide better video quality than previous video coding standards, it is predicted to be widely adopted. One possible application area for H.264 is video communications over best-effort networks, where the available bandwidth for video transmission varies with time. Although H.264 offers better efficiency than MPEG-4 in terms of compression ratio, it lacks tools that make it scalable for use at different bitrates. One possible way of introducing scalability to H.264 is to directly apply the FGS process as is done in MPEG4. Thus the FGS base layer is encoded using H.264 instead of MPEG-4, while the same process as in MPEG-4 FGS enhancement layer is applied as is [12]. Such a straightforward extension of FGS is possible due to FGS' design that allows the use of any video coding standard for encoding the base layer video. This, however, presents serious drawbacks because of the fundamentally different video coding tools used in H.264 & MPEG-4. Firstly, encoding the enhancement layer using FGS (as in MPEG-4 coding) introduces Discrete Cosine Transform (DCT) computations to the H.264 system that uses Integer Transform. This significantly increases the complexity of both the encoder and decoder (particularly the latter). Secondly, the resulting system would fail to encode the enhancement layer using the advanced techniques introduced by H.264, which has proved to significantly improve the picture quality, increase the error resilience and decrease the complexity of the overall system.  33  In this thesis, we overcome the aforementioned drawbacks, by modifying the FGS video coding standard and by introducing new techniques. The developed tools increase the error resilience and decrease the encoding and decoding complexity of the scalable video coding system. In Section 2.2, we first present the trivial extension of FGS to H.264 and discuss its drawbacks. Following that discussion, our proposed H.264 based FGS structure is presented in Section 2.3. In Section 2.4, we present the experimental results. We summarize and conclude the chapter in Section 2.5.  2.2 Trivial Extension of FGS into H.264 Figure 16 illustrates the encoder that is a straightforward implementation of FGS into H.264. The base layer is encoded using H.264 instead of MPEG-4. There are two different approaches to calculate the residual signal, resulting in two different ways of encoding and decoding. This separation is due to the in-loop deblocking filter present in the H.264 standard. The residue signal for the enhancement layer can be formed by taking the difference between the original signal and the reconstructed base layer signal right after the deblocking filter is applied to the base layer signal. In an alternative way, the residue signal can be formed using the base layer signal prior to filtering operation. In this case, an additional deblocking operation would be needed at the decoder side for the enhancement layer to reduce the blockiness of the decoded video which in turn increases the complexity of the decoder. Figure 17 shows the decoders for both cases. As mentioned before, this direct implementation is not the most efficient solution for an H.264 based FGS encoder. The reason for this is explained in the following subsection.  34  .—,,  1  .  ..  :  Enhancement  01  o />' <_> .  M/t'il Frequency^ Weighting! 1  . W"  ' p—f—1  "  'A  ;. Input Video  Integer Transform  . Entropy Coding  Quantization  Base-Layer Bitstream >-  >  Motion Compensation  ~<—  Frame-Store Memory  Deblocking Filter  Motion Estimation  Figure 16.Block Diagram of the direct implementation of F G S Encoder on H.264 Encoder  35  FGS EntropyDecodmg  Enhancement Bitstream i*  Bitstream  w  ^—  Kntropv Decoding  Bitplane Deshiftmg  t". •  Inverse Quantization  Motion Compensation  >  Inverse'* • Transform  *"  —^  £  ^ hancement Layer Video n  Inverse Transform  Frame-Store Memory  Deblocking Filter  Base Layer Video *:  (a)  Enhancement Bitstream  m  Input Bits/ream  ^ w  * , FGS 1 Entropy Decoding.  Entropy Decoding  Bitplane i Deshiftmg  Inverse-'*-  Transform  Inverse Quantization — ^  Motion Compensation  +  Deblocking ' Filter  'Enhancement Layer l^deo  Inverse Transform  Frame-Store Memory  ^  Deblocking Filter  Base Layer Video : >-  (b) Figure 17.Block diagram of corresponding decoders for two cases of direct implementation of F G S on H.264 (a) Residual Signal is calculated after the deblocking filter at the base layer (b) Residual signal is calculated before the deblocking fdter at the base layer  36  2.2.1 Drawbacks of Trivial Extension of FGS to H.264 MPEG-4 FGS employs DCT for transform coding both at the base and the enhancement layers. On the other hand, the H.264 video coding standard replaces DCT with a low complexity 4x4 Integer Transform. The encoder used for trivial extension of FGS, depicted in Figure 16, uses 4x4 Integer Transform at the base layer and DCT at the enhancement layer. Using two different transforms introduces additional complexity to the entire system (both to the encoder and decoder). Also, by using DCT at the enhancement layer, the system cannot make use of the superior features of the 4x4 Integer Transform, such as its low implementation complexity and increased subjective quality [13]. The FGS video coding standard uses four different VLC tables at the entropy coding stage. In contrast, H.264 employs Reversible Exp-Golomb codewords where a VLC table needs not to be stored. Also, Reversible Exp-Golomb codewords increase the error resilience of the system and also can be very efficiently implemented on digital processors [23]. In the trivial extension of FGS over H.264, the enhancement layer cannot take advantage of these features of Exp-Golomb coding. Also additional complexity is introduced to the system by its storage need of four more VLC tables. In summary, the trivial extension of FGS introduces new computation blocks to the system complexity of both the encoder and decoder (particularly the latter). Secondly, the system fails to encode the enhancement layer using the advanced techniques introduced by H.264.  37  2.3 Proposed  H.264 based FGS  System  In this section, the proposed H.264 based FGS system is presented. Our proposed system mainly modifies Transform Coding and Entropy Coding structures of the FGS standard. The technical details of these modifications are presented in the following subsections. 2.3.1 Proposed Transform Coding Structure In our proposed encoder, the DCT transform at the enhancement layer is replaced by the H.264 4x4 low complexity Integer Transform. Consequently, the original FGS macroblock structure has to be changed since the size of the transform has changed. Figure 18 compares the macroblock structures for the 8x8 DCT and 4x4 Integer Transforms. For the case of the 8x8 transform, one macroblock contains 4 blocks of luminance and 2 blocks of chrominance that is a total of 6 transform blocks. On the other hand, the smaller 4x4 transform results in 16 blocks of luminance and 8 blocks of chrominance, for a total of 24 transform blocks for each macroblock. This increased number of transform blocks increases the number of bits needed to code the Coded Block Pattern (CBP) for each macroblock at the macroblock header, and as a result decreases the coding efficiency. CBP is a variable length coded binary string, placed at the macroblock header of the bitstream. For details of CBP coding specified in FGS, please refer to [1].  38  Y  Y  U  U  u  u  V V  V  16x16 Macroblock  Figure 18. F G S Macroblock Structures for two different transform sizes Left: The 8x8 D C T Transform used by M P E G - 4 Right: 4x4 Integer Transform used by H.264  2.3.1.1 Proposed CBP Coding Scheme The reason behind the increased overhead in CBP is that for the 4x4 Integer Transform, one macroblock contains 24 blocks instead of 6, and thus, each CBP code needs to provide information for more blocks. The increased overhead can be analyzed by considering Figure 19. In Figure 19, the 24 (4x4) blocks within a macroblock are grouped into 4. Each (8x8) group contains four blocks of luminance and two blocks of chrominance. The structure of the resulting 8x8 groups is the same as that of the MPEG-4 FGS macroblock structure, as shown in Figure 19. Hence, the same coding algorithm as MPEG-4 FGS CBP coding can be used to code the CBP for each of the new groups. This approach results in using 4 CBP codewords for each macroblock. So the amount of bits spent for CBP is approximately quadrupled. In order to reduce this overhead, we propose a hierarchical scheme to code the CBP. The main idea behind this scheme is grouping the transform blocks into larger size groups and coding the CBP code in steps. The proposed Hierarchical CBP Coding Scheme is presented in the next subsection. The experimental results of the proposed scheme are presented in Section 2.4.  39  j  Y Y  I  U  IY  Y  V  ! Y || Y | V j  Group 0  y  U  Group 1  Y I Y  U  Y  V  Yi  i  Y  i IiYI U I |I I j V y  Y  Group 2  y  |  Group 3  Figure 19. Grouping Scheme for the Simple C B P Coding  2.3.1.1.1 Hierarchical CBP Coding Scheme In the proposed CBP coding scheme, the 4x4 blocks are grouped into groups of four as illustrated in Figure 20. This scheme groups the blocks into 6, each group containing four transform blocks of either luminance or chrominance. The proposed CBP coding scheme refers to blocks in a hierarchical fashion, (see Figure 21). There are two steps in the proposed CBP coding scheme for the first bitplane. At the first step, each group within the macroblock is checked whether all the 4x4 blocks belonging to the group are A L L Z E R O or not. If all the blocks within the group are A L L Z E R O , then the group is classified as an A L L Z E R O group, otherwise that group is classified as non-zero.  40  ! i! l »l !  Y  3  Iu  Y  7  \U\  Y  2  Y  !  Y  5  Group 0  Group 1  i 9  ! Y \I Y  Y  |Y | | 12  Y  13  Group 2  :  10=  11  \Y \! Y ! 14i  15  Group 3  ;  I  ;  17.  iu  i g ;'  19  Group 4  i  IV ! V • 20; ;  21  ! V !! V I  22 ; ;  23  Group 5  Figure 20. Grouping Scheme for Hierarchical C B P Coding  At the second step of the proposed CBP coding scheme, non-zero groups are considered only. The reason for this is, A L L Z E R O and non-zero blocks can co-exist in a non-zero group, whereas only A L L Z E R O groups exist in an A L L Z E R O group. Thus, if a group is classified as A L L Z E R O at Step-1, no further information is required for the blocks within that group. For example, in Figure 21, the groups numbered 0,2,3 and 4 are found to be A L L Z E R O (shown shaded in Step-1). The blocks belonging to those groups are not coded at Step-2 of CBP coding. At Step-2, only blocks belonging to non-zero groups are coded (groups 1 and 5).  41  'mmmm YJmmm ? '  m M  m m  m m  mm, E^E^/ Original Block Structure mtZj for the Macroblock  m m  m  m  ,  STEP 1 of CBP Coding Shaded Groups are Coded at Step-1 Unshaded are Coded at Step-2  /  m  /  m  m  /  m  /  /  m / m / /  /  m  m  STEP2ofCBPCoding  /  Figure 21 Illustration of Hierarchical C B P Coding  So, the two steps of our proposed CBP coding algorithm can be summarized as follows: 1. First step of CBP. At this step, the CBP specifies information about each group within the macroblock. (Step 1 at Figure 21). From now on, the procedure for specifying this information will be referred to as group_cbp. This procedure indicates if each  group  is A L L Z E R O or not.  42  2. Second step of CBP. At this step, the CBP specifies information about each block within a nonzero group. (Step 2 at Figure 21). This procedure is referred to as blockcbp. This procedure indicates if each block is A L L Z E R O or not. Figure 22 illustrates the proposed main algorithm used to create the CBP code for a macroblock. First Bitplane  Third Bitplane and above  Second Bitplane  block_cbp  group_cbp  for all the 8x8 groups within the macroblock  for Step-1  group_cbp for the Step-1  block_cbp for all the 8x8 groups within the macroblock  Figure 22 Block Diagram of the M a i n Hierarchical C B P Coding Algorithm  As mentioned before and can also be seen from Figure 22, groupcbp and blockcbp are the two procedures that are used to code the CBP. Based on the characteristics of the macroblock, either groupcbp or blockcbp procedure is used. Also, CBP coding for the first bitplane, second bitplane and bitplanes above second one change slightly. The details of these procedures for different cases are explained in detail in the following sections. Step-1 of CBP Coding - group_cbp Procedure  The aim of this procedure is to specify, which groups within the macroblock are ALL_ZERO. Figure 23 illustrates the algorithm for the group_cbp procedure. It should  43  be noted that this procedure is not invoked for all the bitplanes of the macroblock. As can be seen from the main algorithm depicted in Figure 22, the cases where this procedure is invoked can be summarized as: •  For all the macroblock's first bitplanes  •  For all the macroblock's second bitplanes, if the macroblock has an A L L Z E R O first bitplane.  This procedure generates a binary string called CBP CODE for the entire macroblock. CBPCODE  specifies which groups within the macroblock are A L L Z E R O . In the  CBP CODE, a binary 1 means that the corresponding group is ALL_ZERO, while a 0 represents a non-zero group. It should be noted that, if any block within the group is not A L L Z E R O then the corresponding group is not an A L L Z E R O group. After the CBP CODE is generated, it is variable length coded using the VLC tables presented in Appendix B. The VLC tables for CBP CODE are based on Exp-Golomb codewords that is different from the codewords present in FGS standard. The details of Exp-Golomb coding are explained in detail in Section 2.3.2.1. The VLC tables are constructed based on the statistics of the CBP  CODE.  If a group is A L L Z E R O , this means that all the blocks within the group are A L L Z E R O and no further information is needed in the CBP for those blocks. However, an ambiguity exists for non-zero groups, since the CBP CODE does not specify which of the blocks belonging to a non-zero group are A L L Z E R O . In order to address these blocks, the blockcbp procedure is invoked.  44  group_cbp Procedure Generate CBP_Code for the Macroblock Put the VLC Code for the CBPCode T  For each 8x8 group In the MB T  block_cbp for this 8x8 group.  Figure 23 Block Diagram of groupcbp  Procedure  Step-2 of C B P Coding, b l o c k c b p Procedure  The aim of this procedure is to determine which blocks within a group are ALL_ZERO. The algorithm for this procedure is illustrated in Figure 24. It should be noted that, not all the groups within a macroblock are coded at this step. As can be seen from the main algorithm and the group_cbp procedure depicted in Figure 22 and Figure 23 respectively, this procedure is invoked for the following cases: •  For each non-zero group at the first and second bitplane.  •  For all the groups of a macroblock at the second bitplane, if the entire macroblock has non-zero first bitplane.  •  For all the groups at the third bitplane and above  This procedure first checks if all the blocks within the group are A L L Z E R O at the lower bitplanes (i.e., if we are coding a group at the third bitplane, we first check if this group has ALL_ZERO first and second bitplanes). If all the blocks within the group are 45  A L L Z E R O at previous bitplanes, then the variable length binary string called SUBjCBPjCODE  is generated. SUBCBPCODE  specifies which blocks within those  groups are A L L Z E R O . In the SUBCBPCODE,  a binary 1  means that the  corresponding block is A L L Z E R O , while a 0 represents a non-zero block. For example, if only the first block in the group is A L L Z E R O , then the SUBCBPCODE  would be  1 0 0 0 . After the SUB CBP CODE is generated, it is variable length coded using the VLC tables presented Table 22 in Appendix C. The VLC tables are constructed based on the statistics of the SUB CBP CODE. block_cbp Procedure  NO j  \ .  4x4 blocks are A L L _ Z E R O \ Y E S at previous bitplanes I  Find # of blocks that are ALL_ZERO at previous bitplanes (cnf)  Place cnt bits specifying whether those blocks are ALL ZERO  Put the VLC Code for sub_cbp_code  Figure 24 Block Diagram of block_cbp Procedure  If not all the blocks within the group are A L L Z E R O at lower bitplanes (i.e., the group contains a block that was non-zero at a lower bitplane), a different approach is taken. Blocks that are non-zero at previous bitplanes has very low probability of being ALL_ZERO at the current bitplane, thus, they are excluded in the process. For other blocks (have ALL_ZERO previous bitplanes), one bit is used to specify if they are A L L Z E R O at the current bitplane. Let's say, in a given group, only two blocks were  46  non-zero at lower bitplanes, and the first block of those two is A L L Z E R O at the current bitplane. Then, the code that will be placed to the bitstream is 10. 2.3.1.1.2 E x a m p l e of H i e r a r c h i c a l C B P C o d i n g  The following example illustrates how the CBP is coded using the proposed Hierarchical CBP Coding method. This example considers CBP coding of a single macroblock for the first and second bitplanes. For simplicity, we only consider the coding of luminance component (i.e., the macroblock under consideration does not contain any color components). The structure of the macroblock under consideration for the first bitplane is shown in Figure 25. For this macroblock, the first bitplanes of blocks 2, 7 and 10 contain non-zero coefficients, whereas all the rest have A L L Z E R O first bitplanes. For the first bitplane, the groupcbp procedure is invoked for all the groups of the macroblock. Groups 0 and 2 are classified as A L L Z E R O groups because all the blocks' bitplanes within the group (i.e., blocks 0,1,4 and 5 for Group-0 and blocks 8,9,12 and 13 for Group-2) are A L L Z E R O . Groups 1 and 3 are classified as non-zero due to non-zero bitplanes these groups contain (the bitplanes of blocks 2 and 7 are non-zero for Group-1 and the bitplane of block 10 is non-zero for Group-3). So the CBP CODE for the macroblock is found to be 1010. First and third bits of CBP CODE are 1, which indicates Group-0 and Group-2 as ALL_ZERO. Second and fourth bits of CBP  CODE  are 0, which indicates Group-1 and Group-3 as non-zero. After this, the VLC code corresponding to the CBP CODE is found using the tables in Appendix B and placed in the bitstream. For this case, the VLC code is found as 0001101.  47  T fas  p $e 15 a  48  ALL ZERO  ALL ZERO  Block 0  Block 1  NOT ALL ZERO Block 2  ALL ZERO  ALL ZERO  ALL ZERO  Block 4  Block 5  Block 6  ALL ZERO  ALL ZERO  Block 8  Block 9  NOT ALL ZERO Block 10  Block 11  ALL ZERO  ALL ZERO  ALL ZERO  ALL ZERO  Block 13  Block 14  Group 0  Block 12  Group 2  Group 1  ALL ZERO Block 3  NOT ALL ZERO Block 7  ALL ZERO  Block 15  Group 3  Figure 25 Example on Hierarchical C B P Coding, Group Structure of the First Bitplane  At the next step, blockcbp procedure is invoked for non-zero groups. So, Group-1 and Group-3 (having non-zero first bitplanes) are further coded using blockcbp procedure. The blockcbp procedure first checks if all the blocks within the group are A L L Z E R O at the lower bitplanes. As the current bitplane being coded is the first one, all the blocks within Group-1 and Group-3 are defined as A L L Z E R O at lower bitplanes. For these groups, a binary string that is called SUBCBPCODE bitstream. SUBCBPCODE  is generated and placed in the  is similar to CBP CODE, but it specifies which blocks are  A L L Z E R O instead of specifying which groups are A L L Z E R O . In Group-1, Block-2  49  and Block-7 have A L L Z E R O bitplanes, this means SUB CBP CODE for Group-1 is 0110 (second and third blocks within Group-1 has A L L Z E R O bitplanes). In Group-3, Block-11,  Block-14  and  Block-15  have  ALLZERO  bitplanes, this means  SUB CBP CODE for Group-3 is 0111 (second, third and fourth blocks within Group-3 has A L L Z E R O bitplanes). After the SUBCBPJCODE  is constructed for all the non-  zero groups, their VLC Codes are found using tables presented in Appendix C.  Group 1 Group 3  SUB_CBP_CODE  V L C Code  0110  00110 00111  This step concludes the CBP coding for the first bitplane of the macroblock. The code placed for CBP for this macroblock at the first bitplane is: 001100 00110 00111 that results in a total number of bits of 16. It should be noted that, in general number of bits needed to code the CBP of the macroblock is lower than this specific example. For detailed analysis, please refer to experimental results at the end of this section. Figure 26 illustrates the structure of the macroblock under consideration for the second bitplane.  50  NOT ALL ZERO Block 0  Block 1  NOT ALL ZERO Block 2  NOT ALL ZERO Block 3  ALL ZERO  ALL ZERO  ALL ZERO  Block 4  Block 5  Block 6  NOT ALL ZERO Block 7  ALL ZERO  ALL ZERO  Block 8  Block 9  NOT ALL ZERO Block 10  NOT ALL ZERO Block 11  ALL ZERO  ALL ZERO  NOT ALL ZERO Block 14  NOT ALL ZERO Block 15  ALL ZERO  Group 0  Block 12  Group 2  Block 13  Group 1  Group 3  Figure 26 Example on Hierarchical C B P Coding, Group Structure of the Second Bitplane  It is first checked if all the groups within the macroblock are A L L Z E R O or not at the first bitplane. For this example, Group-1 and Group-3 are non-zero at the first bitplane. Thus, blockcbp procedure is invoked for each group within the macroblock. In the block_cbp procedure, it is first checked if all the blocks within the group are A L L Z E R O at the first bitplane. For our example, all the blocks belonging to Group-0 and Group-2 are A L L Z E R O at the first bitplane. For these groups, SUB CBP CODE binary siring is generated. In Group-0, only Block-0 has non-zero bitplane, rest of the blocks has A L L Z E R O bitplanes. Thus SUB CBP CODE is found as 0111 (only the  51  first block within the group is non-zero). In Group-2, all the blocks have A L L Z E R O bitplanes, thus SUB CBP CODE is found as 1111. After SUB_CBP_CODE  is  constructed for all the non-zero groups, their VLC Codes are found using tables presented in Appendix C. SUBCBPCODE  V L C Code  Group 0  0111  00111  Group 2  1111  1  Not all the blocks within Group-1 and Group-3 are A L L Z E R O . That's why, SUB CBP CODE is not used for Group-1 and Group-3, but a different approach is taken. First, each block within those groups are checked whether they have A L L Z E R O first bitplanes. For the blocks having A L L Z E R O bitplanes, one bit is used to specify whether they have A L L Z E R O second bitplanes. Thus, in this approach, the number of bits placed in the bitstream is equal to the number of blocks being A L L Z E R O at previous bitplanes. In Group-1, Block-3 and Block-6 have A L L Z E R O first bitplanes and Block-3 has non-zero and Block-6 has A L L Z E R O second bitplane. So for this group, binary string 01 is generated and placed in the bitstream (first bit specifies Block3 is non-zero at the second bitplane and second bit specifies Block-6 is A L L Z E R O at the second bitplane). In Group-3, only Block-10 has non-zero first bitplane and Block-11, Block-14 and Block-15 have A L L Z E R O first bitplane(three bits are used for Group-3). As seen from Figure 26, all these blocks are A L L Z E R O at the second bitplane. So for this group, binary string 000 is generated and placed in the bitstream The total code placed for CBP for this macroblock at the second bitplane is: 00111 1 01 000 that results in a total number of 11 bits.  52  2.3.1.2 Summary of Proposed Transform Coding Structure In this section, we presented our novel structure that replaces the 8x8 DCT at the enhancement layer by the 4x4 Integer Transform. By replacing DCT by the H.264 4x4 Integer Transform, the complexity of the entire system (both to the encoder and decoder) is decreased. Also, by using Integer Transform at the enhancement layer, the system can make use of the superior features of the 4x4 Integer Transform, such as its low implementation complexity and increased subjective quality. The consequence of using Integer Transform, instead of DCT is the change of the original FGS macroblock structure. This is due to the fact that the size of the Integer Transform is different than that of DCT. For the case of the 8x8 transform, one macroblock contains 4 blocks of luminance and 2 blocks of chrominance that is a total of 6 transform blocks. On the other hand, the smaller 4x4 transform results in 16 blocks of luminance and 8 blocks of chrominance, for a total of 24 transform blocks for each macroblock. This increased number of transform blocks increases the number of bits needed to code the binary string called Coded Block Pattern (CBP) for each macroblock at the macroblock header, and as a result decreases the coding efficiency. In order to reduce this overhead, we presented our novel scheme that codes CBP more efficiently. The main idea behind the proposed scheme is grouping the transform blocks into larger size groups and coding the CBP in steps.  53  2.3.2 Proposed Entropy Coding Structure As mentioned earlier, the entropy coding technique used in FGS is different than that of H.264. FGS uses four different VLC tables to code its symbols resulting from bitplane coding. H.264 uses Reversible Exp-Golomb Codewords to code some of its syntax elements (Context Adaptive V L C and Context Adaptive Binary Coding are other techniques supported by H.264, but are not considered in this thesis). Using different entropy coding techniques in enhancement and base layers of the H.264 FGS system increases the complexity of the system, thus, it is desirable to have the same structure used in base and enhancement layers. Also Reversible Exp-Golomb coding has the following advantages over FGS entropy coding technique: 1. Exp-Golomb codewords standardized in H.264 can be implemented very efficiently on digital processors [23]. 2. Exp-Golomb codewords increase the error resilience of the system due to their reversible nature [14]. These advantages are particularly important in wireless video communication environments that are usually characterized as highly error-prone. Also, the size and cost limitations of low-end processors embedded in mobile units severely limit the complexity of the algorithms that can be used. Thus, for these applications low complexity features of the Exp-Golomb codes offer an additional advantage. Based on all these reasons, we replace the VLC technique present in FGS, with its H.264 counterpart (based on Reversible Exp-Golomb Codewords). In this section, we first present the details of Exp-Golomb Coding process adapted in H.264. Following that  54  overview, we present the details of the proposed entropy coding scheme for our H.264 based FGS system.  2.3.2.1 Overview of Reversible Exp-Golomb Coding The Exp-Golomb codeword table used in H.264 entropy coding is written as: 1  0  0  0  0  0  0  0  1  0  1  0  1  0  1  x  0  xx x x x  3  2  X i  x  2  0  x X i  0  x  0  where x take the values of 1 and 0. Each codeword is referred to by its length in n  bits,L = 2n + 1, and INFO = x .i, ... xi, x . The codewords are numbered from 0 and n  0  upwards. When the number of bits, L, and INFO are known, the regular structure of the table makes it possible to create a codeword. This eliminates the need for storing a V L C table for the codewords. The first 10 codewords and their corresponding code numbers are presented in Table 1. Example of Exp-Golomb Codes for a larger sample is given in Appendix A, Table 13.  55  Code Number  Codeword  0  1  1  0  2  O i l  1 0  3  0  0 1 0  0  4  0  0 1 0  1  5  0  0  1 1 0  6  0  0  1 1 1  7  0  0 0 1  0 0 0  8  0  0 0 1  0 0  9  0  0 0  0 1 0  1  1  Table 1 First ten Exp-Golomb Codewords  A decoder decodes the codeword by reading the n+1 bit prefix followed by n bits for the INFO. The n+1 bit prefix is a string of zeros followed by a 1. (i.e. 0 0 0 0 1 for n=4). The following n bits after the prefix give the INFO. These codewords are characterized as reversible, which means decoding of these codewords from the reverse direction is possible. For non-reversible codewords, recovery of data occurring after the erroneous bits in the bitstream is not possible. Thus data till the  56  next resynchronization marker is lost, although there may not be any errors in it. However, reversible codewords make the decoding of the data having no errors occurring after the erroneous bits possible (see Figure 27). This way, the error resiliency of the system is increased. This feature becomes very important in erroneous transmission environments, such as wireless networks and the Internet. Generally speaking, using reversible codewords is less efficient in terms of compression ration, but shows better performance in the presence of losses.  Decoded data before error occurs •  Resynchronization Marker  Recovered Data using reversible c o d e w o r d s M  Error in Bitstream  Figure 27  Data  Resynchronization Marker  recovery using reversible codewords  2.3.2.2 Proposed Entropy Coding Structure FGS standard uses four different VLC tables to code the (RUN,EOP) symbols resulting from bitplane coding (refer to Section 1.4.1.1 for details of bitplane coding). This entropy coding structured is replaced by the H.264 counterpart that uses Reversible ExpGolomb codewords. In order to construct a VLC table to entropy code an information source, one need to know the statistics of the symbols that the source generates. In our case, we want to code (RUN,EOP) symbols, resulting from the bitplane coding process. The FGS standard gives the statistics of these symbols based on DCT coding. As the DCT coding is replaced by Integer Transform in our proposed system, the statistics of the (RUN,EOP) 57  symbols have changed. Therefore, we have to collect the statistics of these symbols resulting after bitplane coding based on Integer Transform. For this purpose, different sequences at different bitrates are encoded and the statistics for the (RUN,EOP) symbols resulting after bitplane coding of 4x4 Integer Transform coefficients are collected. The different sequences and the corresponding are presented Table 2. Sequence Characteristic  Base  Layer  Size  Bitrates  Camera  zooming  out  a 500 Kbps,  of  Frames  1  Tempete  Number  CIF 300  flower, no motion, medium  1.5 Mbps  (352x288)  texture detail Bus  Camera panning from left to 500 Kbps,  CIF  right  (352x288)  following  medium-high  a  bus,  motion  1.5 Mbps  300  and  medium texture detail Mobile  Camera  following  a toy 500 Kbps,  train, low motion, very high  1.5 Mbps  CIF  300  (352x288)  texture detail Table 2 Sequences used to gather statistics for ( R U N , E O P ) symbol  We present the statistics obtained from BUS sequence coded at 500 Kbps in Figure 28. The statistics for other cases can be found in Appendix D.  At the time of this work, the H.264 codec did not have a mechanism for bitrate control. The target bitrate is achieved by changing the quantization parameter for the sequence, hence it does not represent the exact bitrate rather an approximate one. 1  58  Each figure is comprised of 4 graphs, denoting the statistics of the symbols at different bitplane levels. In the graphs, first half is for EOP=0 and the second half is for EOP=l. By definition, the first bitplane does not contain an A L L Z E R O symbol. For the other bitplanes, A L L Z E R O symbol is shown after at the middle, just following symbols belonging to EOP=0. Statistics for the S e c o n d B i t p l a n e  S t a t i s t i c s for the F i r s t B i t p l a n e  35000  80000  30000  70000 60000  25000 5  20000  t  I  r  40000  15000  30000  10000  20000  ~i -  5000 0  50000  10000 k  II  i i i,  < i ' V V ' b % ^ ^ ,  : — s  v ' 5 l ' V » ' ' o ' t i ^ l ^ .  0 N  V  3  5  7  9  11 13 15 17 19 21 23 25 27 29 31  Figure 28 (RUN,EOP) Statistics for the BUS Sequence at 500 Kbps  MPEG-4 FGS uses four different VLC tables to code the (RUN,EOP) symbols. One table is used for the first bitplane, one for the second, one for the third and one for the  59  rest of the bitplanes (fourth and above). The reason for this is that the statistics of the symbols vary significantly among different bitplanes, and constructing new VLC table increases the coding efficiency of the system. For the proposed structure, it was observed that three V L C tables (one for the first bitplane, one for the second bitplane and one for the rest of the bitplanes) are enough to characterize the distribution of the symbols. There are two reasons for decreasing the number of VLC tables. The first reason is that the decrease in the number of possible symbols (RUN can be at most 15 in our case instead of 63, due to the smaller transform size), limits the amount of variation that symbols can possess. The second reason is, a slight variation in the statistics can not be captured by the Exp-Golomb codewords. It is important to note that these tables, unlike in the case of MPEG-4 FGS, do not contain an ESCAPE code. MPEG-4 FGS uses ESCAPE to signal a symbol with large RUN value because the probabilities of a large RUN value are very small. Following the ESCAPE code, 6 bits are used to code the RUN and 1 bit for the EOP. In the proposed structure, the maximum value for RUN is 15 and the number of symbols is significantly lower when compared to MPEG-4 FGS. For this reason, we omit the use of ESCAPE in VLC coding. After the statistics are obtained, the VLC tables are constructed. The codewords in the VLC tables are based on Reversible Exp-Golomb codewords and are the same as the H.264 uses. The following table presents the proposed VLC table for the first bitplane. The tables for the other bitplanes can be found in Appendix E.  60  Index  (RUN,EOP)  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  (0,0) (0,1) (1,0) (1,1) (2,1) (2,0) (3,1) (3 ,0) (4,1) (5,1) (4,0) (6,1) (5,0) (7,1) (8,1) (6,0) (9,1) (7,0) (10,1) (8,0) (11,1) (9,0) (12,1) (14,1) (10,0) (13,1) (11,0) (15,1) (13,0) (12,0) (14,0)  Code  1 010 Oil 00100 00101 00110 00111 0001000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 000010000 000010001 000010010 000010011 000010100 000010101 000010110 000010111 000011000 000011001 000011010 000011011 000011100 000011101 000011110 000011111  Table 3 Proposed V L C Table for the First Bitplane  2.4 Experimental  Results  In this chapter, we presented our proposed H.264 based FGS system. Our proposed system modifies the original FGS structure to achieve the best adaptation of FGS on H.264. We first replaced the DCT at the enhancement layer, with 4x4 Integer Transform to decrease the complexity of the system. This modification brings an overhead in CBP  61  coding at the macroblock header, due to the smaller transform size used. To code the CBP more efficiently, we developed novel Hierarchical CBP Coding Scheme that is presented in Section 2.3.1.1.1. Secondly, the entropy coding scheme of FGS is modified to achieve higher compression ratios, lower complexity and increased error resilience of the overall system. In this section, we present experimental results of our proposed system. In order to provide a better comparison for each developed technologies, we present the experimental results in two subsections. Section 2.4.1 presents the performance of our proposed Hierarchical CBP Coding Scheme. In Section 2.4.2, we present the experimental results for the proposed entropy coding structure. 2.4.1 Experimental Results for the Proposed CBP Coding Scheme The aim of the proposed Hierarchical CBP Coding scheme is to code the CBP more efficiently for our H.264 based FGS system, than the present scheme of the FGS standard. This section presents the detailed analysis about the amount of bit savings on CBP coding that can be achieved using our proposed scheme. For this analysis, we compare the number of bits used to code the CBP in our H.264 based FGS system using two different methods. First method is our proposed Hierarchical CBP Coding scheme that is presented in Section 2.3.1.1. Second method is the same CBP coding structure as in MPEG-4 FGS. However, for the second method the blocks within the macroblock are grouped as in Figure 19, so that each group's structure is identical to FGS macroblock structure. As also shown in Figure 19, the 24 (4x4) blocks within a macroblock are grouped into 4. Structure of each (8x8) group is the same as theMPEG-4 FGS macroblock structure. Thus, the same coding algorithm as MPEG-4 FGS CBP coding is  62  used to code the CBP for each of the groups. This approach results in using 4 CBP codewords for each macroblock. So the amount of bits spent for CBP is approximatelyquadrupled. For this experiment, we use the test sequences presented in Table 2. Because the techniques for coding the CBP for the first and second bitplane are not the same, separate results are presented for each of the bitplanes. The following tables present the average number of bits used for coding the CBP using both methods. It is seen that, more bits are allocated for coding the CBP at the second bitplane, no matter which method is used. This is due to the fact that, at the second bitplane, there are less A L L Z E R O but more non-zero blocks. This means, there are less blocks that can be grouped and coded as A L L Z E R O , decreasing the efficiency of the CBP coding. For the first bitplane, almost all the blocks are A L L Z E R O , and CBP can efficiently group and code them together, using less bits overall. It is clearly seen that, the proposed Hierarchical CBP Coding Scheme outperforms the FGS CBP coding scheme significantly for the H.264 based FGS system on all the sequences. On average the proposed scheme uses 70% less bits than FGS scheme for coding the CBP at the first and second bitplanes. For some cases the proposed scheme uses up to 75% less bits than the FGS scheme. As mentioned before, CBP coding scheme in the FGS standard is not suitable for our H.264 based FGS system. This is due to the increased number of transform blocks within the macroblock. The proposed Hierarchical CBP coding scheme, decreases this overhead significantly. This way, the 4x4 Integer Transform can be used more efficiently in the enhancement layer.  63  Proposed CBP Coding Scheme First Bitplane Bits Used for 450  FGS CBP Coding Scheme  Second Bitplane First Bitplane  Second Bitplane  2051  6257  1653  CBP Coding Table 4Average Number of Bits Used for C B P Coding for Bus Sequence  Proposed CBP Coding Scheme First Bitplane Bits Used for 513  FGS CBP Coding Scheme  Second Bitplane First Bitplane  Second Bitplane  2147  7244  1953  CBP Coding Table 5 Average Number of Bits Used for C B P Coding for Tempete Sequence  Proposed CBP Coding Scheme First Bitplane Bits Used for 339  FGS CBP Coding Scheme  Second Bitplane First Bitplane  Second Bitplane  1937  7453  2004  CBP Coding Table 6 Average Number of Bits Used for C B P Coding for Mobile Sequence  2.4.2 Experimental Results for the Proposed Entropy Coding Scheme In this section, we compare the coding efficiency of the proposed entropy coding scheme that was presented in Section 2.3.2, with the one standardized in FGS. For this comparison we encode different sequences at different base layer bitrates. Two different methods are used for entropy coding: i) Proposed Entropy Coding Method and ii) Original FGS Entropy Coding Method. The sequences used for this experiment are presented in Table 2.  64  After encoding the different sequences with both methods, the number of bits used to code the (RUN,EOP) symbols at different bitplane levels are found for each method. Table 4 presents the results for all the different sequences, for different bitplane levels. Each row of Table 4 presents results for a sequence encoded at a specific base layer bitrate. The columns of the table are grouped into four, and each group presents the results for one bitplane level (first, second, third and the fourth bitplanes). The number of bits used to code symbols using two different methods of entropy coding methods, is presented side by side for each case. We also illustrate the results for each sequence separately at Table 8,Table 9 and Table 10. In these tables, there are two figures for each sequence, representing the different bitrates that the base layers are encoded at. When compared with FGS entropy coding method, the proposed method uses 7% less number of bits on average to code the (RUN,EOP) symbols. For all the sequences, the proposed entropy coding method outperforms FGS entropy coding at all bitplane levels, except for the first one. At bitplane levels higher than the first one, the performance of the proposed table goes up to 22% better than the standard FGS. Also, as mentioned previously in this section, the proposed method has high resilience to errors and less computational complexity. In conclusion, our proposed entropy coding scheme that is based on 4x4 Integer Transform, achieves 7% coding efficiency gain on average with increased error resiliency and less computational complexity over the FGS entropy coding method, in our H.264 based FGS system.  65  -6.4%  876300  735792  757194  882159  751718  1.6%  3.0%  0.7%  1.8%  964557  909247  958044  808554  867021  1001262  946014  987387  834000  3.0%  3.7%  3.9%  3.0%  3.0%  714023  791827  799157  658771  804711  809052  1016858  914871  792700  899462  11.7%  22.1%  22.1%  16.9%  10.8%  Bitplane 4  481014  -8.8%  734221  899774  840804  Bitplane 3  511615  537204  -5.7%  885134  1.0%  Bitplane 2  584541  460814  -7.7%  745780  Bitplane 1  487332  597791  737767  as as  644091  -9.5%  n o  527785  I  577955  o  18.6%  Percent Improvement  a.  o  O o o" 3 o-  871830  Bit Usage F G S x  709261  Bit Usage Proposed  M  1004211  Percent Improvement  2 ft"  974428 929701  Bit Usage F G S  o  5  576352  929701  re  Bit Usage Proposed  re < re  Percent Improvement JO  o ^?  o  Bit Usage F G S W  637521  o a  a'  Bit Usage Proposed  era  Percent Improvement  o  Bit Usage F G S  2 ft"  Bit Usage Proposed  o I  x  o  Tempete, 500 Kbps Tempete, 1.5 M B p s Bus, 500 Kbps Bus, 1.5 Mbp Mobile, 500 Kbps Mobile, 1.5 Mbps  P e r f o r m a n c e o f the P r o p o s e d R V L C T a b l e 1000000 900000 800000 700000 bo  600000 -  2  500000 -  E  400000 -  n  o Proposed RVLC Table a FGS VLC Table  3 Z  300000 200000 100000 0 -, 1  2  3  4  5  Bitplane  (a)  P e r f o r m a n c e o f the P r o p o s e d R V L C T a b l e  El Proposed RVLC Table a FGS VLC Table  1  2  3  4  5  Bitplane  (b)  Table 8 Coding Efficiency Test Results for the Proposed R V L C Table. Sequence is Mobile coded at (a) 500 Kbps (b) 1.5 Mbps  67  P e r f o r m a n c e of the P r o p o s e d R V L C T a b l e 1000000 900000 800000 700000 bo  600000  £  500000  E  400000  • P r o p o s e d R V L C Table B F G S V L C Table  3  300000 4-  Z  200000 # 100000 0  1  2  3  4  5  Bitplane  (a)  P e r f o r m a n c e o f the P r o p o s e d R V L C T a b l e 1200000  1000000  «!  800000  ho • P r o p o s e d R V L C Table a F G S V L C Table  (b)  Table 9 Coding Efficiency Test Results for the Proposed R V L C Table. Sequence is Bus coded at (a) 500 Kbps (b) 1.5 Mbps  68  P e r f o r m a n c e of the P r o p o s e d R V L C T a b l e  (a)  P e r f o r m a n c e o f the P r o p o s e d R V L C T a b l e  (b) Table 10 Coding Efficiency Test Results for the Proposed R V L C Table. Sequence is Tempete coded at (a) 500 Kbps (b) 1.5 Mbps  69  2.5 Conclusion In this chapter, we presented our proposed H.264 based FGS structure. Instead of simply extending the FGS to use H.264 at the base layer, the proposed structure modifies the FGS coding blocks to take full advantage of H.264's superior features present in the base layer. The proposed modifications can be grouped under two groups: 1. DCT is replaced by 4x4 Integer Transform at the enhancement layer 2. Entropy Coding structure is modified to use the Reversible Exp-Golomb coding technique We replaced the DCT at enhancement layer by 4x4 Integer Transform, introduced the novel Hierarchical CBP Coding structure, that significantly decreases the overhead caused by using 4x4 Integer Transform. The VLC tables at the enhancement layer are changed and the new tables are built with Reversible Exp-Golomb codewords using the symbol statistics resulting from 4x4 Integer Transform. By modifying the standard FGS structure, the complexity of the encoder-decoder pair is decreased and the error resilience of the overall system is increased.  Performance  evaluations have shown that our method also improves the coding efficiency of the system by 7% on average.  70  CHAPTER 3 3 Hybrid Structure using Stream-Switching and FGS for Scalable H.264 Video Transmission As mentioned before, the H.264 video coding standard lacks scalability, i.e., video adaptation to different bitrates. Instead, H.264 uses a different approach, called streamswitching, to cope with the fluctuations of the available bandwidth of the underlying network upon which media information is transmitted. In the stream-switching approach, the video is independently coded into several non-scalable bitstreams of different bitrates. The system dynamically switches between these different bitrate coded video versions depending on the bandwidth availability. The advantage of this method is its high coding efficiency, which results from the independent coding of non-scalable bitstreams. However, this method provides a coarse capability in adapting to changing bandwidth conditions due to the limited number of bitstreams. There are two main reasons for having a limited number of bitstreams. The first is due to fact that the encoder needs to encode the original bitstream at different rates. This increases the complexity of the system and therefore there is a trade off between the number of bitstreams that can be offered and the cost of the system. Second, since all the generated bitstreams need to be stored at a streaming server, storage requirements may be a limiting factor. For the stream switching approach, the H.264 video coding standard has specified special key-frames, called Synchronization-Predictive (SP) frames that allow efficient switching between video bitstreams [17].  71  In this chapter, we present a unique method of combining the FGS scalable video coding with the stream-switching techniques to maximize the video quality for the end user. Unlike other proposed scalable switching systems proposed, our proposed system is based on established standards. This means that streams created by this proposed method can still be processed by an existing H.264 decoder that supports switching of streams and does not have FGS capability. We also have developed a novel algorithm to select the rates of the base layer streams adaptively. The proposed algorithm involves encoding the video at different rates with different enhancement layer streams and R-D performance analysis of these streams. Results of this analysis are used to determine the optimal rates at which the base layers should be encoded and where the switching between streams takes place. In Section 3.1, we first give a brief overview of the stream-switching technique and the SP-frame concept used in H.264. Also, a brief comparison of stream-switching and scalable video coding is presented, along with the advantages and disadvantages of both methods. Section 3.2, presents our proposed hybrid approach and the novel adaptive rate selection algorithm. Section 3.3 presents the performance evaluation of the proposed approach and compares it with: 1. FGS enabled H.264 video compression system without switching capability as proposed in Chapter 2, and with 2.  H.264 video compression system with stream-switching capabilities only as proposed by the H.264 standard (i.e., without FGS support).  72  3.1 Stream-Switching and SP-Frames 3.1.1 Overview of Stream-Switching Stream-switching is a technique used in video communication systems to cope with bandwidth variations. In this technique, video is independently coded to several streams at different bitrates and quality levels. After encoding, the streaming server dynamically switches between the streams, according to the available bandwidth in order to accommodate the bandwidth variations. One important restriction of stream-switching is that the streaming server can not switch the stream at arbitrary frames, but only at key frames. The reason for this is the temporal predictive coding techniques of present video coding standards require that the frame being coded to depend on previous frames. Let's consider an example where there are two bitstreams generated independently at different quality levels. Let Pi,n+i,...}  and {...,  P2, -i, Pi.n, P2,n+h— } n  {..., Pi, -i, Pi,n, n  denote the sequence of the decoded frames from  two bitstreams, bitstream 1 and bitstream 2 respectively. Let's also assume all these frames are P-frames and that switching takes place at time instant n, i.e., the server sends {Pi,„-i,  P2, ,P2,n+i}n  In this case, the decoder can not decode P2, correctly, since the n  reference frame to encode the frame P2,„, which is P2, -i is not received. This mismatch n  leads to erroneous decoding which further propagates due to motion compensation. For this reason, in existing video coding standards, switching of bitstreams is only made possible at frames that do not use information prior to their location, i.e., I frames. However, placing I frames periodically in the bitstream reduces the coding efficiency, as these frames do not exploit any temporal redundancy. The H.264 standard introduces a new frame type, called SP-frame. SP-frames make use of motion compensated predictive  73  coding so to exploit the temporal redundancy in the sequence, in a similar manner to that of P-frames. However, identical SP-frames can be reconstructed even when different reference frames are used for their prediction [17]. In the next subsection, we describe the SP-frame concept introduced in the H.264 video coding standard. 3.1.2 Overview of the SP Frame Switching Concept used in H.264 The stream-switching operation is realized by placing keyframes that do not use information prior to their corresponding temporal locations. This approach, however, decreases the coding efficiency of the system, as these keyframes do not exploit the temporal redundancies of the video sequence. H.264 introduced a new frame type, called SP-frame, for this purpose. Similar to Pframes, SP-frames make use of motion compensated predictive coding to exploit temporal redundancy in the video sequence. However, unlike P-frames, SP-frames allow identical frames to be reconstructed even when they are predicted using different reference frames. This property of SP-frames allows them to be used instead of I-frames in stream-switching applications. In this section, the technical details of SP-frames are overviewed. It should be noted that, SP-frames can be used for other applications such as random access, error recovery and error resiliency, but only the stream-switching application is considered here. In order to explain how SP-frames are used during stream-switching, consider an example illustrated in Figure 29. Let's assume that a bitstream is encoded twice at two different bitrates. Their corresponding frames are denoted by {Pu, Pij, SPu, Pi,4, Pi,5} and {P2.1, P2.2.SP2.3,  P2.4. Pi,5}  for the first and second bitstreams, respectively (see Figure  74  29). In each bitstream, SP-frames are placed at the same temporal location that switching is desired to take place (in this case it is SPJJ and SP2j).  Bitstream 2  1,2  ' 1,1  SP 1,3  1,4  1,5  Bitstream 1 Figure 29 Switching between streams using SP-Frames  SP-frames placed within a bitstream are called primary SP-frames. For each primary SPframe, another SP-frame, called secondary SP-frame, which allows switching from that bitstream to another bitstream, is generated. The secondary SP-frames are used only during switching. At the streaming server, two bitstreams (bitstream 1 and bitstream 2)  75  and all the secondary SP-frames needed for switching are stored. At the time of switching, the streaming server sends the secondary SP-frame corresponding to the stream that the server switches to. For example, if we switch from Pi,2 to P2,4, then the secondary SP frame SP12,3 is used in between (see Figure 29). Similarly if we switch is from P2,2 to Pi,4 then a different SP frame SP21J will be used. Secondary SP-frames result in the same future frames as a primary SP-frame even though they use a different reference frame. At the time of switching, the decoder receives the secondary SP-frame (SP/2,3),  with its reconstruction identical to its respective primary frame (SP2j). The next  frame the decoder receives just after switching is P2,4 and it uses SP2J as reference. The decoding process continues normally without any error, as the reconstruction for frames SP2J and SPi2j is identical although they use different reference frames. 3.1.3 Comparison of Stream-Switching and Scalable Video Coding In this section we compare the two approaches i) Stream-Switching and ii) Scalable Video Coding. In particular, the SP-frame approach introduced in H.264 and the FGS approach are considered. The common objective of these two approaches is to cope with the bandwidth variations and offer optimum video quality to the end user. However, the way that FGS tries to achieve this objective is quite different from that of the stream-switching approach. The latter was proposed and described in Chapter 2. The FGS encoder generates one bitstream that contains the enhancement and the base layers. The base layer is encoded at a bitrate, R  e  baS  and the enhancement layer is encoded  using bitplane coding at a maximum bitrate, R - The bitstream is scaled at the FGS max  streaming server by truncating the enhancement layer portion of the video according to  76  the available bandwidth. The video quality at the end user is directly proportional to the amount of the enhancement layer information being sent. Main advantages of this method are its low complexity and its high flexibility. Low complexity is due to the fact that the same encoded bitstream is used for all the different bitrates, and thus encoding is performed only once. In addition, there is minimal overhead for the streaming server, as it should only do simple truncation to achieve scalability. FGS is highly flexible since the streaming server can truncate the enhancement layer to any desired bitrate, maximizing the bandwidth utilization using all the available bandwidth to send video information. As its name implies, FGS allows scalability in a fine-granular manner. Unlike other layered scalable technologies implemented in previous video coding standards, FGS video quality can be adjusted to any bitrate between  Rb  ase  and  R  m a x  .  The main disadvantage of FGS video coding is its low coding efficiency when compared to single layer coding. In particular, for bitrates that are considerably higher than the base layer bitrate  Rbase,  the penalty in coding efficiency becomes significant. This is because  low quality base layer frames are used as references for motion estimation, and as a result, the temporal redundancies in the enhancement layer are not fully exploited. The other approach, named stream-switching, simply encodes the same video with different quality levels and bitrates. The streaming server switches dynamically between the streams to accommodate the variations of the available bandwidth. For bandwidths that are considerably high, high-quality video is sent to the end user. If the available bandwidth drops, the server switches to the low quality version of the video. Based on the reasons discussed in the previous section, switching can take place only at key frames.  77  Thus, the system's response time to a bandwidth variation is low when compared to that of the scalable video coding approach. When compared to scalable video coding, stream-switching approach can not use a single bitstream, but rather two or more bitstreams with different bitrates. That's why streamswitching involves more computational complexity than scalable video coding, as the encoding process should be repeated two or more times, depending on the number of bitstreams used. One other disadvantage of stream-switching is the insufficient bandwidth utilization achieved by the system. It should be noted that, the bandwidth utilization of a stream-switching system increases with the number of bitstreams used, but the larger the number of streams the more impractical the system becomes. When compared with FGS, the stream-switching approach can not adapt to bandwidth variation in a fine-granular way, as it is not based on scalable coding of the video. Despite all the disadvantages, the stream-switching technique has very high coding efficiency (due to independent coding of non-scalable video), which makes it very attractive. The following table summarizes the advantages and disadvantages of both approaches, providing a quick overview.  78  Stream - Switching  Scalable Video Coding (FGS)  Coding  High, due to independent  Low at increased enhanced layer  Efficiency  coding of non-scalable  bitrates, due to low quality  bitstreams.  reference frames used in motion estimation.  Bandwidth  Low - due to limit in number  Utilization  of streams.  Response time to Low - Can only adapt to change in  bandwidth change at key  bandwidth  frames.  Scalability Step  Coarse capability in adapting  High, close to 100%. High - Depending on streaming server, it can adapt instantly. Fine Granular.  to changing bandwidth. Computational  Encoding should be  Encoding is performed once, for  Complexity  performed several times  base and enhancement layer.  depending on the number of bitstreams. Table 11 Comparison of Stream-Switching Approach with Scalable Video Coding Approach  In the next section, we propose a hybrid method that is a combination of the FGS method with Stream-Switching. The proposed hybrid method takes advantage of both methods to improve bandwidth utilization and video quality.  79  3.2 Combining Stream-Switching and FGS The main disadvantage of FGS is its low coding efficiency at high enhancement layer bitrates, which is due to the low quality reference frames used in motion estimation. The main disadvantage of stream-switching is its coarse capability in adapting to bandwidth changes and its low bandwidth utilization. We aim to eliminate those two disadvantages with our combined FGS - stream-switching architecture. Thus, the combined system is not only scalable and can adapt to bandwidth changes in a fine granular way, but it also has high coding efficiency. Figure 30 illustrates the architecture behind our proposed hybrid method for the case of two bitstreams. Our system encodes the video into two independent scalable bitstreams. Each bitstream is an H.264 based FGS stream consisting of a base layer and an enhancement layer. Based on the available bandwidth, the streaming server sends one of the base layers along with its corresponding enhancement layer portion. If there is a low bandwidth variation that can be accommodated by the enhancement layer, the streaming server continues to send the same base layer along with its enhancement layer. However, the streaming server switches between scalable bitstreams if high bandwidth variations occur.  80  1 I i  p 2  r  r -  i i  i  P P  2  1. X J u Z SP  p 2  i i  i i  2  P  2  2  2  High Bandwidth causes switching streams  Low Bandwidth Variations are accommodated by FGS enhancement layer  Variation between  I CD _l— w CO O  CO  I •1  p r  1  SP, P, P, P,  p r  1  Figure 30 Structure of the proposed hybrid system  Therefore, the streaming-server performs both switching and scaling operations. As can be observed from Figure 30, the low bandwidth variation is accommodated by FGS, but if the variation exceeds a certain threshold, it causes the system to switch streams. This structure increases the overall efficiency since: 1) bandwidth utilization is always at 100%, and 2) the picture quality increases with FGS enhancement layer portion operating at its higher efficiency regions. Assume that the network bandwidth changes dynamically in the range of [R  min  The base layer bitrates are  Rbasej  - R ^ ma  ].  and Rbasej for the base layers of low and high quality  81  scalable streams respectively, where R  m i n  < R  b a s e  < R  b  a  s  e  2  ^ R  m  m  If at a given time  -  instant, the available bandwidth Ravaiiabie is greater than the low quality base layer bitrate but lower than the high quality base layer bitrate, that isi?  min  <R  b  m  e  <  R  a v a i l a b l e  <  R  b  a  s  e  2  ,  then the streaming server sends the low quality video and truncates its corresponding scalable stream to utilize the rest of the available bitrate. So the amount of enhancement layer transmitted is R  e n h a n c e m e n t  ,  =R available  One of the challenges in the proposed system and in stream-switching systems in general, is how to choose the bitrates for the independent streams. In the next subsection, we present a novel adaptive rate selection method for stream-switching. For the sake of simplicity, only two independent not scalable bitstreams are considered.  Later, the  algorithm is generalized to include more than two bitstreams that also have their scalable enhancement layer information. 3.2.1 Adaptive Bitrate Selection for Stream-Switching The bitrates of the streams that are used for switching are an important parameter that affects the performance of the system in a dynamic environment. Assume the available bandwidth fluctuates in the range  [R  min  -  i?  max  ], and two streams, one having low quality  and the other with higher quality, will be used to cover this bandwidth range, with bitrates R and l  R  2  respectively. One condition for the bitrates is R  m i n  < R  ]  < R  2  <  R  m a x  .  Also, the bitrate of the lowest quality stream should not be higher than the minimum available bandwidth to be able to send a stream at any given available bandwidth (i.e., R  {  = R  m i n  ).  Thus, for two streams, only the bitrate of the higher quality stream is variable.  The bandwidth range can simply be divided into equal portions and can be half the  82  fluctuating bandwidth range R = 0.5(i? 2  + R ).  max  This technique can be used for n  MIN  streams with straightforward extension. However, this bitrate selection does not use any distortion measure and may not guarantee the best R-D performance. We developed an adaptive rate selection by analyzing the encoded video quality at different rates. In the case of two streams, we are seeking the rate for the higher quality stream, R which 2  minimizes the total distortion at the fluctuating bandwidth range given by: aval= 2  R  aval= ma\  R  R  R  YD. + YD , R. <R </? / . 1 / . 2 min I max *avarmin av r 2 7  7  i\  5  R  R  ^ ' '  R  a  where £>, and D are calculated distortions of the low and high quality decoded streams 2  respectively. The distortion measure D is the mean square error and is given by the following equation: 1 D  M  N  =^ T Z Z ( ( ^ ) - ^ » ) /  (3-2)  2  where / and K are the two pictures and M and N are their height and width respectively. 3.2.2 Generalized Adaptive Rate Selection Generalization of the above problem for n streams requires finding the rates (R ,R ...R„) L  2  where/?, =R , R < R ...< R _ < R and R AAN  T  2  H  T  N  N+I  =R  MAX  to minimize the  total distortion given by Equation (3.3): n  av r i+\  R  R  lA) a  K  (3.3)  aval= i  R  R  83  In order to accommodate the R-D characteristics of the FGS enhancement layer for adaptive rate selection, the problem is similar and entails finding the bitrates (R„R ...R ) 2  where R = R ,  n  {  min  R, < R ... < /?„_, < R and R 2  n  n+X  = R^, except that now  the total distortion measure is a modified version of Equation (3.3): n  Yi  avar i+\  R  R  YD - )  tf\7=«.' aval i  bp  j  (3-4)  where D\ - is the distortion of the i' base stream and bpJ is the number of bitplanes at p  }  h  the enhancement layer. The number of bitplanes sent is the maximum number that can be sent for the given base layer stream given an available bandwidth,  3.3  Experimental  R iava  Results  We compare the proposed hybrid FGS, Stream-Switching approach with the FGS and the Stream-Switching approaches separately. In order to evaluate the performance of the proposed algorithm, we consider a transmission channel where the available bandwidth changes dynamically. For this specific experiment, the available bandwidth is simulated to increase from 30 Kbps up to 250 Kbps and then decrease back to 30 Kbps in the course of 90 frames, which corresponds to 3 seconds at a rate of 30 frames/second. These test conditions are specified by MPEG Scalable Video Coding group at the recent Call for Proposals on Scalable Video Coding Technology [25], as one of the experiment test conditions. For our experiment, we first encode the Foreman sequence at different bitrates and we encode their corresponding enhancement layers with the proposed H.264 based FGS encoder presented in Chapter 2. Then, our adaptive rate selection algorithm is used to determine the best bitrate at which switching between bitstreams is performed.  84  Afterwards, depending on the available bandwidth at a given time instant, the network simulator chooses the base and enhancement layer streams to send to the decoder. Below, we evaluate the performance of the following three methods: 1. Proposed Combination of FGS with Stream-Switching approach using Adaptive Rate Selection 2. Scalable Video Coding using FGS 3. Stream-Switching using SP Frames The parameters for the base layer encoding are presented at Table 12. SP Picture Periodicity:  At every 15 frames (i.e. at every half a second for 30 fps. video)  Quantization Parameter for S/ frames: (QPSP)  Same as Quantization Parameter for P Frames (QP)  Quantization Parameter of Sn frames: QPSP-6 (QPSP ) 2  Frame Structure:  I P P P . . . P P P S P P P P ... Table 12 Encoding Parameters for H.264 Base Layers  We first compare the performance of the three approaches in a channel, where the available bandwidth changes over time. The available bandwidth first increases from 30 Kbps to 250 Kbps through frames 1 to 45 and starts to decrease back to 30 Kbps at frame 45. The base layer bitrate for FGS and the low quality bitrate for the proposed and stream-switching approaches are the same and equal to the lowest bandwidth that the channel can deliver (30 Kbps). For the stream-switching approach, the bitrate for the high quality bitstream is found to be 100 Kbps using Equation (3.3), which describes the  85  adaptive rate selection for non-scalable bitstreams. For our proposed approach, equation (3.4) is used to find the high quality base layer bitrate which was found to be same as that of the stream-switching (i.e., 100 Kbps). The performance of the three tested methods is shown in Figure 31. When compared with FGS, the enhancement layer performance of the proposed approach is relatively high due to switching at a key bit-rate. This results in a higher overall efficiency. The average PSNR gain is 2.9 dB, while for some frames the gain goes up to 3.5 dB. In addition, when our hybrid method is compared with the stream-switching approach, it is clearly seen that the video quality keeps increasing as the available bandwidth increases. This is because our method fully utilizes the available bandwidth. In this case, the average PSNR gain is 1.5 dB and can go up to 3 dB for some frames. The R-D performance of the proposed approach is further analyzed and compared with the Stream-Switching and FGS approaches in Figure 32 and Figure 33, respectively. It is clear from Figure 32, that the advantages of the proposed approach over Stream Switching are mainly the increased bandwidth utilization and granular adaptation of the system to the varying bandwidth. When compared with FGS, the proposed approach does not suffer from the low coding efficiency at high enhancement layer bitrates as it is seen from Figure 33. In our final experiment, we analyze the performance of the proposed adaptive rate selection algorithm. We use the proposed hybrid method and use two algorithms to find the bitrate of the high quality video (adaptive and non-adaptive rate selection algorithms). For the non-adaptive case, the bitrate of the high-quality base layer is found by dividing the bandwidth range into two (i.e., (30+250)/2 Kbps). Figure 34 illustrates  86  the results for this experiment. On average, the adaptive rate selection algorithm results in, 1 dB performance increase.  Figure 31 Performance of the Proposed Approach Compared with two other approaches i. Scalable Video Coding using F G S ii. Stream-Switching using SP Frames  87  R-D P e r f o r m a n c e f o r F o r e m a n Q C I F  ,*<*\t—y  y -V  V  V  Proposed Approach -X— Stream Switching Approach  )  X 0  X  c  X 100  50  ISO  200  250  303  Bitrate (kbps)  F i g u r e 32 R - D P e r f o r m a n c e C o m p a r i s o n o f the P r o p o s e d a n d S t r e a m - S w i t c h i n g A p p r o a c h  R-D P e r f o r m a n c e f o r F o r e m a n Q C I F  Propc sed Approach Approach  -X F G S -  11 0  1 50  I  I  I  100  150  200  Bitrate (kbps)  I 250  F i g u r e 33 R - D P e r f o r m a n c e C o m p a r i s o n o f the P r o p o s e d F G S A p p r o a c h  88  Performance of Adaptive Bitrate Selection  ^ — Adaptive Rate Selection - X Non-Adaptive Rate Selection  a.  -  Bitrate (kbps)  Figure 34 R - D Performance of the Adaptive Rate Selection Algorithm  3.4 Conclusion In this chapter, we introduce a novel hybrid approach that combines FGS scalability with stream-switching based on the H.264's SP frame concept. We also introduce a novel R-D optimized adaptive rate selection algorithm for choosing the rates of the base layer streams. Combining FGS with SP frames is made possible by using our H.264 based FGS technology, presented in Chapter 2. For high bandwidth variations, our proposed system switches from a low-quality stream to a higher-quality stream, whereas low bandwidth variations are accommodated by using only the corresponding FGS enhancement layer. This way, the FGS enhancement layer mostly operates in the high efficiency regions of its R-D curve. In a network environment where bandwidth changes dynamically, our proposed hybrid method outperforms FGS by 2.9 dB and the stream-switching approach by 1.5 dB on average.  89  CHAPTER 4  4 Conclusions and Future Work 4.1 Conclusions For networks used for video transmission environment, such as wireless networks and Internet, the available bandwidth for video transmission is not constant but varies over time. This variation in the available bandwidth possesses a problem for a video transmission system. Traditional video coding standards, whose objective is to optimize the quality of the video at a given bitrate, cannot cope with this bandwidth variation problem effectively. Scalable Video Coding techniques have been developed to more efficiently address this bandwidth variation problem. Scalable Video Coding (SVC) is a video coding framework that enables a system to adapt the quality of the video sequence to the underlying channel's available bandwidth. All popular video coding standards, such as MPEG-2 and MPEG-4 include some scalability tools. The latest video coding standard, H.264, provides superior compression efficiency over all previous standards, but it does not include tools for coding the video in a scalable fashion. In this work, we developed a scalable video coding scheme based on the most advanced video coding standard, H.264. Up to now, H.264 standard offered limited scalability, but there were no solution that achieves highly flexible fine-granular-scalability using the H.264 standard. In order to achieve scalability with H.264, Fine Granular Scalability (FGS) that is originally developed for MPEG-4 is adapted to H.264. We modified the techniques present in FGS and developed novel techniques, so that the proposed scalable  90  coding system has low computational complexity, high error resiliency and has high coding efficiency. At our proposed H.264 based FGS structure, the DCT is replaced by 4x4 Integer Transform at the enhancement layer. This brings an overhead for the Coded Block Pattern (CBP) coding at each macroblock header. The number of bits used to code the CBP is approximately quadrupled due to the change in macroblock structure. We proposed a Hierarchical CBP Coding scheme to decrease this overhead. On average the proposed scheme uses 70% less bits than FGS scheme for coding the CBP at the first and second bitplanes. For some cases the proposed scheme uses up to 75% less bits than the FGS scheme. We adapt the entropy coding method of H.264 standard to the FGS structure. The method is based on Reversible Exp-Golomb coding and it is proved to be highly error resilient with low computational complexity. By replacing the entropy coding method, we achieve 7% gain in coding efficiency. To overcome this problem, we also introduce a hybrid method that combines our proposed H.264 based FGS approach with the stream-switching approach employed in the H.264 standard. By combining different techniques, our proposed system offers a complete solution for all kinds of applications. The proposed system outperforms existing systems by offering optimum bandwidth utilization and improved video quality for the end user. We also introduce a novel R-D optimized adaptive rate selection algorithm for choosing the bitrates of the base layer streams. In a network environment where bandwidth changes dynamically, our proposed hybrid method outperforms FGS by 2.9 dB and the stream-switching approach by 1.5 dB on average. Combining FGS with SP  91  frames is made possible by using our H.264 based FGS technology, presented in Chapter 2.  4.2 Future Work The proposed H.264 based FGS is designed to introduce minimal computation complexity to the overall system. However, in the scope of this work, no formal testing and analysis was performed to analyze the exact amount of complexity gain achieved. Also, by introducing Reversible Exp-Golomb codewords to the FGS enhancement layer, the error resiliency of the system is increased. This feature should be further tested and analyzed for real-world application scenarios, such as 3G wireless environments or the Internet. H.264 video coding standard offers several methods for entropy coding. In this thesis, we used Reversible Exp-Golomb coding method due to its low computational complexity and high error resiliency. However, other entropy coding methods such as Context Adaptive Variable Length Coding (CAVLC), and Context Adaptive Binary Arithmetic Coding (CABAC) can also be incorporated in our system. The combined approach introduced in Chapter 3, gives very good results and it can be further optimized for certain bitrates. The developed approach is good for pre-recorded video, where the entire video stream is available at the time of streaming. This approach can be further developed for real-time streaming applications. The latest trends in scalable video coding are mostly based on wavelet coding tools and motion compensated temporal filtering (MCTF) based video codecs. Future work could include incorporation of these tools to existing video coding standards.  92  APPENDIX A. Exp-Golomb Codes for Entropy Coding The following table presents the Exp-Golomb codewords and their corresponding numbers used for entropy coding. For the rest of the tables in the appendixx, only the Exp-Golomb code number is indicated where a codeword is specified. Exp-Golomb Code Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  Codeword 1 010 011 00100 00101 00110 00111 0001000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 000010000 000010001 000010010 000010011 000010100 000010101 000010110 000010111 000011000 000011001 000011010 000011011 000011100 000011101 000011110 000011111 00000100000  Number of bits 1 3 3 5 5 5 5 7 7 7 7 7 7 7 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 11  93  P6  8uipo3 Xdojjua JOJ sapo3 quio|0£)-dx3 £\ ajq^x  II II II II 11 11 11 11 11 11 11 II 11 II 11 11 11 II II II 11 11 II 11 II II 11 11 11 11 II  IIIIII00000 0IIIII00000 IOI11IOOOOO 00111100000 IIOIIIOOOOO 01011IOOOOO 10011IOOOOO 00011IOOOOO 11IOIIOOOOO OIIOIIOOOOO IOIOIIOOOOO 00IOIIOOOOO 11001IOOOOO 0100IIOOOOO I000IIOOOOO 00001IOOOOO IIIIOIOOOOO OIIIOIOOOOO I01I0100000 OOIIOIOOOOO 11010100000 01010100000 10010100000 00010100000 11100100000  Z9  19 09 6S 8S LS  9S ss PS es zs  IS OS 617  817 LP 9P SP pp  £P IP \p  OP  6£ 8£  o n oo i ooooo  L£  I0100I00000 00I00Iooooo 110001ooooo 01000Iooooo I0000Iooooo  9£ se Pi ee zz  B. VLC Codes for CBP CODE B.1. CBP Codes for the First Bitplane Category 0  All the four luminance and two color components are present in the bitplane.  Y  Y 0  Y  4  Y 1 ,  Y  4  5  Group 0  Y  Y  8  Y 12  Y 2  1  Y 1  Y 13  Group 2  Y  6  7  10  Y 14  Y 11  Y 15  Group 3  U  16  17  18  u 19  u  Group 1  Y 9  3  U  Group 4  V 20  V 22  V 21  V 23  Group 5  CBP  (uv,yyyy)  Exp-Golomb Code Number  Number of Bits  11,1111  0  1  11,0000  1  3  11,0001  2  3  11,0010  3  5  11,0011  4  5  11,0100  5  5  11,0101  6  5  11,0110  7  7  11,0111  8  7  11,1000  9  7  11,1001  10  7  11,1010  11  7  11,1011  12  7  11,1100  13  7  11,1101  14  7  11,1110  15  9  00,0000  16  9  00,0001  17  9  00,0010  18  9  00,0011  19  9  95  96  II  617  OIU'IO  II  SP  IOII'IO  11  OOll'lO  II  9fr  IIOI'IO  n  Sfr  OIOI'IO  ii  PP  IOOI'IO  it  iP  OOOI'IO  n  ZP  IIIO'IO  n  IP  OIIO'IO  n  OP  IIOO'IO  n  6i  OIOO'IO  ii  8£  IIII'OO  n  Li  0 1 1 TOO  n  9i  IOI Too  II  S£  0 0 1 Too  n  Pi  noroo  n  ii  OIOl'OO  II  Zi  lOOl'OO  n  l£  OOOl'OO  6  0£  OIOO'OI  6  6Z  IOOO'OI  6  83  OOOO'OI  6  LZ  IOIO'IO  6  9Z  OOIO'IO  6  sz  1000'10  6  PZ  0000'10  6  iZ  mo'oo  6  zz  OIIO'OO  6  \z  lOIO'OO  10,0011  51  10,0100  52  10,0101  53  10,0110  54  10,0111  55  10,1000  56  10,1001  57  10,1010  58  10,1011  59  10,1100  60  10,1101  61  10,1110  62  10,1111  63  1 1  1 1  1 1  1 1  1 1  1 1  1 1  1 1  1 1  1 1  1 1  1 1  13  Table 14 C B P Codes for the First Bitplane, Category 0  97  C a t e g o r y 1-2  All the four luminance components and only one color component (U or V) are present in the bitplane  Y  CBP  Y 2  Y  <  s  Y  Group 0 Y  Y 8  Y 12  Y  Y 6  1  Y 13  Group 2  1  7  Group 1 Y  9  3  10  Y 14  Y 11  Y 15  Group 3  UA/ u/y  Exp-Golomb Code Number  Number  (u/v,yyyy)  1,1111  0  1  1,0111  1  3  1,0011  2  3  1,1011  3  5  1,1101  4  5  1,1110  5  5  1,0001  6  5  1,0010  7  7  1,0100  8  7  1,0101  9  7  1,0110  10  7  1,1001  11  7  1,1010  12  7  1,1100  13  7  0,0001  14  7  0,0000  15  9  0,0010  16  9  0,0011  17  9  0,0100  18  9  0,0101  19  9  0,0110  20  9  0,0111  21  9  0,1001  22  9  of Bits  7  16  UA/  9  Group 4  98  0,1010  23  9  0,1011  24  9  0,1100  25  9  0,1101  26  9  0,1110  27  9  0,1111  28  9  1,0000  29  9  1,1000  30  9  0,1000  31  11  Table 15 C B P Codes for the First Bitplane, Category 1-2  Category 3 All the four luminance components are present without any color component in the bitplane  Y Y  0  4 4  Y.  Y  ,  Y  1  Y  5  Group 0  Y Y  8  12  Y  9  Y  13  Group 2  1  1  2  6  Y  3  Y 1  7  Group 1  CBP (yyyy)  Exp-Golomb Code Number  Number of Bits  un  0  1  0111  1  3  0011  2  3  Y  Y  1011  3  5  Y  Y  1101  4  5  1110  5  5  0001  6  5  0010  7  7  0100  8  7  0101  9  7  0110  10  7  1001  11  7  10  14  11  15  Group 3  99  1010  12  7  1100  13  7  0000  14  7  1000  15  9  Table 16 C B P Codes for the First Bitplane, Category 3  Category 4  Only two color components are present in the bitplane without any luminance components.  U  18  U  ,  9  Group 0  V  V  V  V  20  22  C B P (uv)  Fixed Length Code  00 01 10 11  00 01 10 11  21 23  Group 1 Table 17 C B P Codes for the First Bitplane, Category 4  Category 5-6  Only one of the color components (either U or V) is present in the bitplane without any luminance components  U/V  u/y  7  16 18 u/y  C B P («/v)  Fixed Length Code  0 1  0 1  9  Group 0 Table 18 C B P Codes for the First Bitplane, Category 5-6  100  B.2. CBP Codes for the Second Bitplane Category 0  CBP_CODE is used at the second bitplane and second bitplane contains all the luminance and the chrominance components.  1  o  \  1  1  1  1  2  1  1  5  6  1  1 1 1  3 7  1  1  1  1  16  18  17 19  Group 0  Group 1  Group 4  1  1  1  1  1  1  8  1  12  1 1  9  10 .  1  13  14  Group 2  1 1  11 15  Group 3  20  22  Same table as Table 14  21 23  Group 5  First Bitplane  Y Y 1  2 6  Y Y 1  3 7  u  16  u 18  u  17  u 19  Group 0  Group 1  Group 4  Y  Y  V  V  V  V  Y  8 12  Y Y  9 13  Y  10 14  Y Y  11 15  Group 3  20  22  21 23  Group 5  S e c o n d Bitplane Table 19 C B P Codes for the Second Bitplane, Category 0  Category 1-2  C B P C O D E is used at the second bitplane and second bitplane contains all the four luminance components and only one color component (U or V).  101  1  o  \  1  1  1  1  1  5  1  1 2  3  1  1  1  8  9  1 1 12 13 Group 2  Y  o  Y  1  Y  4  Y  5  1 10  1  11 1  1  0,1111  0  1  1  1  0,0111  1  3  0,0011  2  3  0,1011  3  5  0,1101  4  5  0,1110  5  5  0,0001  6  5  0,0100  7  7  0,0101  8  7  0,0110  9  7  0,1001  10  7  0,1010  11  7  0,1100  12  7  1,0001  13  7  0,0010  14  7  1,0011  15  9  1,0100  16  9  1,0101  17  9  1,0110  18  9  1,0111  19  9  1,0010  20  9  1,1111  21  9  1,1001  22  9  1,1010  23  9  1,1011  24  9  1,1100  25  9  1,1101  26  9  1,1110  27  9  16  18  17 19  Group 4  Bitplane  Y ' 2  Y '3  Group 0  Y 6 '7 Group 1  Y  Y  Y  1  Y 8  9  Y Y 12 13 Group 2  Number of Bits  1  14 15 Group 3 First  Exp-Golomb Code Number  1  6 '7 Group 1 1  Group 0  CBP (u/v,yyyy)  10  Y 11  Y  Y 14 15 Group 3  Second  Bitplane  u/v u/y 16  7  Group 4  •  102  1,0000  28  9  0,0000  29  9  0,1000  30  9  1,1000  31  11  Table 20 C B P Codes for the Second Bitplane, Category 1-2  Category 3  C B P C O D E is used at the second bitplane and second bitplane contains all the four luminance components but no color component.  103  1  1i  o  1  ' 2  3  1  1 9  First  Bitplane  4  Y  5  Y  12  Y 2  9  Y  6  Group 2  Second  7  Group 1  Y  10  Y 13  3  Y  Y  8  Y  Y Y  4  15  Group 3  1  Group 0  1  14  Group 2  Y  11  1 13  0  1 10  1  12  7  1  Group 1  1 8  1  1  '6  Group 0  1  Same as Table 17  1  11  Y  14  15  Group 3  Bitplane Table 21 C B P Codes for the Second Bitplane, Category 3  C. VLC Codes for SUB CBP CODE Following table illustrates the S U B C B P C O D E VLC codes used in blockcbp procedure of CBP coding. ( BlockJ), Block_l, Block_2, Block_3)  SUB_CBP_CODE  1111  1  0000  00000  0001  00001  0010  00010  0011  00011  0100  00100  104  0101  00101  0110  00110  0111  00111  1000  01000  1001  01001  1010  01010  1011  01011  1100  01100  1101  01101  1110  OHIO Table 22 V L C Codes for S U B C B P C O D E  D. RUN - EOP Statistics D.1. BUS Sequence, Base Layer at 500 Kbps Statistics for the Second Bitplane  Statistics forthe First Bitplane 35000  30000  25000  ' „  .  • , „  «  1  80000  ..I  *  3  60000 <  '  *.  5,  '  "if  10000  1  " ,  •  liW.  •  1  *  15000  0  M ,  70000  20000  5000  -  50000  .  40000 1  I  m.l  '  1  0  30000 20000 10000 0 3  5  7  9  Statistics forthe Third Bitplane  11  13  15  17  19  21  23  25  27  29  31  Statistics for the Fourth Bitplane 140000  1  '•",r,, . , '  120000  100000  80000  60000  i-IKV'*  40000 20000  20000  1 ihafa-T-^i 1 3  5  7  13  15  17  19  21  23  25  it  1  .i .  >,'i  V '  0 27  29 31  3  5  7  9  11  13  15  |ir»H-wr" 17  19  21  23  25  27  29  Figure 35 ( R U N , E O P ) Statistics for the B U S Sequence at 500 Kbps  106  31  D.2. BUS Sequence, Base Layer at 1.5 Mbps Statistics for the First Bitplane  Statistics for the Second Bitplane  35000  90000 80000  30000  70000 25000  60000  20000  50000  15000  40000 30000  10000 H 20000 5000  rsi ' i*  10000 fet^ •,  0 J 1  3  5  7  9  11 13  15 17  19 21 23 25  f iW-^_  0 1  27 29 31  3  5  7  rrlilllEfefcL, 9  11 13  15 17  19 21  23 25 27 29 31  Statistics for the Fourth Bitplane  Statistics for the Third Bitplane 120000  100000  100000  80000  II"  60000  60000  40000  40000  i~  20000  ltb= 1  3  5  7  9  11 13 15 17 19 21 23 25 27 29 31  1  3  5  7  [few-. 9  11 13 15  17  19 21 23 25 27 29 31  107  D.3. MOBILE Sequence, Base Layer at 500Kbps Statistics for the Second Bitplane  Statistics for the First Bitplane 90000  35000  80000  30000  70000 25000  60000 50000  20000  40000  15000  30000 10000  20000 10000  5000  0  0 1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  1  31  3  5  7  9  120000  120000  100000  100000  80000  80000  60000 A  60000  " t  K  13  15  17  19 21  23  25  27  29  31  Statistics for the Fourth Bitplane  Statistics for the Third Bitplane  40000  11  i  ,  I  1  20000  40000  20000  1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  31  1  3  5  7  9  11  13  15  17  19  21  23  108  25  27  29  31  D.4. MOBILE Sequence, Base Layer at 1.5 Mbps Statistics for the First Bitplane  S t a t i s t i c s for the S e c o n d Bitplane  40000  100000  35000  90000 80000  30000  70000 ^ 25000  1  *.  t,"  20000  60000 50000 -  15000  40000  10000  30000 20000  5000  10000 0 1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  0  31  1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  Statistics for the Fourth Bitplane  Statistics for the Third Bitplane 90000  140000  80000  120000  70000 100000  60000 50000  80000  |  60000  40000 '  t .  1 '  "  '  -  l i  ,  '  r'  30000  40000  20000  20000  10000 :  0 1  3  5  7  9  11  13  15  feT-H^  17  19  21  23  " 25  0 27  29  31  1  3  5  7  9  11  13  15  17  19  21  23  25  27  109  29  31  31  D.5. TEMPETE Sequence, Base Layer at 500 Kbps Statistics forthe First Bitplane  Statistics for the Second Bitplane  35000  80000  30000 25000 20000 -| 15000 10000 5000 0 1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  31  1  3  5  7  Statistics for the Third Bitplane 120000  11  13  15  17  19  21  23  25  27  29  31  23  25  27  29  31  Statistics for the Fourth Bitplane 120000  , M<  1, <t,  9  100000 80000  80000  60000 40000  40000  20000  1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  31  1  3  5  7  9  11  13  15  17  19  21  110  D.6. TEMPETE Sequence, Base Layer at 1.5 Mbps Statistics for the Second Bitplane  Statistics forthe First Bitplane 35000  100000  T-  80000 25000  ,  ' • ' -iK-V  90000 30000 •  "I  1"  70000 60000  20000 •  V  50000 15000  40000  Mtlif;-  10000 • 5000  30000 20000 10000 0  0 1 3  5  7  11  13  15  17  19  21  23  •  25  27  29  U 3  31  JllfiligSSWh;^^ 5  7  9  11  13  15  17  19  21  23  25  27  29  31  Statistics for the Fourth Bitplane  Statistics for the Third Bitplane 100000  140000 •  90000 120000 • 80000 100000  ;1 •^Kr-M''  80000 •  '-''""fTia?., -  to  70000  w'/A ""VIS* if  •** * '-jfe  * »  60000 50000  60000  40000 30000  40000  20000  )ISI:<-l«>wMrV, ,  20000 •  1  3  5  7  9  11  10000  13  15  17  19  21  23  25  27  29  31  1 3  5  7  11  13  15  17  19  21  23  25  111  27  29  31  E. Proposed VLC Tables  E.1. (RUN,EOP) Symbols for the First Bitplane Index  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  (RUN,EOP)  (0,0) (0,1) (1,0) (1,1) (2,1) (2,0) (3,1) (3 ,0) (4,1) (5,1) (4,0) (6,1) (5,0) (7,1) (8,1) (6,0) (9,1) (7,0) (10,1) (8,0) (11,1) (9,0) (12,1) (14,1) (10,0) (13,1) (11,0) (15,1) (13,0) (12,0) (14,0)  Code  1 010 Oil 00100 00101 00110 00111 0001000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 000010000 000010001 000010010 000010011 000010100 000010101 000010110 000010111 000011000 000011001 000011010 000011011 000011100 000011101 000011110 000011111  112  E.3. (RUN, EOP) Symbols for the Second Bitplane Index  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  (RUN,EOP)  (0,0) (1,0) (2,0) (0,1) (3,0) (1,1) ALL-ZERO (2,1) (4,0) (3,1) (4,1) (5,1) (5,0) (6,1) (6,0) (7,1) (7,0) (8,1) (9,1) (8,0) (10,1) (11,1) (10,0) (9,0) (12,1) (11,0) (14,1) (13,1) (15,1) (12,0) (13,0) (14,0)  Code  1 010 Oil 00100 00101 00110 00111 0001000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 000010000 000010001 000010010 000010011 000010100 000010101 000010110 000010111 000011000 000011001 000011010 000011011 000011100 000011101 000011110 000011111 00000100000  113  E.4. (RUN,EOP) Symbols for the Other Bitplanes Index  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  (RUN,EOP)  (0,0) (1,0) (2,0) (0,1) (3,0) (1,1) (4,0) (2,1) (5,0) (3,1) (4,1) (6,0) (5,1) (7,0) (6,1) (8,0) (7,1) (8,1) ALL-ZERO (9,0) (9,1) (10,0) (10,1) (11,1) (11,0) (12,1) (12,0) (13,1) (14,1) (13,0) (14,0) (15,1)  Code  1 010 Oil 00100 00101 00110 00111 0001000 0001001 0001010 0001011 0001100 0001101 0001110 0001111 000010000 000010001 000010010 000010011 000010100 000010101 000010110 000010111 000011000 000011001 000011010 000011011 000011100 000011101 000011110 000011111 00000100000  114  Bibliography 1. H. Radha, M . van der Schaar, and Y. Chen, "The MPEG-4 Fine-Grained Scalable Video Coding Method for Multimedia Streaming Over IP," IEEE Transactions on Multimedia, vol. 3, no. 1, pp. 53- 68, Mar. 2001. 2. Information Technology: Generic Coding of Moving Video and Associated Audio Information, ISO/IEC CD 13818 MPEG 2 International Standard, pt. 1-3, 1992. 3. ISO/IEC JTC1, "Generic Coding of Audiovisual Obj ects - Part 2: Visual (MPEG4 Visual)", ISO/IEC 14496-2, Version 1: Jan. 1999, Version 2: Jan. 2000; Version 3:Jan. 2001. 4. ITU-T  Recommendation  H.263,  "Video  Coding  for  Low Bit-Rate  Communication", Version 1: Nov. 1995, Version 2: Jan. 1998, Version 3: Nov. 2000. 5. ISO/IEC 15444-1: Information technology—JPEG 2000 image coding systemPart 1: Core coding System, 2000. 6. ISO/IEC IS 10918-1 | ITU-T Recommendation T.81 - JPEG Image Coding Standard-Part 1 7. K. Sayood, Introduction to Data Compression. Morgan Kaufmann Publishers, Inc., 1996. 8. B. Girod, "Why B-pictures work: a theory of multi-hypothesis motioncompensated prediction," Proc. IEEE International Conference on Image Processing (ICIP), vol. II, pp. 213-217, Chicago, October 1998.  115  9. H.G. Mussman, P. Pirsch and H.J. Grallert "Advances in picture coding", Proc. IEEE, vol. 73, no.4, pp. 523-548, April 1985 10. K.R. Rao, J.J. Hwang, Techniques & Standards for Image & Audio Coding,  Upper Saddle River, NJ: Prentice-Hall, 1996. 11. Sullivan, G.J., Baker, R.L., "Rate-distortion optimization for tree-structured source coding with multi-way node decisions" Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., vol. 3, pp. 393 - 396, March 1992 12. Y. He, F. Wu, S. L i , Y . Zhong, and S. Yang, "H.26L-based Fine Granularity Scalable Video Coding," in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS), Scottsdale, Arizona, USA, vol. 4, pp. 548-551, May 2002 13. Malvar, H.S., Hallapuro, A., Karczewicz, M., Kerofsky, L., "Low-complexity transform and quantization in H.264/AVC", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp.598 - 603, July 2003. 14. R. Talluri, "Error-Resilient Video Coding in the MPEG-4 Standard," IEEE Commun. Mag., vol. 36, no. 6, pp. 112-119, June 1998. 15. Wiegand, T., Sullivan, G.J., Bjntegaard, G., Luthra, A., "Overview of the H.264/AVC video coding standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560- 576, July 2003. 16. List, P., Joch A., Lainema, J., Bjntegaard, G., Karczewicz, M . , "Adaptive deblocking filter", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 614- 619, July 2003.  116  17. Karczewicz, M., Kurceren, R., "The SP- and Sl-frames design for H.264/AVC", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 637- 644, July 2003. 18. Wiegand, T., Schwarz, H., Joch, A., Kossentini, F., Sullivan, G.J., "Rateconstrained coder control and comparison of video coding standards", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 688- 703, July 2003. 19. Xiaoyan Sun, Feng Wu, Shipeng L i , Wen Gao, Ya-Qin Zhang, "Seamless switching of scalable video bitstreams for efficient streaming", IEEE International Symposium on Circuits and Systems, vol. 3, pp. 385-388, May 2002 20. Yuwen He, Feng Wu, Shipeng Li, Yuzhuo Zhong, Shiqiang Yang, " H.26L-based fine granularity scalable video coding", IEEE International Symposium on Circuits  and  Systems,  vol.  4,  pp.  548-551,  May  2002  21. W. Li, F. Ling, and H. Sun, "Bitplane coding of DCT coefficients,", ISO/IEC JTC1/SC29/WG11, MPEG97/M2691, Oct. 22, 1997. 22. Ling, W. Li, and H. Sun, "Bitplane coding of DCT coefficients for image and video compression," in Proc. SPIE Visual Communications and Image Processing (VCIP), San Jose, CA, Jan. 25-27, 1999. 23. L. Kerofsky, M . Zhou, "Reduced Complexity V L C " in Joint Video Team of ISO/IEC JTC1/SC29/WG11 & ITU-T SG16/Q.6 Doc. JVTB029, Geneva, Switzerland, Feb. 2002. 24. "Draft ITU-T Recommendation H.264 and Draft ISO/IEC 14 496-10 AVC," in Joint Video Team of ISO/IEC JTC1/SC29/WG11 & ITU-T SG16/Q.6 Doc. JVTG050, T. Wieg, Ed., Pattaya, Thailand, Mar. 2003.  117  25. ISO/IEC JTC1/SC29/WG11, "Call for Proposals on Scalable Video Coding Technology", MPEG2003/N6193, Waikoloa, December 2003.  118  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 15 36
United States 4 0
Japan 3 0
Brazil 3 1
India 2 0
France 2 0
Germany 2 18
Canada 1 0
Bulgaria 1 0
City Views Downloads
Beijing 10 0
Unknown 9 19
Shenzhen 5 36
Ashburn 3 0
Tokyo 3 0
Alma 1 0
Mountain View 1 0
Chennai 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065599/manifest

Comment

Related Items