Joint MPEG-2 Coding for Multi-Program Broadcasting of Pre-Recorded Video by ' Irene Koo B. A. Sc. (Electrical Engineering), The University of British Columbia, 1995 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE D E G R E E O F MASTER OF APPLIED S C I E N C E in THE FACULTY OF G R A D U A T E STUDIES D E P A R T M E N T OF ELECTRICAL AND C O M P U T E R ENGINEERING We accept this thesis as conforming 4pS the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1998 © Irene Koo, 1998 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Electrical arid Computer Engineering The University of British Columbia 2356 Main Mall Vancouver, Canada V6T 1Z4 Date: Abstract The tradeoff between picture quality and bandwidth usage is one of the most prominent issues in the world of broadcasting. Since broadcasters are able to simultaneously transmit multiple streams in a channel, they face the challenge of guaranteeing a certain picture quality required by each of the video streams transmitted while maximizing the number of video streams carried in each channel. To address this problem, we developed an MPEG-2 based multi-program video coding system, suitable for digital T V broadcasting, video on demand, and high definition T V over broadcast satellite networks with limited bandwidth. The system can be easily implemented for commercial use in digital broadcasting applications. Compared to present broadcast systems and for the same level of guaranteed picture quality, our system greatly increases the number of video streams transmitted in each channel. As a result, a large number of transponders can be freed to carry real-time broadcasting. By switching from tape storage to video server technology, the need for numerous playback (VTR) systems at the headend is eliminated. In addition, the majority of the complete MPEG-2 encoders are replaced by much less complex MPEG-2 transcoders. The freeing of numerous transponders, the elimination of numerous playback systems, and the replacement of the complete MPEG-2 encoders with MPEG-2 transcoders provide a much more cost-effective solution for the broadcast stations. ii Table of Contents JOINT MPEG-2 CODING FOR MULTI-PROGRAM BROADCASTING OF PRE-RECORDED VIDEO i ABSTRACT .ii TABLE OF CONTENTS ...iii LIST OF FIGURES v LIST OF TABLES vii ACKNOWLEDGEMENTS viii CHAPTER 1 INTRODUCTION 1 CHAPTER 2 BACKGROUND 8 2.1 MPEG-2 O V E R V I E W 9 2.1.1 MPEG-2 Video Stream Syntax 9 2.1.2 MPEG-2 Video Coding 11 2.2 R E L A T E D W O R K S . 1 3 2.2.1 Video Coding of a Single Source 14 2.2.2 Video Coding of Multiple Sources 18 2.2.2.1 Statistical Multiplexing 18 2.2.2.2 Joint Bit-rate Control '. 2 0 2.3 S U M M A R Y : . . . . .23 CHAPTER 3 TWO-STAGE JOINT BIT-RATE CODING 25 3.1 A N OVERVIEW OF T H E T W O - S T A G E JOINT B I T - R A T E C O D I N G S Y S T E M . . . . . . . . 28 3.2 FIRST S T A G E E N C O D E R 30^ 3.2.1 Fixed Quantization 31 3.2.2 Video Server: Trade-off between Cost and Picture Quality 37 3.2.3 Complexity Files 4 6 3.3 JOINT B I T - R A T E C O N T R O L L E R 47 3.3.1 Admission Test 47 3.3.2 Processing Procedure for a Group of Pictures 49 3.3.3 Parameters for Optimizing Transcoders' Performance 59 3.3.3.1 First Parameter Set: Picture Complexities 59 3.3.3.2 Second Parameter Set: Target Picture Bit-Rates 60 3.4 TRANSCODERS 6 0 3.4.1 The Transcoding Processing : 6 2 3.4.1.1 Initialization 6 2 3.4..1.2 Transcoding Procedure for a Group of Pictures.... 6 2 3.4.2 Picture Bit-Rate Distribution 65 iii 3.4.2.1 Transcoding with Given Picture Complexities '. 65 3.4.2.2 Transcoding with Given Picture Bits Distribution 66 3.5 S U M M A R Y : 73 CHAPTER 4 SIMULATION RESULTS AND DISCUSSIONS..... 74 4.1 SETUP 75 4.2 S IMULATION RESULTS 75 4.3 T I M I N G A N A L Y S I S 88 4.4 S U M M A R Y 9 0 CHAPTER 5 SUMMARY AND FUTURE WORK 91 5.1 S U M M A R Y .91 5.2 F U T U R E W O R K 93 CHAPTER 6 BIBLIOGRAPHY 94 iv List of Figures FIGURE 1.1: T O D A Y ' S BROADCAST S Y S T E M : 4 FIGURE 1.2: M U L T I - C H A N N E L COMPRESSION S Y S T E M 5 FIGURE 2.1: RELATIONSHIP B E T W E E N I-, P-, A N D B - PICTURES IN A G R O U P OF PICTURES. 10 FIGURE 2.2: B L O C K D I A G R A M OF M P E G VIDEO ENCODING 12 FIGURE 3.1: B L O C K D I A G R A M OF T H E TWO-STAGE JOINT BIT-RATE CODING S Y S T E M 28 FIGURE 3.2: B L O C K D I A G R A M OF T H E FIRST-STAGE VIDEO ENCODER 31 FIGURE 3.3: D E F A U L T QUANTIZER MATRICES DEFINED IN THE M P E G - 2 V I D E O SPECIFICATION .31 FIGURE 3.4: PICTURE QUALITY (PSNR) OF TEST SEQUENCES ENCODED USING 3 FIXED QUANTIZER SCALES A N D T H E DEFAULT QUANTIZER MATRICES 33 FIGURE 3.5: T H E MODIFIED INTRA QUANTIZER M A T R I X 34 FIGURE 3.6: PICTURE QUALITY (PSNR) COMPARISONS OF TEST SEQUENCES ENCODED WITH THE D E F A U L T A N D THE MODIFIED INTRA QUANTIZER M A T R I X 36 FIGURE 3.7: PSNR-Q A N D R-Q C U R V E S OF MICROSOFT ROBOT 3 9 FIGURE 3.8: PSNR-Q A N D R-Q C U R V E S OF G A R D E N . . . 4 0 FIGURE 3.9: PICTURE QUALITY COMPARISONS A M O N G MICROSOFT ROBOT ENCODED A T FIXED QUANTIZER SCALES 7, 8, A N D 10 . . .43 FIGURE 3.10: PICTURE QUALITY COMPARISONS A M O N G G A R D E N ENCODED AT FIXED QUANTIZER SCALES 7, 8, A N D 10 45 FIGURE 3.11: D A T A F L O W D I A G R A M OF THE JOINT BIT-RATE CONTROL PROCESS 48 FIGURE 3.12: F L O W D I A G R A M OF G O P TARGET ADJUSTMENT A L G O R I T H M . 5 2 FIGURE 3.13: G O P BIT A L L O C A T I O N PERFORMED B Y THE JOINT BIT-RATE CONTROLLER FOR MICROSOFT ROBOT, B A L L E T , BUS, A N D G A R D E N 57 FIGURE 3.14: G O P BIT A L L O C A T I O N PERFORMED B Y THE JOINT BIT-RATE CONTROLLER FOR SEGMENT 1, SEGMENT 2, SEGMENT 3, A N D SEGMENT 4 OF T H E M A N IN THE IRON M A S K TRAILER. . . . . 59 FIGURE 3.15: B L O C K D I A G R A M OF THE TRANSCODING PROCESS 61 FIGURE 3.16: D A T A F L O W D I A G R A M OF THE TRANSCODING PROCESS 63 FIGURE 3.17: T H E PICTURE QUALITY OF T H E M A N IN THE IRON M A S K TRAILER SEGMENTS TRANSCODED WITH G O P TARGETS O N L Y A N D THE PICTURE Q U A L I T Y OF T H E S A M E SEGMENTS TRANSCODED WITH G O P TARGETS A N D PICTURE COMPLEXITIES 7 0 FIGURE 3.18: T H E PICTURE QUALITY OF M A N IN THE IRON M A S K TRAILER SEGMENTS . TRANSCODED WITH G O P TARGETS O N L Y A N D THE PICTURE Q U A L I T Y OF T H E S A M E SEGMENTS TRANSCODED WITH G O P A N D PICTURE TARGETS 7 2 V FIGURE 4 .1 : G O P COMPLEXITIES A N D BIT-RATES OF THE M A N IN T H E IRON M A S K VIDEO SEQUENCES 81 FIGURE 4.2: G O P COMPLEXITIES A N D BIT-RATES OF THE SECOND SET OF VIDEO SEQUENCES, 84 vi List of Tables T A B L E 3.1: P S N R V A L U E S OF TEST SEQUENCES ENCODED WITH T H E D E F A U L T A N D PROPOSED INTRA QUANTIZER MATRICES , 35 T A B L E 3.2: C O M P A R I S O N OF P S N R V A L U E S FOR VIDEO STREAMS ENCODED WITH JOINT BIT-RATE CODING USING G O P TARGETS O N L Y , WITH JOINT BIT-RATE CODING USING G O P TARGETS A N D PICTURE COMPLEXITIES, ' A N D WITH JOINT BIT-RATE CODING USING G O P A N D PICTURE TARGETS 68 T A B L E 3.3: BIT-RATES OF THE FOUR M A N IN THE IRON M A S K VIDEO SEGMENTS ASSIGNED B Y T H E JOINT BIT-RATE CONTROLLER 68 T A B L E 4.1 (A): BIT-RATES OF THE M A N IN T H E IRON M A S K VIDEO SEQUENCES ENCODED USING T H E JOINT BIT-RATE CODING S Y S T E M 77 T A B L E 4. 1(B): A V E R A G E G O P COMPLEXITY OF T H E M A N IN T H E IRON M A S K VIDEO SEQUENCES : 77 T A B L E 4 .2(A): BIT-RATES OF THE VIDEO SEQUENCES F R O M T H E SECOND TEST SET ENCODED USING T H E JOINT BIT-RATE CODING S Y S T E M 78 T A B L E 4.2(B): A V E R A G E G O P COMPLEXITY OF THE VIDEO SEQUENCES F R O M THE SECOND TEST SET 78 T A B L E 4.3: P S N R STANDARD DEVIATIONS FOR M A N IN T H E IRON M A S K VIDEO SEQUENCES ENCODED USING OUR TWO-STAGE JOINT BIT-RATE CODESTG S Y S T E M A N D ENCODED INDEPENDENTLY USING THE T M 5 METHOD 86 T A B L E 4.4: P S N R STANDARD DEVIATIONS FOR THE SECOND SET OF VIDEO SEQUENCES ENCODED USING OUR TWO-STAGE JOINT BIT-RATE CODING S Y S T E M A N D ENCODED INDEPENDENTLY USING THE T M 5 METHOD 87 T A B L E 4.5: C P U TIME USED IN TRANSCODING A N D ENCODING THE M A N IN THE IRON M A S K VIDEO SEQUENCES 89 T A B L E 4.6: C P U TIME USED IN TRANSCODING A N D ENCODING THE SECOND SET OF VIDEO SEQUENCES 89 vii Acknowledgements I owe my greatest debt to my supervisors, Professor Rabab Ward and Professor Panos Nasiopoulos, for guiding me from the initial development of the project to the writing of this thesis and for giving me invaluable advice and feedback. Without their support, this thesis could not have been written. I would like to thank my parents for their patience and encouragement and for their understanding of my not being at home all the time, Many thanks to my sister, Jenny, for taking care of mom and dad when I spent much of my time at school and for introducing me to a wonderful group of friends. I wish to express my greatest gratitude to Amy, Grace, Precy, and Rebecca. They have given me friendships since the days of my undergraduate studies. I cherish every moment we spent together talking, laughing, as well as comforting each other. Hopefully, we will have many more gatherings like we used to have. I would also like to take this opportunity.to thank a group of friends who made the past year an enjoyable experience. Ice-skating, birthday celebrating, playing board games, seeing movies, barbecuing, having fun at Playland, watching Symphony of Fire,, and of course, golfing (How can we ever forget that!) have created lasting memories. I also want to express special thanks to Miranda and all the friends in the Image Processing Lab as well as those in the Communications Lab. They have been very helpful when I presented various ideas and problems to them. And to all other people who cares about my well being, "Thank you!" viii Chapter 1 Introduction The introduction of the MPEG-1 standard in 1991 and MPEG-2 in 1993 has led to an increasing number of market opportunities for new communication services that provide their customers with more flexibility, capability, and convenience than the existing systems. One of the newer applications resulting from the standardization of M P E G is Digital Satellite Broadcast (DSB). Analog satellite broadcasting has been around since 1962, the year when the first satellite T V transmission via Telstar I, an eight-minute experimental broadcast from France to the U.S., occurred. In 1976, Taylor Howard of San Andreas, California, became the first individual to receive C-band satellite T V signals on a homebuilt system [24]. However due to the high cost of hardware, it was not till 1980 that the first home satellite system priced below US $10,000 became available. By 1989, the price of a home satellite system dropped to around US$2000.00, and the industry was able to ship only 383,000 units. However, after the demonstration of digital video compression by General Instrument and SkyPix in 1991 and the introduction of new U.S. legislation guaranteeing access to satellite-delivered cable programming services by alternative multi-channel video providers such as Digital Broadcast Satellite (DBS) operators, the number of shipped home satellite systems sky-rocketed to 4 million units by 1993 [24]. Currently, at least three major Direct-to-Home Satellite Broadcasting companies are providing DSB services in the United States and together offering more than 540 channels of programming [25]. Today, there are over 7.3 million DSB subscribers in the United States [28]. 1 Chapter 1 Introduction Another application that makes exclusive use of the MPEG-2 standard is Digital Television Broadcast (DTV). Discussions about providing digital television broadcasting services in the U.S. has begun in 1987 when 58 broadcasters requested the Federal Communications Committee (FCC) to establish a broadcast standard for high-resolution T V services [26]. However, due to technical implications and political issues involved, it was not till April 3, 1997, that the FCC was finally able to give its orders for the launch of digital TV. "The FCC requires the affiliates of the top four networks in the top ten markets to be 6n-the-air with a digital signal by May 1, 1999. Affiliates of the top four networks in markets 11-30 must be on-the-air by November 1, 1999" [29]. When applying a modulation scheme such as Vestigial Side Band (VSB) or Quadrature Amplitude Modulation (QAM) on the 6 MHz analog channel used today, the channel can sustain 19.3 Mbps of data. This bandwidth can accommodate a High Definition T V (HDTV) video program or five Standard Definition T V (SDTV) programs. A H D T V signal has about four times the resolution of today's NTSC signal while a STDV signal has a resolution similar to today's NTSC signal. Since both DSB and D T V have the ability to support multiple programs in a single channel, broadcasters are given the flexibility as well as the challenges in coding and transmitting multiple sources down a fixed bandwidth channel. A typical digital satellite broadcast system consists of three parts: uplink earth stations (headend), distribution satellites, and downlink earth stations. Each transponder in a satellite has the bandwidth to support multiple transmission channels, and each transmission channel can carry multiple video streams. The uplink earth station is where 2 Chapter 1 Introduction the encoding and multiplexing of video streams take place. The video coding system located in an uplink earth station comprises a set of video recording and playback systems, a video cassette archive, a cassette library management system, a scheduling system, and a set of multi-channel compression systems (see Figure 1.1). A multi-channel compression system usually consists of a set of encoders, a channel controller, a multiplexer, and a supervisory system (see Figure 1.2). Based on the program schedules generated by the scheduling system, the video programs to be broadcast soon are retrieved from the archive and placed in the set of playback systems. Real-time video programs are also played back immediately. The signal out of each playback system is fed into a compression system. The necessary service parameters of a video program such as bit-rate are set according to the contracted quality of service (QoS), the video program itself, and the coexisting video programs in the channel. The assigned bit-rate does vary when the coding environment (e.g., the number of video programs coexisted in the channel), the characteristics of the, video program or the characteristics of other coexist video programs change. However, the assigned bit-rate usually stays constant for a long period of time before it is changed. After parameter specifications, the video programs are then encoded. The channel controller controls the multiplexing of the video streams. s Several drawbacks have been recognized in the current broadcast systems: 1. Inefficient use of bandwidth resulting from constant bit-rate (CBR) coding. Much work has already proven that CBR coding is inherently inefficient in terms of bandwidth usage. 3 Chapter 1 Introduction Scheduling System Video Archive Library Management System Figure 1.1: Today's broadcast system. Video Playback Systems 3 ^ I i C ( c c Multichannel Compression Systems t [ = — i ] [ i • i ] [ L 1 ] 2. Large picture quality fluctuation resulting from CBR coding. In addition to poor bandwidth usage, CBR coding also generates large fluctuations in picture quality. 3. Problems with the use of videocassette tapes for archiving. One video playback system is needed for each pre-recorded video program to be transmitted. Although a videocassette tape is a storage medium with huge storage capacity, it is also a medium with no random access supports. Video tapes are subject to physical deterioration when they are used frequently. The sharing of videotape cannot be easily done when several people try to gain access to the same tape simultaneously. 4 Chapter 1 Introduction Multiple copies of the tape are usually needed to accommodate the demands. Most importantly, videocassette tapes require large physical storage space. 4. The need for numerous encoders and the high cost of the broadcast systems. Since video programs are encoded immediately before transmission, one encoder is needed for each of the video programs to be transmitted. The high cost of the encoders increases the cost of a broadcast system substantially. Encoders • • ffl D Multiplexer Supervisory System Figure 1.2: Multi-channel compression system. The goal of this thesis is to develop a simple yet effective multi-channel video coding system. The system should address the problems encountered by today's broadcast systems. More specifically, the coding system 1. should allocate bandwidth to each video program efficiently, 5 Chapter 1 Introduction 2. should offer a consistent picture quality, 3. should have a lower cost than the cost of today's systems, and 4. should not use cassette tape for archiving. In order to meet the above requirements, we propose for our system 1. the use of a bit allocation scheme that assigns to each video stream only the necessary number of bits and minimizes the fluctuations in picture quality, 2. the use of a set of transcoders, which are structurally less complex than an encoder, for encoding the video programs immediately before transmission, and 3. the use of a video server for storing video programs. We develop a two-stage joint bit-rate coding system that has all the characteristics just described. The system is product-oriented and can be easily implemented for commercial use. The system has three components: a video encoder and a video server, a joint bit-rate controller, and a set of transcoders. The system performs multi-channel video encoding in two stages. During the first stage, the system performs high bit-rate -high quality video encoding on the video sources. During the second stage, it converts the high bit-rate video streams to lower bit-rate video streams. When the lower bit-rate video streams are multiplexed, the resulting stream satisfies the network bandwidth requirement. Traditionally, the complexity of a video stream is determined on a picture basis, and the complexity of a picture is estimated from the statistics of some other encoded picture. Such estimation is clearly not accurate in most cases. For example, the complexities of the pictures immediately before and immediately after a scene change are 6 Chapter 1 Introduction obviously uncorrected. To overcome the picture complexity estimation problem, our system measures and records the picture complexities from actual data while it is performing the encoding during the first stage. To simplify the bandwidth allocation decisions in the second stage, we also define a group of pictures (GOP) complexity measure that indicates the complexity of a GOP. By utilizing the recorded GOP complexities and picture complexities, our joint bit-rate control method distributes the necessary bandwidth to each video source. Finally, the set of transcoders converts the encoded high bit-rate video streams to ones that have their bit-rates determined using the joint bit-rate control method. This system guarantees the efficient use of the available bandwidth. For the same level of contracted QoS, it significantly increases the number of transmitted video streams per channel. Compared to existing applications, this implementation offers a very cost-effective solution by greatly reducing the number of playback systems as well as "complete" MPEG-2 encoders needed at the broadcasting headend and by "freeing" transponders for real-time broadcasting. The organization of the thesis is as follows: in Section 2 we give an overview of the. M P E G video encoding process and discuss some of the related works. In Section 3, we describe our two-stage joint bit-rate coding solution.. In Section 4, we present the simulation results of our system. Finally, in Section 5 we provide a summary and possible future work. 7 Chapter 2 Background M P E G is an audio-visual communication standard that is found in many applications. The first version of M P E G , MPEG-1 [23], was designed for digital medium storage of audio-visual signals at 1.5 Mbps. The video coding of MPEG-1 was targeted for sources with SIF resolution (352x240 at 30 non-interlaced frames/s or 352x288 at 25 non-interlaced frames/s) at bit-rates of about 1.2 Mbps while the audio coding was targeted at bit-rates around 250 kbps [3]. MPEG-2 is an extension to MPEG-1. It is more generic and has more features and capabilities than MPEG-1. MPEG-2 is designed for both digital medium storage and transmission and intended for interlaced CCIR601 source at about 4 Mbps. However, MPEG-2 supports bit-rates at high as 429 Gbps [3]. MPEG-2 is found in numerous applications. Digital Versatile Disc (DVD), Digital Video Broadcasting (DVB) and Digital T V (DTV) are three of the more prominent examples. The focus of this chapter is to give the reader an overview of MPEG-2 video coding and to describe some of the works that have been done on this subject. Chapter 2 is organized as follows: in Section 2.1, we outline the structure of an MPEG-2 coded bit stream and briefly review the fundamental concepts underlying the MPEG-2 video coding standard. In Section 2.2, we present some of the previous works done in the MPEG-2 video coding of a single source as well as the simultaneous coding of multiple sources. 8 Chapter 2 Background 2.1 MPEG -2 Overview The MPEG-2 audio-visual coding standard currently consists of 9 parts, covering different aspects such as systems, video, audio, compliance testing, and real-time interface for system decoders. Part two of the standard, which we refer to as the. MPEG-2 video specification [2], specifies the syntax and the decoding semantic for an MPEG-2 compliant video stream. The specification does not specify the encoding process for MPEG-2 video streams. It is up to the system designers to design systems that produce MPEG-2 compliant video streams. 2.1.1 MPEG -2 Video Stream Syntax The MPEG-2 video specification puts the information of a video stream into a hierarchical structure that consists of six layers: sequence layer, group of pictures (GOP) layer, picture layer, slice layer, macroblock layer, and block layer. A l l but the block layer have their own headers, which store information pertaining to their respective layers. For example, the sequence header includes information such as bit-rate and optional quantizer matrices, which are relevant to the entire video stream. The GOP header contains information such as a time code for supporting random access, fast search, and editing [3]. The picture header comprises information such as picture coding type, picture structure, and scan format, which are specific to that particular picture. Three picture coding types are defined in M P E G to exploit spatial redundancy and temporal redundancy that are inherent in any video sequences. The three picture coding types are Intra-coded pictures (I- pictures), Predictive pictures (P- pictures), and Bi-directionally predictive pictures (B- pictures).. I- pictures are coded independently. P- pictures are coded with reference to the most recently coded I- or P- picture. B- pictures are coded 9 Chapter 2 Background with reference to two pictures. One is the most recent I- or P- picture. The other is the first I- or P- picture after the B- picture. These different picture coding types usually appear in a repeating pattern such as IBBPBBPBB in the video sequence. Such a repeating sequence of images makes up a GOP. Figure 2.1 gives an example of the GOP structure and points out the pictures from which the P- pictures and B- pictures are predicted. A slice is a series of consecutive macroblocks that are located on the same row of a picture. It is defined to aid resynchronization in case of transmission errors. The slice layer contains information such as quantizer scale applicable to all the macroblocks in the slice. A macroblock is the smallest codable unit. It is made up of four 8x8 luminance blocks and at least one 8x8 chrominance block. In the macroblock layer, information such as the macroblock type, the quantizer scale, the motion type, the motion vectors, and the macroblock pattern of a macroblock is found. A block consists of 8x8 pixel values of an image. The pixel values in a block are quantized and discrete cosine transformed (DCT). I- picture is the reference P- and P- pictures are the . for P- picture reference for B- picture Figure 2.1: Relationship between I-, P-, and B- pictures in a Group of Pictures. 10 Chapter 2 Background 2.1.2 MPEG-2 Video Coding The purpose of video coding is to compress a video sequence such that the resulting video stream has the desired bit-rate. Both MPEG-1 and MPEG-2 use a block-based compression technique that involves both motion compensation prediction and discrete cosine transformations. Figure 2.2 shows the block diagram of a typical M P E G video encoding process. First, the sequence layer and GOP layer information are determined from a user-supplied parameter list and encoded into the output video stream. Also based on the user-supplied parameter list, the picture coding type of each picture is decided upon and the picture layer information is gathered. Each image is then divided into 16x16 pixel macroblocks. Slice layer information and macroblock layer information are determined and encoded. Motion compensation predictions are preformed on the macroblocks if the picture is a P- or B- picture. Each macroblock is then further divided into 8x8 blocks. DCT is applied to each block, and the resulting DCT coefficients are subsequently quantized using both a quantizer scale and an 8x8 quantizer matrix. Finally, the quantized DCT coefficients are variable length encoded, and the resulting video stream is outputted. The amount of compression resulting from coding depends on the bit-rate desired. In M P E G video coding, compression is achieved by.quantizing the blocks of DCT coefficients. Except for the DC term of an intra-coded block, the quantization step-size used in quantizing a DCT coefficient is determined by both the quantizer scale and the corresponding weighting factor in the quantizer matrix. The quantization step-size of the DC term of an intra-coded block depends on the coding parameter intrajdcprecision,. which is specified by the user. Since the weighting factors of a quantizer matrix are 11 Chapter 2 Background seldom changed during the coding process, bit-rate control is accomplished through changing the quantizer scale value. Two levels of bit-rate control, global and local, are performed in M P E G video coding. Global bit-rate control assigns a target number of bits to each picture within a GOP. Based on the target number of bits for the picture and other information, a quantizer scale is determined. In local bit-rate control, this quantizer scale is refined for each macroblock in the picture so that the resulting number of bits used in coding the picture closely matches the target. Video Parameter Source List Sequence Layer Information Gathering GOP Layer Information Gathering Picture Layer Information Gathering Division of the Picture into Macroblocks and. Slice Layer Information Gathering Inverse Discrete Cosine Transform Motion Compensation Prediction Frame 1 Store 4 — Memory Discrete Cosine Transform Quantization Zigzag or Alternate Scan Variable Length Coding Coded Video Figure 2.2: Block diagram of M P E G video encoding. Two classes of video streams, CBR and VBR, can be generated by M P E G video coding. For a CBR video stream, the bit-rate is regulated by assigning the same number of bits to each GOP, regardless of the GOP's complexity measure, i.e., activity level. If the video stream is given a high bit-rate, some of the assigned bits are wasted when the pictures in the GOP are relatively less complex. On the other hand, if the video stream is 12 Chapter 2 Background given a low bit-rate, the pictures that are more complex suffer from poor picture quality because they have insufficient bits. V B R coding solves the inefficiency in bandwidth usage as well as the inconsistency in picture quality by assigning to individual picture or individual GOP the number of bits it requires to achieve acceptable quality. That is, high-complexity pictures are usually allocated more bits than low-complexity pictures. For the same level of QoS, the resulting V B R video stream has lower average bit-rate requirement and more consistent picture quality than its CBR counterpart. With a higher channel bandwidth, several video streams can be coded and multiplexed together for transmission. Each video stream uses up only a portion of the available channel bandwidth. In this situation, there usually exists an external mechanism that regulates the bit assignment for each video stream. The bit allocation is usually based on the characteristics of the video sources as well as the network conditions. 2,2 Related Works Bit allocation is the strategy used by the existing MPEG-2 encoding algorithms for providing consistent picture quality. It is the process of determining a desired number of bits for a GOP and/or for a picture within a GOP. The desired number of bits for a GOP and for a picture is called a GOP target and a Picture target respectively [21]. The aim of bit allocation is to achieve picture quality consistency across all the pictures in a GOP; Such consistency can be obtained if each GOP target and each picture target reflect the activity and complexity level of the .corresponding GOP and picture. In CBR coding, the number of bits per GOP is fixed. The existing algorithms try to distribute this fixed 13 Chapter 2 Background number of bits to the pictures within the GOP in such a way that each picture target reflects the activity level of the picture and the resulting picture quality is consistent. Unfortunately, an optimum distribution cannot be easily found since the bit-rate of a CBR video stream is tightly controlled. On the other hand, the number of bits required per GOP is not fixed in variable bit-rate coding. The existing algorithms can easily determine an appropriate picture target for each picture with the absence of a bit-rate constraint. Two steps are taken in the bit distribution / bit allocation process. The first step is the determination of the complexity level of a picture. The other is the mapping of this complexity level to a picture target. 2.2.1 Video Coding of a Single Source The Test Model 5 [4] originated from the M P E G committee defines picture complexity as the product of the average picture quantization factor and the number of bits used to encode the picture. Since bit allocation takes place before the actual encoding of the picture, both the average picture quantization factor and the number of bits used to encode the picture are estimated from the most recently encoded picture of the same picture coding type. The model also defines picture target to be proportional to both the number of bits available for a GOP and the ratio of the picture complexity to a weighted sum of all picture complexities of the GOP. In [5], Viscito and Gonzales define a coding difficulty factor for determining picture targets. The coding difficulty factor of a picture is the sum of the "mean absolute differences from D C " of the intra-coded macroblocks in the picture and the mean 14 Chapter 2 Background absolute prediction differences of the inter-coded macroblocks in the picture. The mean absolute differences from DC for an intra-coded macroblock is given by A(r,c) = - ^ A t ( r , c ) (2.1) *k(r,c) = -k%^\yk(i,j)-dck\ where, r and c are the horizontal and vertical offset counting from the top left corner of a macroblock; k is the number of luminance blocks in a macroblock; A k is the absolute differences from DC of an intra- coded block; yk(ij) is the luminance value of the intra-coded block, and dc k is the DC value of the intra-coded block. The mean absolute prediction differences for an inter-coded macroblock is defined as 3 mad(r,c) = j^madk(r,c) k = \ 7 (2.2) madk(r,c) = ji^Y\ek(i,j)\ i=0 y'=0 ' where, et(i,j) is the prediction error and madk(ij) is the absolute prediction differences. Finally, the coding difficulty factor of a picture is determined as D= ^ A(r,c)+ Yjmad (r'c) (2-3) macroblock£{intra-coded] macroblocke{inler-coded] The picture targets for I-, P-, and B- pictures are determined by simultaneously satisfying the following three equations: CGOP = C l ~^~npCp ~^nbCb D - F C > = ^ r ~ c ' ( 2 - 4 ) D L - E \ L Cb =WB— C, where, Di, D P , and D B are the difficulty coding factors of I-, P-, and B- pictures respectively; n p and n D are the number of P- and B- pictures respectively; CGOP is the 15 Chapter 2 Background given GOP target; Q , C P , and C B are the picture targets for I-, P-, and B- pictures; E D and E ' D are the average mean absolute errors of the past and future decoded pictures to which the P- and B- pictures are referenced; w B is a weighting factor. Since D p and D B are unknown, when the picture targets are being computed, they are estimated from previously encoded P- and B- pictures. In [6], the picture complexity is defined as the average variance of all 8x8 luminance blocks in a picture. To determine picture targets, the algorithm first separates the video sequence into segments of different scenes and classifies the scenes according to the picture complexity of I-, P-, and B- pictures in the scene. Picture targets are then obtained using pre-computed experimental I-, P-, and B- picture bit counts for each class. Instead of using a picture complexity measure, [7] uses a macroblock activity measure for picture bit allocation. The macroblock activity measure is defined as the average of the quality bit-count ratios over the quantizer scale index ranges [1,2,..., 31]. For each quantizer scale, the quality bit-count ratio is determined from the bit-count of the original macroblock and the bit count of the encoded macroblock using this quantizer scale value. A high ratio indicates that the macroblock is easy to encode and thus, requires fewer bits. [8] uses a feedback re-encoding method to determine picture complexities and picture targets. For each picture, two encodings are performed. During the first encoding, the picture complexity of a picture is estimated from previous I-, P-, or B - pictures in the 16 Chapter. 2 Background same way as in TM5 [4], and a picture target is determined from the estimated picture complexity. The picture is then encoded. The corresponding average quantization factor Q and the number of bits R used in encoding the picture are obtained. The picture complexity is then updated, and the picture target is re-determined using the updated picture complexity. Another re-encoding method is found in [9]. The algorithm defines macroblock complexity as the number of bits needed to encode a macroblock using a quantizer scale q. The picture complexity of a picture is defined as the sum of the macroblock complexities from all the macroblocks in the picture. The algorithm first encodes a picture using a single quantizer scale value. Using the resulting bitcount from each macroblock, the picture complexity is determined. The algorithm then distributes a given GOP target to the I-, and P-, and B- pictures in the GOP using two bit allocation ratios. The bit allocation ratios are derived from the picture complexities of the most recent I-, P-, and B- picture. A l l above bit allocation techniques address the picture quality consistency issue related to the coding of a single video source. The techniques described in [4], [5], and [6] use statistics from previously encoded pictures or from previously encoded training sequences to make assumptions about the current pictures. Past statistics usually are not good representatives of the activity and complexity levels of current pictures because scene changes occur rather quickly within a video sequence. Both [7] and [8] use re-encoding to solve the problem from using past statistics. By encoding a picture or a sequence more than once, more accurate complexity measures are obtained. However, 17 Chapter 2 Background re-encoding requires extra computations. It also introduces long delay before the transmission of the video sequence. 2.2.2 Video Coding of Multiple Sources Two classes of techniques, statistical multiplexing and joint bit-rate control, can be used in handling bit allocation for multiple program sources. The aim of these techniques is to assign an appropriate bit-rate to provide consistent picture quality for each video source in the multi-program environment. Generally, the goals are accomplished by encoding the video streams using VBR. Although both techniques support V B R compression in a constant bit-rate medium and make use of this knowledge in performing bit allocation, their bit allocation strategies are very different from each other. 2.2.2.1 Statistical Multiplexing Statistical multiplexing is usually associated with packet switching or cell switching networks such as an A T M network. It finds applications in Direct Satellite Broadcast (DSB) more often than in terrestrial broadcasting since satellite transmission employs the A T M network protocols. In statistical multiplexing, V B R data from each source are split' into fixed size segments or "cells", and the cells are placed in a buffer. Immediately before the transmission, the cells from different sources are extracted from the buffer, randomly interleaved, and multiplexed. Three factors are found to have the greatest impact on picture quality when video streams are transmitted, via an A T M network [10, 11]. They are the number of lost cells, the number of pixels in an impaired region and its shape, and the burstiness of the loss. In statistical multiplexing, the aim of bandwidth allocation is to minimize the probability of cell losses. 18 Chapter 2 Background In an A T M network, the probability of cell loss can be minimized if the network is informed about the behaviour that can be anticipated from each individual source. Therefore, video traffic characteristics are first modeled before the actual bandwidth allocation. After a traffic model is chosen, features or model parameters are extracted from each video source. Based on the values of these parameters, the required bandwidth is estimated. For example, [12] uses a Markov chain to model video traffic. The mean of cells generated per frame, \x, and the standard deviation of cells generated per frame, a, are the two parameters used in bandwidth estimations. A statistical model, found in [13], uses the average bit-rate, the bit-rate variance, and the peak bit-rate of a video source as parameters to characterize the video source. Using simulation results, [14] shows that a V B R bit-rate video stream can be characterized using statistical measures such as the marginal distributions and the peak-to-average ratio of the bit-rates. A parametric model proposed in [15], uses nine fundamental indexes, which are the average intensity level of. each picture, the variance of the intensity levels in each picture, the entropy of the pixel values in each picture, the vertical entropy of the pixel values in each picture, the horizontal entropy of the pixel values in each picture, the pixel value difference between consecutive pictures, the motion index of each picture, the temporal entropy of each picture, and the temporal vertical entropy of each picture, to represent a V B R video source. Statistical multiplexing is based on the "law of large numbers" [16]. It is very effective when the number of video sources to be multiplexed is large. Although statistical multiplexing has its merits, it also has several pitfalls. Because statistical multiplexing is subject to packet loss, the entire channel cannot be used to its full 19 Chapter 2 Background capacity [17]. If a source provides too much data, causing the buffer to overflow, packets will be lost. Similarly, if data is being queued in the buffer for too long, causing it to arrive late at the decoder, the data will also be considered lost. Thus, the performance of statistical multiplexing depends greatly on the statistical model used. Any deviation of -the actual data from the model would create catastrophic effects in performance. 2.2.2.2 Joint Bit-rate Control Joint bit-rate control is a multi-program rate control technique that can be used in various types of applications such as terrestrial broadcasting, satellite broadcasting, cable transmission, or even A T M transmission. This technique is not associated with any particular type of networks as in the case of statistical multiplexing. The idea of joint bit-rate control is to allow the bit-rate of each individual video program to vary according to some video characteristic such as the picture complexity, while the sum of all bit-rates remains constant. In this technique, each video stream gets allocated a portion of the channel bandwidth at every instance. The aim of joint bit-rate control is to perform bit allocation in such a way that the picture.quality of each video stream remains relatively consistent. Since video streams under joint bit-rate control are not subject to packet loss, the full channel capacity can be used. Almost all of the existing joint bit-rate control techniques distribute the channel bandwidth according to some relative parametrical measures of the video streams. For example, the technique in [18] determines a picture target for each video stream by defining picture complexity to,be the same as in TM5 [4] and by using the same . quantization scale Q for all video programs at each picture. The picture target of each 20 Chapter 2 Background video stream depends upon the ratio of the stream's picture complexity to the sum of the picture complexities of all video streams, i.e., X (k) R,W = -cil« O (e) Picture quality vs. Q for B- pictures (f) Bit counts vs. Q for B - pictures Figure 3.8: PSNR-Q and R-Q Curves of G A R D E N . Chapter 3 Two-Stage Joint Bit-Rate Coding Figure 3.9 and 3.10 compare the picture quality of two video streams (MICROSOFT ROBOT and G A R D E N ) that have been encoded with fixed quantizer scales 7, 8, and 10 and transcoded to the same variable bit-rates. Both video streams are 2 seconds in length and have a pixel size 720x480. The video streams have been encoded at 30 frames/sec, with a GOP size of 12, and a GOP pattern of IBBPBBPBBPBB. The variable bit-rate assignments are obtained by putting the video streams through our two-stage joint bit-rate coding system. In Figure 3.9 and 3.10, the rectangles outline the areas where artifacts cannot be seen when the video streams are quantized with a quantizer scale of 7, but the artifacts become visible when the video streams are quantized with larger quantizer scales. (a) Encoded fixed quantizer scale 7. 41 Chapter 3 Two-Stage Joint Bit-Rate Coding Chapter 3 Two-Stage Joint Bit-Rate Coding (d) Encoded at fixed quantizer scale 7. (e) Encoded at fixed quantizer scale 8. (f) Encoded at fixed quantizer scale 10. (g) Encoded at fixed quantizer scale 7. (h) Encoded at fixed quantizer scale 8. (i) Encoded at fixed quantizer scale 10. (j) Encoded at fixed quantizer scale 7. (k) Encoded at fixed quantizer scale 8. (1) Encoded at fixed quantizer scale 10. Figure 3.9: Picture quality comparisons among MICROSOFT ROBOT encoded at fixed quantizer scales 7, 8, and 10. 43 Chapter 3 Two-Stage Joint Bit-Rate Coding Chapter 3 Two-Stage Joint Bit-Rate Coding (c) Encoded at fixed quantizer scale 10. (d) Encoded at fixed (e) Encoded at fixed (f) Encoded at fixed quantizer scale 7. quantizer scale 8. quantizer scale 10. Figure 3.10: Picture quality comparisons among G A R D E N encoded at fixed quantizer scales 7, 8, and 10. 45 Chapter 3 Two-Stage Joint Bit-Rate Coding 3.2.3 Complexity Files A complexity file is a record of the picture and GOP complexities. Picture complexity is defined as the number of bits used to encode a picture during the first stage of our process. The picture bit counts from this stage reflect the complexity of the picture content because a single quantizer scale value is applied to the entire video stream. Therefore, if a picture is highly active, its spatial transformation will have more non-zero coefficients than pictures which has low activity levels, and thus, more bits are needed to encode the picture. GOP complexity is defined as the sum of the picture complexities in a GOP, which is also the number of bits used to encode all pictures of a GOP. GOP complexities are used in joint bit-rate control to determine GOP targets. Picture complexities are used in the transcoding process to determine the picture bit distributions within a GOP. Besides picture complexities and GOP complexities, the following parameters are also included in a complexity file: the number of frames within a GOP; (M-l) is the number of consecutive B frames between an I frame and the first P frame following it (or between two consecutive P frames); a value that represents the frame rate used in the video stream; a user-defined bit-rate that represents the bit-rate needed by the video stream during which the video content is most active; a user-defined bit-rate that represents the bit-rate needed by the video stream during which the video content is least active; the number of frames the video stream comprises. N ' M frame _rate _code max_rate minjrate number_of_fram.es 46 Chapter 3 Two-Stage Joint Bit-Rate Coding 3.3 Joint Bit-Rate Controller The second stage of our encoding system consists of a joint bit-rate controller that oversees the bit allocation operations of the video streams to be multiplexed in a single channel. This controller receives the encoded video streams and their corresponding complexity files, which were generated during the first encoding stage. Based on this information and the given channel bandwidth, the controller determines a GOP target for each video source on a GOP basis and sends these GOP targets to the set of transcoders. The controller assumes that all video streams have the same N , M , and frame_rate or the f #ofGOP's^ frame_ ratesame -sec j . For a given bandwidth, the aim of our joint bit^rate N controller is to offer appropriate bit allocation and consistent picture quality for each GOP of the video streams. There are two functions performed by the joint bit-rate controller: 1. an admission test to accept or reject each video requesting to be transmitted and 2. a processing procedure for each GOP to determine the appropriate GOP target for each video source. Figure 3.11 illustrates the dataflow diagram of the joint bit-rate control process. 3.3.1 Admission Test The purpose of the admission test is to reject any new video stream requesting to be transmitted, if its addition to the system degrades the quality of the presently transmitted video programs to an unacceptable level. Our admission test follows the following simple procedure: It sums up the minjrate of all streams currently present in the system 47 Chapter 3 Two-Stage Joint Bit-Rate Coding as well as the minjrate of the new stream. If the sum is greater than the channel rate, the new video is not accepted for transmission; otherwise, the new stream is added to the system. If a more sophisticated admission test is required, this module can be easily replaced or modified since replacement does not affect the operation of the system; the entire system would work the same way as before. Compute aggregrate_GOP_target needed to transmit all videos Read complexity file of the new video Perform Admission Test on the new video New video stream is added to the system Read a G O P complexity from each video's complexity file Determine the initial G O P target for each video Adjust the G O P Send the adjusted G O P targets to a set of transcoder targets Figure 3.11: Dataflow diagram of the joint bit-rate control process. 48 Chapter 3 Two-Stage Joint Bit-Rate Coding 3.3.2 Processing Procedure for a Group of Pictures For every GOP, the joint bit-rate controller performs the following five steps: 1. it computes the necessary number of bits required to transmit the GOP's of all video streams, which we call aggregrate_GOP_target; 2. it reads the GOP complexity of the current GOP of each individual video stream; 3. it determines an initial GOP target of each video stream based on the square root of its complexity relative to the sum of the square root complexity of all video streams; 4. it adjusts the value of the initial GOP target of each video stream based on the user-defined maxjrate and minjrate; 5. it sends to a set of transcoders the adjusted GOP target at which each corresponding video stream must be transcoded. While the admission test ensures that each video stream present in the system has at least the minimum bandwidth it requires, the computation of the aggregrate_GOP Jarget ensures that all video streams together do not use more than the necessary bandwidth. This means, that the video streams do not always consume the entire channel bandwidth. The free bandwidth could be used to provide other services such as Internet and long-distance telephone. The aggretratejGOPJarget is determined as Np necessary _ bandwith = min(channel _ bandwidth, ^ maxjrate i) i=i ( 3.3 ) aggregrate _ GOP _ target - necessary _ bandwidth /(# °fs°0Ps) where, N p is the number of video streams present in the system; # o f ° e 0 P s specifies the number of GOP's transmitted per second; the sum of maxjrates indicates the maximum 49 Chapter 3 Two-Stage Joint Bit-Rate Coding rate needed to transmit the current GOPs of all the video streams, assuming each of the current GOP's comprises the most active segment of its video stream. By using the minimum of this sum and the channel bandwidth, we eliminate the possibility of over-assigning bandwidth to the video streams. The joint bit-rate controller allocates a fraction of the aggregrate jGOPJarget to the current GOP of each video stream. The initial fraction assigned to each video stream is proportional to the relative GOP complexity of the video streams.. That is, J GOP _ Complexity, r i A \ GOP _ targeti = — * aggregrate _ GOP _ target t $ A ) 2^ -\JGOP _ Complexity k k=l The square root function is applied to the GOP complexities because it compresses the complexity ratio between the high-complexity streams and the low-complexity streams into a reasonable range of values. When the complexities of the video streams reach , . , , , . , . GOP _ Complexity . . , extreme high levels, using the ratio — results in allocating a large ^GOP _ Complexity k k=\ proportion of the aggregrate JGOP Jarget to the high-complexity streams, leaving insufficient number of bits for the low-complexity streams. Performance evaluations JGOP _ Complexity have shown that assigning GOP targets according to improves yjGOP _ Complexity k k=l the bit allocation between different sources. A special case for the joint bit-rate control algorithm is when there is only one video stream present in the system. When this situation occurs, the joint bit-rate controller will 50 Chapter 3 Two-Stage Joint Bit-Rate Coding assign the same GOP target for all GOP's. As a consequence, the resulting video stream after transcoding is a CBR video stream. In some cases, the assigned GOP target may go beyond the max_target (= max_rate/(#ofs°0Ps)) or below the minjtarget (= minrate / ( # o f S e C O P s ) ) derived from the user-defined maxjrate and minjrate. This is especially true when some video streams present in the system have extremely high GOP complexities while others have extremely low GOP complexities. The aggregrate_GOP_target is determined using the maxjrate of each video stream and distributed among the video streams according to their present GOP complexities. If low-complexity segments of the video streams are encountered, the controller will assign to those video streams GOP targets lower than their corresponding maxjarget. The video streams currently having high-complexity segments, on the other hand, will be given GOP targets above their respective max_target. Therefore, some GOP target adjustment is necessary. Figure 3.12 shows the flow diagram of the GOP target adjustment algorithm we applied after the initial GOP target assignment. 51 Chapter 3 Two-Stage Joint Bit-Rate Coding Compare initial G O P target with max_target and min_target Records difference between min_target and initial G O P target and adds the difference to an accumulator, AccBelow Records difference between max_target and initial G O P target and adds the difference to an accumulator, AccAbove No Initial G O P target falls within min_targQt and max_targot. Adds the G O P complexity of this G O P to an accumulator, Total_ComplexityJn_Range Yes _ J * _ Compare AccBelow with AccAbove Distributes among videos with initial G O P target less than min_target the extra bits available (AccAbove) according to their G O P complexity Distributes among videos that fall within min_target and max_target the extra bits available (AccAbove) according to their G O P complexity No AccAbove < AccBelow. Extracts difference (AccBelow - AccAbove) from videos that fall within min_target and maxjarget according to their G O P complexity. Distributes bits (AccAbove + difference) among videos with initial G O P target less than min_target according to their G O P complexity I Figure 3.12: Flow diagram of GOP target adjustment algorithm. Chapter 3 Two-Stage Joint Bit-Rate Coding Basically, the GOP target adjustment algorithm compares the initial GOP target of each video with the video's minjarget and maxjarget. One of three cases arises as the result of the comparison: 1. The initial GOP target is below minjarget. The algorithm records the difference between minjarget and the initial GOP target and adds the difference to an accumulator, AccBelow. The difference between minjarget and the initial GOP . target comprises the additional number of bits that the video stream must have for this GOP in order to achieve the minimum acceptable video quality. 2. The initial GOP target is greater than maxjarget. The algorithm records the difference between maxjarget and the initial GOP target and adds the difference to an accumulator, AccAbove. The difference between maxjarget and the initial GOP target represents the allocated number of bits that can be reduced from the GOP while still meeting the user's requirement. 3. The initial GOP target falls within minjarget and maxjarget. The algorithm adds the GOP complexity of this GOP to an accumulator, Total_Complexity_ln_Range, for later use. Next, the bit adjustment algorithm compares AccBelow with AccAbove. One of three cases arises: 1. AccAbove is greater than AccBelow and AccBelow is not zero. That is, the extra number of bits obtained from the video streams that have initial GOP targets greater than their maxjargets can totally compensate for the extra bits required by the video streams that have initial GOP targets less than their minjargets. Thus, the algorithm 53 Chapter 3 Two-Stage Joint Bit-Rate Coding distributes the extra bits among the videos with initial GOP targets below minjarget. The extra bits are distributed as follows: GOP _ Complexity, . / . . extra bits,=- : * extra bits (3.5) ^GOP _ Complexity 1=1 where, Nbei0w is the number of videos that have their initial GOP bit-rate below their minjrate. 2. AccAbove is greater than AccBelow and AccBelow is zero. That is, there are extra bits available, but no streams are starved for bits. The algorithm, in this case, distributes the extra bits to the video streams with initial GOP targets falling within • the minjarget, maxjarget range, improving their video quality. The extra bits are distributed as follows: GOP _ Complexityi . , . . . .. extra bits, = — * extra bits (3.6) ' T in _ range £ G O P _Complexity i=i where, Nin_range is the number of videos whose initial GOP bit-rate falls within the (min_target, maxjarget) range. 3. AccAbove is less than AccBelow: The extra bits obtained from the video streams with their initial GOP targets greater than their maxjargets cannot totally compensate for the insufficiency of the video streams with initial GOP targets less than their minjargets. Thus, the algorithm needs to take away bits from the video streams with initial GOP target in the {minjarget, maxjarget) range. The taken-away bits, combined with the extra bits obtained from the videos with initial GOP target greater than their maxjarget, are distributed to the video streams that require more bits. The taken-away bits plus the extra bits are distributed as follows: 54 Chapter 3 Two-Stage Joint Bit-Rate Coding GOPjComplexity extrabitSi = - : *(extra_bits+taken_away_bits) (3.7) below i y£jGOP_ Complexity 1=1 where, N b ei 0w is the number of videos that have their initial GOP target below their min_target. Figure 3.13 and 3.14 show the results from examining the performance of the joint bit-rate controller. The first set of video sequences consists of G A R D E N , BUS, B A L L E T , and MICROSOFT ROBOT. Each of these four sequences is 2 seconds in length and has a 720x480 resolution. The second set of video sequences consists of four segments from the trailer of M A N IN THE IRON M A S K . Each segment is 10 seconds in length and has a 720x480 resolution. A l l video sequences from the two sets are encoded at 30 frames/sec with a fixed quantizer scale of 6, a GOP size of 12, and a GOP pattern JJ3BPBBPBBPBB. Figure 3.13(a) and 3.14(a) illustrate the GOP complexity of each video stream. Figure 3.13(b) and 3.14(b) trace the GOP targets assigned to the two sets of video test sequences. Figure 3.13(c) and 3.14(c) show the sum of the GOP targets from each bit allocation decision. We observe that the GOP target assignments of each video stream closely match the stream's GOP complexities. In addition, each individual video sequence is assigned with variable GOP targets, and the sum of the assigned GOP targets or the aggregate_GOP'Jarget at each instance is constant. 55 Chapter 3 Two-Stage Joint Bit-Rate Coding (b) GOP targets assigned to individual video segment. Chapter 3 Two-Stage Joint Bit-Rate Coding GOP Bit Allocation ,..".Micr080ft Robot. . . - . . [ ; . . : : o Microsoft Robot+Bu* + Microsoft Robot+Bui+Ballet ...x Microsoft. Robot+Bus*Ballct*Garden. v l i _L i i 1 0 1 2 3 4 5 GOP number (c) Sum of GOP targets assigned to all video segments. Figure 3.13: GOP bit allocation performed by the joint bit-rate controller for MICROSOFT ROBOT, B A L L E T , BUS, and G A R D E N . Chapter 3 Two-Stage Joint Bit-Rate Coding OOP CMiptaihy , K' OOP comcMdly . OOP rune* OOPrwi*-(a) GOP complexity of each video stream. .QOPBnMsciton , K ' OOPBrtAtKMtai (b) GOP targets assigned to individual video segment. Chapter 3 Two-Stage Joint Bit-Rate Coding .GOP Bit Allocation ]—4—i—j—t—i—i—i—i—h—j-0 5" 10 15 20 25 "GOP. number (c) Sum of GOP targets assigned to all video segments. Figure 3.14: GOP bit allocation performed by the joint bit-rate controller for SEGMENT 1, SEGMENT 2, SEGMENT 3, and SEGMENT 4 of the M A N IN THE IRON M A S K trailer. 3.3.3 Parameters for Optimizing Transcoders' Performance As shown earlier, our joint bit-rate controller uses the GOP complexities recorded in the complexity files to determine the GOP targets for each video stream. Then, this information is sent to the transcoders. Each transcoder, upon receiving its corresponding GOP target, distributes it to the pictures in the GOP. The joint bit-rate controller can help the transcoders in optimizing their performance by providing them with the picture complexities or picture targets derived from the picture complexities. 3.3.3.1 First Parameter Set: Picture Complexities Sending picture complexities as additional information to the transcoders is beneficial. It eliminates the need for the transcoder to estimate the picture complexities from' previously coded pictures. Instead, the transcoders now have accurate measurements on the complexity of each picture without performing any analysis. By making use of these 59 Chapter 3 Two-Stage Joint Bit-Rate Coding received picture complexities, the transcoders can determine for each picture a picture target that reflects the picture content. 3.3.3.2 Second Parameter Set: Target Picture Bit-Rates By using the picture complexities recorded in complexity files and the GOP target determined, the joint bit-rate controller could actually perform the picture bit allocation for the transcoders. As a result, the complexity of a transcoder can be reduced and the encoding delay in the transcoding stage can be shortened. The picture target of a picture is given as follows: Picture Complexity. Picture_Targett = — — * GOP_Target (3.8) ^ Picture _ Complexity, where, N is the number of pictures in a GOP; Picture jComplexity is the picture complexity of each picture recorded in the complexity file; GOPJTarget is the GOP target determined by the joint bit-rate controller. 3.4 Transcoders Transcoding is the final step of the joint bit-rate coding process. It is performed immediately before multiplexing and transmission of the video streams. The first half of the transcoding process partially decodes the video stream up to the stage where all DCT coefficients of macroblocks are obtained. The latter half of the transcoding process re-quantizes the DCT coefficients and puts the video stream back together. Thus, our transcoding process involves variable length decoding, inverse scanning, inverse quantization, re-quantization, forward scanning, and variable length encoding of the incoming video stream. Figure 3.15 shows the block diagram of the transcoding process. 60 Chapter 3 Two-Stage Joint Bit-Rate Coding A high quality VBR video stream retreived from a video server Variable . Length Decoding Inverse Inverse Scanning Quantization T3 O O ding Quantization Scanning o c L±J A VBR video stream with its bit-rate closely matches the bit-rates specified the joint bit-rate controler Figure 3.15: Block diagram of the transcoding process. A transcoder, essentially, consists of a cascaded decoder and encoder [20]. The complexity of a transcoder can range from the most complex, where it comprises a complete decoder and a complete encoder, to the simplest, where it is just a re-quantizer. Our transcoder implementation takes on the simplest approach. We can do so because the objective for our transcoding process is to compress the video stream from a high bit-rate to a lower bit-rate suitable for transmission. That is, no other reformatting such as resampling is involved. Since re-quantization is the sole purpose of our transcoding process, changes on performing DCT, the picture types, carrying a new set of coding decision, or a re-estimation of motion vectors is not required. Therefore, all these information, already obtained during the first stage, can be used. By reusing the set of 61 Chapter 3 Two-Stage Joint Bit-Rate Coding coding decisions and the set of motion vectors, we reduce the transcoder's complexity and the processing delay. 3.4.1 The Transcoding Processing Two functions are performed by our transcoder: 1. an initialization procedure to obtain all the necessary information to be used during transcoding and 2. a GOP processing procedure to transcode the current GOP from the initial high bit-rate to the desired target GOP bit-rate. Figure 3.16 illustrates the dataflow diagram of the transcoding processing. 3.4.1.1 Initialization During initialization, the transcoder 1. receives information such as N , M , and the number of frames in the video stream from the joint bit-rate controller; 2. decodes Sequence Header and all other sequence header extensions to retrieve information about the video; 3. initializes sequence rate control. The initialization of sequence rate control consists of the initial estimation of picture complexity and the initial estimation of virtual buffer fullness for each picture coding type. 3.4.1.2 Transcoding Procedure for a Group of Pictures For every GOP, the transcoder performs the following steps: 1. it decodes the GOP Header; 62 Chapter 3 Two-Stage Joint Bit-Rate Coding R e c e i v e N, M, s e q u e n c e bit-rate, & number_of_ f rames from the joint bit-rate controller G e t and put back S e q u e n c e Header a n d related information Initialize s e q u e n c e rate control l G e t a n d put back G O P H e a d e r and related information R e c e i v e G O P target from the joint bit-rate controller Initialize G O P rate control G e t and put b a c k Picture H e a d e r and related information Determine picture Initialize picture F ind appropriate quantizer s c a l e s and requantize picture U p d a t e picture target rate control rate control Figure 3.16: Dataflow diagram of the transcoding process. Chapter 3 Two-Stage Joint Bit-Rate Coding 2. it receives from the joint bit-rate controller a GOP target for the current GOP; 3. it initializes GOP rate control; , 4. for each picture in the GOP, a. it decodes the Picture Header and all other picture extension headers; , b. it determines the picture target for each picture; c. it initializes picture rate control; d. it determines the appropriate quantizer scale parameters to re-quantize the picture; e. it updates the picture rate control parameters. The GOP rate control initialization records the received GOP target obtained from the joint bit-rate controller. The picture target for the next picture in the GOP is defined as in the TM5 picture target procedure [4]: T. = max R bit rate ^ + N p X p + NbXb ' 8x frame _ rate Tp - max R XiKb bit rate NbKpXb ' Sx frame_rate Tb = max N p + P XpKb R bit rate (3.9) NpKbXp ' 8x frame_ rate where, K p and Kb are constants defaulted to be 1.0 and 1.4 respectively; N p and Nb are the number of P- and B- pictures remaining in the current GOP in the coder; X;, X p , and Xb are the estimates of the picture complexity of the next I-, P-, and B - pictures; R is the 64 Chapter 3 Two-Stage Joint Bit-Rate Coding remaining number of bits assigned to the GOP. The picture complexity estimates, X; , X p , and X b , are defined as follows [4]: x,=S,Q, Xp=SpQp (3.10) Xb=SbQb where, Si, S p, and S b are the number of bits used to code the previous I-, P-, B- picture, and Qi, Q p , and Q b are the average quantizer scale parameter used to encode all macroblocks of the previous I-, P-, and B- pictures. During the initialization of picture rate control, the initial quantizer scale parameter for the picture is estimated. This quantizer scale parameter is then refined for each macroblock within the picture. The transcoder uses this refined quantizer scale parameter to re-quantize the macroblock. After the entire picture is re-quantized, the remaining number of bits, R, is updated. 3.4.2 Picture Bit-Rate Distribution In Section 3.3.3, we introduced two sets of parameters, the picture complexities and the picture targets, which the joint bit-rate controller can send to the transcoder to help optimize picture bit distribution. In this section, we discuss the necessary changes to be made to the transcoder in order to take advantage of these two parameter sets. 3.4.2.1 Transcoding with Given Picture Complexities The first parameter set consists of picture complexities. As discussed in Section 3.3.3.1, the joint bit-rate controller sends to the transcoder a GOP target for the current GOP and the picture complexity of the pictures in that GOP for every GOP in the video stream. The transcoder, thus, receives these two pieces of information from the joint bit-rate controller at the beginning of each GOP transcoding procedure. Step 4 of the transcoding procedure discussed in Section 3.4.1.2 becomes 65 Chapter 3 Two-Stage Joint Bit-Rate Coding 4. for each picture in the GOP, a. it decodes the Picture Header and all other picture extension headers; b. it determines the picture target for each picture using the picture complexities received; c. it initializes picture rate control; d. it determines the appropriate quantizer scale parameters to re-quantize the picture; e. it updates the picture rate control parameters. The picture target for each picture coding type is also determined according to Equation 3.9. However, instead of using Equation 3.10 to estimate picture complexities, the received picture complexities are used. 3.4.2.2 Transcoding with Given Picture Bits Distribution As discussed in Section 3.3.3.2, the second set of parameters is the picture target. In this case, the transcoder also receives two pieces of information from the joint bit-rate controller at the beginning of each GOP transcoding procedure: 1. the GOP target for the current GOP; 2. a set of picture targets for the pictures within this GOP. The picture targets are computed using Equation 3.8. Therefore, step 4 of the transcoding procedure becomes 4. for each picture in the GOP, a. it decodes the Picture Header and all other picture extension headers; b. it initializes picture rate control; c. it determines the appropriate quantizer scale parameters to re-quantize the picture; d. it updates the picture rate control parameters. 66 Chapter 3 Two-Stage Joint Bit-Rate Coding Figure 3.17 and Table 3.2 compare the picture quality of the segments from the M A N IN THE IRON M A S K trailer, which were transcoded using GOP target information only, with those transcoded using GOP target as well as picture complexities. The variable bit-rate assigned to each video segment is determined by the joint bit-rate controller. Table 3.3 summarizes the bit-rates assigned to the video segments. From Table 3.2, we observe that there is an average of 0.2 dB improvement in picture quality resulting from using picture complexities. Figure 3.18 and Table 3.2 compare the picture quality of the M A N IN T H E IRON M A S K trailer segments transcoded using GOP targets only with those transcoded using both GOP targets and picture targets. The same bit-rates shown in Table 3.3 are assigned to video segments. The results from Table 3.2 show that the addition of picture targets gives an average 0.9 dB improvement in picture quality. 67 Chapter 3 Two-Stage Joint Bit-Rate Coding Coding Algorithm Average PSNR Std. Dev. PSNR Max. PSNR Min. PSNR Segment 1 Joint using GOP targets only 50.97 9.90 71.60 36.10 Joint using GOP targets and picture complexities 50.94 9.54 71.60 37.20 Joint using GOP and picture targets 51.79 9.37 71.70 40.70 Segment 2 Joint using GOP targets only 50.88 11.00 71.60 37.80 Joint using GOP targets and picture complexities 51.35 11.00 71.60 39.20 Joint using GOP and picture targets 52.17 10.30 71.70 39.40 Segment 3 Joint using GOP targets only 43.16 5.76 54.10 33.80 Joint using GOP targets and picture complexities 43.46 5.59 64.40 32.80 Joint using GOP and picture targets 44.24 5.68 65.40 35.50 Segment 4 Joint using GOP targets only 74.23 9.54 71.60 36.90 Joint using GOP targets and picture complexities 47.29 9.59 71.60 36.90 Joint using GOP and picture targets 47.62 9.31 71.60 37.80 Table 3.2: Comparison of PSNR values for video streams encoded with joint bit-rate coding using GOP targets only, with joint bit-rate coding using GOP targets and picture complexities, and with joint bit-rate coding using GOP and picture targets. . Average Bit-Rate (Mbps) Maximum Bit-Rate (Mbps) Minimum Bit-rate (Mbps) Segment 1 3.37 4.65 2.56 Segment 2 3.56 4.94 2.53 Segment 3 4.80 5.99 3.71 Segment 4 4.26 5.81 2.53 Table 3.3: Bit-rates of the four M A N IN THE IRON M A S K video segments assigned by the joint bit-rate controller. 68 Chapter 3 Two-Stage Joint Bit-Rate Coding : jointbit-rate with GOP targets only - joint bit-rate with GOP targets;and picture complexities 100 150 200 frame number in decoding brder (a) Segment 1 'a. 55 z : joint bit-ratejwi^:GOP targets only - joint bit-rate;with;GOP targets and picture complexities; • Segment 2 GOP 12 100 150 200 frame number in decoding order (b) Segment 2 Chapter 3 Two-Stage Joint Bit-Rate Coding 100 150 ' 200 frame niimberjn'decodingiorder (c) Segment 3 Segment .4 -' GOP 12: : joint bit-rate;with GOP:targetg only - joint bit-rate with GOP targets and picfa 100 150 200 frame number:ih decodihgiorder (d) Segment 4 Figure 3.17: The picture quality of the M A N IN THE IRON M A S K trailer segments transcoded with GOP targets only and the picture quality of the same segments transcoded with GOP targets and picture complexities. 70 Chapter 3 Two-Stage Joint Bit-Rate Coding /WW . i Segment;.! V •GOP 12 I: joint bit-rate wrmGOP targets only -joint bit-rate with GOP targets and picture targets 100 150. 200 frame number.in decoding order (a) Segment 1 / V W I i An : joint bit-rate .with GOP targets only - joint bit-rate wijth GOP targets andipicture t^argets-1 ; Segment 2 GQP 12 100 150 200 frame niimbynn^decqdiri (b) Segment 2 Chapter 3 Two-Stage Joint Bit-Rate Coding j Segment 3; i GOP 12 lijoirtbrt-rate with GOP targets only! -joint bit-rate with GOP targets1 and, picture targets 100 150, 200 frame number in decoding'order (c) Segment 3 o: 55 Segment 4 • GOP 12 : joint bit--ate with GOP targets only - joint bit-rate with GOP taroetsanii picture targets i 1C0 150 200 frame number ihdecodjhg order (d) Segment 4 Figure 3.18: The picture quality of M A N IN THE IRON M A S K trailer segments transcoded with GOP targets only and the picture quality of the same segments transcoded with GOP and picture targets. 72 Chapter 3 Two-Stage Joint Bit-Rate Coding 3.5 Summary In this chapter, we presented our two-stage joint bit-rate coding system that is intended for the video coding of multiple video sources for transmission in a constant bit-rate medium. The system focuses on providing consistent picture quality to each video stream present in the system and on allocating to each video stream an appropriate portion of the given channel bandwidth, which is dependent on the picture complexities. In order to achieve the goals of our system, we use a two-stage coding approach. The first stage analyzes the activity level of the pictures in each video stream and records the data in complexity files. It also facilitates the processing of the second stage by performing motion compensation predictions on the video sources. By using the results from the analyses performed during the first stage and the knowledge of the available channel bandwidth, the joint bit-rate controller determines the number of bits needed to encode a GOP of each video stream and send these GOP targets to a set of transcoders. The transcoders carry out the bit allocation decisions by re-quantizing their corresponding video streams. As an enhancement to the system, the joint bit-rate controller also sends a set of picture complexities or a set of picture targets to each transcoder, facilitating the picture bit allocation process during the transcoding stage. In the simulation results to be discussed in the next chapter, we will show that our two-stage joint bit-rate coding system significantly reduces the fluctuations in picture quality of all video streams. We will also show that the bit allocation decisions performed by our system reflect the complexities of the video streams. 73 Chapter 4 Simulation Results and Discussions In the previous chapter, we presented our two-stage joint bit-rate coding system for coding multiple video programs simultaneously. The system is designed to provide more efficient usage of the available bandwidth. It is also designed to offer each video stream more consistent picture quality than that obtained by coding each video stream individually at a constant bit-rate. By using the two-stage approach, the system also reduces the coding delay introduced immediately before the multiplexing and transmission of the video streams. To test the performance of our system, we compare the bandwidth consumption of each individual video stream encoded using our coding system with the bandwidth consumption of the same video stream encoded using the Test Model 5 [4] algorithm. To illustrate the picture quality consistency provided by our system, we compare the standard deviations of the picture quality (PSNR) of the reconstructed images obtained from our coding system with the standard deviation of the picture quality of those images obtained by independent CBR coding. We also compare the C P U time used to encode a video stream using our system and that using the TM5 algorithm [4]. In this chapter, we present the simulation results of the tests. In Section 4.1 we describe the setup of the tests. In Section 4.2, we show the bit-rates as well as the PSNR standard deviations of the test sequences obtained by using our system and by independent CBR coding. In Section 4.3, we present the results from timing analysis. 74 Chapter 4 Simulation Results and Discussions 4.1 Setup Two sets of simulations were carried out using our joint bit-rate coding system. The first set involves six video sequences at a total bit-rate of 20 Mbps. The video sequences are segments extracted from the trailer of T H E M A N IN THE IRON M A S K . The second set involves five video sequences at a total bit-rate of 18 Mbps. The video sequences comprise extremely complex scenes and less complex scenes from various video clips. Each video sequence is 10 seconds in length and has a spatial resolution of 720x480. This resolution is approximately double the ones used in present broadcast systems.' For example, DirectTV uses a spatial resolution of 545x480 for satellite transmission; GI uses 368x480 for cable transmission; TCI uses 352x480 for cable transmission. Therefore, the bit-rates of the resulting video streams are expected to be higher than those encountered in present systems. A l l video sequences were interlaced, encoded at a frame rate of 30 frames/s with a color sampling ratio of 4:2:0. The GOP pattern used in each sequence is EBBPBBPBBPBB. With the option of providing additional information, the joint bit-rate controller is set to send both GOP targets and picture targets to the set of transcoders. 4.2 Simulation Results The M A N IN T H E IRON M A S K trailer is composed of various scenes from the movie. Between scenes, black frames have been inserted to signal the change of scenes. These black frames, when encoded, have extremely high PSNR values (above 55 dB). We have chosen to ignore these frames in the discussion of picture quality fluctuation since their inclusions in the analysis would give a bias to the results. 75 Chapter 4 Simulation Results and Discussions Table 4.1(a) shows the bit-rates for the six THE M A N IN T H E IRON M A S K sequences encoded using joint bit-rate coding. Table 4.1(b) shows the average GOP complexity of the same six video sequences. Table 4.2(a) shows the bit-rates for the five video sequences from the second test set encoded using our joint bit-rate coding system. Table 4.2(b) shows the average GOP complexity of the same five sequences. It is evident that with joint bit-rate coding, each video sequence was encoded with different number of bits depending upon its complexity. For example, from T H E M A N IN T H E IRON M A S K test sequences, SEGMENT 3 was encoded using 30% more bits than SEGMENT 2. The square-, .. . r „ „ J Average _ Complexity Se n . rooted complexity ratio of SEGMENT 3 to SEQUENCE 2„ v. , is J Average _ Complexity Segment2 1.38. That is, the square-rooted complexity of SEGMENT 3 is 38% higher than that of SEGMENT 2. Using the second set of video sequences as another example, the average bit-rate ratio of SEQUENCE 1 to SEQUENCE 2 is 1.02, and the square-rooted complexity ratio of SEQUENCE 1 to SEQUENCE 2 is 1.03. Therefore, the bit assignments and the relative complexities of a video sequence are closely related. Figure 4.1 and Figure 4.2 show the GOP complexities and bit-rates assigned by the joint bit-rate controller for each of the six M A N IN T H E IRON M A S K sequences and each of the five video sequences from the second test set respectively. There is a high degree of resemblance between the GOP complexity plot and the bit-rate plot of each video sequence. It is evident that the bit-rate of each video sequence varies in time and in accordance with the complexity of the GOP in that instance. 76 Chapter 4 Simulation Results and Discussions Average (Mbps) Standard Deviation (Mbps) Maximum (Mbps) Minimum (Mbps) SEGMENT 1 2.90 0.33 3.66 2.52 SEGMENT 2 3.03 0.42 3.86 2.55 SEGMENT 3 3.95 0.58 5.17 3.10 SEGMENT 4 3.56 0.72 4.93 2.55 SEGMENT 5 3.08 0.53 3.98 2.50 . SEGMENT 6 3.47 0.36 3.99 2.52 Table 4.1(a): Bit-rates of the M A N IN THE IRON M A S K video sequences encoded using the joint bit-rate coding system. Average Complexity SEGMENT 1 870560 SEGMENT 2 1022600 SEGMENT 3 1945100 SEGMENT 4 1491000 SEGMENT 5 1101000 SEGMENT 6 1671900 Table 4.1(b): Average GOP complexity of the M A N IN THE IRON M A S K video sequences. 77 Chapter 4 Simulation Results and Discussions Average (Mbps) Standard Deviation (Mbps) Maximum (Mbps) Minimum (Mbps) SEQUENCE 1 3.79 1.71 8.02 1.88 SEQUENCE 2 3.73 1.52 7.38 1.96 SEQUENCE 3 3.52 1.58 7.27 2.06 SEQUENCE 4 3.23 1.61 7.78 1.94 SEQUENCE 5 3.73 1.46 7.76 2.39 Table 4.2(a): Bit-rates of the video sequences from the second test set encoded using the joint bit-rate coding system. Average Complexity SEQUENCE 1 2974767 SEQUENCE 2 2778334 SEQUENCE 3 2720989 SEQUENCE 4 2290514 SEQUENCE 5 2818206 Table 4.2(b): Average GOP complexity of the video sequences from the second test set. 78 Chapter 4 Simulation Results and Discussions -I 3' E o. u 0:2 O O Segment! 10 15 GOP number 10. 15 GOP. number: (a) SEGMENT 1 Segment 2 10 15 GOP number 10 ,15 GOP number (b) SEGMENT 2 Chapter 4 Simulation Results and Discussions (d) SEGMENT 4 Chapter 4 Simulation Results and Discussions GOP number: 0 5 10 15 20 25 GOP number (e) SEGMENT 5 GOP number GOP. number: (f) SEGMENT 6 Figure 4.1: GOP complexities and bit-rates of the M A N IN THE IRON M A S K video sequences. Chapter 4 Simulation Results and Discussions Stream 1 12 T 17 : 1 1 r 0 5 10 15 20 25 GOP dumber . GOP number I (a) SEQUENCE 1 GOP number-(b) SEQUENCE 2 Chapter 4 Simulation Results and Discussions 6 5 10: 15 120 25 ; GOP number: • GOP number" (c) SEQUENCE 3 GOP number (d) SEQUENCE 4 Chapter 4 Simulation Results and Discussions Stream 5 GOPnumber 0 5 10 15 20 25. GOP number' * (e) SEQUENCE 5 Figure 4.2: GOP complexities and bit-rates of the second set of video sequences. Almost all video sequences have periods of highly complex scenes as well as periods of less complex scenes. If independent CBR coding were used, the bit-rate of each video stream had to be set to a high enough value to guarantee that the picture quality of the video stream during the most active segment be similar to the picture quality of the same segment obtained using our system. Using the six video sequences from T H E M A N IN THE • IRON M A S K as an example, if the video streams were to be encoded using CBR coding, the bit-rates of the six video streams would have to be set to 3.66 Mbps, 3.86 Mbps, 5.17 Mbps, 4.93 Mbps, 3.98 Mbps and 3.99 Mbps. However, since CBR coding directly encodes the video streams while our joint bit-rate coding system re-quantizes the video streams, a slightly lower bit-rate could be used for the CBR coding of each video stream. For the six video sequences, the constant bit-rates that give the most active segment of the video streams picture quality similar to the picture quality obtained using our joint 84 Chapter 4 Simulation Results and Discussions bit-rate coding system are 3.66 Mbps, 3.86 Mbps, 4.80 Mbps, 4.70 Mbps, 3.80 Mbps and 3.70 Mbps for SEGMENT 1, SEGMENT 2, SEGMENT 3, SEGMENT 4, S E G M E N T 5, and SEGMENT 6 respectively. Therefore, the 20 Mbps channel would not be able to accommodate all six CBR video streams. Instead, only 4.9 video streams could be fitted into the 20 Mbps channel. For the second set of video sequences, the constant bit-rates that give the most active segment of the video streams picture quality similar to the picture quality obtained using our system are 6.90 Mbps, 6.64 Mbps, 7.20 Mbps, 7.20 Mbps, and 6.50 Mbps for SEQUENCE 1, SEQUENCE 2, SEQUENCE 3, SEQUENCE 4, and SEQUENCE 5 respectively. The 18 Mbps channel could not accommodate all five CBR video streams. Instead, only 2.6 CBR video streams could be transmitted simultaneously down the 18 Mbps channel. Table 4.3 and Table 4.4 summarize the standard deviations of the PSNR values obtained using our joint bit-rate coding system as well as those from CBR coding. It should be noted that for the M A N IN THE IRON M A S K video sequences, the PSNR values of the black frames inserted in between scenes are not included in this analysis. The lower PSNR standard deviations from joint bit-rate coding show that our joint bit-rate coding system significantly reduces the fluctuation in picture quality of the resulting video streams. For the first set of test sequences, an average 15% reduction in picture quality fluctuation is achieved by our system. For the second set of test sequences, our system lowers the picture quality variations by 21%. 85 Chapter 4 Simulation Results and Discussions Coding Method Standard Deviation PSNR SEGMENT 1 Joint bit-rate using GOP complexities and picture targets 2.68 TM5 CBR @ 3.66 Mbps 3.28 SEGMENT 2 Joint bit-rate using GOP complexities and picture targets 2.61 TM5 CBR @ 3.86 Mbps 3.37 SEGMENT 3 Joint bit-rate using GOP complexities and picture targets 3.47 TM5 CBR @ 4.80 Mbps 4.08 SEGMENT 4 Joint bit-rate using GOP complexities and picture targets 3.65 TM5 CBR @ 4.70 Mbps 3.93 SEGMENT 5 Joint bit-rate using GOP complexities and picture targets 2.81 TM5 CBR @ 3.80 Mbps 3.57 SEGMENT 6 Joint bit-rate using GOP complexities and picture targets 4.49 TM5 CBR @ 3.70 Mbps 4.73 Table 4.3: PSNR standard deviations for MAN-IN THE IRON M A S K video sequences, encoded using our two-stage joint bit-rate coding system and encoded independently using the TM5 method. 86 Chapter 4 Simulation Results and Discussions Coding Method Standard Deviation PSNR SEQUENCE 1 Joint bit-rate using GOP complexities and picture targets 5.17 TM5 CBR @ 6.90 Mbps 6.92 SEQUENCE 2 Joint bit-rate using GOP complexities and picture targets 4.93 TM5 CBR @ 6!64 Mbps 6.16 SEQUENCE 3 Joint bit-rate using GOP complexities and picture targets . 5.57 TM5 CBR @ 7.20 Mbps 6.92 SEQUENCE 4 Joint bit-rate using GOP complexities and picture targets 5.18 TM5 CBR @ 7.20 Mbps 6.37 SEQUENCE 5 Joint bit-rate using GOP complexities and picture targets 4.66 TM5 CBR @ 6.50 Mbps 6.10 Table 4.4: PSNR standard deviations for the second set of video sequences encoded using our two-stage joint bit-rate coding system and encoded independently using the TM5 method. 87 Chapter 4 Simulation Results and Discussions 4.3 Timing Analysis Our joint bit-rate control system reduces the coding delay experienced before the video streams are multiplexed and transmitted. Delay reduction is achieved by our system because motion compensation prediction was performed ahead of time. Therefore, instead of encoding a video stream completely, our system only needs to transcode the video streams to ones that have the bit-rates specified by the joint bit-rate controller. To illustrate the performance of our system in reducing coding delay, we compare the C P U time used by our transcoder in transcoding a video sequence with the C P U time used by a TM5 encoder in encoding the same video sequence. The TM5 encoder employs an exhaustive integer vector block matching algorithm for motion compensation prediction. The search window for the motion vectors of P- pictures is set to be (11,11). The two sets of search windows for both the forward and the backward motion vectors of B-pictures are {(7,7), (3,3)} and {(3,3), (7,7)}. Since we would like to emphasize the benefits in performing motion compensation predictions ahead of time, only the time used by our transcoder in performing variable length decoding, re-quantization, and variable length encoding on all macroblocks and the time used by the TM5 encoder in performing motion compensation prediction, discrete cosine transform, and variable length encoding on all macroblocks are recorded. Some I/O operations are involved in variable length coding. However, since only a few bytes are read or written in each I/O operation, the C P U time used in performing such I/O operations is assumed negligible. Both sets of video sequences were analyzed using a Sun™ Ultra Sparc™ workstation with one Sparc™ floating point processor at 167 MHz, 128 Megabytes of R A M , and 88 Chapter 4 Simulation Results and Discussions running under Solaris™ 2.5. Table 4.5 and Table 4.6 show the C P U time recorded in transcoding and encoding the two sets of video sequences that are 10 seconds in length each. Coding SEGMENT 1 SEGMENT 2 SEGMENT 3 SEGMENT 4 SEGMENT 5 SEGMENT 6 Joint 210.660 236.470 333.910 282.780 247.340 318.580 CBR 1865.740 1921.410 2101.440 2010.230 1890.990 2059.210 Joint C B R 0.113 0.123 0.159 0.141 0.131 0.155 Table 4.5: C P U time used in transcoding and encoding the M A N IN T H E IRON M A S K video sequences. Coding SEQUENCE 1 (sec) SEQUENCE 2 (sec) SEQUENCE 3 (sec) SEQUENCE 4 (sec) SEQUENCE 5 (sec) Joint 384.350 368.020 372.050 356.420 396.710 CBR 2393.262 2120.865 2176.417 2284.886 1974.297 Joint C B R 0.161 0.174 0.171 0.156 0.201 Table 4.6: C P U time used in transcoding and encoding the second set of video sequences. From the results shown in Table 4.5 and Table 4.6, the transcoding performed by our joint bit-rate coding system provides a huge improvement in shortening the coding delay. For the first set of test sequences, the time required for transcoding the video streams, on average, is about 13.7% of the time used in encoding the video streams completely. For the second set of test sequences, our system speeds up the coding process by 82.7%. 89 Chapter 4 Simulation Results and Discussions 4.4 Summary The performance of our two-stage joint bit-rate coding system is presented in this chapter. It is shown that the bit allocation decisions performed by the joint bit-rate controller reflect the complexities of the video streams. More video streams can be supported in a channel if joint bit-rate coding is used instead of CBR coding. For our first set of test sequences, all 6 video streams encoded using our joint bit-rate coding system are transmitted in the 20 Mbps channel while only 4.9 video streams encoded using CBR coding are transmitted in the same channel. For the second set of test sequences, 5 instead 2.6 video streams are transmitted in the 18 Mbps when our system is used. The joint bit-rate coding system is also able to reduce the picture quality variation in the video streams. An average of 15% and 21% reduction in picture quality fluctuation are achieved for the first and the second set of test sequences respectively. Simulation results have also shown that by transcoding instead of real-time encoding the video streams, our system saves about 80% of the coding time. 90 Chapter 5 Summary and Future Work 5.1 Summary With the high bandwidth that is available in digital broadcasting, it is more efficient and cost-effective to multiplex several video sources together and transmit the multiplexed video stream via the fixed capacity medium. The two challenges in broadcasting multiple sources are using the available bandwidth efficiently and maintaining consistent picture quality in each of the video streams multiplexed. Currently, broadcasters either assign a fixed portion of the available channel bandwidth to each video stream or statistically multiplex the video streams for transmission. As discussed in Chapter 2, CBR coding suffers from significant fluctuations in picture quality. Since statistical multiplexing is subject to packet loss, the channel bandwidth is not efficiently used. The goal of this thesis is to develop a multiple-source video coding system that can reduce the variations in picture quality and make efficient use of the channel bandwidth. We achieve our goals by developing a two-stage joint bit-rate coding system for simultaneous coding of multiple sources. This system can be easily implemented for commercial use in digital video broadcast applications. The system uses a two-stage approach. During the first stage, the video sources are encoded with very high picture quality and the complexities of the video streams are recorded for later use. Knowing the complexities of the video streams, the system determines the necessary number of bits needed to encode each video stream.. A set of transcoders is implemented in the system to execute the bit allocation decisions. This two-stage encoding system is intended for , 91 Chapter 5 Summary video that was archived and not for video programs to be transmitted live. When comparing to present broadcast systems, for the same picture quality our system greatly increases the number of video streams transmitted in each channel. Simulation results have shown that our two-stage joint bit-rate coding system increases the number of video streams supported from 4.9 to 6 in a 20 Mbps channel and from 2.6 to 5 in an 18 Mbps channel. The results show a 22% and a 92%. improvement. Such improvements are very significant since a large number of the transponders can be freed up to carry real-time video programs or to provide other communication services. By switching from tape storage to video server technology, playback systems are eliminated since video streams can be directly accessed via the video server. Presently, an encoder is needed for each video stream to be transmitted. However, since the first-stage video encoding of our system is an off-line process, fewer complete encoders are required for our system. The simpler structure of a transcoder makes the manufacturing of the transcoder hardware much easier than the manufacturing of the encoder hardware. Both of these properties translate to a lower cost at the headend. In addition to the gain in bandwidth and the reduction in cost, simulation results have also shown that our system reduces picture quality fluctuation by 15% - 2 1 % and that it speeds up the coding process by 82 ~ 87%. 92 Chapter 5 Summary 5.2 Future Work Our two-stage joint bit-rate coding system is developed for applications that involve the broadcasting of pre-recorded video. We suggest the following modifications, which will increase bandwidth saving. In our implementation, video sources are encoded into MPEG-2 video streams during the first stage. The picture complexities and GOP complexities of video sources are also recorded. Compressed video streams along with their complexity files are stored in video servers for reducing the cost at the headend. However, the re-quantization of the pre-compressed streams reduces the quality of the transmitted video. They only way to avoid degradation of quality or improve bandwidth allocation is to eliminate the re-quantization process. In this case, we can consider the following two different approaches: 1. During the first stage, we only store the complexity information. The transcoders are replaced by "complete" encoders, and VTR's are used for playing the original materials. 2. During the first stage, we store the complexity information as well as motion estimation and motion compensation decisions. In this case, the transcoders have to be modified to accept this information so that no motion estimation is needed at the second stage. Both of the above implementations yield the same picture quality or bandwidth savings. However, feasibility studies are needed to determine which approach is more cost-effective. 93 Chapter 6 Bibliography [I] ISO/EEC 13818, "Information Technology - Generic coding of moving pictures and associated audio information." November 1994 [2] ISO/IEC 13818-2, "Information Technology - Generic coding of moving pictures and associated audio information - Part 2: Video." November 1994 [3] B. G. Haskell, A . Puri, and A. N . Netravali. Digital Video: An Introduction to MPEG-2. Chapman & Hall, New York. 1997 [4] ISO/EEC JTC1/SC29/WG11/93-400, "MPEG-2 Test Model 5 (TM5)." Test Model Editing Committee. April 1993 [5] Eric Viscito and Cesar Gonzales. " A Video Compression Algorithm with Adaptive bit Allocation and Quantization," Proceedings ofSPIE The International Society for Optical Engineering: Visual Communications and Image Processing'91. vol. 1605. part 1. 1991. p.58-71 [6] Atul Puri and R. Aravind. "Motion-Compensated Video Coding with Adaptive Perceptual Quantization," IEEE Trnasactions on circuits and Systems for Video Technoloby, vol. 1, no. 4, Dec 1991. p.351-361 [7] Fu-Heui Lin and Russell M . Mersereau. " A Quality Measure-Based Rate Control Strategy for M P E G Video Encoders," Proceedings of IEEE Internation Symposium on Circuits and Systems, vol.2. 1996. p.782-783 [8] Wei Ding and Bede Liu. "Rate Control of M P E G Video Coing and Recording by Rate-quantization Modeling," IEEE Trans, on Circuits and Systems for Video Technology, vol. 6, no. 1, Feb 1996. p.12-20 . [9] King-Wai Chow and Bede Liu. "Complexity Based Rate Control for M P E G Encoder," IEEE International Conference on Image Processing, vol. 1. 1994. p.263-267 [10] Gunnar Karlsson. "Asynchronous Transfer of Video," IEEE Communications Magazine, v 34 n 8. August, 1996. p . l 18-126 [II] S. Iai and N . Kitawaki, "Effects of Cell Loss on Picture Quality in A T M Networks," Elect, and Commun. in Japan, part 1, vol. 75, no. 10. p.30-41 [12] Pramod Pancha and Magda E l Zarki. "Bandwidth-Allocation Schemes for Variable-Bit-Rate M P E G Sources in A T M Networks," IEEE Trans, on Circuits, and Systems for Video Technology, vol. 3, no. 3, June 1993. p.190-198 [13] W. Verbiest, L . Pinno, B. Veoten. "Statistical Multiplexing of Variable bit Rate Video Sources in Asynchronous Transfer Mode Networks," IEEE GLOBECOM'88, vol. 1. 1988. p.208-213 94 Chapter 6 Bibliography [14] D. Reininger and D. Raychaudhuri. "Statistical Multiplexing of V B R M P E G Compressed Video on A T M Netowkrs," IEEE INFOCOM'93, vol. 3. 1993. p.919-926 [15] R. M . Rodriguez-Dagnino, M . R. K. Khansari, and A . Leon-Garcia. "Prediction of Bit Rate,Sequences of Encoded Video Signals," IEEE Journal on Selected Areas in Communications, vol. 9, no. 3, April 1991. p.305-314 [16] Ajanta Guha and Daniel J. Reininger. "Multichannel Joint Rate Control of V B R M P E G Encoded Video for DBS Applications," IEEE Trans, on Consumer Electronics, vol. 4, no. 3. 1994. p.616-623 [17] Gertjan Keesman. "Multi-Program Video Compression Using Joint Bit-Rate Control," Philips Journal of Research, vol. 50. no. 1996. p.21-45 [18] Limin Wang and Andre Vincent. "Joint Rate Control for Multi-Program Video Coding," IEEE Trans, on Consumer Electronics, vol. 42, n. 3. August 1996. p.300-305 [19] Sanghoon Lee, Seong Hwan Jang, and Jeong Su Lee. "Dynamic Bandwidth Allocation for Multiple V B R M P E G Video Sources," IEEE International Conference on Image Processing, vol.1. 1994. p.268-272 [20] P. N . Tudor and O. H . Werner. "Real-Time Transcoding of MPEG-2 Video Bit Streams," IEE International Broadcasting Convention, n 447. 1997. p.298-301 [21] Ekaterina Barzykina. "MPEG Video Coding with Adaptive Motion Compensation and Bit Allocation Based on Perception Criteria," Master thesis. University of British Columbia, Canada. April 1998 [22] Y . Lee. "Rate-Computation Optimized Block Based Video Coding," PhD. thesis. University of British Columbia, Canada. January 1998 [23] ISO/TEC 17211, "Information Technology - Generic coding of moving pictures and associated audio information at 1.5 Mbps." [24] "Key Dates in Satellite Television." http://www.shea.com/key dates.html (July 14,1998) [25] "Q&A: DirectTV vs. the Competition." http://www.directv.com/sales/answer competition.html#prime (July 15, 1998) [26] Jerry Whitaker. DTV: The Revolution in Electronic Imaging. McGraw Hi l l , New York. 1998 [27] Markus Wasserschaff, Laurent Boch, Rainer Schafer, Heinz Fehlhammer, Alexander Schertz, and Manfred Kaul. "State of the Art on Video Transmission and Encoding," Distributed Video Production (DVP) Project A089. September 1996. http://viswiz.gmd.de/DVP/Public/deliv/deliv.211/delivstr.htm (Jun 29, 1998) 95 Chapter 6 Bibliography [28] Robert Brehl. "Legal satellite dishes gaining upper hand on grey market," The Globe and Mail: Report on Business. July 16, 1998. p .B l&B4 [29] J. A. Flaherty. "Digital T V and HDTV." http://www.web-star.com/hdtvforum/DisitalTVandHDTV.htm (July 16, 1998) [30] ATSC Doc A/54, "Guide to the Use of the ATSC Digital Television Standard," Advanced Television Systems Committee. Oct 1995. http://www.atsc.org (Feb 20,1998) [31] Satoshi Kondo and Hideki Fukuda. " A Real-Time Variable Bit rate MPEG2 Video Coding Method for Digital Storage Media," IEEE Trans, on consumer Electronics, vol. 43, no. 3. 1997. p.537-543 [32] Pramod Pancha and Magda El Zarki. " M P E G Coding for Variable Bit Rate Video Transmission," IEEE Communications Magazine, vol. 32, n. 5. May 1994. p.54-66 [33] Jiro Katto and Mutsumi Ohta. "Mathematical Analysis of M P E G compression Capability and Its Application to Rate Control," IEEE Proceedings on Image Processing, vol. 2. 1995. p.555-558 [34] P. N . Tudor. "MPEG-2 Video Compression Tutorial," IEE Collloquium on MPEG-2 - What it is and What it isn't. 1995. p.2/1-2/8 [35] Robert J. Safranek, Charles R. Kalmanek Jr., and Rahul Garg. "Methods for Matching Compressed Video to A T M Networks," IEEE Proceedings on Image Processing, vol.1. 1995. p. 13-16 [36] Manuela Pereira and Andrew Lippman. "Re-Codable Video," IEEE Proceedings on Image Processing, vol. 2. 1994. p.952-956 [37] Wei Ding. "Joint Encoder and Channel Rate Control of V B R Video over A T M Networks," IEEE Trans, on circuits and Systems for Video Technology, vol. 7, no. 2. April 1997. p.266-278 [38] "Broadcast video servers are part of M T V Europe's digital future," HP Telecommunications News, issue 11. Feb 1997. http://www. tmo. hp. com/tmo/tcnews/9702/11tnfe2. htm (May 14, 1998) [39] "Testing the MPEG-2 transport stream in D V B networks," HP Telecommunications News, issue 11. Feb 1997. http://www.tmo.hp.com/tmo/tcnews/9702/11tnfe5.htm (May 14, 1998) [40] "NVOD makes the case for delivering enhanced pay-per-view services," HP Telecommunications News, issue 11. Feb 1997'. http://www.tmo.hp.com/tmo/tcnews/9702/lltnfe3.htm (May 14, 1998) [41] Craig Birkmaier. " A Visual Compositing Syntax for Ancillary Data Broadcasting." June 1997. http://pcube.com/dtv.html (June 21, 1998) 96 Chapter 6 Bibliography [42] Cragi Birkmaier. "Limited Vision: The Techno-Political War to Control the Future of Digital Mass Media" http://pcube. com/dtv.html (July 16, 1998) [43] "The High-Tech Behind Broadcasting DIRECTV," DIRECTV Hardware - A More Technical Explanation, http://www.directv.com/hardware/tech.html (May 15, 1998) [44] A l Kovalick. "Television in Transition: Insights and Solutions for the D T V Broadcaster," A White Paper on Digital TV. http://www, tmo. hp. com/tmo/literature/English/VID WhitePaper 005a. html (June 8, 1998) [45] Saif Zahir, Panos Nasiopoulos, and Victor C. M . Leung, " A High Quality Real Time V B R MPEG-2 System for Broadcasting Applications," Digest of Technical Papers from IEEE International Conference on Consumer Electronics, June 1997. p.182-183 97