UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Transport issues of real-time MPEG-2 video streams over IP network Cai, Liang Norton 1999

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1999-0255.pdf [ 6.57MB ]
Metadata
JSON: 831-1.0065328.json
JSON-LD: 831-1.0065328-ld.json
RDF/XML (Pretty): 831-1.0065328-rdf.xml
RDF/JSON: 831-1.0065328-rdf.json
Turtle: 831-1.0065328-turtle.txt
N-Triples: 831-1.0065328-rdf-ntriples.txt
Original Record: 831-1.0065328-source.json
Full Text
831-1.0065328-fulltext.txt
Citation
831-1.0065328.ris

Full Text

Transport Issues of Real-time MPEG-2 Video Streams over IP Network by L I A N G NORTON CAI B.Eng. (E.E), The HuaZhong University of Science and Technology, P.R.China 1984 A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF THE REQUIREMENTS FOR THE DEGREE OF M A S T E R OF APPLIED SCIENCE in THE F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF E L E C T R I C A L A N D COMPUTER ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH C O L U M B I A April, 1999 © Liang Norton Cai, 1999 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date DE-6 (2/88) Abstract Networked Multimedia is the point at which computer, telecommunication and entertainment industries converge. With the deployment of gigabit and terabit routers in fast data networks and the increasing performance of personal computers, there is growing interest in providing network users with networked multimedia services. Digital video and audio services will more often be delivered over intranets, the Internet and other IP-related technologies. In contrast to research in the transport of MPEG-2 video over A T M , our project investigates the effects of network impairments at the IP layer. We have developed a real routed IP testbed network with a Continuous Media File Server, IP network performance measurement program, MPEG-2 video implementation and quality measurement program. One of the objectives of the project is to investigate the effects of network impairments, specifically IP packet loss and jitter under moderately to heavily loaded network conditions, on MPEG-2 video reconstruction. We studied MPEG-2 video stream transport errors, error propagation, error sources and the severity of the error effects on video quality. In consideration of the transport requirements of real-time MPEG-2 video, we examined network impairments and related them to our classification of MPEG-2 errors. Based on the IP network performance metrics derived from the stressed testbed network and the video quality obtained from the transported MPEG-2 streams, we examined quantitatively the empirical relationships between IP network performance and ii MPEG-2 video quality. We observed the behavior that digital video presented when transported in a lossy and bursty IP network. By quantifying the effects of IP packet loss on MPEG-2 video quality, our study resulted in four outcomes: • Established the experimental system and collected quantitative data sets to investigate the empirical relationships between MPEG-2 video quality and IP packet loss. • When MPEG-2 video is transported in IP packets, slice loss rather than picture loss is the dominant factor contributing to video artifacts within the range of fair quality video. • Spatial and temporal masking effects tend to reduce the visibility of artifacts introduced by packet losses. Low motion / low spatial activity MPEG--2 video clips are more susceptible to IP packet loss impairment than are high motion / high spatial activity clips. • PSNR, as a numerical comparison method, was found to linearly correlate with packet loss in the high, medium, and low activity test streams. The outcomes of the project provide useful information for variety of potential applications. For example, the quantitative and statistical results and their implications are very important in designing an effective and efficient Forward Error Correction. i i i Keywords: networked multimedia, IP, Internet, QoS, DiffServ, router, network performance, packet loss/delay/jitter, MPEG-2 codec, MC/DPCM/DCT, digital impairment, subjective & objective video quality, forward error correction. iv Table of Contents Abstract ii Table of Contents v List of Tables x List of Figures xi Acknowledgement • xiii 1 Introduction 1 1.1 Identifying the problem domain 2 1.2 Research contributions 4 1.3 Research objectives ; 6 1.4 Outlines of the thesis 8 2 Architecture of the routed IP testbed network 11 2.1 Testbed network hardware and operating system 14 2.2 The CMFS video server and client 14 2.3 Testbed network special program tools 15 2.4 The future differentiated services (DiffServ) Internet 16 3 IP network performance metrics and the fractal traffic model 18 3.1 Definition and methodology for One-way Packet Loss metric 18 3.2 Fractal model for Internet traffic 20 3.3 Project developments 21 4 MPEG-2 video technology and test stream implementation 23 4.1 Digital video coding standards 23 4.2 MPEG-2 coding model 26 v 4.2.1 Uncompressed data rates of digitized T V and video signals 26 4.2.2 M P E G digital video compression technologies 28 4.2.3 M P E G coding model 30 4.3 MPEG-2 video hierarchy and bit stream structure 33 4.3.1 M P E G video ES bit stream hierarchy 33 4.3.2 M P E G TS bit stream hierarchy 36 4.4 MPEG-2 V B V buffering scheme 38 4.5 Recent advancement in MPEG-2 video source modeling 40 4.6 MPEG-2 video application examples 42 4.7 MPEG-2 video CODEC and bit stream analysis tools 44 4.8 Project development 46 5 Digital video impairments measurement & quality assessment 49 5.1 Digital video system versus analog video system 49 5.2 Taxonomy of digital video impairments .51 5.3 Video quality measurement methodologies and metrics.... 54 5.3.1 Subjective video quality assessment standard 56 5.3.2 Objective video quality measurement standard 56 5.4 Project development 60 5.4.1 Application architecture and environment for video quality evaluation 61 5.4.2 Test model and methodology 64 5.4.3 Objective measurement '. 65 5.4.4 Subjective assessment 66 5.4.5 Temporal realignment 68 6 Transport issues on delivering MPEG-2 video over IP network 70 6.1 Protocol stacks for real-time MPEG-2 video transport 70 vi 6.2 Taxonomy of transport errors 76 6.2.1 The impacts of error and impairment on MPEG-2 encoded stream 79 6.2.1.1 Impacts of error on the Block layer 79 6.2.1.2 Impacts of error on the Macroblock layer 81 6.2.1.3 Impacts of error on the Slice layer 83 6.2.1.4 Impacts of error on the Picture layer 86 6.2.1.5 Impacts of error on the Group Of Picture layer 88 6.2.1.6 Impacts of error on the Sequence layer 89 6.2.1.7 Impacts of error on the MPEG-2 Scalability extension headers 90 6.2.1.8 Impacts of packet loss on the MPEG-2 encoded stream 91 6.2.1.9 Error tolerance of frames and error sensitivity of streams 92 6.2.1.10 Ambiguity on MPEG-2 stream re-synchronizing point 94 6.2.1.11 Impacts of end-to-end packet delay on interactive applications 94 6.2.1.12 Impact of excessive packet delay on MPEG-2 quality 95 6.2.1.13 Impacts of network delay variance (jitter) on MPEG-2 decoding 95 6.2.1.14 Impacts of jitter on MPEG-2 system clock recovery 96 6.2.1.15 Impacts of jitter on video and audio presentation synchronization 97 6.2.1.16 Impacts of allocated bandwidth reduction on final video quality 98 6.2.2 Error and impairment sources- Network 99 6.2.2.1 TP Packet loss 99 6.2.2.2 IP packet delay 100 6.2.2.3 Packet delay variance (jitter) 102 6.2.2.4 Bandwidth reduction 103 6.2.3 Error and impairment sources- Communication Protocol 104 6.3 Error detection, prevention and concealment 106 vii 6.3.1 Error detection 109 6.3.1.1 External error detection 109 6.3.1.2 Internal error detection 109 6.3.2 Error prevention 109 6.3.2.1 MPEG-2 scalability 109 6.3.2.2 Priority nature of the MPEG-2 video stream 111 6.3.2.3 Forward Error Correction : I l l 6.3.2.4 Encoded slices randomizing 112 6.3.2.5 Multiple Description Coding 113 6.3.2.6 Robust entropy coding 113 6.3.2.7 Retransmission 114 6.3.3 Error concealment 114 6.3.3.1 Error concealment by exploiting spatial redundancy 114 , 6.3.3.2 Error concealment by exploiting temporal redundancy 115 6.3.3.3 Error concealment by exploiting space redundancy 116 6.3.3.4 Error concealment by exploiting frequency redundancy 116 6.3.4 Adaptive network control 116 7 Experimental results and numerical analysis 118 7.1 Objectives of the experiment 118 7.2 Description of the experiment 119 7.3 Numerical Analysis & experimental results 124 7.3.1 Quantitative relationship: MPEG-2 video quality vs. IP packet loss 124 7.3.2 Dominant contributor to video quality degradation 133 7.3.3 Packet loss tolerance 136 7.3.4 Correlation of PSNR and Packet Loss 137 viii 7.3.5 Characteristics of MPEG-2 Data Loss vs. Packet Loss 139 8 Conclusions 141 Bibliography 145 ix List of Tables Table 4-1 MPEG-2 Layer Function 33 Table 4-2 Video Formats of Digital Satellite Broadcast 44 Table 4-3 Test Stream Encoding Parameters 47 Table 4-4 Statistics of the Encoded Test Steams 48 Table 5-1 Mean Opinion Score (MOS) Subjective Assessment.... 56 Table 5-2 ANSI Objective Measurement Parameters 59 Table 5-3 Mean Opinion Score (MOS) Using Impairment Model 67 Table 6-1 M P E G Encoded Bitstream Error Sensitivity List 93 Table 7-1 Experimental Data Items 124 Table 7-2 Empirical Relationship Study Based on Packet Loss 130 Table 7-3 Empirical Relationship Study Based on MOS 131 Table 7-4 PSNR and Packet Loss Ranges for the Quality Boundaries 132 Table 7-5 Experimental Results for Correlation Coefficient, Rate & Intercept 138 x List of Figures Figure 1-1 IP Network Layer Impairments 3 Figure 2-1 Routed IP Testbed Network Topology 13 Figure 4^1 MPEG-2 Coding Model 31 Figure 4-2 MPEG-2 Example Encoder Diagram 32 Figure 4-3 The Layered MPEG-1 Video Structure „. 34 Figure 4-4 MPEG-2 Video Hierarchy and Extension Function 35 Figure 4-5 Transport Packet Format : 36 Figure 4-6 Link Header Format for The transport Packet •; 36 Figure 4-7 Fixed-length Component of The Adaptation Header Format 37 Figure 4-8 PES Packet Format 37 Figure 4-9 V B V Buffer Dynamics 39 Figure 5-1 MPEG-2 Layer Test Architecture , 63 Figure 5-2 Test Model to Evaluate Network Impairments on Video Quality 64 Figure 5-3 Test Model for Network Impairments on Video Quality Using PSNR 66 Figure 5-4 Test Model for Network Impairments on Video Quality Using MOS 67 Figure 5-5 Temporal Misalignment Due to Loss of Frame 68 Figure 5-6 Temporal Re-alignment with Error Resolving 69 Figure 6-1 Protocol Stack to Transport MPEG-2 Video 71 Figure 6-2 Protocol Stack to Transport MPEG-2 Video over Testbed Network 74 Figure 6-3 Example of Error propagation and Severity 77 xi Figure 6-4 Code structure diagram: Block Layer . 79 Figure 6-5 Code structure diagram: Macroblock Layer 82 Figure 6-6 Code structure: Slice Layer 84 Figure 6-7 Code structure: Picture Layer 86 Figure 6-8 Code structure: Group Of Picture Layer 88 Figure 6-9 Code structure: Sequence Layer 89 Figure 6-10 A Diagram of Source Coding & Channel Access 106 Figure 6-11 Solutions to Errors 108 Figure 7-1 IP Network Layer Impairments 119 Figure 7-2 Routed IP Testbed Network Topology (Same as Figure 2-1) 121 Figure 7-3 Empirical Relationship between Video Quality and IP Packet Loss Rate.... 126 Figure 7-4 Video Quality Interpretation & Quality Boundary Definition 128 Figure 7-5 Effects of Slice Loss & Picture Loss on Video Quality 134 Figure 7-6 Joint Scatter-plot of MOS vs. Packet Loss For Video-H & Video-L 137 Figure 7-7 Empirical Relationships: Slice Loss & Picture Loss vs. Packet Loss 139 xii Acknowledgement First and foremost, I would like to take this opportunity to express my sincerest gratitude to my supervisor Professor Mabo R. Ito. Dr. Ito accepted me to the graduate program, taught me courses, brought me into this research project, offered me financial assistance, provided me with valuable advises, guidance, insights, discussions and encouragement throughout the project. Despite maintaining an extremely busy schedule, Dr. Ito patiently reviewed and discussed with me on all of my reports, presentations and the thesis. I will be forever thankful for having the excellent mentor from Dr.Ito in this important stage of my professional life. I am grateful to have the opportunity to learn from Dr. Gerald W. Neufeld many research insights and the current industrial activities in high speed computer networks and protocols. Special thanks to Mr. Mark McCutcheon for his generous assistance and creative discussions. He helps created an open academic atmosphere and making the Protocol Lab a highly professional environment that we all very much enjoy. His efforts greatly enhanced our capabilities to study, to learn and to search. I would also like to thank Mr. Daniel Chiu for his hard work and support throughout this project. Lastly and most importantly, I would like to dedicate this work to my wife Ying for her love and constant encouragement, and to my son Bryan for his second birthday. It is impossible for me to carry on my graduate study and the project research without her selfless sacrifices and unconditional support. This work is made possible by grants from the Canadian Institute for Telecommunication Research (CITR) and Hewlett-Packard Canada. xiii 1 Introduction Multimedia Support in a Differentiated Services Internet (MSDSI) is a collaborative research project led by Professor M . R. Ito of the Department of Electrical & Computer Engineering and Professor G. Neufeld of the Department of Computer Science. The project is jointly funded by CITR (the Canadian Institute for Telecommunications Research) and Hewlett-Packard Canada. It is also part of the Phase Ul CITR major project "Resource Management in High-Speed Networks". There have been many recent significant advancements in information technology, be it the DiffServ (Differential Services) Internet service model to provide CoS (Class of Service) or QoS (Quality of Service) to the end user; the deployment of gigabit high band-width IP routers in the Internet; the installation of OC-48 (2.5 Gbps) links in the US NGI (Next Generation Internet) and OC-192 (10 Gbps) links in Canada's CA*net3; the development of 64 bit systems with IA-64 architecture for future workstations and PCs; the tremendous efforts by telecom and cable communities to break down the "last mile to home" barrier, etc. Taking advantage of technology improvements, the information infrastructure is reaching a new performance plateau where networked multimedia service can become a reality. This scenario will generate tremendous opportunities and cultural impacts on our society. Consider the future of networked digital broadcast as an example. From the perspective of network and cable providers, it creates opportunities in providing Video on Demand (VoD) services using Unicast, Pay Per View services using Multicast and more broadcast channels using digital compression algorithm. From the perspective of content providers, it creates opportunities to allow their products to be delivered to a 1 gigantic global consumer market across geographical boundaries. From the perspective of end users, it creates opportunities to gain access to more sources and to have more choices. 1.1 Identifying the problem domain With rapid growth of the Internet, intranet and other IP-related technologies, we believe that a majority of video applications in the near future will be implemented on personal computers and inter-connected devices using traditional "best-effort" IP-based networking technologies. "Best-effort" QoS implies that the link bandwidth, packet loss ratio, end-to-end packet delay and jitter can vary widely depending on the network traffic conditions. An IP network provides heterogeneous connectivity to hosts and LANs on different platforms, and performs best in non real-time data (such as Web Pages) transport services. It is Stateless, which reduces the burden on network routers to store and manage state information; Uses no per-flow signaling, which makes efficient use of available bandwidth; Gives no QoS guarantee, which degrades or limits QoS sensitive applications. One can characterize the IP network as noisy, lossy and bursty: • Noisy: The network is exposed to interference, link break downs and node crashes. • Lossy: In the circumstances of net traffic congestion, packets are discarded and the end-to-end delay builds up. • Bursty: Recent studies have proven that the Long Range Dependency exists in many types of network traffics [22, 23]. IP networks are heterogeneous and carry a rapidly increasing number of services in addition to traditional data. 2 On the other hand, MPEG-2 video [45] bit-streams can be characterized as thin, lean, and alive. • Thin: Intra picture spatial redundancy is removed. • Lean: Inter picture temporal redundancy is removed. • Alive: MPEG-2 encoded data is a non-stationary real-time bit-stream demanding timely and continuously delivery. Transport real-time MPEG-2 video requires Quality of Service (QoS) assurance, i.e. link bandwidth, packet loss ratio, end to end delay and delay variation, from the underlying network. Providing QoS guarantees for real-time traffic in the Internet is an active area of research. The Long Range Dependence observed in V B R digital video and network traffic further complicates the search for statistical and stochastic solutions to traffic modeling, management, and network queuing / performance evaluation [48]. IP Packet MPEG-2 © © <D S l M \ C I llosl Delay / Jitter IP-Layer O O O O Corrupted MPEG-2 O 0 o o _QQ IP Router o o IP-Layer COO CO Traffic Generator Packet Loss Time-stamped MPEG-2 Frames Figure 1-1 IP Network Layer Impairments 3 Although packet loss, delay and jitter of an IP network are not critical to most TCP related applications, they present major hurdles to real-time D A V (Digital Audio and Video) delivery and are termed "impairments" as illustrated in Figure 1-1. It is obvious that transport of real-time MPEG-2 encoded, thin, lean and alive digital video stream across the noisy, lossy and bursty "best-effort" IP network exhibits a unique problem domain. When studying D A V delivery over an IP network, an obvious question is how the IP network performance relates to D A V , or how impairments affect the reconstructed MPEG-2 video quality. There are many issues that need to be addressed and studied. For instance: how do data encapsulation and packetizing affect the video quality? How do IP network impairments ( i.e. packet loss / delay / jitter, ) contribute to video quality degradation? How does an error propagate from one coding layer to another, from one picture to another picture and from one protocol layer to another protocol layer? How does MPEG-2 digital video behave when transported through a lossy and bursty environment? It is equally important to study what changes to IP networks are required to allow high-quality MPEG-2 delivery and what modifications to end-systems will reduce the effects of network impairments. 1.2 Research contributions In order to gain a better understanding of the characteristics of MPEG-2 video traffic in a best-effort IP network, an experimental video-on-demand (VOD) routed IP testbed network has been developed. The system consists of UBC' s Continuous-Media File Server (CMFS) [6] as a video server, a PC-based router, and several client machines, which are connected via switched Fast Ethernet. Different load conditions for the IP-4 based network are generated using highly bursty background traffic sources. IP packet measurements include sender and receiver time measurements, inter-arrival times, and packet loss under moderately to heavily loaded network conditions. At the application level, video quality is measured objectively using the Peak Signal Noise Ratio (PSNR) parameter and also subjectively using the Mean Opinion Score (MOS). We have examined the correlation between IP packet loss and video quality. In our study, we show that MPEG-2 video quality may be expressed as a function of packet loss, allowing for statistical analysis. Previous research activities have been focused on transporting MPEG-2 video over A T M networks. An IP network differs from an A T M network in many ways: IP network is stateless without per-flow signaling or QoS guarantees in the "best effort" service model and limited CoS/QoS guarantee in the DiffServ (Differentiated services) model. Variable IP packet size (20 byte or greater header, maximum 65,535 bytes) can be considerably larger than fixed size A T M cells (5 byte header total of 53 bytes). Our project has intensively studied and gained experience with the following three unknown areas in order to achieve better solutions to transport MPEG-2 encoded real-time video over impaired IP networks. • What is the "look and feel" of the MPEG-2 digital video over impaired IP networks? What parameters and tools should be used for evaluation? A l l of us are familiar with the linear relationship between analog video quality and the channel BER. In contrast, the relationship between MPEG-2 digital video quality and IP packet loss as well as the slice and picture loss are not well studied nor understood today. 5 • Where do transport errors occur? How does error propagate from one coding layer to another coding layer, from one picture to another picture, and from one protocol layer to another protocol layer? What is its quantitative severity? • What are the quantitative and statistical relationships between MPEG-2 video quality and IP network impairments? This quantitative and statistical information is crucial in designing effective and efficient techniques, such as FEC, to combat video error resulting from network impairment. Besides being ineffective and inefficient, a blindly implemented scheme might generate so much overheads that the situation is worsened. Results and observations gained from this work have proven interesting and useful. We are able to describe the behavior shown by MPEG-2 digital video when it is transported across a lossy IP network. We now have a better understanding of the QoS requirements for delivering the MPEG-2 video applications in an IP network. Our study also provides insights for the development of adaptive algorithms and applications that are more robust and usable as an adjunct to more robust network protocols to combat network impairments. 1.3 Research objectives The objectives set for the MSDSI research project are: • To understand in detail how competing traffic on IP network affects delivery of high-quality MPEG-2 video. 6 • To develop test metrics and test procedures for diagnosing problems with MPEG-2 delivery over IP networks. • To determine effective methods for delivering MPEG-2 video in impaired IP networks. • To develop improved network delivery protocols based on the outcome of the study. • To develop adaptive applications as an adjunct to more robust network protocols. • To extend our understanding of IP over non-broadcast multi-access (NBMA) networks such as A T M and heterogeneous IP network in general. Under the framework of the above research objectives, the goals of this thesis are focused on: • To investigate and to quantify effects of IP packet loss on reconstructed MPEG-2 video in moderate- to heavily- loaded IP networks. • To study MPEG-2 transport errors, error-affected video quality, error propagation and error sources. To relate particular MPEG-2 errors to particular IP network layer impairments. • To develop test metrics and test procedures for diagnosing problems with MPEG-2 video delivery over IP networks. • To assist in developing adaptive applications as an adjunct to more robust network protocols which are able to gracefully handle data loss and degradation of throughput while attempting to satisfy user requirements. 1.4 Outlines of the thesis The, structure of the rest of this thesis is as follows: Chapter 2 introduces the topology and equipment of our routed IP testbed network where research experiments are conducted. The CMFS (Continuous Media File Server), IP network performance measurement programs, MPEG-2 CODEC and video quality assessment programs, as well as the router configuration to study the Per-Hob Behavior (PHB) for the DiffServ model, are also briefly discussed. Chapter 3 discusses IP network performance and impairment metrics proposed by the IETF IPPM Working Group. Packet loss impairment is the focus of our experiments and therefore deserves more detailed study. In Section 3.3, Project Development, the measurement program and experimental procedures for network performance are introduced. Chapter 4 navigates through the MPEG-2 technologies, the MPEG-2 coding model, the bit stream hierarchy, V B V buffering scheme, source modeling and CODEC properties, to provide the technology background necessary for further discussions. As outlined in section 4.8 - Project Development, the focus of this chapter is to explain the processes of test stream selection, evaluation and implementation, as well as the underlying theoretical reasoning. 8 Chapter 5 addresses the issues of methodology and architecture for objective measurement and subjective assessment of video quality. The fundamental differences between digital and analog video systems as well as the distinct digital video impairments are also studied. The section 5.4, Project Development, introduces the detailed implementations. Section 6 studies transport issues on delivering real-time MPEG-2 video over IP networks. The general protocol architectures for MPEG-2 video transport and the protocol stack applied in the project are outlined. The transport errors, error propagation, error source, and the relationship of particular MPEG-2 errors to the IP network layer impairments, are studied. Lastly, a comprehensive survey on error detection, prevention and concealment techniques are provided. Chapter 7 describes the experimental environment and procedures. Based on numerical analysis of the experimental data, the interesting results and their implications are obtained and discussed. Chapter 8 provides conclusions for the research results and a summary for the project. The thesis is organized with the following underlying plan. This thesis contains 8 chapters. From Chapter 2 to 6, each chapter addresses a major technology building block of the research: 1 . IP testbed network development. 2. IP network performance metrics. 9 3. MPEG-2 test stream selection and implementation. 4. Digital video impairment and quality assessment. 5. Transport issues on delivering MPEG-2 video over IP networks. In each chapter, the current technologies and methodologies are studied and discussed. Based on the prior studies, project research and implementation are described in the Project Development section. In Chapter 7, the studies and developments in previous chapters are combined and the technology building blocks are assembled together to carry on the system level experiments. 10 2 Architecture of the routed IP testbed network To study the issues outlined in the Project Objectives, a routed IP testbed network, as illustrated in Figure 2-1, was developed to transport real-time MPEG-2 video stream from video server to client via an IP router which is moderately to heavily loaded by competing network traffic. The reasons for preferring a real prototype testbed network to the use of network simulation tools are: 1. IP networks and Internet modeling are still active area of research, therefore, how well a specific network simulation model can represent real world IP networks is not yet clear [22,23,24]. 2. We need to evaluate in a real network IP network performance metrics and MPEG-2 video implementation tools. 3. We need to evaluate visual artifacts and degradation caused by IP network impairments. 4. The U B C Continuous Media File Server (CMFS) developed in phase II CITR project is readily available to be deployed as a high quality server for MPEG-2 video stream j retrieval. One might suggest that experiments can be done by transporting encapsulated MPEG-2 video streams through the real Internet, encompassing multiple domains [4,8]. The real Internet is a very dynamic and fluid IP network. Besides the bursty (fractal) traffic 11 characteristics of traditional data, the types of services it carries are increasing rapidly. In our study, in order to characterize the effects of IP network impairment on MPEG-2 video, it is necessary to develop an IP network where the competing traffic load and other network parameters are controllable. Considering the "tail drop" packet discard strategy implemented in majority of the current IP backbone routers, it is our view that after augmenting number of routing nodes in the testbed network, results obtained from the testbed will scale to large IP networks. As depicted in Figure 2-1, all the computers are configured to form three IP subnets with three separated V L A N s . Subnet 01 contains the CMFS video server, host 01 and.02 as competing IP traffic generator hosts, the CS Department L A N connection, the subnet to IP router connection, using five Fast Ethernet ports. Subnet 02 contains a video receiver client host, another traffic generator host 03, and the subnet to router connection, using three Fast Ethernet ports. The Subnet 03 consists of a traffic sink host 04 and the subnet to router connection using two Fast Ethernet ports. The IP router inter-connects three IP subnets via 100 Mbps Fast Ethernet cards and forwards the IP packets from sources to destinations according to its routing table. The paths of MPEG-2 video traffic and the competing background traffic are illustrated using arrow dash lines in Figure 2-1. MPEG-2 video is retrieved from CMFS, packetized and encapsulated in IP packets, passes the Subnet 01 switch to the IP router, and is then forwarded to Subnet 02 video client via the Subnet 02 switch. Similarly, competing background traffic A or B leaves host 01 or 03 and routed to the traffic sink host 04. 12 A stand-alone host 05 running window95 is used to construct the MOS (Mean Opinion Score) evaluation center. MOS Evaluation Center (Host 05) Video Client Host 03 CMFS video server <t Switch MPEG-2 Video v Traffic f \ ' \ \ i i i i i i t 9 / Switch Subnet 02 Ok . Q Host 01 Host 02 Subnet 01 IP Router Background Traffic B Background V , Traffic A : y ^ - — i Switch Host 04 Subnet 03 Figure 2-1 Routed IP Testbed Network Topology 13 2.1 Testbed network hardware and operating system The CMFS video server is a HP Vectra 200 Mhz Pentium Pro PC with a 2GB SCSI-H Fast/Ultra hard drive. The video client is a HP Vectra 200 Mhz Pentium M M X PC. Hosts are all HP Vectra 200 Mhz Pentium PCs. A l l computers run the FreeBSD 3.0 Unix operating system. The router kernel has been modified to run K. Cho's "Alternate Queuing" strategies [5]. The 100 Mbps Fast Ethernet switch is a 3Com SuperStack 3000 10/100 Mbps unit. The switch supports V L A N and allows one switch to serve multiple subnets for routing tests. The switch runs a proprietary operating system with SNMP capability. The MOS evaluation center is constructed using a Pentium PC with a SONY Trinitron Multiscan 17 SE monitor. One of the peripheral slots runs the REALmagic Hollywood2 MPEG-2 real-time hardware decoder. This machine runs on Windows95 for ease of hardware card configuration. 2.2 The CMFS video server and client The U B C continuous Medium File server CMFS [6] video server and client were designed and implemented in the previous phase II CTTR project. MPEG-2 audio and video data streams are stored in continuous media files. Data access patterns as well as services provided to the client by the CMFS server differs considerably from the conventional distributed file server such as NFS. A continuous media client typically transfers large volumes of sequential data. As well, the resource requirements of the network and server itself differ considerably. In order to guarantee continuity, the 14 allocation of network resources such as bandwidth must be guaranteed. Similarly, the availability of resources at the server, such as processor cycles, R A M , and disk bandwidth, must be guaranteed to properly service the client. [6] In CMFS, MPEG-2 video stream retrieval is a multi-step interaction: The video stream is first stored as a presentation object. Each object has a unique object identifier. When requiring a video service, the client communicates with a database server to retrieve the identification, resource requirement and associated meta-data using TCP. The client then requests CMFS to open a real-time stream connection using MT/UDP and ready the object for display. The client also specifies the direction of data flow. 2.3 Testbed network special program tools Various program tools are developed, compiled and resided in the testbed. Programs are written in C or Perl script. Network performance metric tools: Data logging video client; Implementation of the IETF RTP payload format for MPEG-1 / MPEG-2 video [76]; IP packet loss and jitter measurement program. MPEG-2 video implementation tools: Modified TM5 MPEG-2 software codec [46,47]; MPEG-2 TS and ES bit stream analysis tools; MPEG-2 video error detection and concealment programs; Objective video quality measurement program. 15 2.4 The future differentiated services (DiffServ) Internet Using RSVP [19,20,21] as a mechanism, in addition to the traditional elastic services, the Integrated Services (IntServ) model provides real-time services, such as MPEG-2 video delivery, to the users on the Internet [14]. IntServ model uses per flow resource reservation and management as the QoS provisioning methodology. The implication is that every node in the Internet implements the same packet handling strategy, which includes packet scheduling, admission control, traffic classification, setup configuration, allocation policies, security, monitoring and verification. Various doubts have been raised about the IntServ framework over the issues of complexity, scalability and being an administrative model [15]. Differential Services (DiffServ) is a new service model recently proposed by IETF [11,12,13]. The.DiffServ aims at enabling commercial service provider to offer various levels of QoS to customer with a highly scalable (allowing organic growth) but relatively simple solution based on the infrastructure of the current Internet. The provisioning of QoS is achieved using two mechanisms. Firstly, the intra-domain CoS / QoS is secured by allocating adequate bandwidth to back up the levels of services. Edge routers are responsible for classifying and metering user traffics. As the result, the complexities of flow classifying, contracting, monitoring and policing are pushed to the ingress nodes and leave the inside nodes free of keeping track of per-connection states. Secondly, the Inter-domain CoS / QoS control is achieved through a bilateral agreement. The traffic metering at domain boundary enforces the established agreement. 16 To study the DiffServ, the router kernel needs to be modified to incorporate the PHB (per-hop behavior) [13] strategies proposed by IETF DiffServ Working Group. The IP router on our testbed network currently runs the FreeBSD 2.2.5, which is strictly "best effort" utilizing FIFO output queues on each network interface. When aggregated traffic in excess of an output link's capacity results in queue saturation, the simple tail-drop policy is invoked to discard incoming packets. To implement DiffServ on the testbed network, the Alternate Queuing developed by Cho [5] will be employed to provide more sophisticated queuing disciplines (e.g. class based queuing) which are required by PHB. The implementation of the DiffServ in the Internet also raises the issue on importance of adaptive applications that can gracefully handle the degradation in QoS service. 17 3 IP network performance metrics and the fractal traffic model A large heterogeneous IP network, such as the Internet is complex, dynamic and diverse. The difficulties in measuring the network performance metrics lie in the tremendous varying nature of the network and the correctness of the measurement methodology. For instance, a lot of variables exist in the large IP network, be it changes in host and link counts, increase in traffic and variance in traffic patterns, rapid increase in service types, possibility of multiple routing paths and multicast traffic, various IP packet lengths and fragmented packets, etc. On the other hand, attention needs to be paid to the measurement methodologies. For example, whenever possible, passive monitoring should be used because an intrusive monitoring approach might interfere with the network, which in turn affects the test result. As another example, to overcome the extra delay introduced by measurement overhead which competes for host resources, another host running a packet filter / sniffer program is recommended to carry the measurement tasks. Network modeling, analysis, engineering and management demand accurate measurement of the network performance and impairment metrics. The B M W G (Bench-marking Methodology Working Group), the JPPM (IP Performance Metrics Working Group) and the R T F M (Real-time Traffic Flow Measurement Working Group) are working to establish standardized measurement metrics and methodologies. 3.1 Definition and methodology for One-way Packet Loss metric IETF IPPM has defined several IP network performance and impairment metrics under the paradigm of "Framework for IP Performance Metrics (RFC 2330)" [30]. Detailed 18 definitions and methodologies for each metric are available in the following documents: One-way Packet Loss Metric for IPPM [31], One-way Delay Metric for IPPM [32], Instantaneous Packet Delay Variation Metric for IPPM [33], Empirical Bulk Transfer Capacity [34], and Round-trip Delay Metric for IPPM [35]. When congestion occurs, large amounts of IP packet losses could result in devastating degradation of video quality. Therefore the One-way Packet Loss Metric is of the most interest for the project. According to the definition specified by IPPM, one-way packet loss means that: 1. A packet is discarded (will never arrive). 2. A packet arrived but is corrupted. 3. A fragmented packet can not be re-assembled at its destination. 4. A packet failed to arrive within a specified time threshold. For real-time MPEG-2 video traffic, packet arrival later then the decoding time of the associated frame is also treated as a loss. There are three sources of packet loss measurement error: 1. Synchronization between clocks at source and destination. 2. Packet-loss threshold that is related to the synchronization between clocks. 3. Resource limits in the network interface or software on the receiving instrument. 19 The first two sources are interrelated and could result in a packet arrival with a delay lower than the specified threshold being reported as a loss. Therefore, the loss threshold must be set accurately, and the clocks in the video server and client synchronized well enough so that a packet that arrives on time is rarely counted as lost and discarded. In addition, the video server, switches and client in the path of the video packet stream should be checked to ensure the possibility that a packet arrives at the network interface but is lost due to congestion on the interface or to other resource exhaustion on those hosts is low. 3.2 Fractal model for Internet traffic The arrival statistics in traditional network modeling uses the Poisson process. Two assumptions are made: 1. The number of arrivals that occur in disjointed time intervals are independent. 2. The number of arrival in any interval of length t is Poisson distributed with parameter nt. The average number of arrivals within an interval t is also nt, where n is the arrival rate. The model has a memoryless property, therefore Markov chain theory applies. Recent studies and actual network measurements have discovered: 1. Packet arrivals are not Poisson. 2. Most connection arrivals are not Poisson. 3. Connection sizes are log-normal, not exponential. 4. Packet arrivals are correlated. 5. Connection arrivals are correlated. The Poisson traffic theory has been proven to fail in modeling real world traffic. According to the research [22,23,24], the fractal model that can be defined as self-similarity process traffic with long-range dependence characteristic is more accurate. 20 The process of proving the failure of the Poisson traffic model and establishing the fractal traffic model also emphasis that the measurements always need to be calibrated and be self-consistent. The methodology applied must be unbiased and reflect the "wire" facts. 3.3 Project developments After the testbed network has been fine-tuned, the procedures for network performance and impairment metric measurement and the test programs were specified and developed as follow: • The CMFS server and client were integrated into the testbed network. MPEG-2 video test streams are encapsulated, prepared using CMFS API [78] and stored in the server disc. (Refer to the section 7.1 for more details on packet encapsulations). • The Netperf benchmark [38] program was selected to generate bursty competing traffic in the testbed network. The program allows users to select TCP or UDP burst, the burst packet count, the packet length and the gap length between bursts. • In a high speed networking environment, the metric measurement overhead is normally large enough to prevent real-time measurement. As a solution, at the server host, an IP packet sniffer program was created using the TCPDUMP Unix utility [39] to quickly catch the packet and log the pertinent information. The filter of the program is set to catch the video IP traffic sent from the server to client. This way, information on the original video IP packet stream is logged and time-stamped for later off-line analysis. Another advantage of this method is that the test is done passively therefore avoiding interference with the DUT (Device Under Test). 21 • Another sniffer / filter program was created at the video client host to retain the information of IP packets in the arrived video stream. A l l programs and implementations were tested and verified to insure the correctness of the measurements and processes. 22 4 MPEG-2 video technology and test stream implementation A wide range of digital T V broadcast and video delivery applications has selected MPEG-2 [45] as the preferred video compression standard. This chapter introduces background information necessary in developing an understanding of MPEG-2 video components and its coding structure. This understanding is crucial to determine the contribution of each component to visual quality degradation during the transmission of an MPEG-2 video stream over a lossy IP network. Another focus of this chapter is to illustrate in detail the test stream selection, evaluation and implementation processes and the underlying theoretical reasoning. Centered around the MPEG-2 digital video coding standard, Section 4.1 provides background information about the digital video coding standard committees. Section 4.2 presents a brief introduction about the MPEG-2 video coding model. Section 4.3 discusses the MPEG-2 video hierarchical data structure. Section 4.4 introduces the issues of V B V decoder buffering and digital video modeling. Section 4.5 outlines some of the important applications applying MPEG-2. Section 4.6 describes the processes for M P E G -2 test clip selection. 4.1 Digital video coding standards MPEG-2 is a generic digital audiovisual and data coding standard. Its syntax and signaling are designed to allow applications to develop their own private needs. Compliance of digital video with international standards based on ISO/MPEG specifications is very important for several reasons. Firstly, almost all of the local and 23 regional standard bodies, (e.g., D V B , FCC, JISC, ATSC, SCTE, D V D , and D A VIC,) require MPEG-2 system compliance. Secondly, Integrated Circuit development relies on a standardized mass market. Lastly, MPEG-2 standard complied applications enjoy the widest interoperability in digital video and audio applications and services. M P E G ISO/IEC/JTC1 SC29/WG11 E Moving Picture experts Group MPEG-1 ISO/TEC 11172 ISO/TEC 11172-1 ISO/IEC 11172-2 ISO/TEC 11172-3 ISO/IEC 11172-4 ISO/IEC 11172-5 Information technology-Coding of moving picture and associated audio for digital storage media up to about 1.5Mbit/s. Systems. A syntax for packet transporting and synchronizing video and audio streams. Video. Syntax and semantics. Audio. Syntax and semantics. Conformance and test guidelines. Software Simulation. MPEG-2 ISO/IEC 13818 ISO/TEC 13818-1 ISO/IEC 13818-2 ISO/TEC 13818-3 Information technology-Generic coding of moving picture and associated audio. Systems. A syntax for packet transporting and synchronizing video and audio streams. Video. Syntax and semantics. Audio. Syntax and semantics. 24 ISO/TEC 13818-4 ISO/TEC 13818-5 ISO/TEC 13818-6 ISO/TEC 13818-7 ISO/JEC 13818-8 ISO/TEC 13818-9 MPEG-4 ISO/TEC (11,98) Conformance and test guidelines. Software Simulation. Digital Storage Medium Command and Control (DMS-CC). Non-Backwards Compatible Audio (NBC). 10-bit Video Extension. Real-time Interface (RTI). "Generic coding for very low bit-rate video/audio application." JPEG ISO/TEC/JTCl SC29AVG10 Joint Photographic Experts Group JPEG-1 ISO/TEC 10918-1 ISO/TEC 10918-2 ISO/TEC 10918-3 Information technology-Digital compression and coding of continuous-tone still images: Requirements and guidline,1994. Compliance Testing. Extensions. ITU-T H.261 Video codec for audiovisual services at p x 64 Kbps. ITU-T H.263 Experts group on very low bit-rate video telephony. Digital H D T V Grand Alliance, FCC. 25 A properly assembled and fully MPEG-2 standard compliant stream contains all the information necessary to allow MPEG-2 compliant decoder to successfully recover the original video sequence. In this idealized situation, only constrained quantization and motion estimation error occurs. Normally, an MPEG-2 stream contains properly encoded ISO/TEC 13818-1 System information, ISO/TEC 13818-2 Video and ISO/TEC 13818-3 Audio information as well as the bit rate and buffer control mechanism. The System layer includes syntax for stream multiplexing and video and audio synchronizing. Real-time semantics of video signal are embedded in the syntax by means of SCR, PCR, DTS and PTS timestamps. Encoded MPEG-2 streams are "tightly" packed and are therefore very sensitive to error. They also suffer from the error propagation problems. 4.2 M P E G - 2 coding model 4.2.1 Uncompressed data rates of digitized T V and video signals There are many standards in digitizing T V and video signals. Resolutions and data rates between standards vary significantly. To avoid confusion, some of the most popular standards are listed in the following paragraphs. The data rate of each standard is also calculated. 1. The " D - l " or "CCIR 601" digital video: 270 Mbit/sec. Luminance (Y): 858 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 135Mbit/sec. R - Y (Cr): 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68Mbit/sec. B - Y (Cb): 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68Mbit/sec. Total: 27 million samples/sec x 10 bits/sample 270 26 Mbit/sec. 2. NTSC broadcast quality digital video: 250 Mbit/sec. Luminance (Y): 720 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 113Mbit/sec. R - Y (Cr): 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68Mbit/sec. B - Y (Cb): 429 samples/line x 525 lines/frame x 30 frames/sec x 10 bits/sample ~= 68Mbit/sec. Total: 25 million samples/sec x 10 bits/sample 250 Mbit/sec. 3. MPEG-2 conformance digital video: 207 Mbit/sec. Luminance (Y): 704 samples/line x 480 lines/frame x 30 frames/sec x 10 bits/sample ~= 104Mbit/sec. R - Y (Cr): 352 samples/line x 480 lines/frame x 30 frames/sec x 10 bits/sample ~= 52Mbit/sec. B - Y (Cb): 352 samples/line x 480 lines/frame x 30 frames/sec x 10 bits/sample ~= 52Mbit/sec. Total: 20.7 million samples/sec x 10 bits/sample 207 Mbit/sec. 4. MPEG-2 Main Profile @ Main Level digital video with 4:2:0 Chroma: 150 Mbit/sec. Luminance (Y): 720 samples/line x 576 lines/frame x 30 frames/sec x 8 bits/sample ~= lOOMbit/sec. R - Y (Cr): 360 samples/line x 288 lines/frame x 30 frames/sec x 8 bits/sample ~= 25Mbit/sec. B - Y (Cb): 360 samples/line x 288 lines/frame x 30 frames/sec x 8 bits/sample ~= 25Mbit/sec. Total: 18.7 million samples/sec x 8 bits/sample 150 Mbit/sec. 5. MPEG-2 CCIR-601 studio quality digital video with 4:2:2 Chroma: 200 Mbit/sec. 27 Luminance (Y): 720 samples/line x 576 lines/frame x 30 frames/sec x 8 bits/sample ~= lOOMbit/sec. R - Y (Cr): 360 samples/line x 576 lines/frame x 30 frames/sec x 8 bits/sample ~= 50Mbit/sec. B - Y (Cb): 360 samples/line x 576 lines/frame x 30 frames/sec x 8 bits/sample ~= 50Mbit/sec. Total: 25 million samples/sec x 8 bits/sample 200 Mbit/sec. 6. SIF (Source Input Format) digital video: 30 Mbit/sec. Luminance (Y): 352 samples/line x 240 lines/frame x 30 frames/sec x 8 bits/sample ~= 20Mbit/sec. R - Y (Cr): 176 samples/line x 120 lines/frame x 30 frames/sec x 8 bits/sample ~= 5Mbit/sec. B - Y (Cb): 176 samples/line x 120 lines/frame x 30 frames/sec x 8 bits/sample ~= 5Mbit/sec. Total: 3.8 million samples/sec x 8 bits/sample 30 Mbit/sec. 7. CIF (Common Intermediate Format) digital video: 37 Mbit/sec. Luminance (Y): 352 samples/line x 288 lines/frame x 30 frames/sec x 8 bits/sample ~= 25Mbit/sec. R - Y (Cr): 176 samples/line x 144 lines/frame x 30 frames/sec x 8 bits/sample ~= 6Mbit/sec. B - Y (Cb): 176 samples/line x 144 lines/frame x 30 frames/sec x 8 bits/sample ~= 6Mbit/sec. Total: 4.6 million samples/sec x 8 bits/sample 37 Mbit/sec. 4.2.2 M P E G digital video compression technologies As calculated in Section 4.2.1, the transmission and storage of digitized video data requires large amounts of network bandwidth and memory space. For example, the digitized NTSC T V video requires 250 Mbit/sec of network bandwidth and 100 GB of disk space for an hour of movie! Digitized video signals are non-stationary stochastic 28 sources. Each frame of the video can be treated as a member of an ensemble. Each pixel in the frame can be characterized in terms of its neighboring pixels in both the spatial and temporal domain. In other words, pixels of a frame are correlated in both spatial and temporal dimensions. This correlation is termed redundancy. The objective of video data compression is to analyze the image and remove any redundancy, hence achieving lower data rate for video transmission and storage. Therefore, effective and efficient video data compression techniques are important in practical digital video applications. MPEG-2 video data compression techniques can be categorized into three categories: Predictive Coding, Transform Coding and HVS (Human Visual System) Weighting. Predictive Coding exploits redundancies in video data by analyzing the predictability, the randomness and the smoothness of the data. V L C (variable length code) entropy coding [50], D P C M (differential pulse code modulation) [50] and M C (motion compensation) [51] belong to this category. Transform Coding achieves data compression by transforming the image into another array such that the information of the signal/picture is packed into only a few coefficients. The 2D DCT (discrete cosine transform) [54] is an orthogonal transform with two important characteristics. First, the DCT coefficients have been proven to be relatively non-correlated, and this makes it possible to construct relatively simple algorithms for compressing the coefficient values. Second, DCT is a process for decomposing the data into underlying spatial frequencies which in turn allows the precision of the DCT coefficients to be reduced in a manner consistent with the properties of the human visual system. M P E G applies 2D DCT to 8x8 block of pixels to achieve de-correlation and 29 energy compaction. The DCT transform itself is a lossless process. It is the quantization process applied to the DCT coefficients that actually removes the spatial redundancy from the image. HVS Weighting studies human visual system characteristics. Because human are the final users of video services, studies have provided an understanding on how to selectively discard data and information which is insensitive to human eye or has minimum effects on subjective quality. Amongst various psychovisual characteristics of HVS, the understanding of "Resolving Spatial Detail" and "Perception of Motion" are the two most important areas that closely related to video data compression. With regard to the "Resolving Spatial Detail", researchers investigated such issues as differences in luminance and chrominance sensitivity of a human eye, varying precision requirements, frequency weighting and quantization, activity masking and facilitation. The studies in "Perception of Motion" unveiled such phenomenon as critical flicker frequency, spatial resolution of time-varying scenes, motion sensing, tracking of moving objects, and temporal masking. DCT coefficient quantization and vector quantization techniques in M P E G encoder take advantage of HVS phychovisual effects to reduce the data rate without sacrificing the subjective video quality. 4.2.3 M P E G coding model The predictive model and the DCT transform model are two coding models employed in M P E G video standard [50]. The predictive model is less complex but more sensitive to channel noise with low compression ratios. In contrast, the DCT transform model is more complex but less susceptible to error with high compression ratios. The two models can 30 coexist when encoding a picture thus forming a hybrid encoder and ultimately produce better performance. As illustrated in Figure 4-1, in the M P E G standard, video pictures are divided into three categories: I, P, and B pictures. I pictures are intra-coded without reference to neighboring pictures in the sequence. P pictures are coded with respect to the temporally closest preceding I or P picture. B pictures are coded with respect to the immediately adjacent I and P pictures. Group of Pictures 1 R R P It It P It R Frames 8 9 Forward predictive Bi-directional Prediction Figure 4-1 MPEG-2 Coding Model 31 Figure 4-2 depicts an MPEG-2 encoder example. The analog video sequence is sampled and converted to an appropriate format because MPEG-2 standard supports both progressive and interlaced formats. MPEG-2 encoder then compresses the digitized data to an MPEG-2 Elementary Stream according to the specified Scalability structure and Profile / Level option. In the encoding process, the bit rate control mechanism regulates the output bit stream to the bit rate that is offered by communication channel. The encoding procedure applies various image redundancy removal and vector quantization techniques to reduce both the intra-frame spatial redundancy and the inter-frame temporal redundancy. To further extract redundancy from the compressed data stream, the V L C entropy coding technique is utilized. The compression processes generate intra-coded I picture, inter-coded forward predicted P pictures and inter-coded bi-directional predicted B pictures. Source Data Pre-processing FDCT Qantizer MPEG-2 Compressed Stream Encoder Contro l ler Motion Estimator V L C Encoder Reconstmctioh Module Figure 4-2 MPEG-2 Example Encoder Diagram 32 4.3 MPEG-2 video hierarchy and bit stream structure The MPEG-2 video Elementary Stream (ES) is coded using a layered structure. Each layer consists of particular headers and flags. The layered structure provides flexibility in the encoder / decoder process where coding processes can be logically distinct and layers can be decoded systematically. MPEG-2_Transport_System (TS) bit stream is defined by the standard ISO/TEC 13818-1. Based on a 188 byte fixed-length packetizing approach, the Transport layer offers flexibility and advantages in multiplexing data that is related to several application to form a single bit stream. Dynamic capacity allocation, scalability, extensibility, robustness and cost effectiveness are other TS layer features available [45]. 4.3.1 M P E G video ES bit stream hierarchy The diagram in Figure 4-3 illustrates the MPEG-1 bit stream hierarchy and the diagram in Figure 4-4 illustrates the MPEG-2 bit stream structure based on MPEG-1 structure. Table 4-1 shows the functions of each coding layer. L A Y E R OF S Y N T A X FUNCTIONS Sequence layer Context: frame size rate, bit rate, Group of pictures layer Random access into the sequence Picture layer Frame coding unit: I,B,P type Slice layer Resynchronization unit Macroblock layer Motion compensation unit Block layer DCT unit Table 4-1 MPEG-2 Layer Function 33 Video Sequence Group of Picture-Slice Block 8X8 Pixels Macroblock Figure 4-3 The Layered MPEG-1 Video Structure 34 I Sequence Header GOP Header GOP l:\tci iMiin Profile/Level Indication Progressive and Interlaces Display Extension Scalable Extension Picture Header Pii lure I \tcnsion Slice Layer Macroblock Layer Block Layer Intra DC Precision Picture Structure Intra V L C Format Alternate Scan Quantization Matrix Picture Display Spatial Scalability Temporal Scalable Figure 4-4 MPEG-2 Video Hierarchy and Extension Function 35 4.3.2 M P E G TS bit stream hierarchy Figure 4-5,6,7,8 illustrate MPEG-2 TS data formats. MPEG-2 Transport Stream provides service multiplexing to multiple ES streams. No mechanism within the syntax can be used to ensure reliable delivery. It is not, itself an OSI Transport Layer. 188 bytes Link Header 4 bytes Adaptation Header Variable Length Payload Figure 4-5 Transport Packet Format Sync_byte 0x47 13 bit PID Adaptation Layer or Packet Payload bit transport priority 1 bit payload unit start ind. 4 bits Continuity counter 2 bits adaptation field control 1 bit transport packet error 2 bits transport scrambling con. Figure 4-6 Link Header Format for The transport Packet 36 elementary stream priority indicator discontinuity indicator OPCR flag transport private data flag 1 byte Adaptation 1 1 1 1 1 1 1 1 Flagged Adaptation field Length bit bit bit bit bit bit bit bit Header Field random access indicator ORC flag adaptation field extension flag splicing point flag Figure 4-7 Fixed-length Component of The Adaptation Header Format 3 bytes Packet 1 byte 2 bytes PES 2 bits 14 bits PES Start Code Pre. Stream ID Packet Length 10 Header Flags 1 byte PES PES Header PES Packet Header Length Field Data Block Variable Length Figure 4-8 PES Packet Format 37 4.4 MPEG -2 V B V buffering scheme "The network is ideal, and that each byte is transmitted with a constant delay" is the main assumption when constructing the M P E G standard. Under this assumption, the M P E G rate control mechanism uses an idealized video buffer verifier (VBV) decoding model to control the decoder buffer fullness, which in turn prevents the decoder buffer from overflow and underflow. An encoder buffer is filled in bursts as pictures are coded and-emptied at a constant rate as the data are transmitted. A decoder buffer is filled at a constant rate as the transmitted data are received, and emptied in bursts as pictures are decoded. Under idealized conditions, an encoder buffer overflows when too much data are produced and underflows when not enough data are produced. A decoder buffer; overflows when not enough data are removed and underflows when too much data are removed. Buffer occupancy has different meaning in each case, but a full encoder buffer implies an empty decoder buffer. Figure 4-9 is a typical diagram used in many references and showing the dynamics of the decoder V B V buffer occupancy in receiving C B R (constant bit rate) M P E G -2 video stream. The diagram shows the decoder buffer occupancy against time. The ordinate is buffer occupancy and the abscissa is time, in units of encoder picture intervals. The connecting constant slope ramps show the filling of the buffer at a constant rate. The long delay between receipt of the first data and the start of decoding is needed in order to load most of the data for the first two pictures, in this case an I and a P picture. Without this delay, a buffer underflow would occur. After being nearly emptied by removal of the first I picture and P picture, the buffer gradually refills. At the start of the next GOP, the buffer 38 occupancy is back to the level at the start of decoding. According to MPEG-2 [45]., R(n) = Dn / (((n) - (n+1)) + (t(n+l) - t(n))) Where R(n) is the bit rate for n'th picture, Dn is the total bit counts for the n'th picture, (n) is the vbv_delay parameter for the n'th picture and t(n) is the time to remove picture n from the decoder V B V buffer. vbv_buffer_size | ^ ^ | i i i i I I i vbv_delay 0 1 2 3 4 5 6 7 Time (picture intervals) Figure 4-9 V B V Buffer Dynamics 39 The data of a picture is removed from V B V buffer at the pace specified by the standard. The vbv_delay parameter in every picture header is an indication of the buffer occupancy under the idealized network condition with constant packet delay. A bitstream that confirms to MPEG-2 specification shall not cause the V B V buffer to overflow. When the low_delay parameter is zero, the bitstream shall not cause the VBV_buffer to underflow. When low_delay is one, decoding a picture at the normally expected time might cause the V B V buffer to underflow. In reality, IP and A T M networks have never been "ideal". Because of the statistical multiplexing queuing model and the networking protocols developed under this model, packet loss, excessive delay and delay jitter exist. The Transport Issue involves delivering and protecting MPEG-2 video over IP or A T M networks and it is an area of active research. For example, in an IP network, the IP packet traffic pattern, the decoder buffer size, the network delay jitter and the packet loss rate are inter-related. A complex stochastic model needs to be established in order to provide a theoretical foundation for modern network traffic engineering and managing. Furthermore, to guarantee a continuous MPEG-2 video decoding process without interruption from buffer underflow or overflow, it is necessary for decoder or the video client to de-jittering received streams. 4.5 Recent advancement in MPEG-2 video source modeling Digital video source modeling involves analyzing and abstracting statistical characteristics of the encoded video stream applying non-stationary discrete stochastic processes. An accurate source model is important in estimating performance and 40 allocating bandwidth for the underlying network. The source model can also be used to determine the parameter(s) which the network congestion control mechanism monitors. Conventional source modeling uses a layered approach. A digital video is abstracted into five layers: bit stream layer, stripe layer, image layer, scene layer, and program layer. Models based on various stochastic processes are used to match each layer. For instance, M M P P (Makov Modulated Poisson Process) has been used to model the bit streams layer, DNSPP (Discrete Non-Stationary Periodic Process) for the stripe layer, A R M A (Autoregressive Moving Average) for the image layer, E M C (Embeded Markov Chain) for the scene layer and stationary statistics for the program layer [53]. In general, conventional video source models are highly scene and /or codec specific. Recent studies have moved away from the traditional approaches by exploiting more generic and "inherent" features from wide variety of video sources. Based on in-depth statistical study of a large collection of video sequences, Willinger et al. found that L R D (Long-range Dependence) is an inherent characteristic for V B R video traffics [48]. Pure Poisson or Poisson-related models such as Poisson-batch or Markov-Modulated Poisson processes are SRD (Short-range Dependence) models. The A C F (Auto-correlation -Pk Function) of video traffic decays exponentially P(k) = e in SRD models but only by -P power-law (much slower) P(k) = k in L R D models. This implies that the performance of queuing systems with L R D model can be drastically different from the performance predicted by traditional SRD models. Aggregated L R D streams are found not to present the "smoothing" effect of aggregated SRD streams. Willinger et al. proposed an algorithm 41 for generating synthetic traffic traces with L R D property. The computational complexity is 0(N 2 ) . While researchers generally agree on the importance of traffic correlation, they disagree on how much of this should be incorporated in a traffic model. The L R D modeling advocates argue that the L R D phenomenon has a significant impact on network engineering and must be accounted for in dimensioning network resources. On the other side, the Markovian modeling supporters, while acknowledging the presence of L R D , argue that for networks with finite buffers it is sufficient to incorporate correlation up to some finite lag that is proportional to the buffer size. Krunz et al. [49] indicated that the F-ARFMA (Fractional Autoregressive Integrated Moving Average) L R D model often underestimates the correlation up to some lag, and overestimates them beyond that lag. The proposed M/G/oo input process SRD model is found to be more accurate in predicting the actual queuing performance than the F-ARFMA model. 4.6 MPEG-2 video application examples The MPEG-2 standard has many applications. Besides the digital video and audio compression technology, its popularity comes from two other reasons. The first reason is because of its generic characteristic. The second reason is that MPEG-2 syntax and signaling allow applications to develop their private set of syntax and signaling, therefore private needs are satisfied. Following are the typical MPEG-2 video applications: • Digital television standard (DTV). In a 1996 announcement, FCC adopted the digital television standard (DTV) to be the next generation broadcast television standard. 42 DTV employs MPEG-2 M P @ M L video stream syntax for video coding and AC-3 (Digital Audio Compression standard) for audio coding. The picture resolution is 720 x 480 pixels for NTSC with 30 frames / second. High definition television (HDTV or DTV-HD). H D T V is based on MPEG-2 Main Profile / High Level upper-bound parameters. D T V high-definition (HD) has a resolution of approximately three times and two times that of conventional television in the horizontal and the vertical dimensions, respectively. It has an aspect ratio of 16:9. A typical D T V HD format uses 1920 x 1080, 30 HZ progressive or 60 H Z interlaced scan. The encoded video data rate could reach 10-45 Mbps. Digital Video Broadcast (DVB). D V B is the European digital T V standard that is similar to the U.S. D T V counterpart in MPEG-2 parameters. Digital Versatile Disk (DVD). Digital Versatile Disk is a high-density storage media using 630-670 nm red or 417-425 blue laser technology and employs MPEG-2 M P @ M L V B R (variable bit rate) compression technique. At the bit rate of 3.5 Mbps, D V D provides T l studio quality movies. Digital Set-Top-Box. The T V digital set-top-box is the near-term entry point into the new array of information and entertainment services for home. The "set-top-box" industry is a fast growing sector of the consumer electronic, a single signal source has yet to be determined but it is industry consensus that the MPEG-2 decoding block wil l be an essential component inside the boxes. No matter what signal method emerges, an MPEG-2 decoder remains the main components of set-top^boxes. 43 • Digital Satellite Broadcast (DSB). Almost all DSB services use MPEG-2 as the video compression scheme. Alphas t a r D S S - D i r e c T V D S S - U S S B E c h o s l u r IViim-slsir ^deofB:"fM'™ tfi|||||§I^ lllllllllllSlllllllli liliilElllpllIlllll Compression MPEG-2 MPEG-2 MPEG-2 MPEG-2 Digicipher-1 Iphen^iiiiiiijii j | IllSlllljlSlfellsii:! JIlllEillllllHlH Pllll§li&::jlliil:liils lOSllllllSIII Video channels 90 25 144 96 81 [Audio channels 30 Dish size 30" Installation Self Table 4-2 Video Formats of Digital Satellite Broadcast 4.7 MPEG-2 video CODEC and bit stream analysis tools To evaluate and analyze MPEG-2 video streams, we use the software CODEC from M S S G (MPEG Software Simulation Group) for offline encoding and decoding process. We use the hardware decoder from REALmagic Hollywood-2 for real-time video stream decoding and playback. We use HP's MPEGscope Plus for MPEG-2 TS and ES bit stream analysis. The software MPEG-2 CODEC, Mpeg2encde/Mpeg2decode version 1.2 [46,47], is provided by MSSG. It converts uncompressed video streams into MPEG-1 or MPEG-2 bitstreams (encoding), and vices versa (decoding). The CODEC comes with source code for ease of modification and implementation. Mpeg2encode is an MPEG-2 video encoder with following features: generates MPEG-2 / MPEG-1 CBR streams; applies the M P E G -18" 18" - 18" 30"-36" Self Self Sell l'ii>li-<.M>>n.il KsaaMftdttail WiMmMismmlmmEiM liEtt^fcsii^^Hiil UM^^mmmmiim 44 2 TM5 (Test Model) rev. 2 encoder model; supports both progressive and interlaced video; accepts Y U V and P P M as input format; provides trace and statistics support. Mpeg2decode is an MPEG-2 decoder with following features: decodes non-scalable, spatial scalable, SNR scalable and data partitioning MPEG-2 video streams; supports Simple, Main SNR scalable and Spatially scalable Profile streams at all levels; decodes MPEG-2 / MPEG-1 video streams; provides multiple output formats; outputs decoding information; provides robustness against stream syntax errors. The M S S G MPEG-2 CODEC was compiled and integrated into the testbed network running the FreeBSD operating system. The decoder Mpeg2decode was also modified to work with other software tools and quality measurement programs. To subjectively assess video quality, it is necessary to decode and display MPEG-2 video in real-time. We use the REALmagic Hollywood-2 MPEG-2 / D V D hardware decoding card for this task. The decoder accepts MPEG-2 video with bit rates ranging from 500KB7s to 15MB/s. The hardware decoder accepts various video resolutions and formats, e.g. 720x480x30 in 24 bit colors, 352x240x30 in 24 bit colors, D V D and V C D formats. The HP MPEGscope Plus is a comprehensive tool for MPEG-2, D V B , and ATSC system development and verification. In-depth video elementary stream analysis function is one of its full lines of analysis, measurement and 90Mb/s real-time test features. The Video Elementary Stream Compression Analyzer is capable of testing all aspects of M P E G video encoding process including decomposing the video elementary stream into 45 individual Macroblocks, testing syntax and semantic protocol, calculating bit rate and V B V buffer statistics, and analyzing Macroblocks and motion vectors in each frame. 4.8 Project development Based on the understanding of MPEG-2 technology and available CODECs and data analysis tools, we can proceed to evaluate, select, and implement appropriate MPEG-2 video test streams for experiments. We collected a wide variety of bit streams from various sources. It takes multiple steps to evaluate, process, and sort out the final test streams. • A l l candidate video streams are decoded using Mpeg2decode and stored in Y U V format. Each decoded frame has about 1.5MB of data. • A l l of the uncompressed video frames are re-encoded by jMpeg2encode using the same encoding parameters specified in table 4-3. • Each re-encoded stream is tested for MPEG-2 syntax and semantic protocol compliance using MPEGscope ES analyzer. • Each of the re-encoded and verified video streams is fed to the MPEG-2 hardware decoder for real-time visual inspection to insure a uniform encoding quality. • Due to the available time frame and resources, it was not possible to use all of the prepared video streams for experiments. To ensure the generality of the study under this circumstance, it is important that the test streams cover the characteristics of most real world MPEG-2 encoded videos. The streams we selected present wide 46 differences in spatial activity and motion. Cheer, Ballet, and Susi are the streams chosen to represent high, medium, and low level of motion and spatial activity respectively. Video-H, video-M and video-L are used in place of Cheer, Ballet, and Susi in the rest of the thesis. • Another important issue in video stream selection is the total real-time length of the clip. According to the video source modeling discussion in section 4.5, digital video can be abstracted as five layers, the bit stream layer, stripe layer, image layer, scene layer and program layer. MPEG-2 video can be viewed as a stream consisting of many short sub-periods (Scenes) with varying levels of motion and spatial detail. The overall subjective video quality may be characterized by the single worst sub-periods. For this reason, we use relatively short test streams that are 10 seconds long to emulate scenes. Input file format 0 (Y.U.V) Bit rate (Mb/sec.) 6 Total frams 300 V B V buffer size 112(x 16 Kbit) First frame # 0 Profile ID 4 (Main) First frame timecode 00:00:00 Level ID 8 (Main) Frames in GOP 15 Chroma format 1 (4:2:0) I/P frame distance 3 Video format 2 (NTSC) Horizontal size 704 Display H size 704 Vertical size 480 Display V size 480 Aspect ratio 2(4:3) Intra DC precision 0 (8 bit) Frame rate code 5 (30 frame/sec.) Q scale type 1 1 1 (I P B) Table 4-3 Test Stream Encoding Parameters 47 As depicted in table 4-3, test streams are encoded using MPEG-2 Main Profile / Main Level constrained format with 704x480 resolution and 30 frames/second. GOP (Group of Picture) is used as bit allocation boundary in CBR coding process. Normally, there are 15 frames for each GOP at a distance of 3. There are total of 300 frames or 10 second real-time length for each test stream. The constant bit rate is 6 Mbit/second. As recommended, the V B V buffer size is set to 1.8 Mbit. According to the statistics from table 4-4, on average, I frames contain 330K bits, P frames contain 285K bits, and B frames contain 110K bits. This means that the V B V buffer is able to hold two I P B B sequences which is only about 0.27 second of display. Type Frame Slice Bytes Packets Video-H Video-M Video-L Video-H Video-M Video-L I 21 630 866782 . 664211 864736 843 660 880 P 80 2400 2458287 2983630 2854488 2374 2701 2472 B 190 5970 2930873 2598643 2532044 3415 3114 3037 Total 300 9000 6255942 6246484 6251268 6605 6475 6317 Table 4-4 Statistics of the Encoded Test Steams Table 4-4 shows data statistics for video-H, video-M, and video-L. Frame counts, slice counts, byte counts and packet counts for the three pre-encoded test streams are also included in the table. 48 5 Digital video impairments measurement & quality assessment M P E G encoded video exhibits distinct digital video impairments. The large difference in digital and analog video characteristics requires new impairment measurement and quality assessment methods. Since the advent of digital video, there has been a large amount of research and standardization efforts in this area. The objective is to establish a user-oriented, technology-independent, perception-based, in-service objective quality measurement for digital video and the implementation system. 5.1 Digital video system versus analog video system The M P E G digital video encoding scheme employs a hybrid algorithm that combines motion compensation, temporal differential pulse code modulation and block discrete cosine transform with associated adaptive quantization and entropy coding algorithms. The spatial and temporal information content of the source plays a crucial role in determining the amount of compressio'n that is possible and the severity of the compression artifacts. The input video source signal dictates the overall behavior of the encoding algorithms. The resulting video system is therefore time varying and video signal dependent. In an analog video system, there is no coding algorithm variation so that the system can be treated as "constant" or video source independent. Traditional performance parameters have relied on the "constancy" of a video system's performance for different input scenes. Thus, one could inject a test pattern or test signal (e.g., a static multi-burst), measure 49 some resulting system attribute (e.g., frequency response), and be relatively confident that the system would respond similarly for other video material (e.g., video with motion). Because a digital video system is time varying and source dependent, the user-perceived quality is a dynamic function of both the transmission system and the input signal. The quality of digital video systems cannot be evaluated using the static test patterns and waveform reproduction measures traditionally used in assessing analog video systems. According to the research and experiments done by ITS (Institute for Telecommunication Sciences) [69], attempts to use input scenes that are different from what is actually used "in-service" can result in erroneous and misleading results. Variations in subjective performance ratings as large as 3 quality units on a subjective quality scale that runs from 1 to 5 (l=lowest rating, 5=highest rating) have been noted in tests of commercially available systems. While quality dependencies on the input scene tend to become much more prevalent at higher compression ratios, they also are observed at lower compression ratios. For example, subjective test results of 45-Mb/s contribution quality systems (i.e., systems now used by broadcasters to transmit over long-line digital networks) revealed one transmission system with multiple tandem codecs whose subjective performance varied from 2.16 to 4.64 quality units. A digital video transmission system that works fine for video teleconferencing might be inadequate for entertainment television. In summary, specifying the performance of a digital video system as a function of the video scene coding difficulty based on amount of spatial detail and motion yields a much more complete description of system performance. 50 5.2 Taxonomy of digital video impairments New digital video systems introduce fundamentally different impairments than those created by analog reproduction methods. The following is a summary of the digital artifacts and distortions. 1. Aliasing Aliasing occurs when a signal being sampled contains frequencies that are too high to be successfully digitized at a given sampling frequency. When sampled these high frequencies fold back on top of the lower frequencies producing distortion. In most methods of video digitizing, this will produce pronounced vertical lines in the picture. 2. Overload Overload is related to the finite number of levels that the signal can take. If a signal is digitized that is too high in amplitude, then the picture will appear bleached. For example, if the signal level of a gray scale image is too high for the conversion process to cope with, then all levels above the maximum will be converted to white, causing the washed out appearance. Another possible result is known as wrap-around. This is where all out of range values are converted to the lowest value i.e. black. 3. Blocking artifacts Blocking artifacts are found as the discontinuities at the boundaries of adjacent blocks in a reconstructed image. The discontinuities can be perceived either at block edge or as 51 errors inside the block. Coarse quantization of the DCT coefficient, error corruption, and mismatch in motion estimation all contribute to the blocking artifacts. 4. Mosaic / tiling When quantization of the DCT coefficients is unbalanced, the block is represented by only a limited DCT basic image (there are 64 basic DCT images). The resulting block will show the mosaic patterns or tiling artifacts. 5. Blurring Due to the coarse quantization of high order DCT coefficients or the low pass image filtering process, the spatial detail of the image may be removed and the lost information is unrecoverable. 6. Color bleeding It can be seen as the smearing of the color between areas of strongly contrasting chrominance values. The reason is the coarse quantization of the high order coefficients. 7. Staircase artifact The DCT based image is not well-designed to represent diagonal edges. When a diagonal edge is presented within a string of consecutive blocks, the result after coarse quantization is the representation of diagonal edge as a number of horizontal or vertical steps. This is called staircase artifact. Edge discontinuities also occur at block boundaries. 52 8. Ringing The ringing artifact is visible along high contrast edges and appears as ripples outward from the high-contrast edge up to the block boundary. The higher the contrast, the greater the level of ringing peaks and troughs. The ringing artifact is caused by high frequency distortion. 9. Mismatch in motion predicted block The motion compensation mismatch is seen as an ill-fitting block with relation to either o one or more of the objects which it straddles. The chrominance mismatch appears as a misplaced macroblock with respect to its own color and the color of the surrounding area. 10. The Gibbs Effect One of the most common artifacts that affect both M P E G and JPEG compression is the Gibbs effect. This is most noticeable around artificial objects such as plain colored, large text and geometric shapes such as squares. It shows up as a blurring or haze around the object, where the sudden transition is made from the artificial object to the background. It is caused by the discrete cosine transform used to compress chrominance and luminance information. This phenomenon is also apparent around more natural shapes like a human figure. The area of the background around the subject appears to shimmer as the subject moves slightly. This shimmering has been nicknamed "mosquitoes". 53 11. Jerky motion and repeating Segments of the encoded video data could be lost or corrupted in transmission; this results in unrecoverable frames in the decoder. Motion that was originally smooth and continuous is perceived as a series of distinct snapshots. To deal with video frame loss, a decoder might be forced to display the I or P frame it previously stored, which changes the time line and results in a repeated scene or motion. 12. M P E G specific coding impairments There are several difficult scenarios in M P E G compression: high motion, high detail, circular motion, reflections, high frequency noise, alternating lines and multiple motions. 5.3 Video quality measurement methodologies and metrics Subjective Assessment and Objective Measurement of video quality are two important issues in the image research community and the video industry. Since humans are the final consumer of the video stream, the most appropriate video quality measurement should be directly based on the Human Viewer Trial or Subjective Quality assessment. The CCIR (International Radio Consultative Committee) Recommendation 500 [65] is the most commonly used standard of subjective video quality assessment based on quality judgement by a human viewing panel. Although subjective quality assessment is considered to be a true evaluation of the video quality, there are some shortcomings and implementation difficulties associated with the standard. Firstly, an expensive subjective viewing laboratory that conforms to CC1T Recommendation 500-3 must be available. Secondly, the viewing and grading process involves many human interactions and 54 environment factor control. Thirdly, it is very difficult to establish academic reference data. Fourthly, the methodology can not be used as in-service in-system video quality evaluation, therefore it is unsuitable for industrial and commercial applications. Objective measurement methodology mitigates the shortcomings and difficulties of the subjective assessment and has a wide range of academic and industrial applications. The ideal objective measurement should provide results that strongly correlate to the subjective assessment scores. In other words, the test instrument should perceive and measures video impairments like a human being. Toward this goal, the research community and standards organization are striving to find the technology-independent, perception-based, in-service objective quality measures. The criterion for technology-independence is to ensure that the objective measurement result is not dependent on any specific video coding scheme or the underlying transport networks. Giving the rapid technological progress in digital video compression, storage and transmission, this helps to avoid the immediate obsolescence of the quality measuring methodology. Perception-based criteria require that the measurement correlate closely to the evaluation done by a human being and accurately gauges the user's satisfaction. Because of the time-variant and video source dependent nature of the digital video system, "In-service" usefulness is a necessity in achieving accurate quality measurement based on the end user's application. Recent rapid progress in digital video systems and devices has spurred much activity among local and international standard organizations to specify and standardize the methodology for video quality measurement. Following are two important standards. 55 5.3.1 Subjective video quality assessment standard CCIR Recommendation 500-5 (ITU-Rec.500) [65], "Method for the subjective assessment of the quality of television pictures," is an established subjective video quality assessment standard. Subjective tests conducted in accordance with CCIR Recommendation 500 produce one subjective Mean Opinion Score (MOS) for each test clip. The MOS quality grading and the impairment scaling are provided in table 5-1. Grade Quality Impairment 5 Excellent Imperceptible 4 Good Perceptible, but not annoying 3 Fair Slightly annoying 2 Poor Annoying 1 Bad Very annoying Table 5-1 Mean Opinion Score (MOS) Subjective Assessment The CCIT Recommendation 500-3 specifies the viewing laboratory parameters and conditions, e.g. viewing distance at 4 to 6 times the screen height, specifies wall color and background lighting, etc. 5.3.2 Objective video quality measurement standard After extensive research and large scale multi-lab experiments, three national objective video quality standards have emerged: ANSI Tl.801.01-1995 [66], ANSI Tl.801.02-1996 [67], and ANSI Tl.801.03-1996 [68]. 56 ANSI Tl.801.01-1995, "American National Standard for Telecommunications - Digital Transport of Video Teleconferencing /Video Telephony Signals - Video Test Scenes for Subjective and Objective Performance Assessment." provides a set of video test scenes in digital format that can be used for subjective and objective testing of-digital video systems. Having ^ standardized test scenes gives users the ability to directly compare the performance of two or more systems. ANSI T 1.801.02-1996, "American National Standard for Telecommunications - Digital Transport of Video Teleconferencing/Video Telephony Signals - Performance Terms, Definitions, and Examples." provides a dictionary of digital video performance terms and impairments. This standard includes a video tape that illustrates common digital video impairments such as tiling, smearing, error blocks, and jerkiness. Thus this standard gives end-users and service providers a common language for discussing digital video quality. ANSI T 1.801.03-1996, "American National Standard for Telecommunications - Digital Transport of One-Way Video Signals - Parameters for Objective Performance Assessment.", defines a new framework of objective parameters that are sensitive to distortions introduced by the encoder, the digital channel or the decoder. Those parameters can be used to measure the quality of digital video systems. The ANSI standard defines parameters based on three types of features: scalar features, vector features, and matrix features. Features are quantities of information that are extracted from video frames. The spatial information features (SI) and temporal information feature (TI) are examples of scalar features where the information associated with a video frame is represented by a scalar. The spatial frequency features are vector 57 features where the extracted information is represented by a vector of data. The peak signal to noise ratio (PSNR) is a matrix feature where the extracted information is represented as a matrix. Studies have indicated that the amount of reference information (features) required from the video input to perform meaningful quality measurements is much less than the entire video frame. The compression of reference information is a new concept. One application of the concept is to achieve long term digital video system and network monitoring. Because the output scalar features require only small amount of disc storage space, it is possible to efficiently archive those features for future reference. Research is aiming to further refine the techniques of compressing video quality information in order to produce an "in-service" method for measuring video quality that will be good enough to replace subjective experiments in many cases. This would make it possible to perform non-intrusive, in-service performance monitoring, which would be useful for applications such as fault detection, automatic quality monitoring, and dynamic optimization of limited network resources. Based on studies on the above three types of extracted video feature, ANSI defined several Objective Parameters as in table 5-2. In a comprehensive evaluation and statistical study by the ITS of the Subjective Assessment and Objective Measurement using the parameters defined in table 5-2, it was found that objective measures captured about 90% of the subjective information that could be captured considering the level of measurement error present in the subjective and objective data. In table 5-2, each parameter is sensitive to some unique dimension of video quality or impairment type. This is useful to designers trying to optimize certain 58 system attributes over others, and to network operators wanting to know not only when system is failing but also where and how it is failing. Parameter Method of Measurement 711 Maximum added motion energy 712 Maximum lost motion energy 713 Average motion energy difference 714 Average lost motion energy with noise removed 715 Percent repeated frames 716 Maximum added edge energy 717 Maximum lost edge energy 718 Average edge energy difference 719 Maximum H V to non-HV edge energy difference 719-60 Maximum H V to non-HV edge energy difference, 60 trd. 719a Minimum H V to non-HV edge energy difference 719a-60 Minimum H V to non-HV edge energy difference, 60 trd. 7110 Added edge energy frequencies 7110a Missing edge energy frequencies 721 Maximum added spatial frequencies 722 Maximum lost spatial frequencies 732 Minimum peak signal to noise ratio 733 Average peak signal to noise ratio Negsob Negative Sobel difference Possob Positive Sobel difference Table 5-2 ANSI Objective Measurement Parameters 59 For example, quality parameters related to several common digital video impairments are as follow: Blurring can be measured with the Lost Edge Energy Parameter; Block Distortion (Tiling) with the H V to non-HV Edge Energy Difference Parameter; Temporal Edge Noise with the Added Edge Energy Frequencies Parameter; Error Blocks with Added Motion Energy Parameter; Noise with Motion Energy Difference Parameter; and Jerkiness is best captured by Lost Motion Energy and Percent Repeated Frames Parameter 733, peak signal noise ratio (PSNR), is a matrix feature calculated from the error image which is obtained by subtracting the output image from the input image. It is traditionally used as an objective parameter for video quality measurement. The ITS study also found that PSNR captures only about 21% of the subjective information and thus does not correlate well with subjective quality assessment. 5.4 Project development Section 4.8 described the process for test stream selection and evaluation. Test video clips are categorized based on their spatial detail and motion. As discussed in section 5.1, this categorization method provides accurate measurements of a digital video system. To continue the Project Development flow, this section presents our application layer test architecture and implementation for MPEG-2 video transport over TP networks. The environment and methodology for subjective and objective quality evaluation are also examined. 60 5.4.1 Application layer architecture and environment for video quality evaluation The MPEG-2 application layer test architecture and environment for video quality evaluation is designed and built on top of an IP testbed network architecture (Figure 2-1 in Chapter2). Figure 5-1 illustrates components and their associated functions. Based on an established test architecture, the whole video quality assessment can be logically divided into three stages in accordance with their functionality and sequences in the process. Preparation Stage: Referring to Figure 5-1, each test stream is originally stored as Y U V uncompressed image frames. As discussed in chapter 4, we use the Mpeg2encode software encoder to generate MPEG-2 video bit streams. A l l streams are encoded with the same Encoding Parameters shown in table 4-3 to ensure uniform coding quality. The encoded video stream is verified for MPEG-2 standard conformance using HP MPEGscope ES Analyzer. The stream is also decoded and displayed in real-time with a REALmagic-2 hardware decoder for Visual Inspection. Test streams are selected based on the criteria explained in both section 4.8 and section 5-1. The final selected test streams are stored in our Database in MPEG-2 video format. This stage consists primarily of off-line processes. Transport Stage: In each experiment, a test stream is submitted and prepared for the CMFS Video Server as discussed in section 2.2. The video stream is transported from a video server to its IP destination host (running a video client) via the IP Router. The IP Router is running FreeBSD "best effort" packet forwarding with tail-drop packet discard scheme. The router is stressed with bursty background traffic to saturation, which in turn, 61 overflows the output queue and generates IP packet losses. The Video Client running on the receiver host performs various tasks to capture the incoming video stream, log extra information that is embedded within each packet, and run TCPDUMP to timestamp each received packet for jitter calculation. In this stage, the transported test stream together with associated real-time related information are filtered and saved in the Database. Stage 2 is considered the real-time process of these experiments. Quality Evaluation Stage: The transported and received video stream is decoded using the software Mpeg2decode decoder and stored in uncompressed Y U V format. Due to the effects of network impairments, transported (corrupted) video contains many types of visible errors. To ensure accuracy in subjective and objective quality evaluation, pre-processes such as frame re-synchronizing and basic error resolution are necessary and will be discussed in later sections. Using the pre-processed corrupted stream and the reference (transported but not corrupted) stream, the Objective Quality Measurement is computed by extracting features and parameters from the corrupted stream relative to the uncorrupted stream based on methodologies specified by ANSI Tl.801.03-1996 [68]. The Subjective Quality Assessment takes more steps. After pre-processing and error resolution, the corrupted stream is re-encoded using the Mpeg2encode encoder, because the hardware decoder we used for real-time playback requires M P E G format data input. The Subjective Quality Assessment is obtained using a human viewing panel as defined by CCIR Recommendation 500-5 (ITU-Rec.500) [65]. Stage 3 includes mainly off-line processes with the exception that the Subjective Quality Assessment is done in a quasi-real-time fashion. 62 Original Y U V Format Frames Encoding Parameters Testbed IP Router CMFS Video Server CMFS Video Client Reference Frames Database MPEG-2 Stream Conformance Test Visual Inspection Test Stream Database Transported Stream Database Data Logger Temporal Alignment Error Resolving Calibration Objective Quality Measurement Subjective Quality Assessment Results Objective Measurement & Subjective Assessment Figure 5-1 MPEG-2 Layer Test Architecture 63 5.4.2 Test model and methodology As described in the last subsection, video sequence are processed and consequently affected by multiple components in their paths from source to destination, or in translation from one format to another format. Each intervening processing system (e.g., encoding, transmission, decoding, and display) introduces visible distortions in the final video output. To validate our study, it is very important to identify each impairment source and to de-couple its impact on video from combined distortions that are introduced by other impairments. Our project is mainly interested in investigating IP packet loss effects on reconstructed MPEG-2 video. Any specific encoder or decoder chosen for the experiment should not affect the study result. The test model we applied satisfies this requirement. Video sequence A Network packet loss Video sequence B Encoding Decoding w w — • & other & other Video W processes w processes Video sequence A sequence C Figure 5-2 Test Model to Evaluate Network Impairments on Video Quality Figure 5-2 is a diagram of our test model. Video B and C are outputs of video A transported across the testbed network. In our experiments, Videos B and C passed 64 through the same components and processes in their transport paths except that Video B was subjected to network packet loss while Video C was not. If Video C is used as reference, computing the Objective and Subjective quality of Video B with reference to Video C eliminates the effects of unwanted impairments introduced by the MPEG-2 CODEC and other elements. This way, we can investigate clearly the relationship between video quality and network packet loss. 5.4.3 Objective measurement We use PSNR (average peak signal to noise ratio, parameter 733 in ANSI Tl.801.03-1996) [68] to measure the degree of difference between the corrupted stream and the reference stream. As discussed in subsection 5.3.2, PSNR as an objective video quality measurement metric correlates poorly with human visual perception, or subjective quality. Nevertheless, PSNR provides an accurate frame to frame pixel value comparison and it is mainly used to evaluate the degree of error corruption of video data in our experiments. The "real" video quality is assessed using the subjective process defined by CCIR Recommendation 500 [65]. The equations for computing M S E (mean square error) and PSNR are listed as follows. 1 h-1 v-1 2 M S E = ( X(m,n) - X ( m , n ) ) h * V m=0 n=0 Where h and v are the horizontal and vertical pixel number respectively. 65 M S E PSNR = -- 10 log ( ) 2" - 1 Where n is the number of bits used to represent each sample within a digitized frame. Figure 5-3 is a diagram for the model we used to derive the PSNR parameter. It is based on the test model and methodology detailed in subsection 5.4.2. The functions in the Error Resolving block will be explained in subsection 5.4.5. Network Error resolving Video Encoding - • packet loss -• Decoding - • -> PSNR sequence & other & other processes w processes Figure 5-3 Test Model for Network Impairments on Video Quality Using PSNR 5.4.4 Subjective assessment We use an informal process for subjective video quality assessment based on CCIR Recommendation 500-5. The visual quality evaluation center is constructed using a Pentium PC with a SONY Trinitron Multiscan 17 SE monitor and a REALmagic Hollywood2 MPEG-2 real-time hardware decoder. 66 As illustrated in Figure 5-4, the Subjective Assessment is also based on the test model and methodology described in subsection 5.4.2. The 5-point scale MOS grading process is extended to incorporate the philosophy that the MOS scale is obtained by evaluating the corrupted video against the reference video. Therefore the reference video is rated at point 5 and the quality of corrupted video is always considered as "degradation" to the reference video. Table 5-3 shows the revised definition for the MOS grades. Video sequence Encoding & other processes Net \\ oik. packet loss Decoding & other processes Hi uu resolving MOS Encoder Figure 5-4 Test Model for Network Impairments on Video Quality Using MOS Grade Impairment 5 Imperceptible degradation 4 Perceptible degradation, but not annoying 3 Slightly annoying degradation 2 Annoying degradation 1 Very annoying degradation Table 5-3 Mean Opinion Score (MOS) Using Impairment Model 67 5.4.5 Temporal realignment In a highly congested IP network, packet loss can reach a level where the data loss is measured in frames or multiple frames. This occurs when a group of consecutive packets, which contain the data for one or more video frames, is lost. It also occurs when the packets that are lost contain MPEG-2 bitstream headers, resulting in undecodable or unrecoverable frames. The consequence is that the comparison of the video frames between the reference and transmitted streams becomes difficult because of the loss of temporal alignment. Figure 5-5 demonstrates the temporal misalignment due to the loss of Frame 3 in an example received stream. Reference frame 1 Measured frame 1 Reference frame 2 Measured frame 2 Reference frame 3 Measured frame 4 Reference frame 4 Reference frame 5 Measured frame 5 Figure 5-5 Temporal Misalignment Due to Loss of Frame Temporal alignment or frame by frame synchronization between Reference Frames and Measured Frames is required to guarantee accurate PSNR calculation. This raises the important issue of deciding how the lost frames are to be handled, whether by leaving a 68 blank or patching with error concealment techniques. A blank frame could be a black, gray or white frame. Various error concealment techniques are also available. Error resolution or pre-processing is required to realign the frames and provide error handling. According to our experience, error concealment techniques affect measurement results so significantly that the real characteristics of measured video streams are masked. Therefore, we chose to substitute a blank frame of black for a lost frame. Based on this decision, we have written a program to detect frame loss in the transmitted video stream and substitute a black frame for the lost frame. As demonstrated in Figure 5-6, the reference frames and the measured frames are realigned. Reference Reference Reference Reference Reference frame 1 frame 2 frame 3 frame 4 frame 5 r r r r r Measured Measured Patched. Measured Measured frame 1 frame 2 . frame 3 frame 4 frame 5 Figure 5-6 Temporal Re-alignment with Error Resolving 69 6 Transport issues on delivering MPEG-2 video over IP network We have addressed Transport Issues in delivering real-time MPEG-2 over IP networks by investigating the network protocol stack and studying transport errors, error propagation, error sources, and the relationship between particular errors and IP network impairments. We have also investigated various error correction and error handling techniques. A comprehensive survey of error detection, prevention, and concealment techniques is made available for future research and experiment. 6.1 Protocol stacks for real-time MPEG-2 video transport From a purely networking protocol point of view, as illustrated in figure 6-1, there are several levels in the protocol stack of which real-time video may be encapsulated.. In practice, there are two standard implementations. The M P E G committee proposed the first implementation. Referring to Figure 6-1, the encapsulation scheme can be described as Elementary Stream (MPEG-2 ES) > Packetized ES (MPEG-2 PES) > Transport Stream (MPEG-2 TS) > A A L 5 > A T M > Physical Layer. The IETF Audio/Video Transport Working Group (Transport Area) proposed the other implementation. The IETF proposed encapsulation scheme can be described as MPEG-2 ES > RTP > UDP > IP > W A N / L A N > Physical Layer. 70 MPEG-2 ES MPEG-2 PES MPEG-2 TS MPEG-2 PS UDP IP A A L 5 W A N L A N A T M RTP RSVP RTCP IP £ W A N L A N UDP A A L 5 A T M Physical Layer Figure 6-1 Protocol Stack to Transport MPEG-2 Video M P E G System is the part of M P E G standard that specifies the implementation to multiplex M P E G Video and Audio, segments in a single stream. An M P E G Systems stream also contains timing information so that M P E G players (or decoders) can play 71 back the synchronized Video and Audio. MPEG-2 System provides a two layer multiplexing approach. The first layer is dedicated to ensure tight synchronization between video and audio. It is a common way for presenting all the different materials that require synchronization (video, audio, and private data). This layer is called Packetized Elementary Stream (MPEG-2 PES). The second layer is dependent on the intended communication medium. The specification for error free environments such as local storage is called MPEG-2 Program Stream (MPEG-2 PS), while the specification used in error prone environments is called MPEG-2 Transport Stream (MPEG-2 TS). MPEG-2 TS is a service multiplex. No mechanism, within the syntax, exists to ensure the reliable delivery of the transported data. MPEG-2 transport relies on underlying layers for such services. MPEG-2 TS requires the underlying protocol layer to identify the transport packets and to indicate in the transport packet header, when a transport packet has been erroneously transmitted. The MPEG-2 Transport Stream is so named to signify that it is the input to the Transport layer in the OSI seven layer network model. It is not, in itself, the Transport layer. Real-time Transport Protocol (RTP) was defined by RFC 1889 of IETF [74]. RTP provides end-to-end network transport functions suitable for application transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. Instead, together with RTCP, RTP provides a QoS monitoring mechanism including payload type identification, sequence numbering, time stamping and delivery monitoring. Applications typically run RTP on top of UDP to 72 make use of its multiplexing and checksum services and both protocols contribute to parts of the transport protocol functionality. By applying the application level framing and integrated layer processing, RTP is designed to be malleable to provide the information required by a particular application and will often'be integrated into the application processing rather than, being implemented as a separate protocol layer. Unlike conventional protocols in which additional functions might be accommodated by making the protocol more general or by adding an option mechanism that would require parsing, RTP is intended to be tailored through modifications and / or additions to the headers as needed. Therefore, a completed specification of the RTP protocol should include a profile specification and a RTP packet payload format. The profile for audio and video conference with minimal control is defined in RFC 1890. The RTP payload format for MPEG- l /MPEG-2 video is specified in RFC 2250. The underlying packetizing principle of RFC 2250 is to selects the Slice as the minimum data unit and create fragmentation rules to ensure that the beginning of the next Slice after one with a missing packet can be found without requiring that the receiver scan the packet contents. User Datagram Protocol (UDP) was defined in RFC 768 of IETF [77]. UDP is a simple datagram protocol that is layered directly above the Internet Protocol (IP). UDP is also a transport protocol that extends the host-to-host delivery service of the underlying network into a process-to-process communication service. Applications may access UDP using the socket interface. The address formats of the UDP are identical to those used in TCP: a port number and an TP address to identify the end point of communication. Real time characteristic of the MPEG-2 video requires the underlying protocol be implemented with 73 a real time semantic. Compared with TCP, UDP is more suitable for real-time applications because it has lower overhead and a lower delay. Although UDP is an unreliable service that requires error handling at the application layer, it provides two important functions for the real-time streams: transport level de-multiplexing and a checksum for each packet. CMFS Server CMFS Client MPEG-2 Video ES Stream RTP M T UDP IP D L L PL IP Router MPEG-2 Video ES Stream RTP M T UDP IP D L L PL Figure 6-2 Protocol Stack to Transport MPEG-2 Video over Testbed Network 74 Figure 6-2 is a diagram of the protocol stack applied in a testbed network to end to end transport real-time MPEG-2 video. Following are implementation steps in each layer for the CMFS video server. The MPEG-2 video Elementary Stream is parsed to locate the headers within the stream. The ES stream is subsequently divided into Slices for RTP packet encapsulation. According to the RFC 2250 "RTP Payload Format for MPEG1/MPEG2 Video" [76], The M P E G video ES stream is encapsulated using the following rules: 1. The M P E G Video_Sequence_Header, when present, will always be at the beginning of an RTP payload. 2. An M P E G GOP_Header, when present, will always be at the beginning of the RTP payload, or will follow a Video_Sequence_Header. 3. An M P E G Picture_F£eader when present, will always be at the beginning of a RTP payload, or will follow a GOP_Header. 4. The beginning of a Slice_Header must either be the first data in a packet (after any M P E G ES headers) or must follow after some integral number of slices in a packet. The Media Transport (MT) [78] is a proprietary unreliable data stream transport protocol developed by the U B C CMFS project. M T is essentially an association between two UDP sockets, between which byte-sequenced packets of data can be sent in either direction. There is no re-transmission of lost data, but the amount of any lost data can be determined from gaps detected in the sequence space. To overcome the simplicity of the UDP, M T 75 provides functionality, e.g. connection establishment, stream data transport, and connection tear down, to accomplish the CMFS transport layer implementations. The RTP packet is encapsulated into a M T packet payload, then UDP and IP encapsulated. The Ethernet packetizing is the data link layer encapsulation. In the IP layer, an IP packet may be fragmented in order to deliver through networks with low M T U packet size. To make efficient use of network bandwidth and avoid IP packet fragmentation, the M T U for UDP and IP are set to allow IP packet be encapsulated in an Ethernet packet of about 1500 Bytes. Figure 6-2 implies that the IP router needs to strip the Ethernet packet to use IP destination addresses to forward each IP packet according to its routing table. Upon arrival at the IP destination, the video client at the receiver runs a peer process that inverts the steps done in the video server. M P E G video data is extracted from multiple encapsulated packets. 6.2 Taxonomy of transport errors In Chapter 5, we have discussed in detail the distinct digital video impairments that M P E G encoded video inherits. The large differences in digital and analog video characteristics require user-oriented, technology-independent, perception-based, in-service objective quality measurement. In section 5.2, a summary of the digital artifacts and distortions that the new digital video systems introduce was also outlined. 76 As indicated in figure 6-3, packet loss induced errors may propagate both intra-frame and inter-frame. An error can corrupt re-synchronizing boundaries and propagate through multiple layers, resulting in a very visible erroneous block in the corrupted frame. Such errors may persist through multiple frames i f the corrupted frames happens to be an I or P frame. For example, a physical layer bit reversal error or router queue overflow might result in a dropped IP packet at the network layer. This in turn could prevent re-assembly of a fragmented IP packet, resulting in larger data loss in the MPEG-2 bitstream. g Pixel Figure 6-3 Example of Error propagation and Severity 77 This section is devoted to the study MPEG-2 Transport errors, error-affected video quality, error propagation and error sources, also, particular MPEG-2 errors are related to the IP network layer impairments. An understanding on transport errors is established by addressing the following three sets of questions: Set one: Where are the error sources? How are the errors introduced into the data stream? Set two: How do errors affect the final video quality? What is the severity? In bitstream hierarchy, how does the picture layer react to errors? How do I, P, and B pictures respond to errors? How does an error propagate from one layer to another layer, from one picture to another picture and from one protocol layer to another protocol layer? Set three: How do we detect errors? How do we prevent errors from corrupting the bitstreams? How do we select and apply suitable pre- and post- error correction and concealment techniques to combat errors and losses? To analyze errors, the first problem set is to classify and to characterize errors. It is also important to figure out what artifact or degradation a specific type of error can generate. There are complications due to the combined error effects, i.e. a bit error can cause both Macroblock and Slice layer problems while a Macroblock problem might be caused by either bit error or packet delay. To address this chaotic situation, a systematic analysis approach is necessary. First, in subsection 6.2.1, from the MPEG-2 coding structure point of view, the impact that errors and other failures generate on MPEG-2 stream is studied. In the second step, subsection 6.2.2, from the network point of view, the error sources are identified. In the third step, subsection 6.2.3, from the communication 78 protocol point of view, the error presentation and propagation within protocol layers will be analyzed. In the fourth step, section 6.3, presented is a brief discussion on error detection, prevention and concealment techniques. The four-step approach demonstrates the multiple facets of error characteristics as well as their real world presentations. This approach also improves the efficiency in error prediction and simulation. 6.2.1 The impacts of error and impairment on MPEG-2 encoded stream Errors and impairments which penetrate the MPEG-2 stream hierarchy do not just corrupt data in the original area of the frame, but also propagate both intra and inter frames, and cause serious disruption on the decoding. 6.2.1.1 Impacts of error on the Block layer The MPEG-2 Block layer is the lowest layer in the hierarchy and it is in this layer that the actual 8 x 8 DCT Blocks are coded. The encoder processes data depending on whether it is a luminance or chrominance Block and whether the Macroblock is specified as intra or inter coded. The coded Block layer contains the size of DC coefficients (DCc), DC differences (DCd), A C coefficients (Acc) and EOB (end of block). There is no frame boundary marker ( N - M K ) . N - M K V L C segment: DCc, DCd, F L C segment: EOB N - M K •4 Variable frame length • Figure 6-4 Code structure diagram: Block Layer 79 1. Carried Information: DC coefficients, DC differences, A C coefficients, EOB 01. 2. Layer structure and stream re-synchronizing boundary: The Block layer contains V L C (variable length code) data and F L C (fixed length code) EOB segments. The whole stream is Variable Length (VL) in nature. There is no stream re-synchronizing boundary. The EOB is mainly used for multiple zero trailing coefficients removal. 3. Severity of error impact due to loss of information: Because of the V L C nature of this layer, a single bit reversal error or burst corrupted bits can cause the loss of all the data following the error. The result is a corrupted Block in image. 4. Severity of error impact due to corrupted re-synchronizing boundaries: Since there is no re-synchronizing boundary between adjacent Blocks, an error in one Block is likely to cause an error in the next Block and result in multiple corrupted blocks of the image. This error propagation can only be stopped by a re-synchronizing point which features a distinctive re-start marking flag and prevents information from previous Blocks being used. 5. Severity of error impact due to error intra frame propagation: This error propagation problem is different from the one that V L C codes of subsequent Blocks are corrupted. The intra frame error propagation is caused by the differential residue coding technique, which removes the intra Block or intra frame redundancy. As a result, coefficients of multiple relevant Blocks in I frames and intra coded P, B frames (i.e. , DC difference coefficients,) are rendered useless. 80 6. Severity of error impact due to error inter frame propagation: In MPEG-2 coding techniques, most blocks in B and P frames are coded using inter-frame differential value or using estimated motion vectors. In order to reconstruct the predicted frames, I or P reference frames are saved in the decoder memory. If errors and intra frame propagation damage exist in the reference frame, the B and P frames, which use the corrupted reference frames, will inherit those errors. 7. Severity of error impact due to error inter-layer propagation: This is another aspect of error propagation. Because there is no error re-synchronizing boundary between blocks within a Macroblock, an error is able to corrupt all the blocks inside the Macroblock and also to strike across multiple Macroblocks. That's how a Block layer error becomes a Macroblock layer error. 6.2.1.2 Impacts of error on the Macroblock layer The MPEG-2 Macroblock layer is the most versatile coded layer. In MPEG-2 M L @ M P 4:2:0 format, a macroblock constitutes a physical area of four 8 x 8 pel blocks or 16 x 16 pels in the image. The encoded image information consist of four luminance components from each of the four Blocks and two chrominance components from sub-sampled two 8 x 8 Blocks. The Macroblock layer contains Block and motion information. Based on the encoding choices of 1. Forward M C , Backward M C or Interpolated M C . 2. Intra or Inter coding. 3. Coded or Not Coded. 4. Quantized or Not Quantized. There are 15 types of Macblocks for MPEG-2 encoded, P, or B frames. 81 N - M K Header: V L C segment + PLC segment Blocks: V L C stream N - M K •4 Variable frame length • Figure 6-5 Code structure diagram: Macroblock Layer 1. Carried Information: Macroblock address (location) increment (VLC). Macroblock type selection (VLC). There are total of 20 Macroblock types for I, P and B pictures, depend on the indicated type of Macroblock, the information could be 1. different type of motion prediction, and motion vector (VLC), 2. New quantizer scale (FLC). 3 the Block coding information CBP for 4:2:0, 4:2:2 or 4:4:4 etc. (FLC ) followed by coded Block stream (VLC). 2. Layer structure and stream re-synchronizing boundary: Header segment consists of both V L C and F L C data. The bit-stream of Macroblock layer is Variable Length (VL) in nature. There is no re-synchronizing boundary. The re-synchronizing point for the Macroblock address is at the Slice Layer. 3. Severity of error impact due to loss of information: Because of the V L C nature of this layer, a single bit reversal error or burst corrupted bits can cause the loss of all the information following the error. The result is a corrupted Macroblock in the image. 4. Severity of error impact due to corrupted re-synchronizing boundaries: Since there is no re-synchronizing boundary between adjacent Macroblocks, an error in one Macroblock will likely to cause an error in the next Macroblock and result in multiple corrupted Macroblocks in the image. This error propagation can only be stopped by a 82 re-synchronizing point which features a distinctive re-start marking flag. As mentioned earlier, there is no Macroblock address re-synchronizing point until the next Slice, which means errors will propagate and accumulate. This accumulated error will also appear in all inter-coded pictures. 5. Severity of error impact due to error intra frame propagation: The error Macroblock intra frame propagation is caused by the dependency of the intra Macroblock address resolution and the dependency of underlying Block layer DC coefficients. As discussed above, the Macroblock address error in I frame will accumulate and propagate. The D C coefficient errors of the Macroblock will also propagate. 6. Severity of error impact due to error inter-frame propagation: A l l Macroblock layer errors will propagate from I or P reference frames to B or P frames which is created referring to the corrupted frames. 7. Severity of error impact due to error inter layer propagation: The Macroblock re-synchronizing boundary corruption and error intra frame propagation are stopped at the Slice layer. According to the MPEG-2 standard, the header segment of the Macroblock layer determines the horizontal position at which a slice starts. 6.2.1.3 Impacts of error on the Slice layer The Slice layer is important in error handling. The Slice start code provides a re-synchronizing point if data is corrupted. The number of Slices in a picture can range from one to over 175 slices. The quantizer scale index can be changed in the next Slice. 83 M K Header: FLC segment only Macroblocks: V L C N - M K •4 Variable frame length • Figure 6-6 Code structure: Slice Layer 1. Carried Information: Slice start code (FLC). Slice vertical position (FLC). Scalable extension and data partition (FLC). Quantizer scale (FLC). Coded Macroblock stream (VLC). 2. Layer structure and stream re-synchronizing boundary: The Slice header is coded using F L C which is not as efficient as V L C but provides more error resistance for the header. Bit reversal errors can only corrupt the segment of data but does not change the byte alignment or the FLC block boundaries. Because the Macroblocks are V L C coded, the slice stream, which contains multiple Macroblocks, is Variable Length (VL) in nature. Slice start code at the beginning of each Slice stream provides a re-synchronizing point when an error is present. Macroblocks are not skipped in I pictures, and for any type of picture the first and the last Macroblocks in a Slice may not be skipped. At the beginning of each Slice, the predictions for DC coefficients are reset to 128 x 8 and the quantizer scale is refreshed. The significance of these implementations is to insure a complete re-start coding point. 3. Severity of error impact due to loss of information: If the Slice start code is corrupted by an error, the decoder will not be able to identify the boundary, thusly causing 84 damage to two adjacent Slices in the image. An error in Slice vertical position data can result in two overlapped Slices if not corrected. An error in the Scalable extension and data partition header can cause serious problem in the Scalability decoding. Errors in the Quantizer scale parameter will cause the decoder to misinterpret the scale, which results in large distortion of the Slice image. Lastly, an error in V L C coded Macroblocks of the Slice will corrupt all of the Macroblocks that follow it. 4. Severity of error impact due to corrupted re-synchronizing boundaries: According to the MPEG-2 standard and the above analysis, the Slice layer provides a re-synchronizing point to stop error propagation. If the Slice start code is corrupted, it will result in two corrupted Slices but the error will not propagate further. 5. Severity of error impact due to error intra frame propagation: There is no intra frame error propagation at the Slice layer because inter Slice dependency doesn't exist and errors are contained by re-synchronizing code in a Slice. 6. Severity of error impact due to error inter frame propagation: A l l Slice layer errors will propagate from the I or P reference frame to B or P frames which are created using the corrupted frames. 7. Severity of error impact due to error inter layer propagation: There is no error propagation from the Slice layer to the Picture layer. 85 6.2.1.4 Impacts of error on the Picture layer The Picture layer is where the picture type (i.e. I,B,P) is signaled. Information relevant to forward and backward motion vector is defined (i.e. the scale and precision of the motion vector). The MPEG-2 extension data provides flexibility in configuring the Picture Layer. M K Header: F L C segment only Slices: V L C stream N - M K Variable frame length Figure 6-7 Code structure: Picture Laver Carried Information: Picture Layer consists of three parts of information using F L C code. These are the normal Picture header, MPEG-2 extension header and Slice streams. The Picture header contains such information as picture start code, picture temporal serial count, picture type, field / frame selection, I,P or B type, V B V delay, Motion vector scale and precision. The Extension header carries the extension start code, motion vector scale and precision, I block DC precision, picture structure, frame / DCT type, concealment vectors, Q scale, intra V L C format, alternate scan, repeat first field, chroma type, progressive frame and composite display data. 1. Layer structure and stream re-synchronizing boundary: The Picture header is coded using FLCs that protect block alignment boundaries. Because Slices are V L C coded, the picture stream is Variable Length (VL) in nature. The picture start code at the beginning of each picture stream provides a re-synchronizing point at the Picture layer when an error is present. 86 2. Severity of error impact due to loss of information: Picture headers contain crucial information to initiate and configure the decoder for decoding. Any errors in the header could seriously disrupts the decoding process and results in loss of the original picture and loss of all predicted picture if the original picture is used as reference. For example, if an error corrupted the Picture Type parameter, the decoder has no way of knowing I, B or P picture is coming and the decoding process is disrupted. 3. Severity of error impact due to corrupted re-synchronizing boundaries: According to the MPEG-2 standard and the above analysis, If an error corrupts the picture start code, it will result in two adjacent corrupted Pictures. More accurately, it corrupt the last Slice of the first Picture and the next Picture, but the error will not propagate further. 4. Severity of error impact due to error intra frame propagation: There is no intra frame error propagation at Picture layer because the Picture header only deals with the configuration parameters at the Picture level. 5. Severity of error impact due to error inter frame propagation: The inter frame error impact at Picture layer is dramatic. Besides the loss of the original corrupted Picture, all the predicted Pictures which use the corrupted one as reference will also lost. 6. Severity of error impact due to error inter layer propagation: Because all the Picture parameters, include the persisting Macroblock address error, are self-contained and refreshed in every Picture, no error propagates further into the GOP layer. 87 6.2.1.5 Impacts of error on the Group Of Picture layer The Group Of Picture (GOP) layer is a collection of continuously displayed pictures. A time code refers to the first picture in GOP. A closed GOP represents a closed prediction in the GOP only. M K Header: F L C segment only Pictures: V L C N - M K 4 Variable frame length : —• Figure 6-8 Code structure: Group Of Picture Laver 1. Carried Information: The GOP header is coded using F L C and contains the following information: Group start code. Time code. Closed GOP flag. Broken link flag. Extension start code. User data start code. Picture V L C stream. i 2. Layer structure and stream re-synchronizing boundary: The GOP header is coded using FLCs which protect block alignment boundaries. Because the underlying Pictures are V L C coded, the whole stream is Variable Length (VL) in nature. The GOP start code at the beginning of each GOP stream provides a re-synchronizing point at the GOP layer when an error presents. 3. Severity of error impact due to loss of information: The loss of GOP header information is not as important as the ones in the Picture layer. The loss of header information has almost no effect on the video quality but create confusion when user data are included in GOP stream. 88 4. Severity of error impact due to corrupted re-synchronizing boundaries: Corrupted GOP start code will cause an error in the last Slice of the previous Picture and block alignments of the current GOP. The Pictures of the current GOP can re-synchronize themselves after the error. 5. Severity of error impact due to error intra frame propagation: There is no intra frame error propagation at the GOP layer because all underlying pictures can recover from the error in the GOP layer. 6. Severity of error impact due to error inter-frame propagation: There is no inter frame error propagation. 7. Severity of error impact due to error inter-layer propagation: No error will propagate from GOP layer to Sequence layer. 6.2.1.6 Impacts of error on the Sequence layer The Sequence layer delivers important system and computation environment information in its header. The starting point of MPEG-2 stream is defined by extension start code 0x000001B5 in the originally MPEG-1 video sequence. M K Header: F L C segment only GOPs: V L C stream M K < Variable frame length • Figure 6-9 Code structure: Sequence Layer 89 1. Carried Information: The Sequence header is coded using F L C and contains the following information: Sequence extension start code. Profile / Level selection. Progressive sequence. Chroma format. Horizontal and vertical size. Aspect ratio. Frame rate. Bit rate. V B V buffer size. Constrained parameter flag. Intra quantizer matrix. Non-intra quantizer matrix. GOPs (VLC). 2. Layer structure and stream re-synchronizing boundary: The Sequence header is coded using FLCs which protect the block alignment boundaries. Because the underlying GOPs are V L C coded, the whole stream is Variable Length (VL) in nature. The Sequence Extension start code at the beginning and the Sequence End code at the end of the Sequence stream provide stream framing and the re-synchronizing point at the Sequence layer when an error is present. 3. Severity of error impact due to loss of information: The MPEG-2 Sequence header information is very important in configuring the system. For example, a decoder can do nothing if the MPEG-2 Profile / Level information or the Bit Rate information is corrupt or altered. It is of great interest to protect the Sequence header information from lost or corrupted. The Sequence Layer is the top layer of a video sequence, the error propagation problem does not applied here. 6.2.1.7 Impacts of error on the MPEG-2 Scalability extension headers The MPEG-2 Scalability scheme provides flexibility in traffic shaping and also better performance in error resilience. The technique applies dual priority layered coding which generates base layer and enhancement layer bit streams from the original video signal. 90 There are four methods to produce layered Scalability streams, Data Partitioning Scalability, SNR Scalability, Spatial Scalability and Temporal Scalability. The impacts of an error corrupted Scalable Extension Header can cause loss of multiple Pictures, or even the whole Sequence until the next re-synchronizing point. The effects are similar to the Picture Header and the Sequence Header problems discussed in 6.2.1.4 and 6.2.1.6. If the damaged Picture happens to be a reference I or P Picture, the error effects will also propagate to several frames. 6.2.1.8 Impacts of packet loss on the MPEG-2 encoded stream From subsection 6.2.1.1 to 6.2.1.6, the effect of single or burst bit reversal error on an MPEG-2 encoded stream has been studied and categorized based on the six layer coding hierarchy. In packet switched computer networks, the correlated multiple packet or cell loss occurs more often. If ES data is truncated and encapsulated without awareness of priority or hierarchy, the loss of a packet may cause devastating quality degradation. Losing large amount of consecutive bits in MPEG-2 encoded stream could cause the error situations analyzed from 6.2.1.1 to 6.2.1.6 and much more. The previously analyzed errors happened inside a logical layer or between two adjacent logic entities. The loss of a data packet can destroy a large trunk of the bit-stream encompassing multiple hierarchical layers. Damaged Macroblock errors will propagate the intra I frame, and if errors happen to be in a I or P frame, their predicted frames will also inherit these errors. In [84], Han et al had studied the error and categorized them in three categories: 1. Mismatched Macroblocks. 2. Mismatched Slices. 3. Mismatched Pictures. They all 91 resulted in large errors. Their study only focused at the intra coded I frame. If the study included inter frame error propagation, the resulting video degradation would have been even more serious. When the network experiences heavy congestion, more packets might be discarded. If this happens, another serious error problem, the decoder buffer underflow, will occur and could cause complete disruption in decoding processes. 6.2.1.9 Error tolerance of frames and error sensitivity of streams Knowing how well the individual I, P or B frame tolerates error can bring benefit to error resilient coding and transport system designs. The study [81] conducted by Richardson et al provided an interesting answer. In their experiments, the I frame portion of the bit stream was separated from the M P E G encoded stream. Poisson distributed error was applied to the I stream and then it was combined back into the main stream. The error affected the main stream and was then fed into the decoder. Both Subjective and Objective evaluations were used to measure the decoded video quality. Similarly, this process was done to both P and B streams. According to their experimental results, in an error arrival rate of 10 "° 5, the B frame was 10 PSNR points better then the P frame and 15 PSNR points better then I frame. To summarize the error study from 6.2.1.1 to 6.2.1.8, Table 6-1 is a sensitivity list that prioritizes the MPEG-2 video bit stream according to their levels of vulnerability to errors. For example, the I frame Slice Header (Priority 4 in the table) is more sensitive to error than the headers listed behind, e.g. B frame Picture Header (Priority 11). It is a 92 coarse estimation of sensitivity. More detailed study can provide a statistical quantitative value for this parameter. p Layer Header (H) Data (D) Pic. Error Effect 1 Sequence H Multiple group of pictures, decoder configurations. 2 Picture H I Group of I, P, B pictures, decoder configurations. 3 Picture H P Group of P, B pictures, decoder configurations. 4 Slice H I Multiple slices in I, P, B pictures. 5 M-block H I Multiple slices in I, P, B pictures. 6 Block D I Multiple slices in I, P, B pictures. 7 Slice H P Multiple slices in P, B pictures. 8 M-block H P Multiple slices in P, B pictures. 9 Block D P Multiple slices in P, B pictures. 10 Picture H B B picture, decoder configuration. 11 Slice H B Slice in B picture. 12 M-block H B Slice in B picture. 13 Block D B Slice in B picture. 14 GOP H Reference information only. Table 6-1 M P E G Encoded Bitstream Error Sensitivity List 93 6.2.1.10 Ambiguity on MPEG-2 stream re-synchronizing point Ambiguity exists in the literatures regarding the MPEG-2 encoded stream re-synchronizing point. Most papers cite that the Slice is the stream re-synchronizing point. This is only true when the corrupted or lost information happens under the Slice coding layer. According to our study, if the corruption happens to be in the Picture Header or the Sequence Header, a new Slice is no longer the re-synchronizing point. When considering transport packets over lossy, noisy and bursty IP networks, packet loss is commonplace. Even a small packet loss rate could mount to a serious threat to the final video quality. This is because of the much higher probability for Picture or Sequence header corruption and the cross layer mismatch errors discussed in subsection 6.2.1.7 6.2.1.11 Impacts of end-to-end packet delay on interactive applications Interactive applications such as M P E G encoded video teleconferencing are sensitive to end-to-end delay. This includes the overall, network delay and the host display processing delay. Starting from voice conversation, according to ITU-T, G.114, a delay under 150ms one way has little impact, but a delay over 400ms has a serious impact. In order to maintain the lip synchronization, video may precede or succeed the associated audio stream by up to 80ms for one way sections. [60] Assume a voice stream has a delay of 150ms, at synchronizing point, the video should have a delay of less than 230ms. The absolute end-to-end delay exceeding this limit will result in perceptual quality degradation. 94 6.2.1.12 Impact of excessive packet delay on MPEG-2 quality In the MPEG-2 standard, video frames are sampled and displayed on an agreed upon rate. This is the real-time nature of the MPEG-2 stream. If a packet arrives too late for its display time, it is treated as a lost packet. The analysis of loss packet impact on decoded video quality has been studied in subsection 6.2.1.8 and the result also applies here. 6.2.1.13 Impacts of network delay variance (jitter) on MPEG-2 decoding Rate control is one of the most important aspects in MPEG-2 implementation. In the first step, the encoder obtains accurate bit rate estimation by applying the final video quality and bit rate tradeoffs using Variable Quantization masking techniques. To achieve a constant rate offered by the network, bit allocation is done to frames of different activities based on rate prediction. The goal of the second step is to prevent the decoder buffer from overflow or underflow under a given buffer size. The encoder achieves this goal by constructing a V B V decoding model. Assuming the fixed rate channel is omitted, a V B V is an equivalent decoder. By monitoring the V B V buffer occupancy, four parameters: Bit Rate, Picture Rate, V B V Buffer Size, and V B V Delay are issued from encoder to decoder in order to insure the smooth decoding operation without causing decoder buffer overflow or underflow, which could seriously disrupt the decoding process. This delicately balanced rate control mechanism takes into consideration multiple factors and variables and could easily be broken by the network jitter. The reason for the failure is based on the assumption that the decoder buffer can be closely modeled at encoders using V B V if the fixed rate channel is omitted. In reality, due to the network jitter, this 95 assumption is faulty, especially when the network jitter reaches the level where decoder buffer starting overflow or underflow. The relationship between jitter, decoder buffer size and cell loss was studied and demonstrated by Zhu et al at Polytechnic U . of Brooklyn. Most strategies for preventing decoder buffer from overflow or overflow use rate control schemes without considering the A T M network jitter. The behavior of the decoder buffer differs from that of the encoder buffer, which may cause problems in decoder operations due to the network jitter. Their experiment proved that jitter level significantly affects the packet loss ratio. Specifically, at a relatively small jitter variance, the packet loss ratio reduces when buffer size increases. When the jitter is sufficiently large (a > 5E -6), increasing buffer size does not affect the packet loss ratio. When the jitter is small, packet loss is mainly caused by buffer overflow. On the other hand, when the jitter is very large, late packets contribute to the majority of lost packets. In other words, it is useless to increase the buffer size since the late packets can not meet the time constraint. Decoder buffer overflow or underflow interrupt the decoding process and cause serious video quality degradation. 6.2.1.14 Impacts of jitter on MPEG-2 system clock recovery The System Time Clock STC of the encoder must be re-generated at the decoder in order to obtain the same real time intervals at both ends. System clock synchronization plays an important role in real-time implementation of the system, i.e. isochronal sampling rate 96 recovery, video audio synchronization and display, composite display signal generation and network delay equalization. When a common reference clock i.e. global positioning system (GPS) or network time protocol NTP is not available, the system clock can be recovered using a phase lock loop P L L . The voltage control oscillator V C O is free running at about 27 mhz until it is locked to the STC by the synchronizing signal. In the MPEG-2 Transport Stream, the Program Clock Reference PCR provides synchronizing pulses for the PLL. PCR also determines when the stream bytes should enter the System Target Decoder STD. The low pass filter LPF in P L L is capable of filtering out the high frequency components of the PGR jitter, but fail to prevent slow varying drift. There always exists a convergence period for which a free running or unsynchronized P L L can reach the locked state. When congested network nodes add longer delays or even drop multiple consecutive packets, the PCR jitters could be large and correlated. This interval could be sufficient to cause P L L to loss the synchronization. Another possible situation is that the P L L constantly oscillates in between the state of lost synchronization and the state of convergence. The subjective evaluation of effects caused by an unsynchronized decoder system clock is a subject for future study. 6.2.1.15 Impacts of jitter on video and audio presentation synchronization The synchronization of elementary video and audio streams is accomplished with the Presentation Time Stamps PTS which indicates when the decoded pictures and audio segments are to be presented. When the video is not in display order (i.e. B-pictures are presented), the Decoding Time Stamp DTS may differ from the PTS. These time stamps 97 let the STD operate without concern for buffer underflow or overflow if the encoder followed the MPEG-2 rules when encoding the video and audio information. Another synchronization process immediately before the display is due to several video sequences emanating from different sources are displayed together. Only one of the received signals can be used to synchronize the display system. The other signals must be stretched or contracted to fit that time base [50]. Since the MPEG-2 streams pass through the network, the PTS jitter and DTS jitter are generated. Together with the effect of jitter on a recovered system clock, this jitter creates degradation in the overall quality. As a solution, a jitter removal process is necessary at the receiver. The de-jitter process can be done before or after the decoder. 6.2.1.16 Impacts of allocated bandwidth reduction on final video quality Depending on how the network bandwidth is allocated, there are four operating modes for A T M networks, CBR (constant bit rate), V B R (variable bit rate), A B R (available bit rate) and U B R (unspecified bit rate). There are three service models for IP networks, Best Effort service, IntServ (Integrated Service), and DiffServ (Differential Service). With the exception of the U B R and the "best effort" services, all other implementations require a statistical bandwidth allocation. There is a possibility that the underlying network ( A T M or IP network) might fail to provide agreed upon bandwidth due to the reasons outlined in subsection 6.2.2.5. The reduction in bandwidth means increase in packet loss rate that could bring serious video quality degradation according to the analysis in section 6.1.1.8. 98 6.2.2 Error and impairment sources- Network This section will provide error and impairment source analysis from a network point of view. A T M and IP internetworks can be characterized as asynchronous time-division multiplexing networks (ATDM) where the statistical multiplexing is the foundation. The network provides fair access to the transmission capacity and the host is responsible for the quality of the transfer by means of retransmission and forward error correction. InterSer or DiffServ IP networks and A T M networks have crossed this boundary by providing hosts with statistical QoS guarantees although still bearing such statistical multiplexing characteristics as dropping data packets when router buffer is full, having longer queuing delays and larger delay jitter when network is congested. Bit reversal errors • Single bit reversal error • Burst bit reversal error Bit errors typically occur because of physical interference such as electro-magnetic interference, electro-static discharge, lightning, and power surges. For the typical copper cable and optic fiber, the BER is about 10 6 to 10 "7 and 10 - 1 2 to 10 1 4 respectively. Most bit errors can be detected with high probability and corrected. But in a worst scenario, overwhelming errors are incorrectable, thus causing packets to be discarded. 6.2.2.1 IP Packet loss • Packet loss due to unrecoverable error within packet header 99 • Packet loss due to router queue overflow in congestion or bandwidth limitation • Packet loss due to excessive delay • Packet loss due to node / link crash and reboot x • Packet loss due to system state change or human intervention • Packet loss due to mishandling or misrouting • Packet loss due to FEC failure There are generally three types of packet losses. Type one packet loss is caused by unrecoverable random errors in the IP packet. If errors happen to corrupt packet header information, the router may not know where to forward the packet or send it to the wrong destination. Type two packet loss is caused by network congestion or long delay. Type three packet loss is caused by network disturbance such as router crashes, area power failure, and links reset etc. 6.2.2.2 IP packet delay • Packet delay due to packet transmitting • Packet delay due to packet serializing • Packet delay due to router multiplex queuing • Packet delay due to router processing 100 • Packet delay due to segmenting and reassembling • Packet delay due to protocol processing To further complete the list, three more delay types can be added if the end-to-end overall delay is of concern: • End to end delay due to encoding and decoding • End to end delay due to error controlling and FEC processing • End to end delay due to acquiring and displaying According to the studies by Karlsson [2], "protocol processing is a major cause of delay. It includes framing of information, calculation of checksums, and address lookups in hosts and switches (routers). UNIX process scheduling is a major problem that can be relieved by using operating system with real-time scheduling, or by taking the operating system out of the information transfer altogether. In general, protocols should be implemented to reduce maximum delay, not only to maximize throughput". Among all listed types of delays, the queuing delay is the most varying one due to the asynchronous statistical multiplexing implementation of the packet switched network. Queuing delay at each node depends on the instantaneous load, amount of buffer space, and the service discipline. The overall queuing delay of a specific routing path wil l take into account all the individual routing nodes and the multiplexing methods applied (i.e. deterministic or statistical multiplexing). 101 6.2.2.3 Packet delay variance (jitter) • Delay jitter due to the dependency of queuing delay and the instantaneous load • Delay jitter due to the dependency of queuing delay and buffer space • Delay jitter due to router resource contention or bandwidth limitation for hardware processing • Delay jitter due to router resource contention for operating system or software processing • Delay jitter due to routing algorithm • Delay jitter due to unicast or multicast routing path changes • Delay jitter due to node / link crash and reboot • Delay jitter due to system state change or human intervention A l l network jitters can be categorized into three types. Type one jitters are caused by the variation in queuing delay under different levels of load or buffer size. Type two jitters are contributed by both hardware and software resource contention. For example, routers using parallel processing techniques will encounter the situation where multiple incoming packets are to be routed to the same output link. Type three jitters are mostly introduced by path or system state changes. For example, if a forced path change sets up a new route with a different number of nodes, delay variance occurs. 102 6.2.2.4 Bandwidth reduction • Bandwidth reduction due to degradation in router performance • Bandwidth reduction due to congestion (apply to VBR, ABR) • Bandwidth reduction due to traffic enforcement or policing weakness • Bandwidth reduction due to QoS degradation • Bandwidth reduction due to QoS re-negotiation The bandwidth reduction may cause link traffic congestion that increases packet loss, packet delay and jitter. 103 6.2.3 Error and impairment sources- Communication Protocol In this section, error and impairment sources are studied from a communication protocol point of view. By analyzing errors and impairments according to their existence and propagation within protocol layers, error detecting, isolating and eliminating techniques can be developed to improve the network performance in MPEG-2 video stream delivery. IP Network layer protocol errors and impairments: • Due to the protocol encapsulation structure, the packet loss, packet delay and jitter of the underlying layer propagate to the IP network layer. • The IP layer introduces extra packet loss, packet delay and jitter into the packet stream due to multiplexing. • Packet is dropped due to the TTL (time to live) time out. • Packet is dropped due to detected checksum error in packet header. • Packet will be mis-routed if error happen to be in the destination IP address and is undetected. • Single packet fragment loss will cause multiple fragment packets being dropped at destination. • Packets may arrive out of order because of multiple paths. • Packet may be duplicated because of re-transmission. 104 Packet loss due to routing loop and fluttering. Unreachable IP host address. IP network "best-effort" service makes no guarantee of packet loss rate, packet delay and jitter bounds and bandwidth. Receiver unable to reserve path using RSVP due to resource contention. Integrated services IP network unable to provide guaranteed or predictive services reserved using RSVP. Integrated service IP network might fail to monitor and regulate traffic Integrated service IP network might fail to prevent congestion Differential service IP network might fail to provide CoS/QoS to the path since internal resource management and usage are maintained by each domain. Differential service IP network might fail to provide CoS/QoS due to inaccurate traffic classifying, marking and policing at customers Egress node ( or the Ingress node of provider). Differential service IP network might fail to provide end to end CoS/QoS service due to inaccurate traffic metering and PHBs (Per-Hop Behaviors) mapping at inter domain boundaries. 105 6.3 Error detection, prevention and concealment As illustrated in figure 6-10, from a system perspective, delivering real-time digital video over IP network involves several main components: video encoder / decoder, video data stream packetizer, synchronizing mechanism, error correction coding and the data communication channel. BBS Sequence HflH Sequence Rate Control B E A V B V Q Scale MPEG-2 Encoder MPEG-2 Decoder Error Correction Encoder Packetizer Synchronizing Timestamp Error Correction Decoder Depacketizer Recovered clock Synchronizing Timestamp Network Access Control Terminal Access Control Channel A Channel Figure 6-10 A Diagram of Source Coding & Channel Access 106 Error handling techniques in data communication channel and packet video have been extensively studied for some time. In general, they can be categorized into three groups: 1. Error prevention techniques (sender centric): error control coding such as FEC (forward error correction). 2. Error concealment techniques (receiver centric): error concealment and post video signal processing. 3. Adaptive techniques: solutions involve sender, receiver, and the underlying communication network to provide adaptive, scalable and layered error resilient approaches to gracefully handle the video quality degradation to combat errors. Effectiveness and Efficiency are two most important criteria in evaluating error-resolving techniques. Under many circumstances, these two criteria are in conflict with one another and need to be carefully balanced. As an example, a very effective error correction technique might need a larger code overhead, which could largely increase the network traffic load and induce heavier congestion. The increased congestion will eventually influence the over all effectiveness of this technique from a system point of view. As demonstrated in chapter 4 and section 6.2, due to the V L C (variable length coding) of the MPEG-2 standard, random bit error can destroy the V L C data synchronization and cause all of the data following the error to become non-decodable until the next synchronizing start header. Since random bit error and packet loss both cause multiple consecutive losses of data in the video stream, the results of the error studies apply to both. Error detection, prevention, and concealment at both the application layer and the IP layer provide quality improvements to the transported MPEG-2 encoded video. Figure 6-11 is a summary list of all the techniques. 107 Figure 6-11 Solutions to Errors Error Prevention Solution to Errors Error Concealment MPEG-2 Scalability Forward Error Correction Coded Slices Randomizing Prioritized Stream Multiple Description Robust Entropy Coding Selective Retransmission Spatial Redundancy Temporal Redundancy Network Control Adaptation Interleaving Interlacing Frequency Redundancy Space Redundancy 108 6.3.1 Error detection 6.3.1.1 External error detection The channel decoder detects error in the received transport packets with high reliability. A corrupted packet will be indicated in the transport header. Another packet loss detection is to provide sequence numbering in payload of packets or cells. 6.3.1.2 Internal error detection Source decoder can detect errors by evaluating syntax and semantics in a bitstream. For example: a decoder will signal a syntax error if it detects non-existing values in the FLCs or VLCs , or invalid flags and selections. A decoder will detect a semantic error if it can not find EOB in coded blocks. 6.3.2 Error prevention 6.3.2.1 MPEG-2 scalability MPEG-2 layered scalability coding provides packet loss resilience as well as the flexibility of traffic shaping. There are four scalability coding methods specified: Data Partitioning, Signal-to-Noise (SNR) Scalability, Spatial Scalability and Temporal Scalability. Scalability coding methods generate at least two bitstreams: the base layer and the enhancement layer bitstreams. The base-layer bitstream can be decoded independently to generate a lower quality version of the encoded video. The main difference between the 109 various techniques is in the content of the base layer. In the Data Partition technique, the base layer contains a reduced set of DCT coefficients. In SNR Scalability, the base layer consists of a coarsely quantized version of the video. In Spatial Scalability, the spatial resolution of the base layer can be reduced. In Temporal Scalability, the base layer has a reduced temporal resolution. In all cases, the enhancement layer contains the necessary information to obtain the high-quality video [95 ] . In priority and quality sensitive networks such as an A T M , the RSVP IP network, or the DiffServ IP network, the base layer can be transmitted using a higher or better quality channel with lower packet loss rate, bounded delay and jitter. It is possible to use a constant bit rate for base layers in order to allow the network to provide tighter regulated QoS and bandwidth. The enhancement layer can use variable bit rate to take advantage of network multiplexing gain that normally means lower price for network users. The coding structure and traffic shaping also fits well with the A B R type network modes. In this coding scheme the packet loss has less of an effect on the main quality of the final video and therefore provides the error resilient characteristic. In the experiment done by Reibman et al. [ 95 ] , the final video quality of all scalability coded bitstreams outperforms the non-layered counterpart. Data Partition is so easy to implement that its use can be justified for many applications. SNR Scalability gives improved quality and more flexibility in traffic shaping at the expense of increased complexity, particularly at the encoder side. Spatial Scalability provides the best layered coding performance but also provide high implementation complexity. 110 6.3.2.2 Priority nature of the MPEG-2 video stream As demonstrated by the study and analysis in subsection 6.2, data in MPEG-2 video stream exhibits different level of sensitivity to error. If corrupted, higher priority data will cause more severe quality degradation. It is desirable to transport high priority data, such as headers, using reliable protocols and higher quality channels. For example, TCP can be used to transport headers and higher CoS/QoS channels in DiffServ Internet to transport I frames. 6.3.2.3 Forward Error Correction Due to the real-time nature of the MPEG-2 bitstream, a round trip delay of re-transmitting the lost or corrupted packet is generally not acceptable. It appears highly beneficial to use preprocessed active recovery schemes such as the forward error correction FEC. The quantitative objective of a well-chosen (N,K) code should be the maximization of the rate R = K / N , minimization of FEC coding / decoding delays and the provisioning of adequate protection under packet loss. The rate R = K / N determines the fraction of the overall rate allocated to the source coding operation. Therefore, one would like to operate at a code rate as close to one as possible in order to minimize the throttling of the source coding rate and thus maximize reconstructed video quality under light losses. Using the class of Reed-Solomon RS codes, the process of code selection is formulated as a maximization of K / N while keeping the FEC coding delay and decoded packet loss below specified thresholds [9]. I l l Based on their research on the FEC transport coding scheme, Vastola et. al proposed a RS solution based on the following selection and application criterion: 1. There is necessity to use Interlaced FEC coding in an A T M environment. 2. Using Interleaving can improve FEC efficacy. 3. There is a bounded small processing delay of 0.8ms vs. 20 to 100 ms end-to-end overall delay based on the selected code. 4. Code selection based on maximized R, bounded Decoded Loss Probability and FEC Coding delay. As a conclusion, the selection of a single code for all operating rates is questionable. Rather, it would be better to prestore a fixed group of good codes for a selected rang of operating parameters. At low operating rates, the use of FEC may be accompanied by excessive throttling which would translate to excessive sacrifice in coding quality to accommodate the FEC bandwidth expansion. For most cases, codes of small to moderate code length (< 63) performed very well as long as the code rates were properly chosen. The code selection strategy provided robust performance even under conditions of mismatch in choice of channel parameters for which the codes were selected. 6.3.2.4 Encoded slices randomizing MPEG-2 encoded slice represents a 16 pixels height stripe on the final image. The coded slice is also a re-synchronizing unit in a picture. In correlated bursty error corruption and packet losses, it is likely that multiple slices, which have close proximity to each other in a picture, will be corrupted. The visibility of this error is worse then if the slices were randomly apart from each other. This error also creates difficulty in spatial and motion vector concealment since the remaining redundancy between neighboring slices has been removed. 112 Given an MPEG-2 encoded bitstream, it is possible to implement the slice randomizing within the I frames before packetizing and transporting. On the decoded side, after the packetized bitstream was recovered, slices can then be relocated to the correct location according to the information provided by Slice Start Code and the row location of the first Macroblock. Since pictures are displayed by a fixed interval, the cost of this implementation is the randomizing process delay in the encoder side and slice relocating process delay in the decoder. The effectiveness of the scheme will be evaluated in the future. 6.3.2.5 Multiple Description Coding This technique assumes that multiple channels exist between source and destination with non-correlated error events. Video sources are separated into multiple data streams (descriptions) and transmitted over different channels. At the destination, the M D C decoder combines the received descriptions to reassemble the stream. The criterion of the M D C is that the quality of the reconstructed video from any one description is acceptable and the increment improvement is achievable with more descriptions. 6.3.2.6 Robust entropy coding To mitigate the disadvantage of V L C code and prevent error propagation, several techniques have been proposed. 1. Self-synchronizing entropy coding: a distinct synchronizing code word is inserted into the stream to enable the decoder to regain synchronization. 2. Error-resilient entropy coding: as proposed by Kingsbury [93], where variable length bit streams from blocks are distributed into slots of equal size. A 113 predefined offset sequence is used to search for empty slots to place any remaining bits of blocks that are larger than the slot size until all bits are packed into one of the slots. This way, the decoder can regain synchronization at the start of each block, which also stops the error propagation. 3. Reversible variable length code (RVLC): R V L C is implemented in MPEG-4 standard. It is designed so that data can be recovered backwards from the next synchronizing code word detected until the first decodable code word after the error. 6.3.2.7 Retransmission If the application is non-interactive with relaxed delay bounds, retransmission of lost packet can also be a solution to packet loss errors. As retransmission posts longer delays and larger bandwidths overhead, the FEC techniques in practice are used to fix the error of a single packet loss. Retransmission is used only to deal with burst packet losses where the long retransmission delay is acceptable to the application. Selectively retransmiting high priority data in MPEG-2 video stream using table 6-1 can reduce the large bandwidth overhead since some errors in the B frame might not be perceptible by the human eye. 6.3.3 Error concealment 6.3.3.1 Error concealment by exploiting spatial redundancy In this implementation the redundancy information in the surrounding Macroblock pixels are used to interpolate the value of pixels in a corrupted Macroblock. The interpolation is done either in linear or polynomial combing. The resulting Macroblock normally has a 114 block artifact. To reduce the visibility of the artifact, a multidirectional edge smoothing technique is applied to the Macroblock. Luo et. al. proposed an artifact reduction technique for low bit rate DCT-based image compression [89] by using three processes. First, the DC calibration adjusts the reference level of a block as a whole, which results in noticeable image quality improvement. The second, an improved Huber-Markov Random Field H M R F model is developed to differentiate the artifact from image details. The third, Iterative Conditional Mode I C M implementation is computationally effective and avoids undesired large scale effects in enforcing the localized image model. 6.3.3.2 Error concealment by exploiting temporal redundancy The temporal redundancy between consecutive pictures is used to recover missed Macroblocks. In a simpler implementation, the previous anchor picture is simply copied into the predicted erroneous picture. The artifact will be large if there is movement between the two pictures. A better method is to use interpolated motion vectors from the nearest macroblocks in the current picture and it is named as motion compensated temporal error concealment. In MPEG-2 encoded bitstreams, error damages the whole slice, therefore, the reference motion vector normally comes from the above or under damaged Macroblock. 115 6.3.3.3 Error concealment by exploiting space redundancy The approach used the available I frame motion vector defined in an MPEG-2 standard to transmit space redundancy information. In the original Macroblock information loss, the motion vector is used to locate the most similar Microblock in the neighboring area and use it as recovered Macroblock. Although this method requires the encoder to code the concealment vectors for I frames and also moderately increases bit rate (0.7%), the tested results show that the subjective picture quality is noticeably superior to conventional temporal replacement [45]. 6.3.3.4 Error concealment by exploiting frequency redundancy Between two neighboring blocks, DCT coefficients are correlated (especially the low frequency ones). This is the redundancy exploited by frequency error concealment. The technique performs a linear or polynomial interpolation of adjacent DCT coefficients. The efficiency and complexity of the algorithm used determines the quality of the reconstructed block. The implementation of this technique requires accuracy and certainty of the coefficient estimated. An error in estimation could consequently cause visible artifacts. The low frequency coefficients are used because of the higher values and therefore lower estimation errors. 6.3.4 Adaptive network control Most error prevention techniques add redundancy into the raw stream and require varying degrees of protocol support. Currently, there is no standard protocol framework for requesting retransmission of streaming media. In order to provide streaming media 116 services such as MPEG-2 VoD, more and more network bandwidths will be used to carry the retransmission of lost packets and additional control traffic flow to request retransmission. The result is an increase in network congestion. When it reaches a point where the network is severely congested, all FEC and error handling techniques will fail. This is the reason why streaming media applications need to consider both effective error handling techniques and adaptive network congestion control protocols. Therefore, the future of transport MPEG-2 video over DiffServ Internet lies in developing adaptive applications as an adjunct to more robust network protocols that gracefully handle data loss and degradation of throughput while attempting to satisfy user requirements. 117 7 Experimental results and numerical analysis The final stage of the project is to conduct well-planned and meaningful experiments in an integrated system environment. The experiment procedures integrate the research results of all technical elements of the project, including network architecture and implementation, IP network performance metrics, MPEG-2 video implementation, and digital video quality evaluation. Through extensive experimentation, we will be able to investigate in detail the interesting behaviors of MPEG-2 video when transported in a "best effort" IP networking environment. 7.1 Objectives of the experiment As illustrated in Figure 7-1, Delay, Jitter, and Loss of the IP packets are the well known impairments of the "best effort" IP network. Amongst them, the IP packet loss affects reconstructed MPEG-2 video quality significantly. Based on initial analysis results described in Chapter 2 to 6, and the research objectives outlined in the section 1.3, the objective of the experiment is to investigate and to quantify effects of IP packet loss on MPEG-2 video reconstruction in moderate to heavily loaded IP networks. The outcomes of the experiments will be valuable in three areas: 1. Allow a quantitative understanding of the quality of service (QoS) requirements to be provided by the network layer in order to transport MPEG-2 real-time video. 2. Assist in development of adaptive applications that are more robust and able to combat network impairments. 118 3. Using the experiment data, a network provider will be able to predict perceived video quality by starting from IP performance metrics. IP Packet MPEG-2 © © © Host Delay/Jitter Corrupted MPEG-2 IP-Layer CCD 0 0 7 O O Traffic Generator Packet T ,oss Client Database © Time-stamped MPEG-2 Frames Figure 7-1 IP Network Laver Impairments 7.2 Description of the experiment Technical elements of research and development were introduced through chapters from 2 to 6 in the dissertation. For clarity, a brief summary of the technical elements is outlined. Following this summary, detailed experiment procedures are described. A brief summary of the technical elements involved in the experiment: 1. Designing, constructing and configuring of the routed IP testbed network as shown in Figure 7-2. The network consists of an IP router, background traffic generator and traffic sink, CMFS video server, video client, and switches. To study the DiffServ, the 119 router kernel can be modified to incorporate the PHB (Per-Hub Behavior) strategies proposed by IETF DiffServ Working Group using more sophisticated queuing disciplines e.g. class based queuing. 2. Designing the video client, implementing the network background traffic generator, and packet loss measurement. 3. Implementing MPEG-2 video Elementary Stream. Test streams are carefully selected, re-encoded, tested, verified, and prepared for the experiments. Coding parameters for the selected test streams are specified as in Table 4-3. The detailed statistics of the encoded MPEG-2 test streams are available in Table 4-4. 4. Studying digital video impairments and implementing objective and subjective video quality assessments. The test architecture and methodology deployed for application layer (video implementation) is established and outlined in Figure 5-1. Techniques for objective measurement and subjective assessment of the video quality are specified. Programs that provide temporal realignment and simple error handling for the test clips are also created. 5. The fifth technical element is focused on transport issues, e.g. transport protocol and transport error affected MPEG-2 video."Our studies in following three issues helped in setting clearer focus for experiments and established a common ground in error issues. 1. A detailed analysis on MPEG-2 transport errors, error-affected video quality, error propagation, and error sources, exploits the error characteristics of the MPEG-2 video. 2. The study on error sources relates particular MPEG-2 errors to the 120 IP network layer impairments. 3. A survey on recent error detection, prevention and concealment techniques provides multiple viable error solutions under varying circumstances. MOS Evaluation Center (Host 05) Subnet 02 Subnet 01 Host 04 • • • • • • i Subnet 03 Figure 7-2 Routed IP Testbed Network Topology (Same as Figure 2-1) 121 Figure 7-2 is the testbed network topology diagram introduced in Chapter 2. It is used here again to help illustrate the experiment procedures. Experiment procedures: 1. The IP Router is stressed to overflow the router buffer by competing traffic. As a result, the IP router performs "tail drop" style simple IP packet discard strategy when congested (the fractal network traffic characteristics are topics of current academic studies and also briefly discussed in Chapter 3 of the dissertation). The NetPerf IP network performance benchmark is selected as background IP traffic source. NetPerf allows user to specify the preference of IP packet type (TCP or UDP packet), packet size, packet count, and burst interval to create highly bursty but controllable IP traffic load to the network. 2. The competing IP traffic paths are shown as traffic A and B in the figure 7-2. The maximum link bandwidth between the Fast Ethernet switch and IP Router is about 100 Mbit/s. If the aggregated traffic load in this link is higher than the bandwidth, the switch becomes congested and starts dropping packet. Since we are only interested in studying the packet loss statistics of the IP router, to avoid false packet loss from the Fast Ethernet switch, two paths, A and B, are necessary to carry an aggregated input of over 100 Mbit/s background traffic to the IP router. 3. Previously selected MPEG-2 test streams are encapsulated in RTP packet and deposited in CMFS server. The bitstream is encoded as C B R and requires 6 Mbit/s (plus protocol packet header overhead) network bandwidth. In each experiment 122 section, the test stream is transported from CMFS server to the destination video client via the IP router. 4. Specifying the Nerperf parameter in each experiment controls the level of competing traffic load. Higher level of traffic load (congestion) produces higher rate of packet loss. Based on our experience, video quality degradation is significant when IP packet loss rate is over 2%, so that all experiments are done with packet loss being controlled at a rate lower than 1.5%. 5. Network performance statistics, e.g. the packet loss rate and the delay jitter, and the transported video flow IP packet streams are captured and time-stamped. 6. Based on our video layer test architecture, the test model and methodology we established as illustrated in Chapter 5, the corrupted video stream is resynchronized and calibrated for quality measurement against the original stream. For each stream, both objective measurement and subjective assessment of video quality are derived and recorded. The correlation analysis between network packet loss and video quality finally becomes a reality. 7. In this project, we have conducted 178 experiments that transported a total of 52,400 video frames. In each test run, sixteen experimental data items are measured or calculated from the test results. Referring to table 7-1, twelve of the data items are relevant to video quality. They are PSNR, MOS Score, MSE, Unmatched Frames, Corrupted GOP count, Corrupted Picture count, Frame Loss rate, total Frame Sent, total Frame Error count, Slice Header Loss rate, Picture Header Loss rate, and GOP 123 Header Loss rate. The other four measurements, Network Load, IP Packet Loss rate, total IP Packet Sent and total Data Loss (byte) are network performance statistics. Video Quality Experimental Data Items PSNR MOS Score M S E Unmatched Fr. Corrupted GOP Corrupted PIC Frame Loss % Frame Sent Frame Error Slice Hdr. Loss PIC Hdr. Loss GOP Hdr. Loss Network Performance Experimental Data Items Network Load IP Packet Loss IP Packet Sent Data Loss Table 7-1 Experimental Data Items 7.3 Numerical Analysis & experimental results 7.3.1 Quantitative relationship between MPEG-2 video quality and IP packet loss Despite significant advantages over its analog counterpart, digital video such as MPEG-2 video presents unique digital impairments. There are many digital impairment sources from video capture to display, e.g. encoding, multiplexing, servers, transporting, and decoding process. Different impairment sources may induce very different and highly visible artifacts, e.g. block artifact, picture hesitation, motion blurring, loss of synchronization, and color bleed etc. Impairment sources and their impacts on video quality can be de-coupled from each other and investigated separately. IP network packet 124 loss errors normally introduce severe quality degradations, such as blocking, repeated and jerky movement, and flickering artifacts, to MPEG-2 video. There have been tremendous amount of research activities and industry interests centering on the area of delivering M P E G digital video over wide area computer data networks. Interestingly, our extensive search in academic publications and technical literatures indicated that our project seems to be the early ones that studied the transport issues using a system approach. We take consideration of M P E G video characteristics, IP network behaviors, video server and client, as well as their interventions. The experimental data collected from the testbed network provides information allowing for quantitative analysis of these interventions. The quantitative information, which will be discussed and extended later in this chapter, can only be obtained through extensive experiments in a real setting. By studying quantitatively the empirical relationship between MPEG-2 video quality and IP packet loss, we are able to meaningfully interpret collected data with respect to QoS requirements on delivering MPEG-2 video streams in IP packets. This in turn will provide aid for TP network traffic engineering, traffic shaping and management i.e. the Internet Differential Service model CoS/QoS provisioning. Implementations such as designing of a video server or FEC algorithm can also benefit from the data set since it presents the real characteristics of video behavior in a lossy TP network. As indicated in the experimental procedure, the video quality (PSNR and MOS) and the packet loss rate are obtained when the TP router is subjected to various levels of competing traffic. The video quality and packet loss data pair is bivariate in nature. Scatter-plot is suitable to graphically unveil the empirical relationships between the 125 bivariate data pairs. As shown in Figure 7-3, we use the ordinate to represent the video quality (PSNR or MOS) and the abscissa to represent the packet loss rate. V i d e o - H V i d e o - M V i d e o - L 40 30 CO £ 20 CO Q . V i d e o - H 0 0.5 1 1.5 ( d ) p a c k e t l o s s % 10 * * 0. 0.5 1 ( b ) V i d e o - M 1.5 * * * 0 0.5 1 1.5 ( e ) p a c k e t l o s s % V i d e o - L 0 0.5 1 ( f ) p a c k e t l o s s % Figure 7-3 Empirical Relationship between Video Quality and IP Packet Loss Rate. As defined in section 4.8, in Figure 7-3, Video-H, Video-M and Video-L are used as titles of the sub-graphs in place of test video clip Cheer, Ballet and Susi to represent High, Medium and Low level of motion and spatial activity, respectively. Figure (a-c) show 126 graphically the empirical relationships between PSNR and IP packet loss for video-H, video-M, and video-L. Figure 7-3 (d-f) show the empirical relationship between MOS and IP packet loss for each test stream. In all scatter-plots, the reference curves are obtained using the Least-Square Regression estimation. Having experimental data graphically plotted and displayed in Figure 7-3, a more important task is to analyze the data and investigate their underlying empirical relationships. Based on Figure 7-3, Figure 7-4 is used to define and study the properties that reside within the experimental data. To study the video quality issues, we must first decide how to interpret the video quality using experimental data. Video quality boundaries, which will be defined later and shown in Figure 7-4, are also useful in identifying the degree of subjective quality degradations. Since human viewer is the recipient of the video service, the MOS subjective video quality measurement is considered to be the true video quality. Objective video quality measurement results such as PSNR need to use the subjective measurement to correct its inaccuracy. Referring to Figure 7-4 (d-f), we defined the area with MOS score between 5 and 3 as the Low Degradation Zone (LDZ), the are with MOS score between 3 and 2 as the Annoying Degradation Zone (ADZ), and between 2 and 0 as the Very-annoying Degradation Zone (VDZ). As illustrated in each sub-picture of the Figure 7-4 (d-f), to separate three video quality zones, we use Low Degradation Point (LDP) at MOS score 3, and Annoying Degradation Point (ADP) at MOS score 2 to define the two quality boundaries. 127 Video-H Video-M Video-L 0.5 1 1.5 ( d ) packet loss % — ! , 1 'K - N A D P — & s KJ * * I 0.5 1 1.5 ^ e •) packet loss % 0.5 1 1.5 ^ f packet loss % Figure 7-4 Video Quality Interpretation & Quality Boundary Definition. Dashed arrow lines indicate the process that the interpretations of the video quality are based on the subjective video quality measurements. Based on the video quality boundaries that are defined in Figure 7-4 (d-f) using MOS measurements, we are able to locate the same boundaries in PSNR measurements in Figure 7-4 (a-c). As illustrated in Figure 7-4, the LDP and the A D P in PSNR sub-graphs (Figure 7-4 a,b,c) are located by referring to their corresponding LDP and A D P in the 128 MOS sub-graphics (Figure 7-4 d,e,f respectively) under the same level of packet loss. The arrow dash lines that are extended from LDP and A D P in the MOS sub-pictures to LDP and A D P in the PSNR sub-pictures indicate this process. This way, we established video quality properties and boundaries in respect of both objective and subjective measurement data. Watching a video with quality above the LDP (in LDZ), the viewer will occasionally observe corrupted slices. With video quality between LDP and A D P (in ADZ), the viewer will observe a large number of corrupted slices and occasional flickering due to the loss of one or more frames. With video quality below A D P (VDZ), the viewer will observe a strong flickering due to multiple damaged frames. Figure 7-3 and 7-4 are scatter-plots presenting graphically the underlying empirical relationship between the video quality and IP packet loss. In following paragraphs, by exploiting the experimental data quantitatively using scientific statistical analysis, we establish accurate quantified relationships between various parameters. Table 7-2 provides the quantitative study results for the empirical relationships based on Packet Loss. As listed in the upper portion of the table 7-2, Packet Loss values and their corresponding PSNR values for Video-H, M and L are available. Given a Packet Loss value, the corresponding PSNR value can be found in the table together with its 95% Confident Interval (95% CI) range. The 95% Confidence Interval is estimated using linear regression. For example, Video-M is transported across an IP network which is experiencing 0.5% of Packet Loss. We may predict the PSNR of the corrupted Video-M by first finding "0.5" under the Packet Loss from the upper portion of Table 7-2. We can 129 then locate the PSNR value (29.1 dB) from the intersect between the raw of 0.5% Packet Loss and the column of Video-M. The 95% Confidence Interval range (28.3dB ~ 29.9dB) of the PSNR value can also be found from the next cell in the row. Packet Loss (%) PSr> 1R (dB) Video-H 95 % CI Video-M 95 % CI Video-L 95 % CI 0.1 36.4 34.8 - 38.1 36.8 35.3 - 38.0 36,8 35.5 - 38.2 0.3 31.8 30.5 - 33.1 32.9 31.9-33.9 32.7 31.3-33.8 0.5 27.2 26.1 -28.3 29.1 28.3 - 29.9 28.5 27.6 - 29.4 1.0 15.7 14.6 ~ 16.7 19.4 18.6 - 20.3 18,0 17.0- 19.1 1.5 4.2 2.4 - 5.9 9.8 8.3 - 11.3 7.6 5.9 - 9.3 Packet Loss (%) MOS Video-H D E L T A Video-M D E L T A Video-L D E L T A 0.1 4.6 0.43 3.8 0.46 3.6 0.49 0.3 3.7 0.43 3.1 0.46 3.1 0.49 0.5 3.2 0.43 2.8 0.45 2.8 0.49 1.0 1.6 0.42 1.2 0.46 1-6 0.50 1.5 0.5 0.45 0.5 0.47 0.6 0.52 Table 7-2 Empirical Relationship Study Based on Packet Loss Using Statistical Analysis. The upper table presents the Packet Loss to PSNR relationship with 95% Confident Interval estimation. The lower table presents the Packet Loss to MOS relationship with delta range. Similarly, the quantitative relationship between Packet Loss values and its corresponding MOS values are provided in the lower portion of the table 7-2. The main difference between the Packet Loss / PSNR table and the Packet Loss / MOS table is that the Packet 130 Loss / MOS quantitative relationship is non-linear therefore the MOS value is estimated using non-linear regression of 6th order polynomial estimation. Given a Packet Loss value, the estimated MOS value "±" D E L T A value should contain at least 50% of the predictions. For example, given Packet Loss value "1%", for Video-H, the estimated MOS score is 1.6. The accuracy of this estimation is that MOS score 1.6 ± 0.42, which is 1.18 ~ 2.02, should contain at least 50% of the estimations. MOS PS> JR (dB) Video-H D E L T A Video-M D E L T A Video-L D E L T A 5 39.08 3.64 40.07 4.62 39.79 5.10 4 35.47 3.47 37.76 3.53 37.19 3.95 3 23.06 3.37 31.93 3.36 31.17 3.69 2 16.67 3.47 22.64 3.38 21.63 3.73 1 12.20 3.43 18.32 3.42 11.76 3.77 MOS Packet Loss (%) Video-H D E L T A Video-M D E L T A Video-L D E L T A 5 0.06 0.14 0.04 0.24 0.00 0.29 4 0.24 0.13 0.10 0.18 0.05 . 0.23 3 0.66 0.13 0.36 0.17 0.28 0.21 2 0.89 0.13 0.77 0.17 0.76 0.21 1 1.09 0.13 1.08 0.17 1.23 0.21 Table 7-3 Empirical Relationship Study Based on MOS Using Statistical Analysis. The upper table presents the MOS to PSNR relationship with delta range. The lower table presents the MOS to Packet Loss relationship with delta range. 131 In practice, consumer or service provider might be interested in intrusively monitor the network QoS. They may need to predict the network packet loss using the video quality in the receiving host. To investigate the experimental data from this viewpoint, Table 7-3 is created to provide the quantitative study results based on the subjective video quality MOS score. Similar to the lower portion of Table 7-2, the upper portion of the table 7-3 presents the MOS scores and their corresponding PSNR values for Video-H, M and L. . Given a MOS score, the corresponding PSNR together with its D E L T A range for a specific type of video (video-H, M , or L) can be found. The quantitative relationship between MOS scores and their corresponding Packet Loss values are provided in the lower portion of the table 7-3. The predicted Packet Loss value ± D E L T A value should contain 50% of the predictions. Usage example: if the subjective quality MOS score is "3", for Video-L, Video-M and Video-L, the estimated PSNR values will be 23.06dB, 31.93dB and 31.17dB, respectively. Under the same video quality, the estimated Packet Loss values for Video-H , Video-M and Video-L are 0.66%, 0.36% and 0.28%, respectively. Quality Boundary PSNR (dB) Packet Loss (%) Video-H Video-M Video-L Video-H Video-M Video-L L D Z (MOS 5 ~3) 39.1-23.1 40.1-31.9 39.9-31.2 0.0-0.66 0.0-0.36 0.0-0.28 A D Z (MOS 3 ~2) 23.1-16.7 31.9-22.6 31.2-21.6 0.66-0.9 0.36-0.8 0.28-0.8 V D Z (MOS 2 ~0) < 16.7 <22.6 < 21.6 >0.9 >0.8 >0.8 Table 7-4 PSNR and Packet Loss Ranges for the Quality Boundaries 132 As shown in Figure 7-4, video quality boundaries are specified based on MOS subjective quality of the video. Since measurements are normally taken using PSNR for video quality and Packet Loss for network quality, it is necessary to associate the video quality boundaries to the PSNR and the Packet Loss measurements. Table 7-4 is a mapping of video quality boundaries to PSNR and Packet Loss. As a usage example, the V D Z zone for Video-M means that the video was transported through an IP network that experienced greater than 0.8% of packet loss. 7.3.2 Dominant contributor to video quality degradation An MPEG-2 video bit stream is a prioritized stream of coded data units. As summarized in table 6-1, some headers, such as the P-frame Picture header, are more important than the other headers, such as the I-frame Slice header or the B-frame Picture header. The prioritized nature can even be extended to each bit in a data block where a bit in an earlier position of the stream is more important than one later on. This is one of the digital video features that differ from analog video bit streams. On the other hand, packet loss protection techniques normally add redundancy into the original bit stream, increasing the demand on bandwidth. Without knowing how highly prioritized MPEG-2 video bit streams react to packet loss, blindly engineered error protection techniques may be an inefficient means to deal with video quality degradation; they may increase the bandwidth enough to render the compression useless. Therefore, prior to determining how to protect the video stream from IP packet loss, a legitimate question is that: What should be protected in the MPEG-2 bit stream. 133 Figure 7-5 Effects of Slice Loss & Picture Loss on Video Quality. The PSNR and MOS variation in slice loss related results (a) and (b) is much smaller than that in picture loss related results (c) and (d). In Figure 7-5, for Video-H, sub-pictures (a) shows the empirical relationship between PSNR and the slice loss. Sub-picture (c) shows the empirical relationship between PSNR and the picture loss. Sub-pictures (b) shows the empirical relationship between MOS and the slice loss. Sub-picture (d) shows the empirical relationship between MOS and the picture loss. By comparing the slice loss related results (Figure 7-5 a,b) against the picture 134 loss related results (Figure 7-5 c,d), it is clear that the variance of the video quality (PSNR and MOS) due to picture loss is much greater than that due to slice loss. For example, the MOS score differences in Figure 7-3 (d) for a given Packet Loss value are always as large as 3. In contrast, the MOS differences in Figure 7-3 (b) are normally less than 1. This phenomenon in our experiments leads to a rather interesting result: The video quality degradation is closely related to slice loss ratio but not to picture loss ratio. This result explains the following observation: in an MPEG-2 video, a slice loss normally presents a corrupted horizontal bar, a block artifact or a localized jerky motion, while a picture loss can be seen as video hesitating, repeating or flickering. In our experiment, when network congestion increases, the video quality degrades along with the increase in occurrences of corrupted bars or block artifacts. By the time viewers perceive flickering or frame repeating, which are the typical symptoms of picture loss, the video quality is already badly damaged by corrupted slices and not worth watching. Our study concludes that when MPEG--2 video is transported in IP packets, in the range of fair video quality, the loss of slices is the dominant factor that contributes to the degradation of the video quality rather than the loss of pictures. In addition, our study concludes that in the packet loss range of less than 1.5%, single packet losses account for the majority of the losses. Therefore, to provide effective and efficient loss protection, one should first address the problem of slice losses caused by single packet losses. It is not effective to protect picture level data and parameters without first reducing slice losses. 135 7.3.3 Packet loss tolerance As defined in section 7.3.1, the Low Degradation Zone (MOS score 5 to 3) is the most interesting quality area because it is in the range of acceptable video quality to viewers (in general). Our study on the LDZs in Figure 7-4 (d-f) indicates that the slope in Figure 7-4 (d) is less steep than the one in Figure 7-4 (f). To obtain a better visual demonstration, a jointed scatter-plot of Video-H and Video-L is created. Figure 7-6 is generated by overlaying Figure 7-4 (d) and (f) together without changing any units in both axes. As illustrated in Figure 7-6, the least-square estimation curve for Video-H is always higher than the one for Video-L till MOS score is lower than 1.5. This phenomenon is very visible in the L D Z (MOS score 3 to 5). A lower curve in the MOS / Packet Loss plane indicates a higher rate of video quality degradation. Based on this observation, we concluded that spatial and temporal masking effects tend to reduce the visibility of Packet Loss induced artifacts in more active areas. We can interpret the observation using both literal and quantitative descriptions. Literal description: in the acceptable video quality range (MOS between 5 and 3), the lower motion / lower spatial activity MPEG-2 video stream is more susceptible to IP packet loss impairment. Quantitative description: Video-H reaches the LDP at packet loss rates of 0.66% while Video-M at 0.36% and Video-L at 0.28% so that Video-H can tolerate more packet loss than Video-L. We can also derive PSNR value for the LDP: 23.06 dB for Video-H, 31.93 dB for Video-M and 33.17 dB for Video-L. 136 01 i i i i i i i I 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 p a c k e t l o s s % Figure 7-6 Joint Scatter-plot of MOS vs. Packet Loss For Video-H & Video-L. MOS score for Video-H is better than that of Video-L in the range from MOS 5 to MOS 1.5 It is therefore suitable to examine only low motion / low spatial activity MPEG-2 streams for the worst-case scenario video quality evaluation for video streams transported over a lossy IP network. A network provider can predict the packet loss effects on an MPEG-2 encoded video by. only investigating the low motion / low spatial activity portions of the video. 7.3.4 Correlation of PSNR and Packet Loss Studies [71] have indicated that PSNR as an objective measurement of video quality does not correlate well with the true subjective video quality. This conclusion is quantitatively 137 verified by our experiments. As shown in table 7-2, a MOS value of 3 corresponds to 23.06 dB PSNR for Video-H but 30.17 dB PSNR in Video-L, a 30% difference in value. On the other hand, the PSNR of the three video streams show strong linear correlation with packet losses as illustrated in Figure 7-4(a-c). Based on the data collected and displayed in Figure (a), (b) and (c), table 7-5 lists the Correlation Coefficient between PSNR and Packet Loss for Video-H, M L, as well as the rate and intercept coefficients calculated using linear regression. Correlation Coef. Rate Intercept Video-H -0.956 38.9 23.0 Video-M -0.950 38.7 20.2 Video-L -0.946 38.9 20.9 Table 7-5 Experimental Results for Correlation Coefficient, Rate & Intercept The similarity in the Rate and Intercept coefficients means that the empirical relationships between PSNR and Packet Loss of the three streams are almost identical. The Correlation Coefficients for Video-H, M and L are 0.956, 0.950 and 0.946 respectively. The Correlation Coefficients are close to 1.00, which demonstrates the strong linear correlation between PSNR and Packet Loss for three test streams. Quantitative proof can be found from Table 7-3. Given the packet loss ratio of 0.3%, the PSNR values are 31.8, 32.9 and 32.7 for Video-H, Video-M and Video-L respectively. There is only a 2.7% difference in value. Although not correlated well with subjective video quality, as a reliable numerical comparison method, the PSNR metric is suitable for estimating IP 138 packet loss. This result is correct for PSNR calculated from MPEG-2 video streams with different levels of motion and spatial activity. In practice, the PSNR derived from a transported MPEG-2 stream can be used to passively estimate the level of IP packet loss. 7.3.5 Characteristics of MPEG-2 Data Loss vs. Packet Loss The MPEG-2 ES data is packetized according to the encapsulation scheme specified in IETF RFC2250. Slice in MPEG-2 is the error re-synchronizing point. Because IP packet size can be as large as 64 K B , RFC2250 selects the slice as the minimum data unit. RFC 2250 also specified the fragmentation rules to ensure that the beginning of the next slice after one with a missing packet can be found without requiring that the receiver scan the packet contents. p a c k e t l o s s % p a c k e t l o s s % Figure 7-7 Empirical Relationship Between Slice Loss & Picture Loss and Packet Loss 139 Figure 7-7(a) is a scatter-plot showing the empirical relationship between slice loss and packet loss for Video-H. Because of the use of RFC2250 RTP packet payload format in packetizing, slice loss agrees linearly with packet loss. Loss of an IP packet means loss of at least one slice. It also means that data loss granularity is much higher than single bit or even a short burst of bits. There is need for a suitable FEC scheme to recover data in IP packet level of data loss. Figure 7-7(b) is a scatter-plot showing the empirical relationship between picture loss and packet loss. The variations of the picture loss counts are quite large. If compared with Figure 7-4 (d), the L D Z , A D Z , and V D Z correspond to the picture loss level of 0-0.2%, 0.2—2% and 2—4% respectively. According to our experiments, picture loss rates higher than 2% produce very annoying flickering. In all test streams, I- and P—frames account for 33.2% of the total frames in the test stream. On average, 2% frame loss will result in about 2 I- or P—frame losses from our 300 frames test program. 140 8 Conclusions With Differential Service Model (DiffServ), the Internet is evolving to provide CoS / QoS to end-users. The evolution will create tremendous opportunities for real-time applications such as VoD using unicast, Pay Per View using multicast, and DTV using broadcast for MPEG-2 digital video. There are uncharted waters and challenges in the area of transport real-time MPEG-2 video over large IP networks. To characterize the network impairments on video quality, and to design effective and efficient error resolving techniques, it is necessary to study the relationship between the MPEG-2 video quality and the IP packet loss rate in high quantitative and statistical accuracy. The MSDSI (Multimedia Support in a Differentiated Services Internet) project responded to the challenge with research efforts and extensive experiments that encompass three main areas: the routed IP testbed network construction, the IP network performance metric measurement, and the MPEG-2 digital video implementation and quality assessment. The technologies, methodologies and experimental procedures applied in the project are introduced in detail through this dissertation. The conclusions as the result of the MSDSI project research effort are as follows. • Based on the data collected from the experiments, the empirical relationships between MPEG-2 video quality and IP packet loss was established. With such quantitative information, network service providers may be able to predict perceived video quality starting from IP packet loss rates. 141 Slice loss rather than picture loss was the dominant factor contributing to video quality degradation in the range of packet loss rate when MPEG-2 video is transported in IP packets. By the time a viewer perceives flickering or frame repeating, the video quality is already badly damaged by corrupted slices. Together with our observation and various research results [3,24] that majority of IP packet losses are single packets, this conclusion provides direction in designing effective and efficient techniques and protocols to combat M P E G video error under IP network: packet loss impairment. We identified that block artifact, hesitate movement and flickering was the most visible quality degradations in lossy IP network environment. Through our experiments, we observed that low motion / low spatial activity MPEG--2 video clips were more susceptible to IP packet loss. Therefore, they are suitable for worst-case scenario quality evaluation when studying the effects of packet loss. We proved quantitatively that, PSNR, as a numerical comparison method, was found to linearly correlate with packet loss in the high, medium, and low activity test streams. The PSNR value derived from a transported MPEG-2 stream can be used to estimate the level of IP packet loss passively. The PSNR as an objective measurement of video quality is also verified quantitatively by the project. It correlates poorly with the subjective assessment. Compared with A T M networks, IP packets are normally considerably larger than A T M cells. Small IP packet loss rates translate into much higher frame error rates. 142 Therefore, the traditional FEC schemes designed for single bit or short bursts may no longer be effective or efficient. New schemes are needed to provide data protection and recovery. The results provided a quantitative understanding of the QoS requirements to be supported by the network in order to transport MPEG-2 real-time video. Using the experimental data, a network provider may predict perceived video quality by starting from IP performance metric. There are also several limitations in this study. The first limitation is the lack of a tool to accurately measure the amount of motion and spatial detail within a test stream. This limitation can be mitigated using synthetic test streams in the future studies. The second limitation is that the network traffic generated using Netp'erf might not accurately emulate the traffic in the real Internet, although Internet traffic characteristics are still an active area of academic studies. Due to the limited time and available resources, we can use only three video clips in our experiments. As a solution, in the future, we can conduct more experiments using video clips that represent a wide range of motion and spatial activity to achieve more accurate experimental results and a larger data set. Through over one year of intensive study and research activities, the following objectives (also outlined in chapter 1) are also met. • To investigate and to quantify effects of IP packet loss on MPEG-2 video reconstruction in moderate- to heavily loaded IP networks. • To study MPEG-2 transport errors, error-affected video quality, error propagation and error sources. To relate particular MPEG-2 errors to the IP network layer impairments. 143 • To develop test metric and test procedures for diagnosing problems with MPEG-2 video delivery over IP networks. • To assist in developing adaptive applications as an adjunct to more robust network protocol to gracefully handle data loss and degradation of throughput while attempting to satisfy user requirements. Our study is the first step into the uncharted waters of the MPEG-2 video transport issues on IP network. The next step study may have multiple dimensions. For instance, we can transport MPEG-2 video streams in the real Internet to verify the correctness of the results obtained from the testbed network. We can investigate the effective and efficient error handling techniques to deal with firstly the single IP packet loss. We can also exploit the MPEG-2 scalability and to take advantage of the DiffServ CoS / QoS services, etc. According to our experiments^ the networked multimedia with high quality M P E G video is not yet possible to every household but will be commonplace in a not distanced future. 144 Bibliography System 1. M . McCutcheon, M . R. Ito, G.W. Neufeld. "Video and Audio Streams Over an IP /ATM Wide Area Network." Technical Report, CS Department, U B C June, 1997 2. G. Karlsson. "Asynchronous Transfer of Video." IEEE Communication Magazine, p 118-126, August, 1996 3. L Orozco-Barbosa, E Mellaney, G Gagnon, L Wang. "Experimental study of video distribution over an A T M metropolitan network." IEEE trans. Circuit and systems for video technology. 4. E Mellaney, L orozco-barbosa, G Gagnon. "Study of MPEG-2 video traffic in a multimedia L A N / A T M internetwork system." IEEE trans. Circuit and systems for video technology, p663-674, vol 7 No4, Aug. 1997 5. K. Cho. " A Framework for Alternate Queuing: Towards Traffic Management by PC-UNIX Based Routers." USENIX annual technical conferences, June, 1998 6. G. Neufeld, D. Makaroff, N . Hutchinson. "Design of a variable bit rate continuous media file server for an A T M network." IST/SPIE multimedia computing and networking, January, 1996 7. S Varma. "MPEG-2 over A T M : System Design Issues." IEEE proceeding of C O M P C O N , p26-31, 1996 8. N Chaddha, A Gupta. " A framework for live multicast of video streams over the Internet." ICTP, pi-4, IEEE 1996 9. V Parthasarathy, J Modestino, K Vastola. "Design of a transport coding scheme for high-quality video over A T M networks." IEEE trans. Circuit and system for video technology, p358-376, vol 7 No2 April 1997 10. Massimo Celidonio, giovanni Santella. "On the correlation between transmission quality-of -service (QoS) parameters and image quality of digitally transmitted video in radio terrestrial broadcasting." ICIP, p327-330, v3, IEEE, 1996 145 Networks, Internet, Traffic Models 11. Y . Bernet, et al. " A framework for differentiated services." <draft-ietf-diffserv-framework-OO.txt>, May, 1998 12. S. Blake, et al. "An architecture for differenciated services." <draft-ietf-diffserv-arch-01 .txt>, August, 1998 13. M . Borden, C. White. "Management of PHBs." <draft-ietf-diffserv-phb-mgmt-00.txt>, August, 1998 14. B . Braden, et al. "Integrated services in the Internet Architecture: an over view." RFC 1633, June, 1994 15. J. Wroclawski, "Integrated and differentiated services in the Internet - tutorial M l . " A C M Sigcomm 98, Vancouver, August 1998 16. C. Partridge. "Designing and building gigabit and terabit Internet routers -tutorial T2." A C M Sigcomm 98, Vancouver, August 1998 17. S. Shenker, et al. "Specification of guaranteed Quality of Service." RFC 2212, September, 1997 18. S. Shenker, et al. "General characterization parameters for Integrated Services network elements." RFC 2215, September, 1994 19. B . Braden, et al. "Resource ReSerVation Protocol (RSVP)- version 1 functional specification." RFC 2205, September, 1997 20. B . Braden, et al. "Resource ReSerVation Protocol (RSVP)- version 1 message processing rules." RFC 2209, September, 1997 21. Mankin et al. "Resource ReSerVation Protocol (RSVP)- version 1 applicability statement on deployment." RFC 2208, September, 1997 22. W. Willinger et al. "On self-similar nature of Ethernet traffic (extended version)." I E E E / A C M Trans, on networking, p202-213, val. 2, N o l , February, 1994 23. P. Abry, D. Veitch. "Wavelet analysis of long range dependent traffic." IEEE transactions on information theory, p2-15, January, 1998 146 24. K. Thompson, et al. "Wide-Area Internet Traffic Patterns and Characteristics." IEEE network, p 10-23, December, 1997 25. Peterson, B . S. Davie. "Computer Networks, A System Approach." Morgan Kaumann Publishers 26. D Bertsekss, R Gallager. "Data Networks." Prentice Hall 27. Walrand, P. Varaiya. "High-Performance Communication Networks." Morgan kaumann Publisher 28. Kumar. "Broadband Communications." McGraw-Hill 29. G. Kesidis. " A T M network performance." Kluer academic publishers. 1996 IP Network Performance, IPPM Metrics 30. V . Paxson et al. "Framework for IP performance metrics." RFC 2330, May, 1998 31. G. Alems et al. " A one-way packet loss metric for IPPM." <draft-ietf-ippm-loss-05.txt>, November, 1998 32. G. Alems et al. " A one-way delay metric for IPPM." <draft-ietf-ippm-delay-05.txt>, November, 1998 33. C.Demichelis, "Instantaneous packet delay variation metric for IPPM." <draft-ietf-ippm-ipdv-02.txt>, November, 1998 34. M . Mathis et al. "Empirical bulk transfer capacity." < draft-ietf-ippm-btc-framework-00.txt > November, 1998 35. G. Alems et al. " A round-trip delay metric for IPPM." <draft-ietf-ippm-rt-delay-00.txt> November, 1998 36. V . Paxon. "An introduction to Internet measurement and modeling - tutorial M2 . " A C M .Sigcomm 98, Vancouver, August 1998 37. V . Paxson. "Measurement and analysis of end-to-end Internet Dynamics." Ph.D. dissertation, April 1997 38. R. Jones et al. "Netperf. http://www.cup.hp.com/netperf/netperfPage.himl, 1995" 147 39. V . Jacobson et al. Lawrence Berkeley Laboratory, University of California, Berkeley. Tcpdump 40. M . Muuss. Netspec. http://www.tisl.ukans.edu/Projects/AAI/products/netspec/, 1997 41. U C B , L B N L , VFNT. NS - network Simulator http://www-mash.CS .Berkeley.EDU/ns 42. A Helmy, D Estrin. "'Stress' testing applied to a multicast routing protocol" 43. R Kreula, H Haapasalo. "Transfer delay at A T M L A N emulation and classical IP over A T M . " ICC, p478-482, ffiEE 1997 44. R Melle, C Williamson, T Harrison. "Diagnosing a T C P / A T M performance problem: a case study." Globecomm, pi825-1831, v3, ffiEE, 1997 MPEG-2 Technology j 45. international Standard rSO/TEC 13818-2 (MPEG-2 Video). "Information technology - Generic coding of moving pictures and associated audio information. Video, syntax and semantics," November, 1994 46. M P E G software simulation group. "Mpegencode - MPEG-2 encoder, version 1.2," July 19, 1996. http://www.mpeg.org/MSSG 47. M P E G software simulation group. "Mpegdecode - MPEG-2 decoder, version 1.2," July 19, 1996. http://www.mpeg.org/MSSG 48. W. Willinger et al. "Long-range dependence in variable-bit-rate video traffic." IEEE transactions on communications, pi566-1579, vol. 43, No 2/3/4 February, March, April 1995 49. M . Krunz, A . Makowski. " A source model for V B R video traffic based on M/G/co input processes." Infocom, pl441-1448, v3, ffiEE, 1998 50. J. Mitchell, W. Pennebaker, C. fogg, D. LeGall. " M P E G video compression Standard." Chapman & Hall 51. Rao, J.J Hwang. "Techniques & Standards For Image, Video & Audio Coding." Prentice Hall 52. L. Torres, M . Kunt. "Video coding - the second generation approach." Kluwer Academic Publishers 148 53. J. Leduc. "Digital moving pictures - coding and transmission on A T M networks." Advances in image communication 3. Elsevier Publisher 54. A. Jain. "Fundamentals of digital image processing." Prentice Hall information and system sciences series 55. Mckinney, R. Hopkins. "Guide to the Use of ATSC Digital Television Standard" 56. T Sikora. "The MPEG-4 video standard verification model." JEEE trans, circuit and systems for video technology, pi9-31, vol7 N o l Feb. 1997 57. L Chiariglione. " M P E G and multimedia communications." IEEE trans, circuit and systems for video technology, p5-8, vol7 N o l Feb. 1997 58. J. Boyce, R. Gaglianello. "Packet loss effects on M P E G video sent over the public Internet." A C M multimedia 98, electronic proceeding 59. N Maxemchuck, S Lo. "Measurement and interpretation of video traffic on the Internet." ICC, p500-507, IEEE 1997 60. H Todd, J Meditch. "Encapsulation protocols for M P E G video in A T M networks." Infocom, pl072-1079, v3, IEEE, 1996 61. G Andreotti, G Michieletto, L Mori, A Profumo. "Clock recovery and reconstruction of P A L picture for M P E G coded streams transported over A T M networks." IEEE trans. Circuit and systems for video technology, p508-514, vol 5, No6, dec. 1995 62. D. Kucera. "MPEG-2 design and testing." EE times, issue 978, 1997 63. F Lin, R Mersereau. "Quality measure based approaches to M P E G encoding." ICIP, p323-326, IEEE 1996 64. M Pickering, J Arnold. " A perceptually efficient V B R rate control algorithm." IEEE trans. Image processing, p507-511, vol 3 No5 Sep. 1994 Digital Video Impairment and Quality Assessment 65. CCIR Recommendation 500-5 (ITU-Rec.500). "Method for the Subjective Assessment of the Quality of Television Pictures." Recommendations and Reports of the CCIR, 1992 149 66. ANSI Tl.801.01 - 1995. American National Standard for Telecommunications - Digital Transport of Video Teleconferencing / Video Telephony Signals - Video Test Scenes for Subjective and Objective Performance Assessment. Alliance for Telecommunications Industry Solutions, 1200G Street, NW, Suite 500, Washington DC 67. ANSI Tl.801.02 - 1996. American National Standard for Telecommunications - Digital Transport of Video Teleconferencing / Video Telephony Signals - Performanve Terms, Definitions, and Examples. Alliance for Telecommunications Industry Solutions, 1200G Street, NW, Suite 500, Washington DC 68. ANSI Tl.801.03 - 1996. American National Standard for Telecommunications - Digital Transport of One Way Video Telephony Signals - Parameters for Objective Performance Assessment. Alliance for Telecommunications Industry Solutions, 1200G Street, NW, Suite 500, Washington DC 69. S. Wolf et al. "User-oriented measures of telecommunication quality." IEEE communications magazine, January, 1994 70. N . Jayant. "High quality networking of audio-visual information." IEEE communications magazine, p84-95, v31,n9,September, 1993 71. M . Celidonio, G. Santella. "On the correlation between transmission QoS parameters and image quality of digitally transmitted video in radio terestrial broadcasting." ICJP, p327-330, v3, IEEE 1996 72. K . Nahrstedt et al. "An integrated metric for video QoS." A C M proceeding, IMC, p371-380, 1996 73. J. Lubin et al. "Vision model-based assessment of distortion magnitudes in digital video." http://www.mpeg.org/MPEG/JND Transport Issues -protocol stack 74. H . Schulzrinne et al. "RTP: A transport protocol for real-time applications." RFC 1889, January, 1996 75. H . Schulzrinne. "RTP profile for audio and video conferences with minimal control." RFC 1890, January, 1996 76. D. Hoffman et al. "RTP payload format for MPEG1/MPEG2 video." RFC 2250, January, 1998 150 77. J. Postel. "User Datagram Protocol (UDP)." RFC 768, 28 August, 1980 78. R. Mechler et al. "Media Transport API (MT). Technical document," CS department, U B C 1997 Transport issues - error detection, correction and concealment 79. S .M Lei. "Forward Error Correction Codes for MPEG-2 over A T M . IEEE trans." On circuits and systems for video technology, vol.4 April 1994 80. D. Raychaudhuri, H . Sun, R.S. Girons. " A T M ransport and cell-loss concealment techniques for M P E G video." ICASSP, p i 17-120, v l , IEEE 1993 81.1.E.G. Richardson and M.J . Riley. " M P E G coding for error-resilient transmission. Image processing and its application," July, 1995. Conf. Pub. P559-563, No 401, E E 1995 82. E Ayanoglu, P Pancha, Reibman, Salwar. "Forward error control for MPEG-2 video transport in a wireless A T M L A N . " ICIP p833-836, v2, IEEE 1996 83. A Tsai, J Wilder. " M P E G video error concealment for A T M networks." ICASSP proceeding. P2002-2004, v4, IEEE 1996 84. T Han, L O Barbosa. "Performance requirement for the transport of M P E G video streams over A T M networks." ICC p221-225, v3, IEEE 1996 85. J Feng, K T Lo, H Mehrour, A E Karbowiak. "Loss recovery techniques for transmission of M P E G video over A T M networks." ICC, pl406-1410, v3, IEEE1996 86. S Aign. "Error concealment enhancement by using the reliability output of a S O V A in MPEG-2 video decoder." ISSSE proceeding, p59-62, IEEE 1995 87. T Johnson, A Zhang. " A framework for supporting quality-based presentation of continuous multimedia streams." ICMCS, IEEE 88. S Hemami, R Gray. "Subband-coded image reconstruction for lossy packet networks." IEEE trans. Image processing, p523-539, vol 6 No 4 April 1997. 89. J Luo, W Chen, K Parker, T Huang. "Artifact reduction in low bit rate DCT-based image compression." IEEE trans. Image processing, pl363-1368, vol 5, No 9, Sep. 1996 151 90. J Bolot, T Turletti. "Adaptive error control for packet video in the Internet." ICIP, p25-28, v l , IEEE 1996 91.1 Richardson, M Riley. "IEE colloquium on multi-media communication systems: A T M cell loss effects on traffic." Globecomm, pl533-1538, IEEE 1997 92. N matoba, Y Kondo, T Tanaka. "Low delay, error resilient error control for mobile video communication" Globecomm, pl032-1036, v l , IEEE 1997 93. Y . Wang, Q. Zho. "Error control and concealment for video communication: A review." Proceeding of the IEEE, p974-997, vol. 86, No. 5, May, 1998 94. S Aign. "Error concealment, early re-synchronization and iterative decoding for MPEG-2." ICC, pl654-1658, v3, IEEE 1997 95. R Aravind, M Civanlar, A reibman. "Packet loss resilience of MPEG-2 scalable video coding algorithm." IEEE trans. Circuit and systems for video technology, p426-435, vol 6 No5 Oct. 1996 96. P. Cuenca, L . orozo-Barbosa, F.Quiles, T.Olivares. " A survey of error concealment schemes for MPEG-2 video communication over A T M networks." CCECE'97 proceeding, p i 18-121, IEEE 1997 97. M.J . Riley, LE.G.Richardson. "Adaptive M P E G video traffic to avoid network congestion." 'Telecommunications' 26-29 march 1995, conference publication No 404 IEE, 1995 98. T Lakshman, P Mishra, K ramakrishnan. "Transporting compressed video over A T M networks with explicit rate feedback control." Infocomm, p38-47, v l , IEEE 1997 152 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065328/manifest

Comment

Related Items