Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Intergrating DMIF and internet standard protocols for QoS-aware delivery of MPEG-4 Pourmohammadi Fallah, Yaser 2002

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2002-0213.pdf [ 8.99MB ]
JSON: 831-1.0065639.json
JSON-LD: 831-1.0065639-ld.json
RDF/XML (Pretty): 831-1.0065639-rdf.xml
RDF/JSON: 831-1.0065639-rdf.json
Turtle: 831-1.0065639-turtle.txt
N-Triples: 831-1.0065639-rdf-ntriples.txt
Original Record: 831-1.0065639-source.json
Full Text

Full Text

INTEGRATING DMIF AND INTERNET STANDARD PROTOCOLS FOR QoS-AWARE DELIVERY OF MPEG-4 by Yaser Pourmohammadi Fallah B.Sc. Sharif University of Technology, 1998  A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Electrical and Computer Engineering We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA November 2001 © Yaser Pourmohammadi Fallah, 2001  In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives.  It is understood that copying or  publication of this thesis for financial gain shall not be allowed without my written permission.  Department of  G'lectfica-l  g< Ca^^n-j^r £n,<fin-een'i*y  The University of British Columbia Vancouver, Canada Date  DE-6 (2/88)  pec 20,2.001  Abstract Delivering high quality multimedia presentations over the Internet spurred the development of numerous encoding and delivery standards. MPEG-4 is the first true object-based multimedia standard that seeks to address the issues concerning the delivery of multimedia content over the Internet, in particular QoS-Internet. MPEG-4 defines a generic framework, Delivery Multimedia Integration Framework (DMIF), for this purpose. The DMIF standard, however, only describes the semantics of the delivery platform. Incorporation of the tools and protocols provided by the QoS-Internet into this framework remains an issue to be solved by system developers. This thesis presents a QoS-aware system architecture for MPEG-4 streaming over the QoSInternet. This novel architecture benefits from an object-based design that very well fits the requirements of MPEG-4 standard. This design addresses the shortcomings of the DMIF framework in providing a complete solution for the delivery of MPEG-4. Defining a practical syntax for DMIF semantics is one of the aspects of the presented design. Integrating the services that are available through existing transport protocols such as TCP, UDP and RTP forms another part of the presented design. MPEG-4 uses a very generic approach toward specifying QoS constraints and does not specify how QoS signalling methods are exploited in DMIF. Tackling QoS issues of MPEG-4 delivery forms an important part of this thesis. This part presents a method for integrating RSVP into DMIF, and proposes a QoS-aware streaming system design. To verify the validity of the proposed architecture and as a proof of concept for the DMIF standard, a version of the presented design has been implemented for the best-effort Internet. The implementation provides a complete realization of the proposed streaming system for best-effort Internet as well as a partial but functional realization for QoS-Internet. In addition to standard conformity, the implementation features a novel fast-start rate controller and multi-client support. This implementation of the DMIF instance for remote retrieval and MPEG-4 Streaming Server (also considered as the first open source code of this part of the DMIF standard) have become part of the IM1 software, the reference implementation of MPEG-4. This research work has also resulted in the publication of three papers in various prestigious conferences.  ii  Table of Contents Abstract  ..—ii  Table of Contents  iii  List of Tables  vii  List of Figures  viii  Acknowledgements Dedication  x xi  Chapter 1.  Introduction  1  1.1  Problem and Motivation  2  1.2  Summary of Contributions  2  1.3  Organization of The Thesis  4  Chapter 2. 2.1  MPEG-4 and Streaming Background  MPEG-4 Standard  5 5  2.1.1  Coding, Composition and Streaming of Media Objects  6  2.1.2  Delivery of Streaming Data  8  2.1.3  Interaction With Media Objects  2.2  10  DMIF Overview  10  2.2.1  Why DMIF?  11  2.2.2  The DMIF Communications Architecture  12  2.2.3  The DMIF Computational Model  14  Streaming: Control Plane, Data Plane  17  2.3  Chapter 3.  Proposed MPEG-4 Streaming System Architecture  18  3.1  Previous Work: MPEG-4 Reference Software  19  3.2  Proposed S ystem Architecture: Obj ect B ased, DMIF B ased  20  3.3  Server Architecture vs. Client Architecture  3.3.1  Client  23 24  Architecture  24  Protocols for Message Transport  25  Control Plane: DAI and DNI implementation  25 iii 3.3.2  Data Plane  27  Server  28  Server Architecture: Server Application Layer  28  Server Architecture: The Server DMIF (Delivery) Layer  29  Multi-Threaded Operation  30  Rate Control  31  Chapter 4. 4.1  Control Plane vs. Data Plane  32  Control Plane Issues  4.1.1  Protocol  4.1.2  Design  Selection:  32 DMIF  RTSP  33  Issues  35  DMIF concepts realization  36  Control Plane message transport  39  Multi-threaded approach  40  4.2  & Implementation  and/or  Data Plane Issues  44  4.2.1  Data Plane Structure  44  4.2.2  Data Transport  45  4.2.3  RTPfor  The Transport  of MPEG-4  over IP  46  Alternatives for carrying MPEG-4 over IP  46  RTP-related design considerations  47  Implementation  RTCP feedbacks  52  Some experiments  54  Streaming Rate Control  56  Chapter 5. 5.1  — 49  Jitter Compensation Buffer  5.1.1  Jitter  5.1.2  Jitter  5.2  57 57  Compensation  and Start-up  Delay  Stream Rate Control and Start-up delay  5.2.1  Live Media  5.2.2  Pre-Recorded  Media:  Fast Start  Safe transient in the critical period  Fast start with some probability of packet loss in critical period  Transmission  59 62  5.2.3  58  Time Functions  for Fast Start  62  65 67 69  iv  5.2.4  Comparing Different Fast-Start Timing Functions  72  5.3  Some Experiments  75  5.4  Jitter-Adaptive Rate Control  80  5.5  Summary of Rate Control Subject  Chapter 6. 6.1  ,  QoS-Aware MPEG-4 Streaming System  QoS: Requirements and Models  80 81 81  6.1.1  QoS-Internet Models  81  6.1.2  QoS Requirements and MPEG-4 Traffic Spec's  83  6.2  MPEG-4 QoS Notion  6.2.1  6.3  DMIF QOS Model  Proposed QoS-Aware System Architecture  6.3.1  System Architecture  6.3.2  Control Plane  Control Plane structural design ' Control Plane end-to-end QoS signaling protocol 6.3.3  Data Plane  84 85  88 89 91  92 93 94  Data transport, quality feedback  95  Multiplexing and scheduling (server side)  96  Rate control and traffic shaping  6.4  Incorporating RSVP in DMIF  6.4.1  Introduction to RSVP  102 103 103  6AAA  RSVP application interface  104  RSVP messaging operation  105  6.4.2  RSVP in Co-operation with DDSP;  106  6.4.3  Mapping DMIF and RSVP QoS Descriptors  112  IntServ QoS notion (TSpec, RSpec and ADSPEC)  112  6A3.2  Token bucket description of traffic  114  Mapping DMIF and RSVP parameters  6.4.4  QoS Renegotiation 6.4.5  Definition of DAI_ChannelReneg (): A Proposal  Implementation and Evaluation  115 120  120 121  v  Chapter 7.  Summary and Conclusions  124  7.1  Summary of Contributions  124  7.2  Future Research Directions  126  7.3  Concluding Remarks  127  Bibliography  128  APPENDIX A.  DMIF Application Interface  131  APPENDIX B.  Features of the Implemented Software  143  APPENDIX C.  DMIF Instance for Remote Retrieval, the Implemented DNI Syntax.. 145  vi  List of Tables TABLE  1 DMIF FUNCTIONS  AND  RTSP METHODS  T A B L E 2 D A T A B A S E OBJECTS  35 38  TABLE  3 DMIF Q O S _ Q U A E I H E R T A G S  87  TABLE  4 DMIF DEFINED  88  CHANNEL_QUALIFIERTAGS  T A B L E 5 M O D E S OF MULTIPLEXING  100  TABLE  6 DMIF RSVP QoS  NOTATION MAPPING FOR MULTIPLEXED STREAMS  119  TABLE  7 DMIF RSVP QoS  NOTATION MAPPING  119  TABLE  8 MPEG-4 DMIF DEFINED QoS  TABLE  9 MPEG-4 DMIF DEFINED Q O S  MODES STATUS  142 .142  vii  List of Figures FIGURE 2-1 A N M P E G - 4 SCENE EXAMPLE  7  FIGURE 2-2 M P E G - 4 SYSTEMS PRINCIPLE (COURTESY OF OLIVIER AVARO)  8  FIGURE 2-3 T H E M P E G - 4 SYSTEM L A Y E R M O D E L  9  FIGURE 2-4 M P E G - 4 TERMINAL ARCHITECTURE  11  FIGURE 2-5 D M I F COMMUNICATION ARCHITECTURE  12  FIGURE 2-6 D M I F COMPUTATIONAL M O D E L  15  FIGURE 2-7 SERVICE CREATION & CHANNEL ADDITION IN A REMOTE INTERACTIVE D M I F  16  FIGURE 3-1 IMPLEMENTED MODULES: SHADED A R E A  19  FIGURE 3-2 M P E G - 4 STREAMING SYSTEM ARCHITECTURE  20  FIGURE 3-3 SERVER ARCHITECTURE  23  FIGURE 3-4 T H E ARCHITECTURE OF CLIENT DMJJF REMOTE INSTANCE  24  FIGURE 3-5 D A T A AND CONTROL PLANES IN THE CLIENT ARCHITECTURE  26  FIGURE 3-6 SERVER PROCEDURAL DIAGRAM  29  FIGURE 4-1 A LAYERED DEPICTION OF THE CONTROL PLANE  33  FIGURE 4-2 ORGANIZATION OF D M I F MULTIMEDIA COMMUNICATION CONCEPTS (1)  37  FIGURE 4-3 ORGANIZATION OF D M I F MULTIMEDIA COMMUNICATION CONCEPTS (2)  37  FIGURE 4-4 U M L DEPICTION OF D N A OPERATION (UDP)  41  FIGURE 4-5 U M L DEPICTION OF D N A OPERATION ( T C P )  42  FIGURE 4-6 R T P PAYLOAD  48  FIGURE 4-7 BITRATE OF A TYPICAL PRESENTATION, USED FOR R T P EXPERIMENTS  54  FIGURE 4-8 ADAPTIVE R A T E CONTROLLER OPERATION (JITTER CONTROLLED)  55  FIGURE 4-9 EFFECT OF PACKET DROP ON VIDEO QUALITY  55  FIGURE 5-1 ABSOLUTE JITTER VS. RELATIVE JITTER '.  57  FIGURE 5-2 JITTER COMPENSATION BUFFER (COURTESY OF PROF. JEFFAY, U N C )  58  FIGURE 5-3 JITTER BUFFER AND START-UP D E L A Y  60  FIGURE 5-4 TYPICAL D E L A Y DISTRIBUTION FUNCTION  61  FIGURE 5-5 FAST START: CRITICAL AND NORMAL PERIODS  65  FIGURE 5-6 AGGRESSIVE FAST START  69  FIGURE 5-7 SMOOTH FAST START (2), AGGRESSIVE FAST START (1)  71  FIGURE 5-8 SHAPING M A Y NEUTRALIZE THE FAST START  73  FIGURE 5-9 EFFECT OF TOKEN BUCKET SHAPING ON FAST START  74  viii  FIGURE 5-10 FAST START EXPERIMENT IN A L O W JITTER ENVIRONMENT  76  FIGURE 5-11 FAST START EXPERIMENT IN A HIGH JITTER ENVIRONMENT  76  FIGURE 5-12 A CLOSER LOOK AT <FIGURE 5-11>  77  FIGURE 5-13 No JITTER BUFFER EXPERIMENT  78  FIGURE 5-14 AUDIO STREAM IN L O W JITTER ENVIRONMENT  79  FIGURE 5-15 AUDIO STREAM IN HIGH JITTER ENVIRONMENT  79  FIGURE 6-1  90  A HIGH L E V E L VIEW OF THE NEW SYSTEM ARCHITECTURE  FIGURE 6-2 NEW QOS-AWARE SERVER ARCHITECTURE (CONTROL PLANE EMPHASIZED)  91  FIGURE 6-3 D A T A PLANE IN THE SERVER ARCHITECTURE  95  FIGURE 6-4 TYPICAL VIEW OF THE D A T A PLANE, HYBRID M O D E (SCENARIO 1)  97  FIGURE 6-5 OMITTING THE FLEXMUX/SCHEDULER (SCENARIO 2)...  99  FIGURE 6-6 MAXIMAL U S E OF F L E X M U X (SCENARIO 3)  99  FIGURE 6-7  RSVP OPERATION  105  FIGURE 6-8 ACCOMMODATING RSVP SIGNALING ENTDDSP  108  FIGURE 6-9 TOKEN-BUCKET TRAFFIC SHAPING AND MODELING  114  FIGURE 6-10 FRAGMENTING I FRAME TO REDUCE THE REQUIRED T O K E N BUCKET SIZE  118  FIGURE 6-11  PACKET LOSS FOR BEST EFFORT TRAFFIC  122  FIGURE 6-12 N O PACKET LOSS FOR GUARANTEED SERVICE TRAFFIC  122  FIGURE 6-13 HIGH JITTER, MISSED DEADLINES FOR BEST EFFORT TRAFFIC  123  FIGURE 6-14 No DEADLINE MISSED, L O W JITTER FOR GUARANTEED SERVICE  123  FIGURE A - l QoS  141  MONITORING EVENTS  FIGURE B - l SERVER OUTPUT SNAPSHOT  143  ix  Acknowledgements I came to realize, throughout this research project, how fortunate I was to be surrounded by support from my family, my friends and classmates, and my research supervisor. A sincere thanks is conveyed to them that made my thesis possible. And above all that is God who has given me the chance to be with these people, so thank be to him foremost. Thank be to him who has greatly blessed me with countless bounties, specially his great gift to me, my wonderful family. I would like to express my gratitude to my beloved family, especially my father AH and my mother Ziba, for their unconditional love and support. They have been there for me, even when far and away, since the very first day of my life. Thanks to them for giving me a solid foundation filled with so many opportunities. Without the support of my parents, brothers and sister, I would not be able to accomplish this research, nor would I ever be able to thrive in other aspects of my life. Many thanks be to Dr. Hussein Alnuweiri for his kind supervision, for his being so friendly that my study at UBC became a pleasure not an obligation. His kindness and continuous support was always my peace of mind. I thank my friends, for their never hesitating to help me. Many thanks to Kambiz for his participation in this great project we well accomplished, for his always being helpful from the very first day we met, and for our friendship, may it last forever. Many thanks be to Shahin, Shahram, Khosro, and other friends who will forgive me for my missing the opportunity to name them here. The sincere help of these people made my life more easy-going. I would also like to acknowledge the sincere helps of my research colleagues and friends: Amr, Haitham and Ayman. Their knowledge and resource exchange greatly contributed to this research work.  x  To Ma man and  Baba  Chapter 1. Introduction  Rapid growth of Internet technology facilitated many new applications that could only be dreamed of in the past. Multimedia applications were very quick to appear in the Internet and introduce themselves to the networked world. Special needs of multimedia applications obliged the development of new standards and protocols. These standards have been developed for both end-system and underlying network. MPEG, the Moving Picture Experts Group (a committee ISO/TEC) that developed the popular MPEG-1 and MPEG-2 standards introduced a new standard for multimedia applications. The new standard, MPEG-4, is a real multimedia standard that addresses many aspects of a complex multimedia application. MPEG-4 is an object-oriented multimedia technology that defines a wide variety of tools for encoding, storage, delivery and management of audio-visual information. Although MPEG-4 design is not meant to target any specific delivery and access technology, Internet has always been considered as the main platform that MPEG-4 applications are supposed to run on. The Internet technology, however, was not initially designed to support the requirements of multimedia applications. In response to the increasing demand for better services, Quality of Service (QoS) concept has been introduced to the Internet technology. QoSInternet that is being gradually realized will be the future transport platform for multimedia applications. There are some special problems associated with internet-based multimedia applications that have dragged extensive attention in recent years. This thesis focuses on the issues pertaining to the streaming of MPEG-4 content over the Internet, in particular QoS Internet.  1  1.1 Problem and Motivation MPEG-4 standard, unlike its predecessors, does not target any specific delivery technology; instead it defines a framework that enables seamless utilization of available data transport services. This delivery framework that is called "Delivery Multimedia Integration Framework" or "DMIF" provides some rules and guidelines for the design of an MPEG-4 streaming and delivery system. There exist many streaming tools that are currently used for non-MPEG-4 multimedia applications. One can consider using those existing systems for MPEG-4 applications too, but there are many new problems associated with the object-based nature of MPEG-4. In fact these new problems motivated the design of DMIF. The special delivery requirements of MPEG-4 prevent traditional systems from providing satisfactory services. DMIF, however, is not more than an integration framework that incorporates and integrates other protocols and tools for multimedia delivery purposes; most of its services and tools are only described semantically. Therefore the system design, definition of exact syntax that matches the real world requirements, and mapping and incorporating other protocols into the integrating framework are all left to the system developers. This is, nonetheless, natural as DMIF is only sought to provide a general integration framework. Giving DMIF the key role in MPEG-4 streaming application, the problem of MPEG-4 streaming over the Internet reduces to designing a DMIF based system that incorporates standard Internet services (IETF protocols) for its purpose. Choosing the protocols and tools that are available in best effort and QoS internet, mapping them to DMIF operation, and designing a system that can make use of all these services are the main goals of this research work. At the time when this research began, the MPEG-4 community had previously developed the reference software for MPEG-4. The software included some parts of DMIF but lacked the part that would enable MPEG-4 content retrieval over the Internet. This was an additional inspiration to study the DMIF standard and design and implement the missing part.  1.2 Summary of Contributions To address the shortcomings of the existing systems and realize MPEG-4 streaming over the Internet, a system has been designed for MPEG-4 streaming over IP network. This system that 2  uses a client/server architecture has also been implemented and successfully tested. In the design of this system a novel technique for streaming media data over the Internet has also been developed that is extensively described in this thesis. This technique is called fast-start rate control. The abovementioned system was designed and implemented for Best-effort Internet. For the next step of my research I focused on QoS aspects of MPEG-4 streaming over the Internet and designed a system for QoS-aware delivery of MPEG-4 over QoS-internet. This new design is in fact an evolution of the first version that was implemented for the best-effort Internet. The following describes the major steps toward this design and explains the contributions that were made to this research topic: Designing a client/server system for MPEG-4 streaming over the Internet: This design includes an object-based architecture for the realization of DMIF instance for remote retrieval and an object based server architecture including DMIF and application layer architectures. This design that is an important part of this research provides a flexible architecture that eases the implementation effort. The design utilizes MPEG-4 and DMIF concepts. Implementation of DMIF instance for remote retrieval at the client side and an MPEG-4 streaming server: This implementation became a part of IM-1, the reference software of MPEG-4, and at the time of this writing was on the path of standardization. The implementation is an exact realization of the aforementioned architecture and perfectly conforms to DMIF specifications. Design and implementation of a fast-start Rate Controller for media streaming: A fast start rate controller has been designed and added to the implementation. The rate controller is not required by MPEG-4 specs, but it is necessary for any streaming system. Development and implementation of a solution for MPEG-4 delivery over RTP: A method for incorporating RTP in DMIF for the delivery of MPEG-4 content over the Internet was developed and implemented; this task was mainly done in the control part of 3  the system. At the time of this writing there was still no standard solution for the delivery of MPEG-4 using RTP. Designing a client/server architecture and a complete solution for QoS-aware streaming of MPEG-4 as well as partial implementation of the QoS-aware system: By revising the first design (the implemented design), a QoS-aware MPEG-4 streaming system was developed. This design includes an object-based client architecture very similar to the implemented one, But the Server architecture is a new object-based architecture that enables QoS-awareness. The guideline for incorporating RSVP in DMIF and their parameter mapping are also given in the proposal. A partial implementation has also been done and its results been verified. The contributions, recounted above, are all described in details in this document. A few research papers have also been published regarding these contributions [1][2][3].  1.3 Organization of The Thesis This document is composed of 7 chapters. The current chapter, introduction, briefly discussed the motivations behind my research and the contributions that were made. The background required for the rest of the dissertation is not included in the introduction but in chapter 2. Chapter 2 gives an overview of MPEG-4 technology and discusses the basics of its streaming over the Internet. Chapter 3 introduces the designed and implemented streaming system (also known as the first version) and studies its structural design that is one of the major contributions of this research. More details about this design can be found in chapters 4. Chapter 4 concerns the Control Plane and Data Plane of the first version of the system. Streaming Rate Control, which is another contribution of this research, is discussed in chapter 6; some experiments that were conducted using the implemented software are also presented in this chapter. Based on the experience of implementing an MPEG-4 streaming system described in chapters 3, 4 and 5, a proposal for introducing QoS-awareness to the streaming system is given in chapter 7. This proposal includes the structural design and the method of exploiting RSVP in DMIF. Chapter 7 also presents the results of some experiments using a partial implementation of the QoS-aware proposal. Chapter 8 concludes the thesis and gives an overview of the topics covered.  4  Chapter 2. MPEG-4 and Streaming Background  This chapter reviews the basics of MPEG-4 standard and its delivery facilities. Since the focus of the thesis is on the delivery of MPEG-4 content over the Internet, we pay special attention to the issues of the delivery layer of MPEG-4, in particular the remote interactive scenario.  2.1 MPEG-4 Standard The MPEG-4 standard is a novel multimedia technology that its scope encompasses a wide range of tools and technologies. Unlike its predecessors, MPEG-4 standard does not focus only on the media compression part of the multimedia technology, but it also considers all the tools that future multimedia applications require. Such tools are provided to satisfy the needs of authors, service providers and end users alike. The following description of MPEG-4 tools is taken from [6] and briefs on what MPEG-4 provides: • For authors,  M P E G - 4 enables the production of content that has far greater  reusability, has greater flexibility than is possible today with individual technologies such as digital television, animated graphics, W W W pages and their extensions. Also, it is now possible to better manage and protect content owner rights. • For network service providers M P E G - 4 offers transparent information, which can be interpreted and translated into the appropriate native signaling messages of each network with the help of relevant standards bodies. The foregoing, however, excludes Quality of Service considerations, for which M P E G - 4 provides a generic Q o S descriptor for different M P E G - 4 media. The exact translations from the Q o S parameters set for each media to the network Q o S are beyond the scope of M P E G 4 and are left to network providers. Signaling of the M P E G - 4 media Q o S descriptors  5  end-to-end enables transport optimization in heterogeneous networks. (Chapter 6 of  this document is dedicated to the QoS issues of MPEG-4 delivery. The mapping of MPEG4 QoS qualifiers to the RSVP (and IntServ) QoS parameters are also discussed.) • For end users, M P E G - 4 brings higher levels of interaction with content, within the limits set by the author. It also brings multimedia to new networks, including those employing relatively low bitrate, and mobile ones. The object-based paradigm of MPEG-4 opens new fields of application development.  These goals are achieved by exploiting standardized ways to represent a scene or multimedia presentation. An MPEG-4 presentation is composed of units of aural, visual or audiovisual content, called "media objects". These objects can be of natural or synthetic origin (they could be recorded with a camera or microphone, or generated with a computer). Object-based representation of a scene obliges the use of a scene descriptor that describes how these media objects are composed to form audiovisual scenes. Another functionality that MPEG4 seeks to provide is the interaction with the audiovisual scene and its objects. Delivery of media data is another aspect of multimedia communication that MPEG-4 standard covers. This part of the MPEG-4 standard (part-6, DMIF) is of special interest in this document and is extensively discussed. Figure 2-1 depicts an example of MPEG-4 audiovisual scene. The following sections illustrate the MPEG-4 functionalities described above, using this scene.  2.1.1 Coding, Composition and Streaming of Media Objects Media objects in an MPEG-4 audiovisual scene can be of several types, such as: still images (e.g. as a fixed background), video objects (e.g. a talking person - without the background) audio objects (e.g. the voice associated with that person), etc. In addition to those objects shown in Figure 2-1 and mentioned above, MPEG-4 defines the coded representation of objects such as text and graphics, talking synthetic heads and associated text used to synthesize the speech and animate the head, synthetic sound. Figure 2-1 and Figure 2-2 explain the hierarchical way in which an audiovisual scene is described as a composition of individual objects. The leaves of the hierarchy correspond to the primitive media objects. Primitive media objects are the most basic elements of a scene. 6  Combination of two or more primitive objects forms a compound media objects. A talking person is an example of a compound media object. The primitive media objects of the talking person are a voice and a visual object. In the hierarchy of the scene, compound objects encompass sub-trees. audiovisual  objects  multiplexed downstream control I data  multiplexed upstream contraildata  scene • coordinate system  /  user event&"  video compositor projection plane  i i•  ! i i  audio compositor  hypothetical viewer speaker  user input  Figure 2-1 An MPEG-4 Scene Example  The primitive media objects in MPEG-4 are delivered in Elementary Streams. The association between these primitive objects is kept in scene description and not in individual elementary streams. All streams that are associated to one media object are identified by the same object descriptor. This association can be used for synchronization, intellectual property management 7  purposes, etc. Figure 2-2 gives more insight into the principle of MPEG-4 systems and how elementary streams are associated with objects and other streams. More information regarding MPEG-4 systems can be found in [4].  Figure 2-2 MPEG-4 Systems Principle (Courtesy of Olivier Avaro)  A necessary task in composing a scene from its primitive objects is synchronizing their corresponding streams. This task is done in the Synchronization Layer (SyncLayer) of MPEG4. This layer is responsible for time-stamping access units inside an elementary stream.  2.1.2 Delivery of Streaming Data To unify the methods of access to elementary streams and to make an MPEG-4 application independent from the underlying data access technology, the MPEG-4 standard defined a generic framework for the delivery of its content. This framework is called Delivery Multimedia Integration Framework or "DMIF". This thesis comprehensively studies DMIF.  8  M P E G - 4 delivery and synchronization layers facilitate the synchronized delivery of streaming information from source to destination. Figure 2-3 shows how these layers exploit a two-layer multiplexer in order to provide such services.  Elementary Streams  3  H  SL SL SL SL SL  X es <D JS  FlexMux  FlexMux  «3  Sync Layer  SL SL  DMIF Application Interface  DMIF Layer  FlexMux FlexMux Streams  a  S  S L Packetized  Elementary Streams Interface  DMIF Network Interface  H  File  Broad- Intercast active  ;RTP) (PES) AAL2 UDP M P E G 2 ATM TS IP  H223 PSTN  TransMux Layer  DAB • * • M u x (not specified in MPEG-4) i  a  *>, u  >  •i-H  Q  Figure 2-3 The MPEG-4 System Layer Model Multiplexing of elementary streams may be done for several reasons; for example, to group elementary streams with similar QoS requirements, or to reduce the number of network connections. The first multiplexing layer, shown i n Figure 2-3, is managed according to D M I F specification, part 6 of the M P E G - 4 standard ( D M I F stands for Delivery Multimedia Integration Framework and is comprehensively discussed in this thesis). This multiplexer, also represented as the M P E G  FlexMux tool, allows grouping of Elementary Streams  (ESs) with low  multiplexing overhead. The second layer of multiplexing is the TransMux (Transport Multiplexing) layer (Figure 2-3). It models the layer that offers transport services matching the requested Q o S . Only the interface to this layer is specified by M P E G - 4 , and the mapping to native network transport functions is left to the system developers and other standard bodies. A n y suitable existing transport protocol stack such as (RTP)/UDP/IP, ( A A L 5 ) / A T M , or M P E G - 2 ' s Transport Stream over a suitable link layer may become a specific TransMux instance. A t the time of this writing, no protocol stack was chosen by the standard bodies ( I E T F - A V T and ISO/IEC) for M P E G - 4 delivery over IP. Use of the F l e x M u x multiplexing tool is optional. Therefore this layer may be empty i f the underlying TransMux instance provides all the required functionality. This issue is discussed in  9  Chapter 6 when QoS issues of MPEG-4 delivery are examined. The synchronization layer, however, is a necessary part of the system and is always present.  2.1.3 Interaction With Media Objects Interaction with the scene and its objects is one of the features that the object-based nature of MPEG-4 offers. Depending on the degree of freedom that the author has allowed when designing the scene, the user has the possibility to interact with the scene. Operations a user may be allowed to perform include (but not limited to): (1) change the viewing/listening point of the scene, e.g. by navigation through a scene; (2) drag objects in the scene to a different position; (3) trigger a cascade of events by clicking on a specific object, e.g. starting or stopping a video stream; (4) select the desired language when multiple language tracks are available.  2.2 DMIF Overview The Generic architecture of an MPEG-4 terminal is depicted in Figure 2-4. It consists of three layers: The Compression Layer, the Synchronization Layer (SL) and the Delivery Layer. The Compression Layer of MPEG-4 terminal performs media encoding and decoding of Elementary  Streams. It is specified in ISO/TEC 14496-2 and ISO/IEC 14496-3 standard documents. The Synchronization Layer that is also referred to as Sync Layer is responsible for managing  Elementary Streams and their synchronization and hierarchical relations. Sync Layer is specified in ISO/TEC 14496-1 (the MPEG-4 Systems document). The Delivery Layer, which is of special  interest in this thesis, ensures transparent access to content irrespective of delivery technology. ISO/IEC 14496-6 specifies the MPEG-4 delivery layer. The Delivery Layer provides a means of retrieving MPEG-4 elementary streams. This layer provides an abstraction layer between the core MPEG-4 systems components and the retrieval method. Unlike its predecessors, MPEG-4 does not target any specific delivery technology but instead defines a framework for seamless utilization of these technologies. This framework is referred to as the Delivery Multimedia Integration Framework (DMIF) [5]. This framework addresses the issues of local file access, broadcast media access and remote interactive content access over any kind of network. DMIF is sometimes viewed as a session layer protocol. The following sections explain DMIF in more detail. 10  media aware delivery unaware ISO/IEC 14496-2 Visual ISO/TEC 14496-3 Audio  media unaware delivery unaware  Compression Layer  Sync Layer  ISO/IEC 14496-1 Systems  media unaware delivery aware  Delivery Layer  Elementary Stream Interface (ESI) DMIF Application Interface (DAI)  ISO/IEC 14496-6 DMIF  Figure 2-4 M P E G - 4 terminal architecture  2.2.1 Why DMIF? Why do we need to define yet another new framework for the delivery of MPEG-4, instead of using existing protocols and methods? This is a fundamental question that may rise. The answer to this question is given by reviewing the goals and objectives of DMIF, enlisted below: •  hiding the delivery technology details from the DMIF User  •  managing real time, QoS sensitive channels  •  allowing service providers to log resources per session for usage accounting  •  ensuring interoperability between end-systems  There is currently no single method or protocol to provide all of the services mentioned above. As a result a set of tools and protocols must co-operate in order to achieve these goals; and these tools need to be integrated in a unifying framework. This is the role that DMIF plays. To achieve these goals,,DMIF defines a communication architecture that hides the details of the delivery technologies below an interface. This architecture conceptually facilitates the abovementioned features. The interface is called the DMIF-Application Interface (DAI). The DAI separates the delivery aware and delivery unaware layers of the MPEG-4 terminal architecture (Figure 2-4).  11  DMIF takes into consideration the QoS management aspects; it also considers the requirement of allowing service providers to log resources per session for usage accounting. Usage accounting is supported in order to facilitate the implementation of appropriate billing policies. All these functions are done by using available protocols and methods and integrating them in DMIF. The DAI is an important part of the DMIF standard as it implicitly specifies what services are available to the DMIF users. The DAI is specified semantically and its syntax is not defined, as it would be dependent on the programming language and operating system. More information about DAI can be found in APPENDIX A.  2.2.2 The DMIF Communications Architecture DMIF supports three major technologies: interactive network technology, broadcast technology and the storage technology. The delivery technology that this thesis focuses on is the interactive network technology. The architecture that integrates these technologies is depicted in Figure 2-5. The shaded boxes clarify what the boundaries of a DMIF implementation are. The standard document, [51, normatively defines the behaviour of DMIF at the boundaries of the shaded area, while th6 additional modeling and definitions, which apply to elements internal to the shaded boxes, have only informative value. Originating DMIF for Broadcast  D  Originating  Ml  Originating DMIF for Local Files  F Filt er  App  H  H  Originating DMIF for Remote srv  DAI Sip map  DNI  DAI  Flows between independent systems (normative) Flows internal to a single system (either informative or out of DMIF scope)  Figure 2-5 DMIF Communication Architecture  12  From an application's perspective data is accessed through the DMIF-Application Interface, irrespective of whether such data comes from a broadcast source, from local storage device or from a remote server. DMIF also allows the concurrent presence of more than one DMIF Instances; therefore it facilitates the use of multiple sources for a single presentation. DMIF services are requested by applications through the DAI interface. The application supplies the delivery layer (DMIF) with a URL that points to the location of media data. Based on the URL, the delivery layer then determines the appropriate DMIF Instance to activate. The operation of identifying the appropriate DMIF instance and activating it is normally done through DMIF client filter. However, the depicted architecture is informative and another approach may be taken to redirect the requests to the suitable DMIF instance. The activated DMIF Instance serves the application for further requests associated with the same URL. The DMIF instance translates the originating application requests into specific actions to be taken with respect to the technology through which data is accessed (e.g., network access technology). Similarly, data entering the terminal (from remote servers, broadcast networks or local files) is uniformly delivered to the Originating Application through the DAI. DMIF instances have some common features that are implemented similarly for all of them. These features are usually associated with DAI implementation. The other parts of an implementation deal with the details of the delivery technology. In the case of interactive networks DMIF specifies a logical interface (the DMIF-Network Interface - DNI) between a hypothetical module implementing these common features and the network specific modules. Note that DNI primitives are only specified for information purposes, and a DNI interface need not be present in an actual implementation. Though the DNI function is not seen by the application, it provides the possibility of code reuse and easy adoption of new similar technologies for new DMIF instances. Next chapter discusses this matter (for UDP and TCP). The network specific modules specify the exact mapping of the DNI primitives into native network functions. In order to provide access to the network resources, DNI primitives need to be adapted to the native network signalling mechanisms. To ensure interoperability with other implementations, format of the messages that are sent using the native network transport service are fully specified by the DMIF standard. 13  The DMIF standard defines a straightforward mapping of DNI primitives into signalling messages; this mapping is named DMIF Default Signalling Protocol (DDSP). DDSP is in fact a session level protocol for the management of multimedia streaming over generic delivery technologies. It has some similarities to other session layer and streaming control protocols such as RTSP and SIP. DDSP comprises primitives to set-up and tear down sessions as well as individual data channels. The protocol uses binary format for its messages. As it has been mentioned, DNI is defined only for the remote interactive case; in the case of the Broadcast and Local Storage scenarios the model is simpler and no internal interface has been specified. It is obvious from Figure 2-5 that the DMIF Instance for Broadcast and Local Storage scenarios can indeed be implemented in one module. All the control messages that are exchanged between peer applications in an interactive scenario are in this case terminated in the DMIF Instance. (Even for the remote access case a valid implementation may not implement DNI primitives as described (informatively) in DMIF.) DAI and DNI functions are described in APPENDIX A of this document. The complete definition of these functions can also be found in the DMIF document [5].  2.2.3 The DMIF Computational Model This section explains DMIF messaging and operation model, known as DMIF computational model. When requesting the activation of a service, an application uses the Service primitives of the DAI, and creates a service session; the DMIF layer then contacts its corresponding peer (that can be either a remote peer, or a local emulated peer) and creates a network session with it. Network sessions have network-wide significance, service sessions have instead local meaning; the DMIF Layer maintains the association between them. A high level view of the service activation process of DMIF is shown in Figure 2-6, the following describes the steps of this process (a more detailed view is depicted in Figure 2-7): The Originating Application requests the activation of a service to its local DMIF Layer- a path in control plane is created for message exchange between local application and DMIF layer (1).  14  The Originating DMIF peer establishes a network session with the Target DMIF peer — a communication path between the Originating DMIF peer and the Target DMIF Peer is established in the control plane (2) The Target DMIF peer forwards the service activation request to the appropriate Target Application — a communication path between the Target DMIF peer and the Target Application is established in the control plane (3) Using the pre-established control plane (paths 1,2,3), the peer Applications create data channels in the user plane (4) to carry the actual media data.  Originating DMIF  Target DMIF  Figure 2-6 DMIF Computational Model  DMIF is involved in all four steps above. Figure 2-7 shows a more detailed view of the message exchange process between DMIF peers (client and server). Two sets of functions that are seen in this figure represent the DAI and DNI interfaces that were described in the previous section. DMIF uses the binary format for building message packets that are going to be used in remote retrieval case; this format is specified in the DMIF standard and along with the signaling procedure at the session layer level forms the DMIF Default Signaling Protocol (DDSP). This protocol is explained in [5] and APPENDIX A when DNI primitives are discussed. Figure 2-7, also, depicts the operation of this protocol, to a large extent (follow the message exchange process in the network, past DNI).  15  Target DMTJF Terminal: S E R V E R  Originating DMTP Terminal: CLIENT Application  DAI  , DMIF L a  y e r  Initiatesl DA_ the ServiceAttachf service > l  DNI + Network + DN1  DMIF Layer  DAI  Application  determine whether a new network session DN_ is needed SessionSetup 3 (rsp-2) attach to the service  Service Attach done  DN_ ServiceAttachlfj onnect to the| application DA_ running ServiceAttachl the service the app. running the service replies. 7 (rsp-4)  HOD)  Client has received the IOD and will request the addition of channels;(SD and OD in the first round, media streams afterwards) The App. DA_ requests ChannelAdd a new Channel  determine whether a new network] connection bN_ is needed ITransMuxSetub (rsp-2) create the channel PN_ iChannelAdded  Channel Add done  Notify the application running the service  DA_ ChannelAdd]  (rsp-4)  the app. running the service replies.  Figure 2-7 Service Creation & Channel Addition in a Remote Interactive DMIF  16  2.3 Streaming: Control Plane, Data Plane Streaming applications are common in some features and functionalities. In many cases, streaming systems adopt out of band control signalling. In another word, they consist of two information transport planes, known as Data (User) Plane and Control Plane; this way the media data and control data are conveyed through separate channels. Data plane is utilized for the delivery of media data such as encoded audio and video, while the control plane conveys the control signalling for the presentation. There exist other architectures for streaming applications, but the focus of the thesis is restricted to the structure described above. In general it may be hard to identify the boundaries of Data Plane and Control Plane, in particular when QoS issues are involved. For the first part of this thesis, however, the frontiers of these planes are fairly discernible. Control Plane of a streaming system may use any kind of streaming protocol. RTSP and DMIF Signaling are two examples. DMIF addresses the Control Plane issues extensively, though it is deliberately short of describing the Data Plane to the same extent. For this reason, it is sought to focus more on the Data Plane issues of an MPEG-4 streaming system in the last few chapters of the thesis. The issues that usually concern the data plane include Rate Control, Multiplexing/Scheduling and QoS.  17  Chapter 3. Proposed MPEG-4 Streaming System Architecture  This chapter proposes a new architecture for MPEG-4 streaming. This architecture has been used for our implementation of an MPEG-4 streaming system. The focus of this thesis, as it has been explained in previous chapters, is on the remote interactive scenario of MPEG-4. The implementation is also the realization of this scenario with Internet being the selected delivery technology. Therefore the primary choices as delivery methods are UDP and TCP, the basic Internet transport services. The realization of remote retrieval scenario is by far the most complicated one in the implementation of MPEG-4 streaming scenarios. Previous chapters reviewed DMIF characteristics and specifications. DMIF specifications, deliberately, do not address the implementation and transport service related issues to leave the room for seamless adoption of different transport technologies. Therefore, it is the job of system developers to work out the required architectural design for each specific case. The specific case, discussed in this thesis, is the Remote Retrieval over Internet. The novel architecture presented in this thesis provides the necessary foundation for the realization of MPEG-4 streaming over the Internet. It considers all the requirements of MPEG-4 and an object based design, and eases the implementation effort significantly (as we experienced). This architecture is one of the main contributions of this effort in the course of developing an MPEG-4 streaming system. It is obvious that this architecture and its implementation, though the only published and open source solution, are not the only possible ways of realizing the remote retrieval scenario of MPEG-4. The architecture design is noticeably the most important part of this implementation. 18  3.1 Previous Work: MPEG-4 Reference Software In order to verify the specifications of the MPEG-4 standard, an implementation effort was initiated by the MPEG community to produce the reference software for MPEG-4. This reference software is called M l (our implementation has later become part of EVIl). M l implements the compression and systems layers of MPEG-4, but the only DMIF instance that it supports (excluding our contribution to IM-1) is the local file access instance. TMI also implements the DMIF client filter described in the previous chapter. Our additions to IMl include the implementation of a DMIF instance for remote retrieval and an MPEG-4 streaming Server that fully supports DMIF. Figure 3-1 depicts these modules in shaded boxes. Although similar works to produce the DMIF instance for remote retrieval has also been done in other institutions, the results of those efforts have not been contributed to MPEG and are not available to public or MPEG community for verification [8]. Also, there has not been any publication on the implementation details of those efforts yet. Our implementation, therefore, is considered to be the first open source code for this part of the DMIF standard.  |-| Originating  fe App  a  Originating DMIF for Broadcast  Target DMIF  Target App.  Originating DMIF for Local Files  Target DMIF  Target App.  Originating DMIF for Remote stv DAI  Sig map  Broadcast source Local Storage  7  DNI  Target DMIF  Sig nap  DNI  DAI  Flows between independent systems (normative) Flows internal to a single system (either informative or but of DMIF scope)  Figure 3-1 Implemented Modules: Shaded Area  One of the advantages of a layered and modular design is the transparency of operation between layers. Our implementation takes advantage of this idea, and utilizes the DMIF Application Interface to communicate to the application layer of Evil at the client and the Server Application 19  l a y e r at t h e s e r v e r . A s a r e s u l t , t h e r e i s n o n e e d t o g e t i n v o l v e d i n t h e m a t t e r s c o n c e r n i n g t h e compression a n d systems layers o f E v i l .  The M  l i m p l e m e n t a t i o n i s w e l l d o c u m e n t e d i n [7] a n d w i l l n o t b e d i s c u s s e d i n t h i s p a p e r . T h e  architecture d e s i g n e d f o r o u r s y s t e m a n d the i m p l e m e n t a t i o n are e x p l a i n e d i n the next sections.  3.2 Proposed System Architecture: Object Based, DMIF Based T h e u n i q u e feature o f the i m p l e m e n t e d d e s i g n is its b e i n g object b a s e d a n d c o n f o r m a n t to the D M I F concept. S i n c e M P E G - 4 is object based, it seems quite natural to take an object-based approach i n the design a n d implementation o f an M P E G - 4 system. F i g u r e 3-2 depicts a h i g h level v i e w o f the s y s t e m structure.  Transport  Transmux Channels  ^'Transport  Sv t f ' t f ty \  'iiMUX+Vc  a.  ^Rate/Flow/ ^"Control, »^  1  . as C/3  E  Data Plane  • HP.  z a  fc .  Control Plane  Data/Session  o> —  ES  1  Vi : •  -3  fa ,3  DDSP  Control  a  (Sen ive Provider)  signaling MPEG4 Appl.o.t.on  MPEG4  « £  1  )  M  l  r  "  DMIF  MPEG-4 Client  Q  Application  MPEG-4 Server  Figure 3-2 MPEG-4 Streaming System Architecture S i n c e the system has been designed to be perfectly object-based, m a p p i n g system objects to C + + objects  was fairly  straightforward. T h e system  w a s therefore  implemented in a  completely  o b j e c t - o r i e n t e d m a n n e r . It m u s t b e n o t e d t h a t w e d o n o t c a l l t h i s d e s i g n " o b j e c t b a s e d " b e c a u s e i t w a s i m p l e m e n t e d u s i n g a n object o r i e n t e d l a n g u a g e , b u t b e c a u s e i t treats s t r e a m s a n d s o m e o t h e r M P E G - 4 concepts i n separate objects s i m i l a r to what M P E G - 4 itself defines.  20  Being object based has many benefits such as great flexibility and scalability in operation, easy development and extension of software and many other advantages common to modular and object based systems. Since the implemented software is also completely multi-threaded, its implementation would become extremely hard if it did not use an object-oriented method. It is, however, obvious that the object-oriented method imposes a large run-time overhead on the system. The imposed overhead is not concerned in this effort as the software was meant to be a verification of DMIF functionality and not a commercial product. It is, however, noteworthy that the amount of overhead and its effect on the system were not significant and visible in our experiments. Our implementation is based on the client/server architecture and uses the multimedia communication concepts that MPEG-4 offers. It also adds some new concepts that are specific to this implementation and system design. There are some concepts taken from Evil structural design too. These concepts are enlisted below and help in understanding the rest of this chapter: DMIF defined concepts:  DMIF Instance: an implementation of the Delivery layer for a specific delivery technology. Network Session: an association between two DMIF peers providing the capability to group together the resources needed for an instance of a service. The Network Session is identified by a network-wide unique ID. A Network Session could group one or more Service Sessions. Service: an entity identified by a Service Name (opaque to DMIF) that responds to DAI primitives. For example, an MPEG-4 presentation in the application layer can be seen as a Service that interacts with DMIF layer through DAI primitives. Service Session: a local association between the local DMIF Instance and a particular service. DMIF-Application Interface: DAI is the interface between an application (DMIF User) and the Delivery layer. DMIF-Network Interface: DNI is a semantic API that abstracts the signalling between DMIF peers irrespectively of the delivery support.  21  IM1 defined concepts:  DPI: DMIF Plug-in instance Interface provides a DAI-like interface between DMIF client filter and DMIF plug-in instances such as Remote Instance or File Access instance. Concepts defined for the implementation:  DMIF Service layer (DS): The higher sub-layer of DMIF that manages the Service Session requests made by the application layer. This layer is network and transport unaware. It uses DNI interface to communicate to the DNA layer (see below). DDSP signaling invocation is performed at this layer. DMIF Network Access layer (DNA): The lower sub-layer of DMIF that handles network access issues and hides all the transport and network access details from the higher layers. This layer is in direct contact with the transport services. Access to this layer by DS layer is performed through DNI. DDSP functionality is provided in this layer. DN Daemon: A module in the Server implementation, below DNA layer. It runs a listening thread and conveys the client requests to the appropriate servicing object in DNA layer. Server Application Layer: As its name implies, this layer is the highest layer in the server, providing end-to-end services to the clients. Service Provider Layer: (only at the server) a part of the Server Application layer, responsible for invoking DMIF functions and communicating to DS layer through DAI. DMIF Service Object: A DMIF Plug-in instance Interface provides a DAI-like interface between DMIF client filter and DMIF plug-in instances such as Remote Instance or File Access instance. Each single object services one end-to-end DMIF service. Network Session Object: represents the network session concept of DMIF and implements network access primitives. Each single network session object services only one end-t-end network session. Service Session Object: Implemented in the Server Service provider layer, each object represents a single end-to-end service. In our implementation each service session corresponds to a single presentation. 22  The server and client architectures designed for the Implementation are depicted in Figure 3-3 and Figure 3-4 respectively. The use of the concepts mentioned above is observed in these architectures. The following sections describe these architectures in more detail. The architectures shown in this chapter depicts a client/server view of the design and data plane and control plane matters are not separately discussed. To provide more vision into the specific issues of Data and Control Plane, the implementation is also reviewed from that perspective in Chapter 4.  3.3 Server Architecture vs. Client Architecture Our implementation of the remote interactive scenario of MPEG-4 exploits a client-server architecture; it is also possible to develop a peer-to-peer version of this implementation by replicating the functionalities that are specific to the client or the server. Since the client in our system is based on EVIl software, and considering the fact that Evil does not support (define) peer-to-peer type of applications, only the client/server model was used for the implementation. CD  •DAI Ind  ~o  8-3  illl§Elll?!lifi:i  SS4  Application Service Manager  1 1  SS3  1  SS2  '£  Service Session  CO  Object Id=l  CD  DAI CD O DAI rsp, DAI DataSen'dSReq^ Service4 '£ _ CD CD DNI Ind  Service3  IS****  1  Setvice2  DMIF Service Object Idl  DMIF Service Manager  Q  DNI OT OT CO O O <  i t «  DNI_rsp / DNLcallbackThrd DNI_DjlaSend_Req  Data Plana (Channel)  SMPI  CO >> ft LL _ J  A  NS3  1  NS2  Network Session Object (ns id 1)  Network Session Manager  .^Server^edialgump Interface  c o 1o5  ES provider:  |ES' ; V',^* provider:  "5. Q.  MP4 file leader  Live Media,  CD  1 :  Real-TimcJ  < £  CO  CO  MPEG4 Encoder i  —'  LL Q  )aemdnoListening;*<secc ndar\ ihicad -'  -  Figure 3-3 Server Architecture  23  3.3.1 Client The client implementation is based on the architecture depicted in Figure 3-4, which represents the instance involved in handling remote media access. The resulting software module, which supports remote access of MPEG-4 content, implements the recommended DMIF Application Interface (DAI). This module interacts with the application through the DMIF client filter.  DMIF^FiliLM  DPI (DAI) ^  .  2  IS  DMIF Service Object  CO  6  S Q  DMIF Service Layer Manager  DNI I 4J  2  Network Session Object  tpililp DMIF Network [ Access Layer Manager  Figure 3-4 The Architecture of Client DMIF Remote Instance In the implementation, the DMIF instance for remote retrieval (also referred to as the Remote Instance) is a Dynamically Linked Library, however it could be linked in any other practical form such as a static Library. This library works as a plug-in module for the MPEG-4 reference software, Evil. Interaction between this instance and the DMIF client filter passes through an interface called the DMTF-Plug-in Interface (DPI). This interface is basically a replication of DAI and offers no new concept. Architecture The Remote Instance is composed of two main layers as shown in Figure 3-4. The upper layer, DMIF Service layer (DS), interacts with the DMIF filter and provides the services requested by the application. The lower layer, DMIF Network Access layer (DNA), handles the network control messaging between peers and implements the DMIF Default Signalling Protocol  24  (DDSP). However, It is the DS layer that invokes the DDSP messaging in the DNA layer. The DNA layer is accessed by the DS layer through the DMIF Network Interface (DNI) Media data is transported across native network transport channels that are referred to as Transmux channels. Creating Transmux channels and managing the network sessions between DMIF peers are done using the functionality provided by Network Session objects, working in the DMIF Network Access layer. The implemented DNA layer presents its functionality through the DNI primitives regardless of the protocol used for the transportation of the DDSP messages. The main functionality expected from the DMIF Service layer is to create and manage DMIF services and hide the technology used to transport control messages and elementary streams. This is done using DMIF Service objects created in this layer. Protocols for Message Transport The separation of the DS and DNA layers facilitates employing a variety of transport technologies while minimizing the need to modify the DMIF layer implementation. In our implementation of the Control Plane, messages are transported by either TCP or UDP. Since DDSP does not provide error recovery facilities, lost UDP datagrams can halt the system. In order to prevent this problem, an error recovery scheme is required (though not implemented). For reliable connections, nevertheless, UDP suffices as a transport service. When using TCP as the transport service for DDSP messages, no error recovery is required. However, TCP requires a connection set-up phase prior to sending the first DDSP message. The reliability that is inherently provided by TCP outweighs the undesirability of the initial delay. For each network session one TCP connection is established at session-setup time and later DMIF messages are transported by this dedicated connection; therefore only the session-setup message suffers from the TCP initial delay. Chapter 4 discusses the control plane issues in more detail. A brief description from client standpoint is, however, given in the next section. Control Plane: DAI and DNI implementation The DMIF standard only describes the semantics of the DAI and DNI, defining the syntax is left to system developers. The Syntax we have used for the implementation of the DAI at the client is mostly based on Evil's definition. Evil uses the syntax that was informatively suggested in the latest version of the DMJF standard. Nevertheless, it is up to the system developer to use this syntax or design another. We have used the same syntax as Evil's for the DAI, but for the DNI 25  we designed a different syntax that is in fact more similar to DMIF's semantic notation of the DNI. The parameters that are exchanged by DNI functions are all implemented with utmost exactness to assure full conformance to the standard. This syntax is presented in APPENDIX C. In the client/server implementation of the remote interactive scenario, it is the client that always initiates requests; therefore, there is no need for implementing most of the callback functions for the client. The only callback functions that are implemented for the client are the DA_DataReceive and DN_DataReceive functions (the other callback functions exist in the implementation, but are not operative). For this reason, the only DNI primitive that is implemented in the DS layer is DN_DataReceiveCallback, the rest of the DNI primitives are realized in the DNA layer. The same story is repeated for the DAI; all DAI primitives, but the DN_DataReceive_Ind, are implemented in the DS layer. This simplifies the implementation of the client greatly. DAI_cnf  DAI  Data Channel  9DAl_lnd  DPIJnd  DPI  Data Plane  Control Plane s<TZT—Network C DNI functioiO -  Session object - \ >, ,* f , tiiiliii  misfffi  ) t  Interaction with Server  Listener object  Data Stream from Server  Figure 3 - 5 Data and Control Planes in the Client Architecture  Figure 3-5 shows a more detailed version of the client structural design. Note that the DPI in this figure is actually the replica of the DAI and the reader can replace it by the DAI. Control Plane calls from the application layer are inspected by the DMIF client filter and directed to the remote instance if they are intended for a remote server. These calls are made by 26  calling DPI "_req" primitives in the DS layer. When the DS layer finishes its service to that specific call, it will return the results by calling the DPI _cnf primitive in the DMIF client filter (or in effect: the DAI _cnf in the application layer). Figure 3-5 shows this operation visually. Note that the DAI (DPI) function that is called in the DS layer is a member of the DMIF service object that is created to serve that specific DMIF service session. DS layer DAI functions call the DNI primitives in the DNA layer (with the exception of the DN_DataReceive). These calls are blocking calls and will not return until their request is either granted or rejected. Data Plane The structure used in our implementation, according to the standard, implies that the Data Plane can use any available delivery technology regardless of the delivery scheme used by the Control Plane. In the current implementation (that has been contributed to Evil), UDP is used for the delivery of MPEG-4 content. TCP is not a good candidate for the transport of time-critical data due to its preference for reliable transmission over timely delivery. The Real-time Transport Protocol (RTP) would provide additional benefits and is the only choice for more complex applications. For an experimental version of our system, I have utilized RTP as the major transport protocol, the results of this experiment is given in section 4.2.3. Data Plane channels are created in the Transmux channels that have been established between the client and the server. For each Transmux channel a client listening thread is dispatched to receive the packetized elementary stream. At the server, packets from different data channels are multiplexed into one (or more) Transmux channel(s); therefore the packets containing multiplexed elementary streams need to be de-multiplexed in the DMIF Instance before being delivered to the Sync Layer (the Data Channel object in Figure 3-5 belongs to the Sync Layer). Following the presentation, the client will close the data and Transmux channels before terminating it's session with the server. Section 4.2 further discusses the data plane issues and implementation considerations.  27  3.3.2 Server Figure 3-3 depicts the architecture that was designed for the implementation of the MPEG-4 streaming server. This architecture and implementation are only applicable to the remote retrieval scenario that is the focus of this thesis. This novel design, which has been implemented and successfully tested, is one of the major contributions of this research. Great flexibility in exploiting different transport technologies is one of the advantages this architecture offers. The Server implementation consists of two layers, an application layer (service provider layer) and a DMIF (delivery) layer. The application layer of the server corresponds to the compression and Sync Layer of the client. Quite naturally, the DMIF (delivery) layer acts as the counterpart of the client DMIF (delivery) layer. Server Architecture: Server Application Layer The Server Application consists of two distinct layers. One layer is in essence the Service Provider layer for the client requests. The other layer consists of one or more resource media (MPEG-4) pumps. The Service Provider layer includes the realization of MPEG-4 Sync Layer and some media streaming control functions. Elementary streams are supplied to this layer by the MPEG-4 pump; the MPG-4 pump may be a real-time MPEG-4 encoder or an MP4 file reader (or both). The elementary streams are packetized in the Service Provider layer considering the transport protocol used by the Data Plane. The interface between the two layers of the Server application occurs through the Server Media Pump Interface sub-layer (SMPI). The SMPI layer is basically an interface and a simple abstraction layer. Its functionality is very similar to DMIF client filter; therefore we used the implementation of the DMIF client filter as the basis for implementing this interface. For simplicity, the interface between this layer and the Service Provider Layer is called with the same name, SMPI. The media pumps can be added to the implementation as new DLL's, and this addition will not oblige any changes in the Server Application layer. This flexibility and independency of layers is facilitated by SMPI. At the time of this writing only the MP4 file reader instance was exploited as an MPEG-4 pump.  28 Server Architecture: The Server DMIF (Delivery) Layer The Server DMIF layer is divided into two parts, as is the case with the client DMIF layer. The DNA layer maps DMIF requests to native network signaling; the other layer, DS, abstracts DMIF operation, manages DDSP function invocation and handles DMIF service management. The DMIF layer of the Remote Streaming Server varies from the client-side DMIF layer in that every client request forces the creation of an individual and independent thread. This thread traverses all the DMIF layers as well as the application layer utilizing the appropriate servicing objects. Figure 3-3 and Figure 3-6 depict this operation. -g  Service Session Object  request processing  <-  <^DMJncP>  C^DAI_rsp^)<-  -<J^NIJnT>  CD  to  DAI  CD O  £ ^  CD CO CO >to  DMIF Service Object  ^ D A I j e q ; (^DNLcnf,  I T DNI CO CO CD  o o  <  Network Session Object  DNI_rsp^X—(^.CallbackThrd)  5 co  u.  DN Daemon  Listening Thread  DDSP Response  MM  Secondary Thread  DDSP Request Figure 3-6 Server Procedural Diagram  1  t  Media Data  The implementation of the DNI and DAI primitives at the Server adopts the (informative) syntax that the DMIF document suggests. Based on this syntax each pair of DNI or DAI functions (the function and its callback) is implemented in four smaller functions: request/confirm and indication/response. Figure 3-6 shows in which layer each function is implemented. 29  The server DMIF layer contains two distinct network access instances for Control Plane messaging: a UDP instance and a TCP instance. It must be noticed that the media transmission method used in the Data Plane is not dependent on the transport protocol adopted by the Control Plane. As mentioned before, the disadvantage of using TCP, in terms of its slow-start and initial delay, may be outweighed by the reliability it provides. Therefore, applying the error-recovery capability to the UDP instance would be an unnecessary overhead (thus not implemented). The flexibility of our architecture facilitated the adoption of both transport methods. The user will have the ability to choose between them (by specifying it in the request URL). Multi-Threaded Operation The special multi-threaded approach of this design allows for low overhead operation of the r  Server during normal operations, when clients don't have new requests. In another word, the server only has threads processing the media data for its clients and does not have a thread dedicated to that session awaiting client requests. In fact the server only runs one listening thread waiting for requests from all clients. A dedicated thread is dispatched upon arrival of a new request. This thread is responsible for serving the requesting client only for its new request. Once the client request has been addressed and a response has been sent, the dedicated thread will be terminated and the server will be lightly loaded; the servicing objects, however, will still be available for future requests. For example in a scenario where there are 100 streams being sent to 50 clients, our server would need an average of slightly over 101 threads to serve the clients, whereas in the case where each session requires a managing thread, that number would be 150. When TCP is used in the control plane, the argument above is somewhat invalid, as each client requires a separate listening-thread. This problem can be solved by taking a different approach and dedicating only one secondary listening-thread to all clients. This thread listens to a set of TCP ports (sockets) and serves only one of them at any time. This approach increases the efficiency as described in previous paragraphs. However, in our implementation, we did not use this approach because of its technical difficulties. Figure 3-6 illustrates the multi-threaded approach employed for processing clients' requests. Each layer in the server architecture consists of a Manager object and a number of Service objects as seen in Figure 3-3. The DNA layer objects, Network Session objects, handle the DDSP signalling between the two peers. All the requests associated with a specific DMIF service 30  are handled by its Service object in the DS layer. Each tuple of Network Session and DMIF Service objects is linked to a Service Session object in the Application (Sync) layer. Thus, each request to the server, as it traverses through the layers, is served by a Network Session object, a DMIF Service object, and a Service Session object. The thread that has been created to service the request is a member of the first servicing object, the Network Session. As the thread traverses the upper layers, it invokes member functions of its associated service objects. This makes the implementation very scalable and enforces separation between various clients since their object handlers are disparate. It is helpful to recall that the servicing objects in fact correspond to the multimedia communication concepts that DMIF offers, described in section 3.2. Rate Control Rate control is one of the features that every streaming system must implement. This implementation uses a novel rate controller that also features the fast start technique. Chapter 5 describes this technique in detail. From a client/service perspective, the only noticeable point is that the rate control is performed at the server. It must be noted that the rate control concept, as discussed in this thesis, is not a media-aware rate control mechanism, meaning that it can be applied to any kind of media as long as timing information is available. There exist some encoding-aware or media-aware rate controllers that are not of interest in this thesis [41]. Their implementation involves compression layer matters that are out of the scope of this research.  31  Chapter 4. Control Plane vs. Data Plane  Chapter 3 presented the design and implementation of the streaming system a client/server perspective. The control and data plane issues of the design were also discussed in that context. This chapter reviews the structure and operation of each of these planes in more details and separately from the other. The protocols and functions that the control plane must support are studied in section 4.1 and Data Plane issues are examined in section 4.2.  4.1 Control Plane Issues As it has been mentioned before, DMIF is the key component of our system. In the design of DMIF more attention was paid to the control plane issues; therefore data plane matters are somewhat out of the scope of DMIF. For this reason, this section that focuses on the control plane is naturally a review of the utilization of DMIF in the control plane of a streaming system. According to the layered architecture of the system, two levels of control signaling are required; one for high level stream control functions such as play, pause, and stop; the other concerning the network related signaling such as resource reservation and channel setup & release. Figure 4-1 gives more vision into the layered architecture of the system. This layered architecture suggests that the delivery layer must provide the set of functions required for high level signaling to the application layer, while it must implement the low level signaling inside itself. The former is realized in the form of DAI primitives that are implemented in the higher sub-layer of the delivery layer (DMIF Service layer) and the latter is realized by implementing the DDSP functions in the lower sub-layer (DMIF Network Access layer). 32  Target Application (Apadana Server)  Originating Application Compression Layer  Sync Layer  IM1-2D Player  Streaming Conn u l lor  DAI_ D A I Callback Functions  D M I F Filter DAL  DPI  D N Callback Functions  Remote Instance Plug-m DNI—  Remote Instance D L L  Dtfl.  DN D.icniun  D N Protocol Stack  Figure 4-1 A Layered Depiction of the Control Plane  The higher signalling level logically includes some lower level functions. In another word higher level messages has to pass through the lower layers of the system; therefore some lower level functions must exist to support them. Session setup function is an example of such a function. The following clarifies this matter (compare the two sets of functions): 1. High level signaling a. Session Setup and Release  Low Level Signaling a. Session Setup and Release  b. Stream Channel Addition and Deletion  b. Transport/Multiplex channel addition, deletion and configuration  c. Stream Control Signalling (Play, Pause, etc.)  c. Transport of High level signaling messages  DMIF specifies most of these functions; for those functions not provided by DMIF we have to use other protocols. Next section examines this matter.  4 . 1 . 1 Protocol Selection: DMIF and/or RTSP The first set of the signaling functions mentioned above is implemented in the DAI primitives. The second set is implemented in the DNI primitives. These interfaces, however, do not specify the whole required signaling functionality. The shortcomings are specifically visible in the stream control operation. For example, DMIF does not define any particular function for play, 33  pause or other playback control functions; instead it carries these commands using the UserCommand primitive. The syntax for defining these functions remains an unresolved matter. The Real Time Streaming Protocol (RTSP) can play a role in this case and provide us with the required syntax. In our implementation, however, we used IM-l's stream control syntax that on its turn was taken from the User-to-User Interface of the DSM-CC standard [16]. For the second set of functions, DDSP (DMIF) almost satisfies the needs of our streaming system and there is no need for employing other protocols. Using RTSP instead of or in conjunction with DMIF The Real Time Streaming Protocol is the standard streaming control protocol currently used in many streaming systems. It has been suggested that RTSP be used in place of DMIF for remote interactive scenario. There is on-going discussion in this regard at the time of this writing; we did not choose RTSP for our system because RTSP has not been designed for object based media and many of the benefits of MPEG-4 standard may be lost due to the shortcomings of RTSP. Moreover, the goal of this research was not exploring RTSP and its shortcomings for MPEG-4 streaming, but providing a standard-compliant solution for MPEG-4 streaming. Despite the similarity between RTSP and DMIF in many cases, there is a fundamental difference between them that prevents us from comparing them directly. Any attempt to compare RTSP and DMIF must consider the fact that RTSP is a protocol for streaming control (remote retrieval) and DMIF a framework for integrating multimedia delivery over different delivery technologies. Not considering this fact simply results in confusion and invalid conclusions. The comparison that has been done in this thesis is only between RTSP and parts of DMIF that operates similarly to RTSP. Some Internet drafts suggested the use of RTSP alongside DMIF [33]. It, however, seems that except for streaming control (playback control) RTSP and DMIF are very much redundant and using them alongside will produce some unnecessary overhead. Hence it is not recommended to use the full implementation of DMIF and RTSP in one system except for inter-operability purposes. Table 1 compares RTSP and DMIF, and helps understanding the possible points of cooperation and the level of redundancy. One must note that despite similarity in some cases, DMIF and RTSP are inherently different. RTSP is a protocol while DMIF is an integration framework. 34  Table 1 DMIF functions and RTSP Methods  DMIF primitive  RTSP Method  Function  DAI Service Attach, DNI SETUP Session Setup  Session Initiation  DAI Service Detach, DNI TEARDOWN Session Release  Session Release  DAI & DNI Channel Add  SETUP  Addition of media channels  DAI & DNI Channel Delete  TEARDOWN  Deletion of media channels  DAI & DNI UserCommand  PLAY/ PAUSE/RECORD  Stream control commands  DNI Transmux Setup  Not specified  Setting up transport/ multiplex channels in the network  DNI Transmux Release  Not specified  Releasing network channels  DNI Transmux Configure  ANNOUNCE ?  Changing trans/mux channel configuration  (Done in MPEG-4 systems OPTIONS not in DMIF)  Getting available methods  (Done in MPEG-4 systems REDIRECT not in DMIF)  Redirecting the another server  (Done in MPEG-4 systems SET-PARAMETER not in DMIF)  Setting media device parameters  request  to  Based on the argument above and from our experience of implementing an MPEG-4 streaming system, it is clear that for our purpose we can use either of RTSP and DMIF separately or in conjunction. One can learn more about RTSP from [33]. It is noteworthy to know that although the implementation did not use RTSP, it still was perfectly functional and there was no practical justification for using any protocol (i.e., RTSP) in addition to DMIF.  4.1.2 Design & Implementation Issues The structural design of the system has been described in Chapter 3. This section gives more vision into the details of control plane architecture and explains how DMIF communication model is realized. Organizing and implementing DMIF components such as CAT (Channel Association Tag) and TAT (Transmux Association Tag) are explained here. The implementation 35  details are not described completely and at the lowest level, as they are available through the code itself. This section does not get involved in the code writing matters. For more information in this regard refer to [45] or [15]. DMIF concepts realization DMIF control plane specifies some communication tags, identifiers and variables to organize the data plane concepts. Some of the communication concepts that are used by DMIF and are of more interest to us are enlisted below: Network Session: an association between two DMIF peers providing the capability to group together the resources needed for an instance of a service. The Network Session is identified by a network-wide unique ID. A Network Session could group one or more Service Sessions. Service: an entity identified by a Service Name (opaque to DMIF) that responds to DAI primitives or calls them from the layers above DMIF. Service Session: a local association between the local DMIF Instance and a particular service. Transmux Channel: A native network transport channel: it includes the transport protocol stack that is directly supported by the signalling means of the network. It provides the basic transport stack element that DMIF may complement with additional multiplexing or protection tools. Data Channel: A session layer transport channel that is only known to the end-systems. DMIF has knowledge of these channels and multiplexes them into Transmux channels. In the implementation, some of these concepts are realized in the form of objects while others only exist implicitly. The mapping of these identifiers to each other and their relationship and association forms the organization of the implementation. In DMIF specifications these concepts are identified by their ID's or Association Tags, i.e., Network Session ID, Service Session ID, DMIF Service ID (Service ID), Channel Association Tag (CAT), and Transmux Association Tag (TAT). Figure 4-2 and Figure 4-3 explain how these identifiers are related to each other.  36  networkSessionld 2  networkSessionld 1 Service 1  TATl  {Service 2 I T A T 2  CAT1  Service 1  CAT1  TATl  CAT2 Service 1  T3  CAT Figure 4-2 Organization of DMIF Multimedia Communication Concepts (1)  For the application layer, only service session is meaningful; hence the Service Session identifier is generated in this layer. It is however used in the DMIF layer. DMIF layer maps this service session to a unique DMIF Service. In another word, there exist a one-to-one relationship between Service Sessions and DMIF Services. The DMIF specification allows for a Service session to use multiple DMIF services, but from DMIF perspective each DMIF service only serves one service session. Since EVI-l does not support the use of multiple DMIF services in one Service Session, the DMIF Service ID can simply be set equal to the Service Session ID. O rig in atin g D M IF  Target D M IF  Figure 4-3 Organization of DMIF Multimedia Communication Concepts (2)  Network Sessions that are created between DMIF peers may be used by multiple DMIF Services; therefore each DMIF Service object must keep track of its associated network session (using its network Session Id). Data Channels are first created in the application layer and then extended to other layers, thus their association tag (CAT) is set in the application layer. Each Service Session (therefore DMIF Service) contains multiple data channels. In our implementation these CATs are only meaningful 37  inside the scope of Service sessions, therefore the reuse of their values between different services will not cause any confusion. Transmux channels are created in DNA layer of DMIF and are mapped to CATs inside DMIF layer. By multiplexing some data channels into Transmux channels it is obvious that the number of TATs assigned to each DMIF Service will be less than the number of CATs. The figures above clarify this idea. There are many possible situations where several Service Sessions, DMIF Services and Network Sessions co-exist in one system. The reader can simply generalize the depicted scenario in the figures above to other cases. How to keep track of objects and their mapping information: In the implemented software in each layer there are a few permanent objects that keep track of the sessions, services, Data channels, and Transmux channels. There are also private variables in each object that help in identifying the corresponding objects in other layers. These private variables are in most cases a pointer to the correspondent object in other layers. Table 2 Database objects  Database name  Status  Description  Important Info, in each record of the database  Service Record List  Independent and permanent object in the DS Layer  Database of all active services in the DMIF Instance  Application Layer Service object Pointer  (SRCList)  Network Session Id; CAT List database TAT List database  Network Session List (NetSessionList)  Independent and permanent object in the DNA Layer  Database of all active network sessions in the DMIF Instance  Socket, IP address and Port information for both Data and Control Planes TAT List  TAT List database  A record of SRCList Database of all Transmux channels database of each DMIF Service  List of TATs,  CAT List database  A record of SRCList Database of all data database channels in each DMIF Service  List of CATs and their associated TAT  38  The important database objects that exist in the implementation are summarized in Table 2. Most of these databases are realized in the form of linked-lists. Further information about their implementation can be found in the documentation of the code [45]. Control Plane message transport Control plane messages are transported using the native network services. TCP and UDP are such services. Our implementation supports both these services and gives the user the ability to choose between them. At the client side, the user specifies the transport protocol in the request DMIF URL and the remote retrieval instance exploits either TCP or UDP accordingly. The URL for TCP case starts with DTCP:// (x-DTCP for experimental implementations), and DUDP:// (or x-DUDP) when UDP is used. The format of DMIF URL and an example are given below: dtcp : // (username): (password) @ (IP address) (port) / (filename) dtcp://a:b @ ilename.mp4  At the server, there are two listening threads that are always listening to two specific ports, one for UDP and one for TCP. These ports should be well-known for clients; in our implementation we set them to 5000 for both protocols. The separation between TCP and UDP services is done in the DNA layer. The DS layer is not aware of the difference between them. The DS layer calls the DN primitives from the DNA layer without considering the transport protocol. In fact the separation between these services (signaling over UDP and TCP) is done when creating the servicing object in the DNA layer for the requested network session. The network session object is created according to the requested URL and only supports on of the two transport protocols (UDP & TCP), therefore the later requests are automatically responded by the appropriate object. The DNI interface for both (UDP & TCP) services is the same and conforms to the DNI specs, but their implementations are slightly different. However, most parts of the implementation are very similar. When a request for creation of a new network session arrives, the DS layer inspects the URL and constructs a UDP or TCP network session object accordingly. Since these objects are quite similar, both DNA and DS layers don't differentiate between them in the future interactions and all the differences are implemented internally in the network session objects. The separation of UDP and TCP instances in the DNA saves us the implementation of two separate DMIF instances (for UDP and TCP). 39  Loss of control plane messages If control plane messages are lost for any reason and not delivered to either of the client or server, the whole operation in an end-to-end session may halt. To overcome this difficulty we have to utilize a reliable link or an error recovery scheme. TCP inherently provides a reliable transport service and naturally solves the abovementioned problem. On the other hand, TCP has its own disadvantages like the initial connection setup delay. The initial delay problem, however, should not be exaggerated as it only affects the very first message that is exchanged between the client and server. The next messages are transported through the established connection. Moreover, the reliability that TCP provides outweighs the undesirability of the initial delay. Using UDP as the transport protocol has the advantage of removing the initial delay, but the disadvantage of unreliability. This makes the use of UDP as a transport means for control plane messaging a bit risky. In practice, UDP should only be used in very reliable environments. An error recovery scheme may be introduced to the system to enable it to work properly in unreliable environments. Multi-threaded approach As it has been explained in the previous chapter, the implementation benefits from a multithreaded approach. This section explains some of the implementation issues that concern this approach. Listening Threads The server listening threads that were introduced in section do not serve the incoming requests completely; in fact they are only responsible for redirecting requests to appropriate service objects and invoking a secondary thread to finish the service job. This way the listening thread quickly frees itself from serving the last request and resumes its listening job. Figure 4-4 and Figure 4-5 illustrate the operation of these listening threads.  40  Figure 4-4 gives a UML-like (UML stands for Unified Modeling Language) depiction of the DNA layer operation when UDP is used as the transport protocol of the control plane. As it can be seen in the figure, at the client side, calls to DNI functions are in the form of blocking calls. The client thread that makes a DNI request will be blocked until it receives a response from the server. At the server the listening thread receives the request and dispatches a new thread to serve the recently arrived request.  Client  DN Daemon SessionSetupRequest  Secondary Thread Begin thread  T3  SessionSetup  M o  o  Callback Sessions etup Confirm  X Figure 4-4 U M L Depiction of DNA Operation (UDP)  When TCP is used in the control plane the operation at the server side is a bit more complicated. Figure 4-5 shows the UML depiction of the DNA operation for this case. The main listening thread for TCP is only responsible to get the very first request (Session Setup) and establish the connection. When connection setup is complete, a new listening thread is dispatched to listen to the future requests. The operation of this new listening thread is quite similar to what the UDP listening thread does. The new thread is responsible for dispatching secondary threads for serving the DNI requests. All the DNI requests that belong to the current client are received by this new listening thread (other requests are received by other listening threads) and the main listening thread is free to wait for new clients to request session setup.  41  Server Listening thread i—n c  Client Session Setup Request  e  thread  (J  o  Bp.gin  Connection Listening , Thread Secondary Thread  Session Setup Confirm DNI Request  DNI Callback  Begin,  thread  T3 tu  M  O  o DNI Confirm  X Figure 4-5 U M L Depiction of DNA Operation (TCP)  From the figures above it is obvious that in the case of using UDP there is only one listening thread running at any time for any number of clients; but when TCP is used there are N+l listening threads running simultaneously when N clients are being served. The listening threads are normally blocked on some receive functions defined by the socket library and does not put too much overhead on the system operation. It is also possible to use only one connection listening thread that listens to a set of TCP ports/sockets. This way the number of threads for the TCP case drops to 2 (this approach is not implemented, see section  Thread-Safe Dynamic Buffer Allocation: Communication between listening and secondary threads is performed using a technique that simplifies the implementation greatly. This technique that is specifically designed for this implementation provides multithread programs with a safe dynamic buffer allocation method. It is primarily designed as an alternative for static circular (cyclic) buffers. It is far less complicated than circular buffers and requires no global parameters for buffer management. It uses standard C characteristics and requires the Operating System undertake the buffer management. Moreover, there is no danger of buffer overflow.  42  This technique is useful for multi-threaded programs, like communication servers that may receive several requests simultaneously. The server (main thread) passes the buffer data "received message" to an appropriate function (a newly dispatched thread) and then forgets about it. Although the main thread forgets the buffer location in memory, the buffer is not deleted and remains in the memory as a protected object (allocated memory). The newly spawned thread (forked thread) is responsible for deleting the message (de-allocating the memory location). Consequently, after spawning the new thread and passing the message pointer to it, the main thread returns to its normal state and waits for more messages. The child thread is responsible for handling the message and finally deleting it. The following pseudo code clarifies how this technique is implemented: //The main t h r e a d : c h a r * but; char * mesg; Loop{ // F o r g e t the p r e v i o u s buffer a n d c r e a t e a n e w o n e buf = n e w c h a r [100]; // point t h e m e s g pointer to this n e w l y c r e a t e d buffer mesg=buf; W a i t f o r n e w m e s s a g e ( i n buf); / / Fill t h e buffer with a n e w m e s s a g e _ b e g i n t h r e a d ( f u n c t i o n 1,0, m e s g ) / / D i s p a t c h t h e a p p r o p r i a t e f u n c t i o n f o r h a n d l i n g r e c e i v e d m e s s a g e // N o t e that m e s g (or buf) is p a s s e d to t h e f u n c t i o n // w h i l e it is not d e l e t e d in t h e m a i n t h r e a d } / / e n d of l o o p function 1 (char * m e s g ) { //do something ...  delete m e s g ;  }  Note that in the main thread, when a new buffer is created it cannot overwrite the old buffer unless the old buffer is deleted by the forked thread (which will mean that the message is not needed anymore). This way, buffers are dynamically allocated and deleted. There is no overwriting hazard, because the buffer size is theoretically infinite and practically equal to all available memory space. Furthermore, the main and forked threads do not need to manipulate global parameters for buffer management.  43  4.2 Data Plane Issues Data Plane is the transportation medium for multimedia data. The structure of the Data Plane can be seen in Figure 3-2. Chapter 3 explained the structural design of the data plane; this chapter describes its implementation from a different perspective and tries to elaborate on some important aspects such as Data Transport Protocols. Details of code writing are not included in this chapter as it can be found in the code itself (or at [45]). Chapter 6 that discusses QoS issues, further explains some of the Data Plane functions from QoS standpoint. Since Rate Control requires extensive discussion and study, the issues pertaining to streaming rate control are not discussed in this chapter, but in a separate chapter dedicated to this topic (Chapter 5).  4.2.1 Data Plane Structure "Data channels", which are used for the transport of elementary streams, are the main components of the data plane from the application layer perspective. Data Channels are in fact the virtual communication paths for individual elementary streams from the client to the server. Data channels are sometimes multiplexed before being mapped to network channels. The communication channels that are created in the underlying network (the real communication paths for data channels) are called Transmux channels. Each Transmux channel may carry a few data channels. During data channel setup, as the first step a Transmux channel is created to transport application data from the server to the client. The Transmux channel has a one-to-one association with a single network communication channel (e.g., RTP sessions or UDP port). The overall procedure of creating Data and Transmux channels is described in the DMIF document [5]. Figure 2-7 depicts this process (see the part after the completion of service attach process). When a client's request to initiate the media sequence, i.e. "Play" command, arrives at the server, a dedicated thread is dispatched per data channel to send the packetized elementary streams to the DMIF Network Access layer. The DNA layer multiplexes and transmits the packetized elementary streams across the Transmux channel. The Data Plane in the implementation uses the simple mode of the FlexMux recommendations to multiplex packetized elementary streams for transmission. The FlexMux tool is explained in the MPEG-4 Systems document. However, it can 44  simply be described as a number of data fields that are affixed to SL packets and provide a means of distinguishing between elementary streams. The Data Plane at the client needs to de-multiplex the packetized elementary streams and route the packets to their associated data channels. The Sync Layer reassembles the media packets to their primary form of access unit. Packetization and reassembly of SL packets are out of the scope of DMIF and are performed in the Synchronization Layer. The functionality of the Data Plane at the server differs somewhat from that of the client. The ES providers, seen in Figure 3-3, supply the service provider (application) layer with media access units. The Sync Layer, which is part of the service provider layer, receives and packetizes the access units. Using the information that is provided to the Sync Layer, the SL headers are generated. SL Packets are then multiplexed and passed to the DMIF layer to be transported over Transmux channels. The Rate controller is also implemented in the server SL layer where the timing information is available. Chapter 5 describes the design and implementation of the streaming rate controller.  4.2.2 D a t a T r a n s p o r t  Data Transport is the very first function that is expected from the Data Plane. Selecting the appropriate transport protocol stack is very important. The object-based nature of MPEG-4 requires us to exploit more than one transport protocol for a single streaming system. For simple applications that timely delivery is more important than reliable delivery, UDP is used rather than TCP. The disadvantages of TCP, when it comes to the delivery of Multimedia and real time streams, is extensively described in literature and can be found in all multimedia networking books or papers [18]. MPEG-4 presentations usually contain a few streams (Scene Description and Object Descriptors streams) that require reliable delivery for which TCP is recommended as the transport protocol. For more complex applications that have more requirements such as inter-media synchronization or quality feedback, the use of RTP is recommended. The data plane in the main version of our implementation (the reference implementation) only supports UDP as its transport protocol. Some experiments using this system are presented in section 5.3. To experiment the use of RTP as the transport protocol a version of the software that 45  uses Lucent's RTP Library has also been developed and tested. This experiment is explained in the next section.  4.2.3 RTP for The Transport of MPEG-4 over IP This section studies the cons and pros of using RTP rather than UDP for the transport of MPEG4 over IP. It also describes the experimental implementation and its new features. Information about RTP operation and its methods can be found in [27]. Alternatives for carrying MPEG-4 over IP There are a few alternatives for carrying MPEG-4 content over IP using UDP or RTP; the following examines some of the most reasonable choices. MPEG-4 over UDP  Considering the fact that MPEG-4 SL defines several transport related functions such as timing, sequence numbering, etc., Choosing UDP as the transport protocol seems to be the most straightforward alternative. Despite the advantage of simplicity, there are some problems with this approach. The problems are originated from the fact that no other multimedia data stream (including those carried with RTP) can be synchronized with MPEG-4 data carried directly over UDP. Furthermore, the dynamic scene and session control concepts can't be extended to nonMPEG-4 data. Even if the coordination with non-MPEG-4 data is overlooked, carrying MPEG-4 data over UDP has the following additional shortcomings: I. Mechanisms need to be defined to protect sensitive parts of MPEG-4 data. Some of these (like FEC) are already defined for RTP. II. There is no defined technique for synchronizing MPEG-4 streams from different servers in the variable delay environment of the Internet. III. MPEG-4 streams originating from two servers may collide (their sources may become irresolvable at the destination) in a multicast session. IV. An MPEG-4 back channel needs to be defined for quality feedback similar to that provided by RTCP. V. RTP mixers and translators cannot be used.  46  RTP header followed by full MPEG-4 headers  This alternative may be.implemented using the send time or the composition time coming from the reference clock as the RTP timestamp. This way no new feedback protocol needs to be defined for MPEG-4's backchannel (channels created to send some information back to the sender), but RTCP may not be sufficient for MPEG-4's feedback requirements. Additionally, due to the duplication of header information, such as the sequence numbers and time stamps, this alternative causes unnecessary increases in the overhead. MPEG-4 ES's Over RTP with individual payload types  This is the most suitable alternative for coordination with the existing Internet multimedia transport techniques and does not use MPEG-4 systems at all. Complete implementation of it requires definition of potentially many payload types, as already proposed for audio and video payloads [29], and might lead to constructing new session and scene description mechanisms. Considering the size of the work involved, which essentially reconstructs MPEG-4 systems, this may only be a long-term alternative if no other solution can be found. RTP header followed by a reduced SL header  The inefficiency of the first RTP solution suggested above (RTP header followed by full MPEG4 headers) could be fixed by using a reduced SL header that does not carry duplicate information following the RTP header. This approach dragged a lot of attention from the academic community; at the time of this writing some Internet drafts that suggested this solution were on the path of promotion to RFC level. RTP-related design considerations For this experiment, only the common parts of the suggested Internet drafts were used (from drafts: [28][29][30][31]). Although it seemed that most of the implementation effort would be for the data plane, it turned out that adopting RTP in the data plane requires a lot of modifications to the control plane implementation. Despite this fact, the experiment and implementation are discussed in the current chapter (Data Plane) because the goal of these modifications is to support RTP in the data plane.  47  RTP payload format The RTP Payload format, used in this experiment, contains the full MPEG-4 SL Packet in its payload field. At the time of this writing there was no standard or RFC describing the RTP payload format for MPEG-4, instead there were a number of Internet drafts with different ideas [28] [29] [30] [31]. Many of the drafts that have been published by the time of this writing were expired or abandoned. For this reason we have only used the parts of those drafts that were widely agreed upon. This way we can easily modify the implementation if the status of one of those drafts promotes to RFC. As it can be seen in Figure 4-6 there may exist redundant fields in RTP and SL headers. There are controversies on how to remove this redundancy (if at all). Since RTP fixed header cannot be modified, the SL header must be defined in a way that we could remove the unnecessary information. The RTP payload therefore will be the modified SL packet, and the payload format will be defined by the characteristics of the new SL header. RTP packet RTP Payload  RTP Header [SLHeader  Elementary Stream-Data  |  RTP fixed Header  • •  SeqNo. (16 bits)  Time Stamp (32bits)  • • •  SL Header Seq No. (m-bits) • •  Time Stamp (n-bits)  • • •  Figure 4-6 RTP Payload  The timing information in SL header is associated with the SL payload, while the timing information in RTP header is associated with the RTP payload. If we encapsulate one SL packet in one RTP Packet, both time-stamps will refer to the same timing information. On the other hand, if we encapsulate multiple SL packets in one RTP packet (a FlexMux packet in one RTP Packet), the RTP time-stamp cannot refer to any of the multiplexed SL packets. In this case we cannot remove the timing information from SL headers and RTP time-stamp will only be pure overhead. 48  For the implementation, it was decided to keep the SL header timing information so that we could use both modes of one and multiple SL packets in one RTP packet. In fact, these modes correspond to following scenarios: one RTP session per elementary stream (one SL packet in one RTP packet), and one RTP session per MPEG-4 presentation (multiple SL packets in one RTP packet). It must be noted that the overhead of having timing information in both RTP and SL packets is not too much. The author of the thesis believes that this problem is exaggerated in many Internet Drafts and there are many other problems that need to be attended before trying to remove a few redundant bytes. Implementation For this implementation we have used the RTP library provided by "Lucent Technology". This Library is implemented in C (the source code is available) and is not thread safe. The library has primarily been developed for Unix environment, but also works in Win-NT platforms (but not in Win98/95). The details of integrating this Library into the DMIF implementation are given in the following sections. How to Use the RTP Library The RTP Library provides a high level interface to applications that make use of it. The following paragraphs describe how this library must be used. Initialization  A number of calls must be made to initialize the library. The first of these is RTPCreate(), which establishes a context. A context is an identifier used by the library to determine which RTP session a function call is to be associated with. This context corresponds to the TransMux concept in DMIF implementation. Once RTPCreate has been called to initialize the session, the addresses for the session must be set. There are two sets of addresses used by the library. The first is the send set. This is a list of unicast and/or multicast addresses and port numbers. The library sends data packets to the addresses listed in the send set. The library also has a receive set defined. This set contains but one address/port pair, which can be multicast or unicast. It is the address that the library expects to receive packets on. 49  Sending and Receiving Packets  Sending packets is fairly straightforward. The RTPSend() function is used to tell the library to send an RTP packet. It requires the user to pass a pointer to a buffer, a length, a value for the marker field in the RTP header, an increment for the timestamp, and the context. The library will take the buffer, add the RTP header, perform any required operations, and send the packet. The library will automatically send RTCP packets. The initial timestamp and sequence number are chosen randomly. Receiving packets is more complex. In order to know if a packet is available for reading, a process can block, it can poll, or use any other kind of mechanism. Since the library does not dictate this policy, it is up to the user to determine when data is available for reading. To do this, the library allows the user to access the receive sockets. There are two: one for RTP, one for RTCP. The functions RTPSessionGetRTPSocket and RTPSessionGetRTCPSocket are used to do this. They take as input the context and a pointer to a socket. When they return, the socket has been filled in. The user may then check for the presence of an RTP packet on these sockets in any fashion; using "select" winsock function as in our implementation, for example. When a packet is present on either socket, the application should call the function RTPReceive(). This function takes the context, the socket on which data is present, a pointer to a buffer, and a pointer to a (maximum) length value. The library will read and process the RTP or RTCP packet. For RTCP, it will perform all statistics collection and parsing. Closing the Connection and Session  When the application has decided to terminate and/or leave the RTP session, it should do two things. First, the function RTPCloseConnection should be called. This will close all active receive sockets, send a BYE packet, and then close all send sockets. It does not delete internal storage or statistics that have been collected, however. RTPCloseConnection accepts a reason string. This string is sent in the BYE packet. How DMIF Uses the Library The DMIF standard does not specify how a specific delivery protocol is used in the data plane. Instead it defines a set of primitives in the control plane to facilitate exploiting various delivery technologies. Mapping the implementation of those primitives to a specific transport protocol is 50  left to the developers. For example creating a Transmux channel must be mapped to RTP session initiation. The DMIF standard defines some identifiers for using UDP and TCP in its data plane, but there is nothing defined for RTP at the time of this writing. Therefore we have used some "userdefined" identifiers in order to use RTP in the data plane. The DMIF standard itself is evolving and the required identifiers for RTP will probably be included in the standard. The RTP library has many shortcomings that forced us to change the server architecture. The library is written in C and is not object-oriented, while our implementation is profoundly Object Oriented. This makes the usage of the library in object-oriented programs somewhat hard and not scalable. This library is not thread safe, while our implementation is multithreaded; therefore we had to use some synchronization primitives such as Semaphores to guard against thread synchronization problems. This Library also requires the user to implement a "schedule" function so as to handle its timely events such as RTCP packet transmission. In our implementation we dispatch a new thread for each scheduling request from the library. The reason for dispatching a new thread is that the scheduling usually requires the called function (RTPSchedule) to suspend itself for a while. This way only the newly spawned thread will be blocked not the main thread. The Control Plane primitives affected by the use of RTP in the Data Plane: Figure 2-7 illustrates how a data plane channel is created in an MPEG-4 session. In order to use RTP, we modified the implementation of the primitives that were involved in the creation of those channels. These primitives were in both DNI and DAI layer. The primitives that we had to modify (re-implement) in the control plane were: DA_ChannelAdd, DA_ChannelDelete, DA_ChannelAdded, DN_TransmuxSetup and DN_TransmuxRelease. In the data plane DA_DataReceive and DA_DataSend were modified. Elementary streams are conveyed through Data channels. To add a channel, the DMIF function, DA_ChannelAdd, is called. The DMIF then has to decide which transport protocol must be used for this elementary stream. If a Transmux channel does not exist to carry the data channel information, DMIF will call the DNI primitive, DN_TransmuxSetup, to create a new Transmux channel. This new Transmux channel is an RTP session in our MPEG-4 over RTP 51  implementation. The implementation of DN_TransmuxSetup includes the initiation of an RTP session, creating RTP listening objects and dispatching listening threads for them. The DN_TransmuxRelease implements the functionality required for destroying the aforementioned RTP session and its associated objects and threads. In the data plane, the elementary stream packetizer/multiplexer encapsulates Flexmux packets in RTP packets and passes them to the DMIF. Since the timing information in RTP packet in this case is not useful, we do not manipulate this value in RTP header. RTCP feedbacks Our previous implementation used UDP in the data plane and there did not exist any kind of feedback in the data plane. Therefore the data plane was unidirectional and the Server did not need to listen to the clients in the data plane (it has to do it in the control plane though). By using RTP, the RTCP reports are sent in both directions; therefore we had to add the listening facilities to the server data plane. This made the Server implementation even more complex. For each RTP session, the server creates a listening object and dispatches a thread to receive RTCP packets. These RTCP packets are then analyzed (using Library functions) and their information can be used in the streaming control process. The RTCP RR packets that are received at the Server, contain the following information: •  SSRC identifier: This identifier shows from which client the report is received.  •  Fraction lost: The fraction of RTP data packets from source SSRC_n lost since the previous SR or RR packet was sent. This value helps detecting packet loss.  •  Cumulative pkts lost: The total number of RTP data packets from source SSRC_n that have been lost since the beginning of reception.  •  Highest SeqNo received: The low 16 bits contain the highest sequence number received in an RTP data packet from source SSRC_n, and the most significant 16 bits extend that sequence number with the corresponding count of sequence number cycles; This value can be used for low control and error recovery.  52  •  Inter-arrival jitter: An estimate of the statistical variance of the RTP data packet interarrival time measured in timestamp units; can be used for Rate Control purposes.  •  last SR timestamp: Shows the Last Server Report that the client received.  •  delay since last SR: The delay, expressed in units of 1/65536 seconds, between receiving the last SR packet from the server and sending this reception report block.  Using this information, the server can determine whether a client is being flooded. By observing the highest sequence number the Server knows how many packets are still in transit. Using the packet loss information in the report, the Server knows about the quality of data delivered to the client. The inter-arrival jitter field provides a second short-term measure of network congestion. Packet loss tracks persistent congestion while the jitter measure tracks transient congestion. The jitter measure may indicate congestion before it leads to packet loss. Computing the Round Trip time: The round trip time (RTT) is computable using the timing information available through RTCP reports. RFC 1889 explains how to calculate the round trip time [27]. We can use RTT to detect network problems such as congestion. How to detect congestion: RTCP reports provide an estimate of Round Trip Time and the Inter-arrival Jitter of RTP packets at the client. These two parameters can be monitored all the time and when they exceeded a threshold level, we can assume that congestion has happened and the transmission rate needs to be reduced. Another parameter that can be used to detect the congestion is the packet loss reports in RTCP RR messages. In fact packet loss may be a better measure for determining network problems. The implementation uses the values of Jitter and RTT to detect probable congestion or network problems. The way the server should react to these problems remains an issue out of the scope of this chapter. But as a simple solution and experiment we have chosen to use a packet-dropping rate control scheme at the server. This packet-dropper reduces the rate of transmission when there is trouble in the network. The packet-dropper acts as a rate controller; it, however, can only adjust (reduce) the transmission rate that is already controlled by the streaming rate controller. 53 Some experiments A few Experiments has been carried out with the version of software that uses RTP in its data plane. The observations and results of these experiments are given in this section. In the following figures the transmission rate is the parameter that we are interested in. First, with source based normal Rate Control and no Packet drop; video stream was transmitted at the full natural rate of video, the following figure shows how the transmission rate changes during the presentations. The video object that was used in MPEG-4 presentation was an encoded H.263 stream. As it was mentioned in section, we used the value of jitter to detect the possible congestion problems. Then we adopted a rate control scheme, which reduced the transmission rate whenever congestion problem was detected. The following figures are all taken from the server monitoring output and show the adaptive rate controller's performance.  File Ed*^View Help  ' cv&'a +te.'• St ?  ;  Cumulative pkts lost = 0 delay since last SR = 25559 Fraction lost = 0 hlohest seqno received = 49220 interarrival jitter =173 last SR timestamp = 1278679334 Round Trip Time: 175 Rate = 0  Figure 4-7 Bitrate of A Typical Presentation, Used for R T P Experiments  Since we conducted all our experiments in a L A N environment, it was not possible to experience congestion in the network; therefore we had to simulate the congestion. To simulate the congestion we tried to run the server and client on machines with heavy processing load imposed by other programs, so that programs were not able to meet their deadlines sometimes and the value of jitter increased during those periods. 54  The packet dropper of this implementation, as it can be seen in Figure 4-8, reacts to the increase in jitter by reducing the rate of transmission. fr Untitled - MPSServerZ N s Edit view Help  Cumulative pkts lost = 0 delay since last SR = 306053 Fraction lost =0 highest seqno received = 21602 interarrival jitter = 174 last SR timestamp = -331616093 Round Trip Time: 98 Rate = 0 rate  50 Kbps -  time  Simulated Congestion Periods  NUM  Ready  Figure 4-8 Adaptive Rate Controller Operation (Jitter Controlled) T o see the effect of packet drop on the quality of presentation, some snapshots of the rendered video are given below. The audio of these presentations became nosiy and interruptive as the rate controller dropped some of the audio stream packets. 50% random packet drop  Original Video  File Vjew fio flpt'iont hjsip  file View go Qptions Help  i  *•<  .Untitled - IM I 2D  HPJE3  | ~ i Untitled-IM1 ZD  ©  3  ill  tit  m  I  4*.  O  000 :71.9 15.2fps Figure 4-9 Effect of Packet Drop on Video Quality 55  Chapter 5. Streaming Rate Control  Rate Control is one of the necessary features that every streaming system must implement. Being employed in the data plane, rate control helps adjusting the rate of transmission based on the rendering rate. Rate control may be implemented in any layer that the required timing information is available. For the local access scenario of MPEG-4, according to the implementation of IM1, rate control is implicitly realized in the compression layer (decoders and presenter.) Sync Layer packets that are read by the file reader are passed to the compression layer through blocking function calls. The compression layer blocks these calls (or does the decoding, etc.) until it needs another SL packet, i.e., when it consumes all the available access units. This way the rate control functionality is implicitly achieved. For the remote retrieval scenario, however, it is not possible to block server file reader's calls from the client side (remotely). Therefore it is necessary to utilize a rate control technique that is mainly based on the local information of the stream and not the client compression layer status. In our implementation, this goal is achieved by applying the rate control technique in the Sync Layer, just before passing the read access units (AU) to the Sync Layer Packetizer. The function of rate controller greatly affects the quality of the delivered presentation. Therefore we must pay special attention to the selection of an appropriate scheme. The most important challenge of a rate controller is to accommodate the network jitter or the changes in network latency. This chapter studies the rate control scheme developed for the implementation of our MPEG-4 streaming system (the system that runs on Best Effort Internet). It must be noted that the rate control in this chapter is discussed from an individual stream viewpoint. Section 5.1 discusses 56  the jitter problem and its solution. Section 5.2 elaborates on the enhancements to the jitter compensation scheme. Section 5.3 presents the results of a series of experiments conducted using the implemented rate controller, and finally sections 5.4 and 5.5 conclude the discussion on rate control technique and suggest some improvements to the idea of rate control.  5.1 Jitter Compensation Buffer In streaming applications, the client is usually equipped with a buffer for Jitter compensation. This section briefs on why such a buffer is needed and how it improves the performance of a streaming system.  5.1.1 Jitter Delivery of packets over a network is naturally accompanied with delay. Many factors contribute to the amount of delay imposed on each packet. Delay jitter, also referred to as jitter, is the variation in the amount of delay experienced by different packets. (^Absolute jitter^^^ I I  I  I  GATEWAY  \ \  \  \  Rate controlled stream if  INTERNET  fr 57  .  -7  *<?  Expected arrival time Relative jitter = ...  Figure 5-1 Absolute Jitter vs. Relative Jitter  Although there are several definitions for jitter (some of them shown in Figure 5-1), in this document we use a general definition of Jitter that only considers the total variations in delay. Although the design of our system requires computing the expected arrival time and measures the jitter based on the arrival time of the first packet; but the value of jitter that is important to us is the maximum jitter magnitude, similar to what "relative jitter" in Figure 5-1 defines. This value, simply called jitter in this document, and the mean delay are also shown in Figure 5-4. 57  5.1.2 Jitter Compensation and Start-up Delay Streaming applications are real-time applications; to play back a media stream the presenter needs to receive each frame before its expected decoding time. Late frames are not only of no use, but they are also harmful for the system as they waste some resources. In another word, for real-time applications a lost packet is better than a late packet. For most networks eliminating the delay jitter is not possible, therefore most applications utilize a technique to reduce the negative effect of jitter. This technique is based on buffering, delaying and re-arranging packets at the receiver. Figure 5-2 illustrates this technique. 3p  Figure 5-2 Jitter Compensation Buffer (courtesy of Prof. Jeffay, U N C )  Buffering and re-arrangement of packets delays the presentation. The first packet that is freed from the receiver jitter buffer and sent to the receiver presenter usually starts the presentation and consequently sets the deadline for other packets to arrive at certain times. The start-up delay therefore consists of two components, the channel or network latency and the buffering delay. The component of start-up delay that is of interest in this chapter is the buffering delay. The network latency is presumed constant and equal to the mean delay. The jitter effect is indeed considered in the buffering delay. To simplify the discussion on the start-up delay topic, we simply address the buffering delay factor of the start-up delay as "start-up delay" in the rest of this chapter; the reader will note that this value does not include the constant mean delay or network latency.  58  M a n y of the existing streaming applications use very large jitter compensation buffers. Usually the delay offered by the jitter compensation buffer is much longer than the network latency. For example, " R E A L P L A Y E R 2 G " application buffers the incoming data for approximately 10 seconds before starting to play out them.  5.2 Stream Rate Control and Start-up delay To deliver a better service to the client, the server employs some techniques to reduce the startup delay. This section describes two instances of media streaming: live media streaming and prerecorded or stored media streaming. For live media playback, it is not possible to completely purge the delay introduced by the jitter compensation buffer. However, for stored or prerecorded media, using a technique called "Fast Start", it is possible to completely eliminate this delay. After calculating the start-up delay mathematically, this section discusses the approaches taken for improving the system performance in each of the abovementioned cases.  Calculating the Start-up delay introduced by Jitter Buffer Assuming no extra jitter compensation and buffering i n the compression layer, the intra-media synchronization operation starts when the compression layer (decoder) receives the first data frame. The decoder then expects the next frame to arrive at or before a certain time. For instance, consider a video stream with frames sent every 80ms. This is the case for a 12.5 fps video presentation. If the decoding time stamp of the first decoded packet is t, the next frame w i l l have the time stamp of t+80ms. The decoder expects to receive this frame during a 80ms period. If the frame arrives at the decoder after 80ms, the presenter w i l l not be able to play it back and the frame is considered lost. The jitter buffer may delay the release of the first frame to ensure that the next frames w i l l arrive on time. For this reason, the duration of time between the arrival of the first frame in Jitter buffer and its being released to the decoder can be viewed as the Jitter buffer delay. W e call this value d and refer to it as the start-up delay. Figure 5-3 illustrates these concepts. s  If the delay jitter is at most J , the minimum amount of required buffering (time-wise) is inax  exactly J . When we store frames up-to J nmx  nmx  seconds, the next expected packet is always in the  59  buffer when its playback time matures. Assuming that frame inter-arrival time is I (= 1/ frame rate), the number of frames that need to be buffered in the Jitter buffer is: [—p] + 1 This number times the inter-arrival time gives the size of buffer in time or the amount of delay a packet may wait in the jitter buffer if it arrives on time. If the packet suffers jitter and arrives later than expected, it still meets the deadline because the presentation has already been delayed for time longer than jitter. It is very likely that the value of J  inax  be much larger than  J  avg  while the occurrence of the  maximum jitter is very rare. In this case, a very small portion of packets that experience large jitter, force the whole presentation to be delayed for  seconds. If the stream decoding and  QoS requirements allow some packet loss, we can use J  avg  instead of 7 ^ and reduce both buffer  size and startup delay. Delayed Decoding Time Arrival time /  ds X  /  Ideal Decoding Time  >  /  JX Jf  /  Ideal Arrival Time = Transmission time; tt (not considering network delay)  Presentation starts at this time  Actual Arrival Time (ta )= Transmission time + Jitter  4 frames buffered  1  Decoding Time Figure 5 - 3 Jitter Buffer and Start-up Delay  In DMIF signaling, the information about the tolerable packet loss is usually delivered through DAI parameters at the time of channel creation (ChannelAdd). If the jitter is statistically distributed with distribution function fj, the probability of packet loss (due to the missing of the deadline) caused by using J  avg  instead of J  niax  in operation is given by: 60  P ( l o s s ) = 1-Fj (d ) s  w h e r e d i s the startup d e l a y o r a c t u a l l y the j i t t e r b u f f e r size i n seconds ( I f s  f a s t start i s n o t u s e d ) . T h e s m a l l e r t h e v a l u e o f d i s , t h e m o r e t h e l i k e l i h o o d o f p a c k e t l o s s i s . s  In the rest o f this d o c u m e n t w e use the s y m b o l  J  t o refer to  i n the c o n t e x t o f u n d e r e s t i m a t i n g j i t t e r is not a l l o w e d , then  J  J , avg  described above. I f packet loss  refers to  J . max  Figure  5-4  elaborates  o n delay a n d jitter statistical distribution.  T o c h o o s e t h e b e s t v a l u e f o r d^ w e n e e d t o k n o w t h e s t a t i s t i c a l s p e c i f i c a t i o n s o f t h e j i t t e r . T h e r e h a v e been c o m p r e h e n s i v e studies o n jitter characteristics for different type o f n e t w o r k s a n d there a r e m a n y r e l a t e d d o c u m e n t s t h a t c a n e a s i l y b e f o u n d i n t h e w e b [35] [37]. S t u d y o f J i t t e r s p e c s , h o w e v e r , is out o f the scope o f this d o c u m e n t a n d w i l l not b e d i s c u s s e d here. N e v e r t h e l e s s , i n the n e x t f e w sections the g e n e r a l characteristics o f d e l a y d i s t r i b u t i o n f u n c t i o n s as i t c a n b e seen i n F i g u r e 5-4 i s u s e d f o r j i t t e r a t t r i b u t i o n .  Mi  Mean delay  c  V  Q  Delay  Figure 5-4 Typical Delay Distribution Function A n y t e c h n i q u e that c a n r e d u c e the d e l a y j i t t e r i s h e l p f u l f o r r e d u c i n g the startup d e l a y . N e x t t w o sections study the start-up d e l a y a n d jitter buffer issues f o r t w o i m p o r t a n t scenarios, l i v e m e d i a streaming and stored m e d i a streaming.  61  5.2.1 Live Media For live media, we have to minimize the jitter buffer size in order to reduce the startup delay and there is no other option. Minimizing the jitter buffer size will only become possible when the amount of jitter is reduced. Therefore employing a transport service that can lessen the amount of imposed delay jitter in the first place is the best approach. Neither of QoS services defined for IntServ networks (Guaranteed and Controlled Load, see section 6.1.1) address the jitter constraints [19][21][22]; thus no bound on jitter can be explicitly set. It is however possible that a jitter constraint be mapped to a delay bound constraint. In fact, the maximum delay always sets an upper bound for the maximum jitter magnitude. DiffServ model (refer to 6.1.1) provides some facilities to bound the amount of jitter. Expedited Forwarding (EF) is the service of choice for connections that require low-jitter [24] [23]. Schedulers that are in the path of a stream directly contribute to the formation of the end-to-end delay and jitter. Though some scheduling techniques such as Weighted Fair Queuing are suitable for providing bounded delay services, but satisfying jitter constraints may not be an easy task for them. In fact providing a bounded jitter service is best done by non-work-conserving algorithms. Work-conserving algorithm, such as WFQ, cannot provide a lower bound on delay; therefore they are not primary choices for jitter sensitive applications.  5.2.2 Pre-Recorded Media: Fast Start For stored media it is possible to completely eliminate the start-up delay. We have designed a technique, called "fast start", to achieve this goal. Operating in fast start mode, the sender transmits more than required frames to the client at the beginning of a stream. As a result, the client receives a few frames ahead of their decoding time and buffers them until their playback time. The buffer that is used for storing these early arrivals is in fact the same jitter buffer. It must be noted that applying fast start technique is only possible for pre-recorded presentations. For pre-recorded media, frames can be read ahead of their decoding time. The fast start technique uses the same jitter compensation technique described earlier in this chapter. It only applies a scheme for filling the client buffer faster at the beginning of a 62  presentation. When jitter buffer receives enough frames to guarantee jitter compensation, it can release the first frame to the decoder and start the presentation. In some cases of fast start it is even possible to immediately release the first packet and eliminate start-up delay completely. An analytical discussion on the fast start technique is given in this chapter. To lay the basis of this analysis, a few parameters that are used in this argument are defined here: n: frame number in a stream (n > 0 ) d : start-up delay s  T : arrival time (including jitter and assuming mean delay of zero to simplify equations) a  Ta: decoding time or rate controlled ideal arrival time, When start-up delay is used, the actual decoding time becomes Ta + d s  T : Time of frame transmission (not the serializing time, but the moment a frame is transmitted). t  J: jitter as described earlier in this chapter (J jitter, therefore always positive.  avg  or Jma); note that this value is the magnitude of X  j: an instance of jitter; may be negative, but its magnitude (from the most negative to the most positive) is less than J f: frame rate (12.5 for a 12.5fps video for example) I : Inter-frame time; Time between two successive frames = 1/f (e.g., 80ms for a 12.5fps video) From the definition of variables above the following equations are directly extracted: T (n)= T (n) + j(n) (Eqn 5.1); Arrival time is the sum of jitter and Transmission time; the mean delay is assumed zero as it has no effect on this analysis. (Eqn 5.1) a  t  Td (ri) = n • 1/f = n • I ; The decoding time of a frame assuming that Ta (0) = 0. The time origin of the calculations given here is the ideal decoding time of the first packet (frame). Considering this time as time 0, T (0) = 0. The arrival of the first frame (T (0)) is in d  a  fact the triggering event for the system, but for analytical purposes we consider the time baseline to be defined as the initial arrival time of a jitter-less system. T is also defined in this manner t  and does not concern the jitter. This means that the initial frame sent at time T (0) arrives at the t  receiver at time T (0) + Mean delay, and since Mean Delay is assumed to be zero (because of t  its not being important to our analysis), the jitter-less arrival is at time T (0). This time is t  considered to be time 0, the ideal decoding time; thus T (0) =0. t  If the first frame (n=0) experiences jitter, in practice the time baseline of the whole process shifts accordingly; therefore jitter for the following frames is calculated with an offset and the first 63  frame is assumed jitter-less. Although this phenomenon changes the parameter j, it has no effect on the magnitude of jitter, J. Based on this argument we can assign j (0) = 0. From (Eqn 5.1) we have T (0) = T (0)+j(0) = 0 a  t  The playback or actual decoding time for a frame is Td (n) + d . s  In the rest of this chapter time-dependent variables (e.g., T (n)) are sometimes cited without a  '(n)'. This is only for making equations more readable, and both symbols refer to the same variable. For a presentation not to lose a packet due to late arrival, the following inequality must hold: T  a  < Td + d (Eqn 5.2); this is translated to: A frame must arrive before its playback or actual s  decoding time. Since the value of jitter is not known for every frame, we have to consider the worst-case scenario in which the first frame arrives with the most negative jitter. In this case the time baseline is shifted to the negative side and some packets may experience jitter as large as J. To compensate for this jitter it is necessary to have a jitter buffer of size J; therefore in the rest of this chapter we only use J instead of j(n). From (Eqn 5.1) we have: or  T + J<T + ^ t  d  (Eqn 5.3)  ^>J + T,-T ; d  If the fast start technique is not used and frames are transmitted according to their time stamps (similar to live media streaming) then T = T . As a result expression (Eqn 5.3) reduces to d > J; t  d  s  that means start-up delay must be at least equal to the magnitude of jitter, J . T =T t  (No Fast-Start)  d  —•>  d >J s  By using the fast start technique it is possible to remove the start-up delay under some circumstances or reduce it significantly. In a long run, system must operate similar to the one without fast start, that is: it must provide a jitter buffer of the size at least equal to the amount of jitter ( J ) . Mathematically one can write: (n is the frame number) For n » 1 we have: T < T - J + d t  d  or  s  In an optimal case we have: T = n • I - J + d t  s  T < n• I - J + d, t  s  (Eqn 5.4) 64  When there is no start-up delay, i.e., d = 0 we have T < n • I -J; using the equality part of the s  t  (Eqn 5.5)  expression above results in: T = n « I - J T  A fast start system finally reaches a state in which it operates normally (sending packets at the natural rate), but before reaching this state it spends duration of time in a state that we call it "critical period". The period of time after the critical period is called "normal period". System performance in the critical period, in terms of packet loss probability due to jitter, effect, depends on the characteristics of the chosen fast start technique. Figure 5-5 shows these periods of time. Arrival Time Or Transmission Time  The actual decoding (playback) time when using start-up delay  actual decoding (play-back) time when start-up delay N O T used Transmission time for FastStart (Or Fast Start Arrival Time with no Jitter included)  Time  Critical period  Normal period  (or Decoding Time Stamp or Sequence Number)  Figure 5 - 5 Fast Start: Critical and Normal Periods  The following section analyzes the hazards that fast start may introduce to the system in its critical period. Safe transient in the critical period The amount of start-up delay employed by a fast-start technique may range from zero to J. Delay J does not justify the use of fast start as it acts quite similar to normal operation with start up delay. The whole idea of fast-start comes from the fact that critical periods are usually very short; therefore some levels of packet-loss probability is quite acceptable in exchange for zero or 65  very low start-up delay. Parameters that are adjustable for different designs are start-up delay and transmission time sequence (or formula). Figure 5-5 illustrates these concepts. The most forceful type of fast-start technique is the one that transmits packets one after another without any delay in between them, until it reaches the normal period. The following discusses this method mathematically. We know that T (0) = T (0) = 0 is the time basis. The second frame that is sent (n=l) may t  d  suffer a jitter of J (the worst case); for it to be received on time we must have: T (l) <I-J t  (Eqn 5.6)  +d  s  Considering / as the frame length (in time units), and from the fact that the second packet has to wait for the first one to finish its transmission (serializing time), we obviously have: T (1) > /. t  Given that the second frame is transmitted immediately after the first one: T (1) = /, and equation (Eqn 5.6) can be rewritten as d > J- (L-l). t  s  For the next frames we have: T (n) < n • I - J + d t  and with any function of T (n) that transmits  s  t  the n+1 frame no later than I seconds after frame n, this inequality holds. This suggests that the fast start technique can at least reduce the startup delay by (I -1) and still make the critical period safe. In another word, the second frame is the most demanding one. By applying the reasonable restriction  T (n+1)- T (n) < I t  t  (Eqn 5.7),  it is guaranteed that the critical and normal  periods will be safe if the second frame always arrives on time despite the presence of jitter. It must again be noted that using start-up delay may be inevitable for securing the on-time arrival of the second frame. J <I- I  If  (Eqn 5.8)  holds, the inequality (Eqn 5.6) results in a range for d that s  includes the value zero; this suggests that J< I — 1 is the necessary condition to have a "safe" fast start without startup delay (d = 0). s  General Conditions for safe transient in critical period are as following: d > J- (I -I) s  and  T (n+1)- T (n) < / t  t  , for all n in critical period  The following table outlines the safe transient conditions with regard to the value of jitter, assuming that general conditions are satisfied. 66  If J < I — 1  No startup delay needed for safe critical period (d = 0 )  If J > I -1  d startup delay needed for safe critical period (d ^ 0)  s  s  s  Equation (Eqn 5.6) is derived for a situation in which we do not want to take any risk by removing the start-up delay. By accepting some level of risk, it is possible to remove the start-up delay completely; this alternative is discussed in the next section Fast start with some probability of packet loss in critical period If the application is allowed to risk the loss of some packets in the critical period, it will be possible to reduce the start-up delay and even nullify it. Our implementation of an MPEG-4 streaming server utilizes such a scheme and applies no start-up delay. N Cp = ^] Ij  The length of critical period, Cp, is calculated using the following formula:  o Where N is the length of the critical period in terms of the number of frames sent from the beginning of the presentation. The value of N is derived from: N N - Number of consumed frames = Jitter buffer size  —>  J  N - [ — — 1 > [—] + 1  I  I  Ij is the interval between two subsequent transmissions Ij = T (j+1)- T (j) > 1 (avg. frame t  t  length) in the critical period, remember that in normal period we have: Ij = I. N is the number of frames that are sent during the critical period The above equations result in the desired value for N becomes the minimal when the equality part of the expression above holds. There may not exist an integer value for N that results in the equality part to hold, therefore we have to take the smallest integer number with which the inequality holds. This value of N can still be called the optimal value. The following recursive algorithm can be used for deriving the optimal N:  67  (Alg.1) 1) Initially N = number of frames needed to be buffered in jitter buffer; N = [J/IJ+1. 2)  Cp=f> o  3) Z = [(Cp- d ) /I] = number of frames consumed by the application during the critical period. s  4) If  N - Z < [J/I] + 1  then  N = N+1  , jump to step 2.  5) Cp and N are computed correctly, exit Jitter is not usually controllable, and the interval 7' is also pre-set and not adjustable in many cases. The only parameter that remains under the control of the delivery layer is the intertransmission time Ij (or T (n) in fact). To minimize the length of critical period, we should t  minimize all the instances of Ij (squeeze T (n)). t  Probability of packet loss Jitter is viewed as the variation of delay, and in the worst-case scenario, presumed here, its magnitude is measured by subtracting the amount of delay from the lowest possible delay (see Figure 5-4) and not from the mean delay. Therefore the jitter distribution function curve will be simply a shifted version of the delay distribution function (the lowest delay point is mapped to zero jitter for the worst-case consideration, the mean delay is mapped to zero jitter for normal cases). The statistical distribution of jitter (and delay) is therefore presumed to be similar to the curve shown in Figure 5-4, we call this function f (j). The probability that a frame misses its deadline is calculated based on the amount of data available in the jitter buffer. We call this amount of data, the jitter margin (M). It is obvious that M starts at 0 for the initial frame and increases to J when the jitter buffer is in normal state (normal period) and the fast-start period (critical period) is over. M is given by the equation: M =d +T - T s  d  If there is no start-up delay the equation reduces to M = T - T  t  d  t  The probability that a frame misses its deadline = Probability (Jitter > M for that frame) = 1 - F (M) = 1 - F (T (n) - T (n) + («/,)) d  t  (Eqn 5.8)  When M is almost equal to J, the probability of losing a packet due to jitter is almost zero since F(T (n) - T (n) + (d )) = 1. It supports the previous argument that the only period of time that d  t  s  fast-start puts the streaming process at risk, is in the critical period. 68  The length of critical period is usually very short; therefore computing the probability of packet loss considering both critical and normal periods will usually result in a very low probability. For this reason it is not possible to come up with an absolute formulation that demonstrates the level of risk that removing start-up delay may impose to the whole stream. Instead the probability formulae found above can be used to determine whether there is need for start-up delay. Candidates for T (n) can also be examined by this formula. The next section uses this formula t  and a case study of jitter to examine some transmission timing functions.  5.2.3 Transmission Time Functions for Fast Start In this section a few transmission time functions (T (n)) are examined for the fast start scheme. t  T (n) is the only parameter that is usually controlled by the designer. t  Fast start with minimum length of critical period (Aggressive Fast Start)  To minimize the length of critical period the fast start technique must use the maximum available capacity of the transport service. The maximum transport capacity is reached when frames are sent one after another with no delay between them. Therefore, in the formula for Cp calculations [Alg.l], the inter-transmission time is: Ij = / where T is the length of frame or serializing time. Figure 5-6 illustrates this technique visually. We can call this method aggressive fast-start.  Figure 5-6 Aggressive Fast Start  The following formulizes this method and presents the T (n) function that must be used in t  realizing the rate controller. In the following formulae, N is the number of frames sent during the critical period to fill jitter buffer with [J/I] + lframes, it can be derived from [Alg.l]: 69  T, (n) = n • /  for ()< n < N  T (n) = (n-N) • I + N •/  for n > N  t  The conditions that must be met are: For n > N :  T (n) < n • I - J  that holds with T defined above.  t  t  Fast Start for systems with bounded BW  In some cases the amount of bandwidth available to each stream is bounded. An example of such case is when we reserve a certain amount of BW for a stream. However, by accepting some risks, it is possible to transmit packets faster than what the reservation specifies. We do not discuss such a situation and focus on a case when the system is committed to limit its output bitrate to a certain peak-rate. To simplify the analysis, assume that peak rate is defined on a packet basis, so that Pr is "peak packet rate". Limiting the peak packet rate puts a restriction on the time interval between subsequent packets. Remember that this time interval was already limited to "Z", the frame length. Pr being the peak packet rate, the minimum interval between successive packet transmissions (from the beginning of the first packet till the beginning of the next one) becomes: M a x (/, 1/Pr), we call this value 1/p to make it similar to 1/Pr for more readability. To find N, the length of critical period, we must use [Alg.l] with Ij set to 1/p. When N is found the transmission time function is then described as the following: T, (n) = n • (1/p)  forO<n<N  T (n) = (n-N) • I + (1/p) • N t  forn>N  The conditions that must be met are: n> N : T (n) < n • I - J that holds with T defined above. t  t  The transmission curve for this case is quite similar to the one shown in Figure 5-6. The only difference is the length of critical period. The term 4/ is also replaced by 4(l/p). The smooth fast start (Implemented for our MPEG-4 streaming system):  For the implementation of our MPEG-4 streaming system, a different T function has been used. t  The two above-mentioned functions are suitable for minimizing the length of critical period, while the technique used in the implementation tries to make the transition smoother. This 70  scheme does not send packets at its maximum possible speed for the whole critical period. It starts sending packets at full speed and then gradually reduces the speed to the natural rate of the media. Figure 5-7 illustrates this method. Arrival Time Or T r a n s m i s s i o n iTime  XT"  Time (or D T S o r S e q _ N o . )  C r i t i c a l period (1) •4  *•  C r i t i c a l period ( 2 ) Figure 5-7 Smooth Fast Start (2), Aggressive Fast Start (1)  The T function used to realize the smooth fast start as following: t  T (n) = n(n + 1)A ;  for 0 < n < N  t  T (n) = (n-N) • I + N(N  + l)A  for n > N  t  This function satisfies the jitter buffer general requirement: n > N : T (n) < n-1 - J t  In the above expression: '/' is the length of frame or serializing time, A is a constant value set by the designer, and N is the number of frames sent during the critical period to fill the jitter buffer with [J/TJ frames. N can be derived from [Alg.l]. But it is in fact simpler to use the following formula:  _NjN  N  +  l)A  =  2  Resolving the above equation, requires knowledge of N or A. It is up to the system developer to determine an appropriate value for A and then obtain N. It is obvious that A must be greater than /, and N-A < I .In our design the interval between successive packets increases by A as each packet is sent: the accumulation of this increase leads to the T function described above. t  71  5.2.4 Comparing Different Fast-Start Timing Functions From Figure 5-7, it can be seen that the critical period for smooth fast start is not that much longer than aggressive fast start's and the increase in the risk of losing packets due to missed deadlines is not significant. Although the smooth fast start reaches the normal period later than the aggressive one, the amount of packets needed to fill up the jitter buffer is not very high and does not put the streaming in a significant danger. Hypothetically, there is no solid advantage for using smooth fast start instead of previously discussed methods (only considering the fast-start theory issues). But technically there are some advantages in using this technique; some of them are enlisted here: 1. The length of packet transmission at full speed is shorter and helps preventing congestion. 2. Transmission at higher rates causes queue build-up in intermediate devices, by gradually reducing the rate it is possible to slow-down queue build-up and reduce the size of queue needed for the flow, as well as reducing the amount of delay offered. The smooth change in the transmission speed helps preventing the intermediate devices from being overwhelmed and therefore causing unwanted jitter. For the first two T functions t  described above, the receiving time will not resemble the transmission time due to the same effect. The reason for this phenomenon is discussed below. The average bit-rate and the peak-rate of a flow are usually specified during the BW reservation procedure. In most of the QoS networks an intermediate device can always guarantee that the specified average bit-rate is dedicated for the flow. The peak rate, however, is not guaranteed to be available all the time. For this reason, the application must commit itself to minimize the time that it transmits packets at the full speed. Transmission of a flow at the full speed may cause a queue build-up in the intermediate devices for the flow. The queuing delay then makes the faststart assumptions at the sender meaningless. For this reasons it is always better to avoid long bursts. This is what the implemented fast start does. It reduces the transmission rate gradually and lowers the chance of traffic's being policed. The effect of token bucket shaping on fast-start  A series of simulations has been carried out to study the performance of the fast start technique when its traffic passes token bucket shapers (This is very likely in IntServ networks). Token 72  bucket shaping is a fairly well-known concept in data communications. Readers who are not familiar with the token bucket concept are encouraged to refer to section of Chapter 6 for a brief description of this concept. The simulation described in this section compares the token bucket shaped traffic for two fast start techniques, the aggressive fast start for minimum critical period and the smooth fast start. It is important to note that although the values that were used for the simulation are typical values for a sample presentation, the absolute value of these parameters is not important and will not change the results of simulation. Simulation results are given below: In Figure 5-8 the two lower curves (both marked as T ) show the input traffic to the shaper for t  both fast-start techniques; as it can be seen the output traffic is the same for both input streams (Two T curves, not distinguishable). It means that changing T did not make any difference as a  t  everything was limited by token bucket shaper. Arrival or Transmission Time (ms) 25000  20000  Tt  15000  Ta Tt 10000  Ta  5000  T - a ) r ^ t f 3 c O ' i - a > ! - i n c o - ! - O T i ~ L n c O ' ! - o > i - c o u O h - c n o c M ^ c D o o o i i - c o m t ^ - o o s  i  -  i  v  -  r  -  i  -  i  -  i  -  C  v  J  C  M  C  M  C  M  C  SeqNo or Time(ms)  J  Figure 5-8 Shaping May Neutralize the Fast Start  The parameters for this simulation were: Token rate = 5000 Bytes/s, stream bitrate= 4062 Bytes/s, bucket size = 1500 Bytes, max AU size = 1000 Bytes. As it can be seen the token rate is slightly above the stream rate to allow the steam to pass. The bucket size is also larger than the AU size to allow all packets pass. It is not usually reasonable to use a token rate much higher than the stream rate. 73  The parameter that is adjustable here is the token bucket size. For the second experiment I have changed this parameter to 2500, while keeping the rest intact, but the result was repeated (the T  a  curves completely overlap). The experiment was repeated for different set of parameters and the result for parameters in the same range remained intact, i.e., the token bucket shaper limited the fast-start rate controller significantly and the fast start was not very successful. After conducting some other simulations, it turned out that for this special case the token bucket size must be at least 5 times the probable normal burst-size of the media to allow a successful fast start using the aggressive fast start method (the token rate was set almost equal to the media rate). In fact, the output rate for aggressive fast start resembles the input and output rate of the implemented fast start (Figure 5-9). Though this is a very specific case, it shows that a very aggressive fast-start will not perform much better that a smooth one. Figure 5-9 shows the result of a simulation with these parameters: Token rate = 5000 B/sec, stream bitrate= 4062 B/s, bucket size = 7000 B, max AU size = 1000 B  Arrival or Transmission Time 7000 -,  DTS  - Ta c  Tt c  Tt  Ta  6000 5000 4000 3000 2000 1000  SeqNo or Time  Ta t ^aggressive) CO CM  CO  CO  00  i-  CD  •<*  i-  in  CD  m  T-  CD  co CD  CO  Figure 5-9 Effect of Token Bucket Shaping on Fast Start 74  Based on this argument and the simplicity of implementation, the implemented fast start uses the latest described transmission function, i.e., the smooth fast start. To implement a fast start scheme, the amount of jitter must be known. This however is not possible in many cases and an estimation of jitter is usually sufficient for our purpose. Another parameter that needs to be set is A. For our implementation we use a value few times greater than /; for a 100Mbps Connection (C = 10 7) and average frame size 1000 Bytes, I is 0.08ms or A  approximately 0.1ms. The implemented Fast start did not use any start-up delay; that is: d = 0. The next section s  discusses the result of some experiments that were carried out using the implemented fast-start rate controller.  5.3 Some Experiments The implemented rate controller uses the smooth fast start technique with no start-up delay (d  s  =0); this matter holds true for all the experiments explained below. In all the figures of this section the vertical axis represents the frame arrival time, and the horizontal axis denotes the experiment time. Low Jitter environment experiment The first experiment has been conducted in a low jitter environment; the measurements showed that in this environment jitter was in the range of 20-40ms. To show how the fast start works, a very large jitter buffer has been used ( » 2sec). Figure 5-10 depicts the results. The presentation used for the experiment was composed of a few h.263 video streams and a single G.723 audio stream. Using the value of a A calculated in previous section, and the measured value of Jitter, we have these parameters as following: A = lms  I = 80ms for 12.5fps H.263 Video, 30ms for G.723Audio, ...  J = 40 ms  N = 2 for both video and audio.  75  • T d ( D T S o r Ideal arrival)  Fast Start  - T a ( a n d Tt) of F a s t - S t a r t 5000 4500 4000  i II 11111 ir rm 11 ii 11111 IImil i o  o  CM  00  o CM  o CO  CO  I I II I I I I I I II I I I I I IIII I II II II I I II I I I I I I II I I I I I II I I I I I II I I I I I II I I I I I I II I I I I IIIII I  o o CM  O  O  00  CO T -  O  O O  O CM CO O) i i - CM  CM  O  O  O CM  CO 00  CO CM  O CO CO CO  00  CM  O O CD CO  O  O  00 00  CO O  Time or DTS (ms)  Figure 5-10 Fast Start Experiment in a Low Jitter Environment .Td  Ta  40000 35000 ^ 30000 <> / *§- 25000 o  I  20000  .1 15000  T-  CM CD Tf  CO CM CD CM  CO CO  in  CD  o co co in r--  CD CO  CO CM CM O  CO CO CO  o in T CO  T T—  CM  CO  o CD  CO CO m  XT  in in  CD  CO  o CM  i —  o CD  T  CD CM  r^ CO CO CM  CO CO co CM  CO CO CM CO CM  o CO r^ CM  Time or DTS (ms)  Figure 5-11 Fast Start Experiment in a High Jitter Environment  76  High jitter environment experiment  The next experiment has been conducted in an environment with a higher level of jitter (Figure 5-11). The amount of Jitter buffer for this case was 2 seconds. The stream used for the experiment was a video stream (12.5 fps) with I = 80 ms, therefore N =12.5 or 13; this means that jitter buffer contained almost 13 frames in the normal period. As it can be seen from Figure 5-11, the amount of jitter is very high, but the Jitter buffer compensates for this delay variation and as the result, the played back video was very smooth. A zoomed-in look at this diagram is given in Figure 5-12, it can be seen that during the critical period (in this specific experiment) some frames were about to miss their deadline because of high jitter (around time 360ms). The same amount of jitter that happened later was not dangerous as system was in its normal period. To generate a high jitter environment for this experiment, I have reduced server's processing power and network resources by running multiple heavy applications on the same machine that consumed the network and processing resources. I 8000 T  Td  Ta  I  :  •  1  Time or DTS (ms) Figure 5-12 A Closer Look at <Figure 5-ll>  77  No jitter buffer experiment  In the next experiment no jitter buffer was used, as it can be simply guessed, the resulted presentation was a very interruptive one with drastically low quality. The stream that its decoding and arrival times are shown in the Figure 5-13 was an audio stream. For this experiment, even a very small jitter caused the play back to be totally interrupted. In this special case 80% of the time nothing could be heard.  Td  DTS  Ta Arvl Time  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  O T - T - c o c o T r - ^ r m m c o t D r ^ r ^ c o i - C M C O T j - m c D h - O O C n O T - C M C O - *  Time (SeqNo or DTS)  Figure 5-13 No Jitter Buffer Experiment  Audio streams in low and high jitter environments  Another experiment with the jitter buffer size of 2 seconds has been conducted. The results were observed for one of the streams of this presentation, a voice stream. The first figure below (Figure 5-14), shows how the system performs in a low jitter environment; the second figure (Figure 5-15) depicts the results of the same streaming experiment in a high-jitter environment. For the first one, the played back audio was smooth and perfect. For the second one there were moments of interruption. These moments of interruption are the direct result of jitter compensation failure. Jitter compensation failure causes some packets to miss their deadline.  78  3 5 0 0 0 -r 30000 «T 2 5 0 0 0 |  20000 -  r—  ra 1 5 0 0 0 -  > <  10000 5000 0 o  o o o o o o o o o o o o o o o o o i - CM n t m (D r ^ o o c o o - i w n t m e g co N t T - C O m C M O l f f l C O r C O W W O l C O ' t ' T - c o m c o c o O T - c o m r ^ c o o c M c o i o r - ^ c o i  -  T  -  T  -  ^  i  -  r  -  W  «  W  W  W  W  Time or DTS (ms)  Figure 5-14 Audio Stream in Low Jitter Environment  25000  CM CO Tt  CO CM CO CM  CO CO Tt  LO CO LO  CO  o CO  co CO  CO CM CM  o  CO CO CO  o lO CO  T— T—  CM  CO  o CO  CO CO LO  Time or DTS ( ms)  Tt  CO  cn CO  LO LO  o CM  Figure 5-15 Audio Stream in High Jitter Environment  79  5.4 Jitter-Adaptive Rate Control In some situations, the pattern of jitter changes as time elapses. If the amount of maximum jitter (J) increases, the previously filled up jitter buffer may not be adequate for a smooth streaming. In this case, the transmission can speed up for a while to enlarge the jitter buffer. Once the Jitter buffer size is adequate for compensating the network jitter, it will be safe for the rate controller to adjust the transmission speed to the natural rate of media. This scheme is called Jitter-Adaptive Rate Control. Such a scheme requires the system to continuously monitor the Jitter. In a version of the implemented streaming system that uses RTP as its transport protocol, we used RTCP reports to measure the jitter. Based on the amount of measured jitter, the system decides whether to speed up the streaming, to continue with the natural rate or to reduce the transmission rate. In that experimental implementation (section 4.2.3), the increase in jitter was translated to network problem and congestion, therefore the transmission rate was reduced. This is in contrast to what the abovementioned jitter-adaptive rate control technique suggests. In fact, the way the jitter increase problem is dealt with depends on how jitter increase is interpreted. This issue has to be solved under the special circumstances of an implementation.  5.5 Summary of Rate Control Subject From the analysis and experiment results, presented in this chapter, it can be seen that the faststart technique can eliminate the need for startup delay (at least in cases where BW is not scarce). Other than the fast start, Rate Control is a simple task that is achieved by sending out frames based on their decoding time stamp. The idea of fast start and jitter buffer can further enhance the design of a rate controller. Techniques based on extra information about the network latency and jitter pattern will definitely be useful for the implementation of an efficient rate controller. The jitter-adaptive rate controller is an example of such improvements inspired by the "fast start" idea.  80  Chapter 6. QoS-Aware MPEG-4 Streaming System  Introduction of Quality of Service concept to the Internet technology in recent years spurred the development of some new standards. Any application that uses the quality services provided by the QoS-Interrtet has to comply with these standards. Interaction with the QoS-Internet is an important job from an end-system stance. Providing some sort of QoS inside the end-system is another task the end system has to undertake. This chapter discusses some of the QoS issues that concern MPEG-4 streaming over the QoSInternet. The QoS issues are viewed from an end-system position. After some basic discussions and introduction to the problem, a new design is proposed as well as the guideline for its implementation. A partial implementation of the design and its results are also discussed.  6.1 QoS: Requirements and Models Quality of Service may have different meanings in each standard or application. This section first describes the quality services that are most likely to be provided by the QoS-Internet to the endsystems; and then studies the special QoS requirements of MPEG-4 traffic by examining its traffic specification.  6.1.1 QoS-Internet Models The notion of end-to-end Quality of Service in this document refers to the QoS observed by end systems, in particular clients. The server, by definition, does not receive any service from the clients. It must be noted that QoS issues pertaining to the delivery of flows inside a network are of a different nature than those of concern to the end systems. This chapter assumes that a QoSenabled Internet has already been implemented and does not discuss QoS issues that concern 81  intermediate devices in a QoS-Internet. Therefore, the only subjects that must be discussed are the type of services provided by the QoS-Internet, the methods of utilizing these services, and the requirements that end-systems must fulfill. Since the types of services that a QoS-enabled Internet must provide is still a matter of discussion in several IETF groups (at the time of this writing), the QoS parameters and services that are chosen here are the most usual and common ones. IntServ [19] is the QoS network model that is used in this chapter. The reason for choosing this model is that IntServ is expected to become the dominant access network technology. Types of service that IETF has defined for IntServ are given below: •  Guaranteed Service:  o deterministic delay guarantee o token bucket used to specify traffic and QoS •  Controlled-Load Service:  o network provides service close to that provided by a best-effort network under lightly loaded conditions o token bucket used to specify traffic •  Best-Effort Service:  o no guarantees In IntServ Model, Guaranteed Service means guaranteed end-to-end delay, guaranteed Bandwidth and No packet loss. Controlled load Service does not give solid guarantees on these parameters but promises that end system's packet will not experience a congested network and will always see a lightly loaded network while traversing it. Best Effort service is similar to the current Internet service, i.e., no guarantee in delivery, but network does its best to deliver packets. IETF has also defined another type of QoS network, called DiffServ [23]. DiffServ treats the traffic in aggregate form, while per-flow guarantees are meant to be achieved in IntServ. The former scales well with large number of flows, while the latter is not scalable. For this reason, the most likely scenario is that for a complete QoS solution, the core network uses the DiffServ model that scales well, and the access network uses the IntServ model to provide per-flow guarantee to the end users. This approach results in a hybrid network model. From an end system perspective, however, this is not important and we consider the hybrid network as an IntServ network since the end host only needs to interact with the IntServ network. 82  6.1.2 QoS Requirements and MPEG-4 Traffic Spec's The QoS concept has been initially developed for individual flows or class of flows rather than for applications. Therefore the service levels described above are all meaningful when we deal with individual elementary streams; nevertheless, the story is different for an MPEG-4 application that wants to receive QoS service that is meaningful to the whole application and all its streams. To realize a QoS-enabled application, more attention must be paid to the adoption of a QoS Signaling method and the design of an architecture that combines the signaling tools with other QoS related modules. This sub-section describes the new, MPEG-4 related, problems that are not addressed by conventional QoS models. MPEG-4 standard supports the use of many tools and encoding techniques that have previously been developed. For example, MPEG-2 or H.263 video streams may be used in an MPEG-4 presentation as well as MPEG-4's own encoded video, each one as a separate object. This diversity in the media type used by MPEG-4 complicates the function of the delivery layer. MPEG-4 traffic is an aggregation of the traffics of its elementary streams. In another word, the MPEG-4 traffic consists of the traffics of its objects. Since an MPEG-4 presentation is very likely to comprise a large number of objects of different media types (audio, video, animation, images, etc.), it is almost impossible to find a straightforward format for MPEG-4 traffic. This diversity in data types and possibility of adding or removing objects from a presentation during the rendering makes MPEG-4 traffic very time-variable. Even if all the media that are used in a presentation are of the same type, addition and deletion of them during the presentation may cause the resulted traffic to be greatly variable in time. User Interaction is one of the most important facilities that have become possible due to the object-based nature of MPEG-4. User can interact with the scene, add objects, remove them or alter their traffic and rendering specifications. Adding and removing objects changes the traffic pattern and may make previous BW reservations useless. All these complexities that object-based nature of MPEG-4, diversity in media types and userinteraction produces, makes it impossible to define a simple and straightforward model for MPEG-4 traffic. It is beneficial to mention that almost for all video encoding techniques, changes in the scene and movements in the movie affect the produced stream traffic greatly. This  83  is in particular the case for encoding techniques that are supported by MPEG-4. Obviously, this feature adds to the complications mentioned above. Although the object-based nature of MPEG-4 introduces so much difficulty to the presentation delivery function, it has some benefits in terms of providing high degree of efficiency in compression, presentation management and even delivery of MPEG-4 traffic. A summary of the MPEG-4 traffic specification is given here: 1. It is object-based a. It may be composed of a large number of objects; thus requiring massive amount of management and scheduling efforts. b. Each object may be of a different type of media (e.g., audio, video, image), and have different QoS requirement. c. Objects may be added or deleted during the presentation; hence causing the traffic to be greatly time-variable even with a limited set of media types. 2. User is allowed to interact with the scene, adding to the problems mentioned in 1 a. Addition and deletion of objects (streams) is possible. b. Traffic/Rendering specification of an object may change due to user-interaction To solve these difficulties, the MPEG-4 standard has defined some tools and frameworks that are used in the design of the proposed system.  6.2 MPEG-4 QoS Notion The MPEG-4 standard seeks to tackle the specific QoS-aware delivery problems, mentioned in previous section, by defining a set of QoS functions and parameters for DMIF. Since DMIF is only an integration framework, it is obvious that the aforementioned problems are only addressed in a semantic manner. As a result, mapping DMIF QoS model to network QoS architecture is a complicated challenge that must be examined for each specific network QoS technology separately. This section gives an overview of the DMIF QoS model. 84  6.2.1 DMIF QOS Model DMIF Ver.2 starts addressing the QOS architecture by extending the DAI primitives with QOS monitoring capabilities. However, DMIF does not go further than specifying a few parameters and primitives for API. Designing a framework or architecture in which these primitives are to be realized is left to system developers. Tackling this issue is in fact the partial goal of this thesis. Mapping QoS parameters to their network access interface counterparts is another issue that is addressed in the design proposed in this thesis. The QoS architecture for DMIF should address different ways of dealing with QoS in different networks. Basically, there are two approaches. On one hand, there are proactive networks that allocate resources during connection set-up. On the other hand, reactive networks that react according to the available resources during the lifetime of a connection.  In order to  accommodate both types of networks there is a need of three types of DAI primitives: •  REQUEST OF QOS o These primitives are already specified in DMIF Ver.l (i.e., DAI_ChannelAdd). The QOS parameters are specified inside the ChannelDescriptor in terms of: Priority (qualitative QOS), Profile (quantitative QOS in terms of delay and loss), and Traffic (amount of resources in terms of bit rate) These features are now fully implemented in our software package.  •  MONITORING OF QOS o These primitives are specified in DMIF Ver.2 (i.e., DAI_ChannelMonitor, DAI_ChannelEvent). They provide feedback mechanisms from the DMIF layer to the Application to inform about the underlying available resources.  •  RENEGOTIATION OF QOS o These primitives are not included in DMIF, and may (shall) be added in a future amendment of DMIF (e.g., DAI_ChannelReneg) to allow changing the ChannelDescriptor of an already established channel in a DMIF session. My proposal for this primitive is discussed in section  85  The ChannelAdd primitive operation has been described in section 2.2.3, Figure 2-7. The semantics of the DAI_ChannelAdd, DAI_ ChannelMonitor, and DAI_ChannelEvent primitives are given in APPENDIX A. A more detailed description is found in [5]. DMIF specifies the QoS traffic parameters for a given stream at the DAI (e.g., bitrate, maximum packet size, etc.). QoS performance requirements (e.g., delay, loss probability, etc.) are also delivered to DMIF through the DAI. Channel creation operation in DMIF uses this information to create Transmux channels. It also delivers this information to the DMIF peer as DMIF parameters. This part of standard, however, does not specify how these parameters are translated to native network QoS parameters. As a result, the DMIF QoS parameters are very generic. The next section discusses the QoS-related parameters that the current DMIF uses as well as the aforementioned primitives (i.e., DAI_ChannelAdd, DAI_ChannelMonitor and DAI_ChannelEvent). DMIF QoS Parameters: To facilitate QoS information conveyance from the Application layer to DMIF sub-layers, "ChannelDescriptor" and "qosDescriptor" are defined and used by DMIF. ChannelDescriptor is the parent of qosDescriptor as it contains all the information that qosDescriptor has. ChannelDescriptor is generated outside of DMIF layer, where QoS traffic specification and performance requirements are available. ChannelDescriptor at the DMIFApplication Interface carries all the information required for channel configuration. This includes in particular Quality of Service parameters for both profile (e.g., loss, delay, priority, etc.) and bandwidth requirements (e.g., peak bit rate, average bit rate, etc.) for an individual Elementary Stream. The qosDescriptor parameter is extracted from the ChannelDescriptor parameter to be used in the internal operation of DMIF at the DNI level. In another word, qosDescriptor is used for communication between DMIF peers as well as DMIF sub-layers (through the DNI). In DMIF operation, when a channel is going to be added to the presentation, All the information about channel configuration and stream QoS is put in the ChannelDescriptor and delivered to DMIF through DAI_ChannelAdd primitive. The information carried through the DAI in the ChannelDescriptor can be further delivered across the DNI (DNI_ChannelAdd[ed]) and be mapped in the DMIF Default Signaling Protocol (DDSP). 86  Inside the DMIF layer ChannelDescriptor is processed and qOsDescriptor is generated. DNI_TransmuxSetup is the (only) DMIF primitive that uses this descriptor to deliver information through the DNI to DMIF network access sub-layer and the DMIF peer. DMIF enables the aggregation of multiple Elementary Streams, which share the same QoS profile, into a single TransMux, and computes the aggregate bandwidth. In this case qosDescriptor contains aggregate traffic QoS description. DMIF then maps the QoS requirements (i.e., QoS profile and aggregated bandwidth requirements) for a particular TransMux into specific network QoS (e.g., traffic contract). Table 3 DMIF QoS_QualifierTags  QoS_QualifierTag  Semantic Description  x  PRIORITY  Priority for the stream  MAX_PDU_SIZE  Maximum size of a PDU, as delivered over the Transmux  AVG_BITRATE  Average bit rate measured over an observation time window  MAX_B ITR ATE  Maximum bit rate measured over an observation time window  MAX_DELAY  Maximum delay experienced by any PDU of an Elementary Stream measured over an observation time window  AVG_DELAY  Average delay experienced by the PDUs of an Elementary Stream measured over an observation time window  LOSS_PROBABILITY  Allowable probability of loss of any single PDU of an Elementary stream measured over an observation time window  JITTER.TOLERANCE  Maximum delay variation experienced by any PDU of an Elementary Stream measured over an observation time  The qosDescriptor shall be able to carry a number of QoS metrics. The set of metrics currently defined and their semantics is provided in Table 3. The values that must be used to identify the Qualifier tags is given in [5]. The ChannelDescriptor shall be able to carry a number of channel parameters, enough to create QoS-aware channels. These include all the QoS metrics defined in Table 3 above; as well as the set of parameters for other purposes. The set of parameters currently defined in addition to the QoS metrics is provided in Table 4. SYNC_GROUP and SL_CONFIG_HEADER are the only  87  two parameters that the current DMIF (Ver.2) adds to qosDescriptor set to make the channel Descriptor. Further information about these channel qualifiers can be found in [4] and [5]. In Table 4, two user-defined fields can be seen that are explicitly defined for RTP and UDP delivery cases. In our implementation of the "MPEG-4 over RTP" system, we had to define and use these user-defined fields. This addition can also be considered in future amendments of DMIF. Table 4 DMIF defined Channel_QuaIifierTags  ChanneLQualifierTag  Value  Semantic Description 5  Reserved  0x00-0x70  As for QoS Metrics  SYNC_GROUP  0x71  Identifies the timeline associated to a channel  SL_CONFIG_HEADER  0x72  Describes the MPEG-4 Sync Layer Packet Header format  Reserved  0x73-0x7f  ISO/TEC 14496-6 Reserved  User defined:  0x87  User Private (used in our implementation): requests the use of UDP in the data plane  CHANNEL_UDP_DPLANE User defined:  0x88  requests the use of RTP in the data plane  CHANNEL_RTP_DPLANE User defined  6.3  User Private (used in our implementation):  0x80-0xff  User Private  Proposed QoS-Aware System Architecture  This section describes a proposal for the architecture of a QoS-aware MPEG-4 streaming system. This design seeks to tackle the shortcomings of DMIF in presenting a complete framework for QoS-aware streaming of MPEG-4. Naturally it is possible to have other designs that fulfill the needs of a QoS-aware streaming system, but the design proposed in this dissertation benefits from some unique features that will ease the implementation effort. The supportive idea for such a claim is the fact that this new design is based on our implementation of an MPEG-4 streaming system, described earlier in this thesis. The proposed system architecture has been named QoSaware for some reasons that are explained here. 88  QoS-enabled or QoS-Aware The term "QoS-enabled" usually applies to a system that is able to provide some levels of guarantee on the quality of service requested by the client. Subsystems of such a system are designed in a way that the overall performance of the system meets the constraints requested by the client. An example of such a system is an IntServ network that can provide a flow with a bounded delay delivery service. The term "QoS-aware" is usually applied to a system that takes advantage of the quality services provided by a QoS-enabled system. A QoS-aware system must be able to communicate with the QoS-enabled system in order to benefit from the provided QoS. There, however, does not exist a solid definition for QoS-enabled and -aware systems in literature, and a system that is both capable of providing some levels of QoS and able to take advantage of other systems' QoS may be called QoS-aware. The Server Architecture that is proposed in this thesis is not claimed to be QoS-enabled as it does not address the file access and processor sharing matters; nevertheless, from the network access perspective it is fully QoS-enabled and QoS-aware. Therefore, I cautiously chose the term QoS-aware to refer to the proposed system, despite its capability to perform some operations to guarantee QoS in the server itself.  6.3.1 System Architecture Chapter 3 discussed the system architecture that has been used in our implementation. That architecture included all the necessary objects and modules to support a standard-compliant DMIF implementation. For this reason the previous architecture, shown in Figure 3-2, is still valid for the new design to some extent. The new design is an evolution of the previous architecture; the overall structure is almost kept intact. The simple relations between modules are modified and enhanced. The internal functionality of most of the modules is also modified to reflect QoS awareness or capability. To have a better understanding of the new architecture, it is recommended that the reader refer to Figure 3-2 prior to examining the new design. The proposed architecture includes a new design for DMIF implementation and some additional parts in the delivery layer that are not part of DMIF. Figure 6-1 depicts some parts of the new architecture in detail (do not compare the old and new architectures using the depicted figures as 89  they emphasize on different parts of the system). It must be noted that this figure is a simplified depiction of the system architecture and does not reflect all the details. Sync Layer Data Plane  FlexMux Scheduler  TxMux Scheduler (WFQ)  H  Data Plane Manager RSVP . Daemon  RTP Daemon  RTP Daemon  ii i  II .  D.MIl Sij-naliiij;: DDSI' (JoS  Monitoring  DDSP  Control Plane  \-  ES Data Channel  Rate Control or Primary Traffic Shaping  RSVP Daemon  me II  D M I KS i g n u l i n g : DDSI' QoS Monitoring  MPEG-4 Server Application  Figure 6-1 A High Level View of the New System Architecture  The new system architecture emphasizes more on the Data plane rather than the Control plane. This is mainly because of the complexity that is introduced to the data plane in a QoS-aware environment. The control plane is very similar to the one discussed in Chapter 3 and Chapter 4; as it has been emphasized at the beginning of this section, the old design modules are still functional in the new design. To facilitate QoS-awareness, some signaling protocols such as RSVP must be adopted in the control (and data) plane. By using RTP for the data transport, some QoS monitoring reports are available through RTCP. These reports are necessary for realization of DMIF QoS monitoring functionality. In contrast to the former architecture, in this new design it is not easy to separate control plane and data plane completely, therefore the boundaries of these planes shown in the Figure 6-1 may not represent the boundary of objects or modules in an implementation. 90  The proposed system architectures (Standard and QoS-aware) are in fact client/server architectures. From the previous chapters it has become obvious that the server implementation is more complex than the client's (the delivery layers are compared). Therefore, most of this chapter discusses the matters related to the Server design and implementation. The next two sections describe the architectural design of the Control and Data planes of the proposed system.  6.3.2 Control Plane The control plane of the new design is very similar to that of the old one. In fact the old design is almost completely preserved in the new architecture and only some new parts are added to it. Figure 6-2 depicts the new design for the server with emphasis on the control plane part. Although the view of this design is not in general very different from the previous one, each module of it is redesigned to facilitate QoS-awareness. SS4  •D > O  Application Service Manager  O —I  '£  :  i  SS3  SS2  !§1§?811  CD  to  1 Service Session Object Id= SPI  DAI IService4.  c o  Service3  TO  DMIF Service Manager  Service2 FlexMux Scheduler  llllill  DMIF Service Object Idl  DNI  m  Network Session Manager  o o <  2  NS3  NS2  Network Session Object (ns id 1)  QoS control and event monitoring module  S.  o -  1  TransMux Scheduler ;PN,Daemon: Listening fsecpndary«threads¥""  •5  F.S prouder:  LS  MP4 hie reader »  pro\ider JLMcdia . Real-Tirrj  o  Q. Q. < 01  e CO  MPEG4 Encoder  iii  • RTP daemon^RTCR? listenineithreadft^:  .RSVP'daemon?  Figure 6-2 New QoS-Aware Server Architecture (Control Plane emphasized)  The control plane in the client architecture is not different from the previous design depicted in Figure 3-4. The only difference is again in the design of each module or object that has to accommodate QoS-awareness. For example, the Network Session Objects have to implement the required functionality to communicate with the RSVP daemon that is presumably available in the system (e.g., by the operating system). 91  In the control plane, QoS-awareness is introduced by adopting a few new modules such as RSVP daemon and QoS monitoring/control modules. It must be noted that the RSVP daemon (also known as RSVP process) may exist separately from the rest of the system. For example, some operating systems provide this module and present its functions to other applications through an interface. In this case the RSVP daemon does not need to be included in the architecture. Schedulers and Traffic shaper modules are data plane modules that must be adopted in order to provide QoS capability in the data plane. These modules are not part of the control plane; however, some of them are shown in Figure 6-2 to emphasize their position in the architecture. It must be noted that Figure 6-2 does not show all the necessary modules for a real implementation, in fact the emphasis of this figure is the control plane modules and their relationship. The few next sections discuss other issues that the control plane has to attend to. Control Plane structural design The control plane of the proposed system basically uses the same architecture the previous version does. This section reviews this architecture from QoS perspective, mostly for the server. Each layer of this architecture is explained in this section. Server Application Layer The Server application layer is the core module of the system. It is clear from Figure 6-2 that this layer is not a part of DMIF and in fact lays on top of DMIF and provides the final processing for user requests received by DMIF. Since this layer is not delivery aware, there is no notion of transport related QoS (RSVP processing). Similar to the first (standard) design, service sessions are represented by dedicated objects that operate in the same manner as their predecessors. Data Channels in this layer correspond to the individual elementary streams and are fed by the elementary stream pump (e.g., file reader) with media data. The implementation of data channel must include MPEG-4 Sync Layer functionality. Packetization of access units into SL packets is implemented in these objects. Rate Control is another functionality that Data Channels provide. The Rate Control module must be implemented in the same way that Chapter 5 introduced. DMIF Service Layer The DMIF Service layer is mainly an abstraction layer; it also acts as DDSP manager when it invokes DDSP primitives in its underlying layer. From the control plane point of view, the DMIF 92  Service objects in this layer are similar to those in the old design. The implementation of DMIF primitives, however, must address the QoS monitoring and signaling features that the new design requires. From the data plane perspective there are also some improvements. Introducing a simple scheduler helps solving possible contentions that may happen while a FlexMux channel is being accessed by more than one stream. This matter is explained in the next section. FlexMux is implemented in the DMIF layer, therefore it is the DMIF layer that decides about multiplexing elementary streams into FlexMux channels, reserving new resources for an existing FlexMux channel or not using FlexMux for a specific stream, etc. All these functionalities must be implemented in the DMIF service Layer. However most of the information that is needed to perform this operation is available only by interacting with the lower layer, DMIF Network Access Layer. Matters associated with the FlexMux are later discussed in section DMIF Network Access Layer Providing QoS awareness and capabilities in the DMIF Network Access Layer (DNA layer) is by far the most complicated job in the effort to design a QoS-aware MPEG-4 server. Network Session objects in this layer are responsible for providing all the functionalities necessary for providing QoS-aware services; in addition they are responsible for normal DMIF signaling process. Utilizing RSVP (and RTP) in this layer in addition to previously used protocols, TCP and UDP, requires additional management effort. Issues associated with the usage of RSVP in the DNA layer are discussed in section 6.4. Since this layer is in direct contact with the network transport layer, it will be the last place for the server to influence the delivery of streams to the client. The function calls to send packets end in this layer; therefore for blocking calls the scheduling technique used in this layer may affect other layers' performance. Traffic shaping is also done in this layer for each individual Transmux channel. This matter is discussed in section Control Plane end-to-end QoS signaling protocol The very first need of a QoS aware system is QoS signaling. For the proposed system RSVP is used as the signaling protocol. This would be the very natural choice since IntServ is the presumed access network. Mapping DMIF QoS to RSVP notations and operation is of special importance and is discussed in a separate section, section 6.4. Adopting RSVP as the end-to-end  93  QoS signaling protocol obliges the use of RSVP host model. The following explains how this model is included in the proposed architecture. Including the RSVP host model in the proposed architecture In addition to the RSVP protocol process, the RSVP host model includes several QoS related modules such as Admission Control, Policy Control, Packet Classifier and Packet Scheduler. The RSVP process is assumed to be available to the system through the operating system or other non-DMIF means, thus it does not fit in the DMIF layer. Packet Schedulers are explicitly shown in the proposed architecture, while packet classifier is presumably implemented in the network interface layer, below DMIF layer, in the operating system. During reservation setup, an RSVP QoS request is passed to two local decision modules, "admission control" and "policy control". Admission control determines whether the node has sufficient available resources to supply the requested QoS. Policy control determines whether the user has administrative permission to make the reservation. Admission and Policy Control modules are not explicitly shown in Figure 6-2 or Figure 6-1. In fact the "QoS Monitoring and Control" module of the DMIF layer can act as the admission and policy control unit. The information that these two decision modules require may not be available in the DMIF layer only. Therefore their implementation may require some additions to the DAI interface or even addition of some other modules to the other layers. Assuming that the delivery constraints are the only factors to affect the admissibility of a client reservation, the "QoS Monitoring and Control" unit is the recommended module for realizing admission and policy control functions.  6.3.3 D a t a P l a n e  The data plane of the proposed system, as seen in Figure 6-1, spans several layers. A better visualization of the data plane in the server architecture is given in Figure 6-3. This depiction, however, is not very precise; it is only used for illustrating the relationship between data and control plane modules. In the proposed system, for each user one instance of the data plane is created (in each layer, the corresponding object). A typical, more precise view of the data plane is depicted in Figure 6-4. This figure shows data plane structure in a case of two users. The reader can easily generalize this depiction of the architecture. 94  Server Application  Service Session  1 Data |Channel| (SyncL)  DMIF Service Layer  DMIF Service FlexMux Scheduler  DMIF Network Access Layer  TxMux channel  RTP RSVP Daemon Daemon  FlexMux  TxMux channel  ES resource  Network Session  TxMux/Scheduler  i  Resolve contention^ be ween Different Transmux's  Network Transport Figure 6-3 Data Plane in the Server Architecture  This section describes the data plane design by dividing the data plane functions into 3 sets: •  Data Transport and Quality feedback  •  Multiplexing and Scheduling  •  Traffic Shaping and Rate Control Data transport, quality feedback The proposed system utilizes RTP as its primary transport protocol for delivering media data over the Internet (The method of incorporating RTP in the system was discussed in Chapter 4). It is however recommended that TCP be supported too. Some streams may prefer reliable delivery of data to timely delivery. This is particularly the case for scene and object descriptors streams. In simple presentations with minimal delivery requirements (for example presentations that do not require inter-media synchronization and QoS monitoring), RTP can be replaced with UDP (as it is in our first implementation).  95  Quality Feedbacks RTCP, which is part of RTP, provides the server with quality feedbacks from clients through receiver reports. Clients in turn may use RTCP server report information for inter-media synchronization purposes. Quality feedbacks that RTCP provides can be used in the realization of DMIF QoS monitoring primitives. RTCP reports are also useful for predicting network problems and applying congestion control schemes. In a version of our implementation that uses RTP, we used jitter and end-to-end delay information to predict congestion. More information about this experiment is given in section of Chapter 4. In order to avoid congestion and help recovering from it, we employed a simple rate control scheme that responds to the increase in jitter and delay by reducing the rate of transmission for the streams that are resilient to rate distortion. Choosing the protocol stack DMIF does not specify how data is transported over the network and what transport protocol is used. It is always up to system developers to implement a scheme that chooses the appropriate protocol stack for media data transport and exploits it for data delivery. MPEG-4 has not predicted how the application layer informs the delivery (DMIF) layer about its intended transport protocol. This was one of the points of ambiguity when we tried to implement a version of the data plane that could use both UDP and RTP as its data transport protocols. We had to use some of the user-defined parameters in ChannelDescriptor to tell the data plane which protocol shall be used for the associated channel. (See section 4.2.3 to learn how this feature is implemented.) Table 4 includes our modifications to the DMIF Channel_Qualifier_Tags. Multiplexing and scheduling (server side) A QoS aware system has to deal with multiplexing and scheduling issues with more care. Multiplexing and scheduling may be performed in several manners. Figure 6-4 shows a generic solution. It must be noted that the client has less tasks to do and most of the work is done at the server. This section views the multiplexing/scheduling issue from server's perspective. Shared Resources Each MPEG-4 service session corresponds to a single presentation that may in turn be made up of several objects. These objects that are delivered in a streaming manner share the server 96  processing power, some buffer/shapers and the network connection. It is necessary to adopt scheduling techniques in order to share these resources efficiently and fairly. In this document we only discuss the problems and solutions for scheduling the access to server network connection and shared buffers; the processing power is assumed to be ample and well distributed amongst objects that need it.  o n  o o  o n  D M IF Service j  : Bulfei O 1r  Network Transport  : T i affic S h a p e i  (^~^) : S c h e d u l e r D C : Data C h a n n e l  Figure 6-4 Typical View of the Data Plane, Hybrid Mode (scenario 1)  Layers of Multiplexing/Scheduling Schedulers are always implemented alongside multiplexers. As it can be seen from Figure 6-3 and Figure 6-4, two sets of mux/schedulers are considered for the proposed design. The reason for having two sets of multiplexer/schedulers is that MPEG-4 presentations usually have a (large) number of objects and it is necessary to divide the load of mux/scheduling between two layers. Multiplexing elementary streams reduces the system overhead significantly. FLEXMUX/SCHEDULER  The first set of multiplexer/schedulers that is called FlexMux/Scheduler uses the optional MPEG-4 multiplexing tool, the FlexMux; it exploits a simple scheduling technique as well. The job of this multiplexer/scheduler is to multiplex elementary streams that belong to the same presentation into flexmux channels. Therefore the number of mux/schedulers in this layer will be equal to the number of presentations that are being served at the same time (Figure 6-4). 97  TRANSMUX/SCHEDULER  Outgoing streams have to share the same connection at the network connection point. This is the point where Transmux.multiplexer plays its role. A scheduler is associated with this multiplexer to help in utilizing the shared resource, i.e., the network connection. Transmux channels that carry the (flexmuxed) elementary streams are multiplexed into the network connection at this multiplexer. These transmux channels may belong to different presentations, though some of them could have come from the same presentation. Regardless of the number of presentations, there is need for only one scheduler at this level. Figure 6-4 illustrates this matter. Since we assumed that the QoS access network would be an IntServ network, the last scheduler that schedules several Transmux (transport) channels for transmission over the network, must implement a scheduling algorithm that provides per-flow guarantees. Weighted Fair Queuing (WFQ) scheduler provides per-flow guarantee that is required by the streams at this level. The FlexMux Scheduler however does not need to be as powerful as the tranmsux scheduler and can implement a simpler scheduling technique like Priority Queuing (PQ). Multiplexing/Scheduling Scenarios: It is possible to remove the FlexMux scheduler if we do not use the FlexMux feature of DMIF. This in effect moves the load of the omitted scheduler to the Transmux scheduler (scenario 2); Figure 6-5 depicts this method. Another alternative is to multiplex all elementary streams of each presentation into one Transmux channel. In this case we still need both schedulers, but this time the Transmux scheduler faces its minimum possible load. Figure 6-6 shows the typical view of the data plane for this case. Selecting Multiplexing mode can be done in a dynamic manner. This will allow having the highest efficiency in the system. Multiplexing elementary streams of different types (Image and Audio for example) is not a good idea and traffic specs of their aggregate stream may make resource reservation impossible; however individual streams might have been able to proceed in resource reservation separately. This suggests that multiplexing must be limited to same-class traffics, and only when RSVP allows that, and the aggregate traffic does not violate the previously set commitments.  98  d  n  d  n  d  n  d  n  d  n  d  d  o  n  Figure 6-5 Omitting the FlexMux/Scheduler (Scenario 2)  : li u I fer and T r a t f u sluipci SLIU-JUL-I  D C : Data C h a n n e l Figure 6-6 Maximal Use of FlexMux (scenario 3)  When is multiplexing allowed? Multiplexing elementary streams into FlexMux channels is one of the optional features that the MPEG-4 standard offers. In a QoS aware system, multiplexing multiple streams into one channel may cause some problems. If flows with different kind of QoS requirement are multiplexed, QoS requirements of the aggregate traffic will be maximized in each aspect to fulfill the requirements of all flows. Even if the QoS requirements of all flows are similar, multiplexing different kinds of traffic may still cause some problems. The reason is that if the specifications of the aggregate traffic, in terms of traffic shape, do not meet the requirements of the set of provided QoS services, the traffic may be policed and not get the required treatment. For this reason multiplexing of elementary streams into one Transmux channel is only allowed for streams of the same type or traffic shape. In some cases, even addition of new streams of the same type to an existing TransMux channel may violate its BW commitments. These concerns suggest that we keep the use of FlexMux feature at minimum. Nevertheless, if QoS renegotiation facilities are available, it may be possible to accommodate new channels into existing TransMux channels. Table 5 Modes of Multiplexing  Mode of operation  Advantage & Disadvantage  Using FlexMux: One channel for -Easy session management -QoS requirements changes are not well addressed, each presentation Hard QoS management, requires QoS renegotiation. Using FlexMux: Each Class uses -Better QoS management, harder session management. its own FlexMux channel (Same Allows integration of traffic, reducing number of flows. class aggregation)  -Addition of new channels to each class MAY change QoS requirements, hence requiring QoS renegotiation.  Using FlexMux: creating new -Does not require QoS renegotiation for the existing FlexMux  channels  for  new channels. More network overhead  streams of an existing class No FlexMux: Each stream has its -High network and server overhead; but better chance own TransMux channel  for each flow to get its required QoS.  100  Other problems that are usually associated with the use of multiplexing and in general aggregate traffic of a fairly small number of streams is that modeling and prediction of traffic shape is not easy to do. On the other hand, by not using a multiplexer, there may exist too many end-to-end Transmux connections that introduce large network overhead and may even cause a shortage of ports at the server (It must be noted that usually the Filter-Spec used in the resource reservation protocols use the port number as a part of flow identifier, therefore it is not easy to share them amongst several Transmux channels at the server.) Table 5 illustrates how and when FlexMux may be used for multiplexing elementary streams. It also explains the pros and cons of using FlexMux tool. Scheduling in the first version of the design( Best-effort version): Blocking Calls According to the old design (first version, no-QoS), data channel objects of the server application layer are only relaying modules. The ES resource provider calls their data plane primitives and provides them with SL packets that are later delivered to DMIF layers using blocking calls. The use of blocking calls simplified the implementation and there was no need for buffering packets in any layer. Scheduling is also automatically done by the operating system. Obviously this approach cannot provide any guarantee on anything and is totally QoS unaware. The use of blocking calls therefore is limited in the new design and is only allowed through the DMIF service layer. There are two sets of buffers before each scheduler to free us from wanting the blocking calls: one in DMIF Network Access layer and another one in the DMIF service layer (or in server application layer, depending on where FlexMux is implemented). A bad feature of DMIF: sharing network sessions between service sessions Sharing network sessions among service sessions is an optional feature of DMCF that is also supported by our implementation; this feature however is harmful for scheduling and per-flow QoS control purposes. Since this feature has almost no practical value and only adds to system's complexity, the author suggests that implementations of DMTF do not use it. This way implementing per-flow (per-TransMux) QoS control schemes will be easier.  i  101 Rate control and traffic shaping Rate control and Traffic shaping are two other issues that the data plane must deal with. The effect of these two functions on the outgoing traffic sometimes overlaps. The rate controller adjusts the overall rate of transmission (frame per sec.) while the traffic shaper handles the smallscale characteristics of traffic, such as its burst size. Rate control is a mandatory task for the system while traffic shaping is optional. These issues are illustrated below. Rate Control Rate control is a necessary element of any streaming system. The rate controller of our system uses the fast-start technique that is later described in section 5.2.2. This rate controller is not in essence different from the one already implemented in the best-effort version of our software. The only thing that must be considered is that some measures must be taken to prevent the fast start from violating the previously set QoS commitments. For example, the peak rate of the data should not exceed the previously announced peak rate (in TSpec). The traffic shaper however can adjust the rate and make sure that no QoS violation happens. This is discussed in the next section. In our implementation the rate controller was placed in the server application layer; for the proposed system it can still be implemented in the same layer. Traffic Shaping However, traffic shaping is optional for the system, we recommend the use of a token bucket traffic shaper in the DMIF layer and prior or after transmitting packets to the last scheduler (Transmux scheduler). To simplify the scheduling unit and achieve higher performance in scheduling, it is suggested that traffic shaping be performed before scheduling. However, considering the implementation consideration, large overhead due to the large number of shapers required, we may not be able to do so. In this case a shaper must be put after the scheduler. The shaper is not explicitly shown in Figure 6-4 and is hidden in the buffers before or after the Transmux/scheduler. Because IntServ uses token bucket model to characterize the traffic, we have chosen to use a token bucket shaper as the traffic shaper in our system. The next section that examines the use of RSVP in DMIF, describes how DMIF QoS qualifiers are mapped to token bucket model parameters (sections and  102  6.4 Incorporating RSVP in DMIF  DMIF, acting as the integrating framework, must incorporate RSVP to enable standard QoS signaling with the network. The reason for choosing RSVP as the QoS signaling protocol for this architecture is that RSVP is expected to become the dominant end-to-end QoS signalling protocol and is already the most popular QoS signalling protocol. IntServ (Integrated Services QoS model) that is the likeliest QoS capable service to be supported in the access network also uses RSVP as the means of interaction with end-systems. This section first gives an overview of RSVP and then discusses the incorporation of RSVP in DMIF. The overview of RSVP in this section is excerpted from RSVP RFC and some other reviews of RSVP (available at  6.4.1 Introduction to RSVP RSVP RFC (RFC 2205) describes itself as a resource reservation setup protocol designed for an integrated services Internet (IntServ) [19]. It operates on top of IPv4 or IPv6, but is not considered a transport protocol; it is rather an Internet control protocol like ICMP, IGMP or routing protocols. The RSVP protocol is used by a host to request specific qualities of service from the network for particular application data streams or flows. Routers may also use RSVP to deliver QoS requests to all nodes along the path(s) of the flows and to establish and maintain state to provide the requested service. RSVP operates for the reservation of simplex flows only, i.e., data channels that are created by RSVP are unidirectional. This is a natural choice for most of multimedia and streaming applications where media data is only sent in one direction. RSVP has been designed primarily for multicast applications, but'since unicast is a particular case of multicast, RSVP can obviously be used for unicast sessions too. It is recommended that RSVP enabled hosts and routers implement a set of mechanisms collectively called "traffic control" in order to be able to provide QoS. These mechanisms include (1) packet classification, (2) admission control, and (3) "packet scheduling" or some other link-layer-dependent mechanism to determine when particular packets are forwarded (this matter was discussed in section 6.3). 103  The "packet classifier" determines the QoS class (and perhaps the route) for each packet. For each outgoing interface, the "packet scheduler" or other link-layer-dependent mechanism achieves the promised QoS. Traffic control implements QoS service models defined by the Integrated Services Working Group (Guaranteed QoS and Controlled load services are currently defined). Admission control determines whether the node has sufficient available resources to supply the requested QoS. If admission control process accepts the reservation, parameters are set in the packet classifier and in the link layer interface (e.g., in the packet scheduler) to obtain the desired QoS. RSVP is a receiver-oriented protocol, i.e., the receiver of a data flow initiates and maintains the resource reservation used for that flow; senders are responsible for advertising the available flows. Further discussion on the objectives and general justification for RSVP design are presented in [19] and [20]. RSVP Sessions For each data flow with a particular destination and transport layer protocol, RSVP defines a concept that is called "session".  An RSVP session is defined by the triplet: (DestAddress,  Protocolld [, DstPort]). The "[ ]" means that DstPort is optional. DestAddress indicates the IP address that data packets must be sent to; it may be a unicast or multicast address. Protocolld is the IP protocol ID for the transport protocol that carries the stream. The optional DstPort parameter is a "generalized destination port"; it could be defined by a UDP/TCP destination port field, by an equivalent field in another transport protocol, or by some application-specific information. In our case where the TCP/IP protocol stack is used, this field could be TCP or UDP port. In the case where RTP is used, one of the UDP ports that correspond to the RTP session may be used. RSVP application interface A set of interfaces has been defined for RSVP. From these interfaces defined in pseudo-code format in RFC 2205, RSVP interface to application layer is the one we are interested in. How the proposed system must use this interface is one of the issues that is described in this thesis. The details of this interface are given in RFC 2205, i.e., the RSVP RFC. However, basic knowledge of RSVP suffices for understanding the semantics of the interface primitive. The terminology used for describing the use of the interface primitives is comprehensible to readers that are familiar with RSVP. 104 RSVP messaging operation RSVP signaling in conjunction with DDSP is used by the proposed system for resource reservation. A very short description of RSVP messaging is given here to provide enough background for readers that are not already familiar with RSVP operation. To learn more about RSVP operation and messaging, readers are recommended to refer to RFC's 2205 and 2210. Figure 6-7 shows a typical RSVP messaging that happens during resource reservation process. Since the only scenario of interest is the unicast scenario, we discuss the RSVP messaging process from unicast viewpoint. As it is depicted in Figure 6-7, Senders start sending PATH messages toward the client as soon as they have information about a client that is willing to receive a specific flow from the sender. The process of obtaining client information is not shown in this figure, as it is not a part of the RSVP signaling. When client receives a PATH message, it responds by RESV message that according to RSVP traverses through the same path that PATH message has just taken. The actual reservation in each node happens when the RESV message passes through that node.  1.  P A T H sent by senders  2.  RESV issued by receiver for Sender l's flow  Host r~D Router © RESV  PATH 2  Figure 6-7 RSVP Operation  Information in the PATH message that is of interest to hosts is as following: •  Sender Template describes the format of data packets that the sender will generate. This template is in the form of a filter spec that could be used to select this sender's packets from others in the same session on the same link. Sender IP address and optional UDP/TCP port number are two information fields that sender templates usually deliver. 105  •  Sender TSpec defines the traffic characteristics of the data flow that the sender will generate. TSpec does not include any QoS notion and only describes the traffic in BW and bit-rate terms. TSpec is used by traffic control to prevent over-reservation, and perhaps unnecessary Admission Control failures. A receiver uses TSpec to decide whether it wants to receive such a stream or not. Sometimes TSpec is delivered to the receiver even before the arrival of PATH messages, by some means other than RSVP (e.g., in DMIF messages), but it is still required to be included in the PATH message.  •  ADSpec is a package of advertising information [20][21]. ADSpec is used for collecting information about an established path and for end-to-end QoS verification purposes.  Information in the RESV message basically includes flow descriptors that specify which flows of a certain session are requested. A flow-descriptor on its turn includes Filter spec and Flow spec, respectively used for sender selection and QoS requirement specification. A Flow spec is composed of TSpec (Traffic specs) and RSpec (Reservation specs) that specify the requested resources and level of service (QoS service). RESV TSpec and RSpec are later described in more detail in section  6.4.2 RSVP in Co-operation with DDSP; DDSP signaling and operation was described in Chapter 3 and Chapter 4. In this section the integration of RSVP messaging in DDSP is discussed. As it is seen in Figure 6-2, the server application layer is not aware of RSVP and only talks to DMIF for all its delivery inquiries. Therefore the only layer that is involved in RSVP process is DMIF. As it has been mentioned earlier in this chapter, when the MPEG-4 application requests addition of a new channel from the DMIF layer, it specifies the QoS requirements in channel descriptor parameter (QoS Descriptor). It is then the DMIF layer that contacts RSVP process for resource reservation. Mapping Application requests (from MDIF) to DMIF requests (from RSVP process) is an issue that needs extensive discussion. In this section only the overall messaging process is discussed; the technical and detailed study of parameter translation between DMIF and RSVP is discussed in section 106  Before advancing in our study of the DDSP and RSVP messaging, it is necessary to repeat this argument that: Control plane signaling is performed using only one duplex connection, this document does not consider applying QoS techniques and issues for this connection and only concerns the data plane issues when QoS is involved. The sequence of operations that takes place to start a presentation is: 1. Acquiring information about Server and presentation URLs. 2. Performing DMIF control plane operations, service and network session setup, to establish a control plane session between client and server. 3. Creating data channels (in the data plane) for scene composition description and object descriptor streams. Signaling is done in control plane. 4. Creating data channels (in the data plane) based on the scene content for the delivery of media data. Signaling is done in control plane. From previous chapters we know that DMIF opts an out of band control plane for control signaling and DDSP messages. Since we only consider applying QoS matters to the data plane, creation of DMIF control plane and its signaling connections (first 2 stages) is not affected by the QoS-aware approach. But the part of signaling that concerns creation and manipulation of data plane channels (stages 3 and 4) needs to be revised to accommodate the use of RSVP and QoS awareness. It, however, turns out that the exchanged DDSP messages may not need to be modified; the modifications that must be applied to the channel creation procedure are in specifying when, how and by whom the RSVP process must be called for resource reservation. (Compare Figure 2-7 and Figure 6-8). From DDSP signaling viewpoint, the procedure depicted in Figure 2-7 remains intact. There are, however, profound changes to the implementation of some DDSP primitives inside each DMIF instance. RSVP messaging that is used during the channel creation must be placed somewhere in this procedure. Figure 6-8 is a revised version of Figure 2-7 that includes RSVP messaging. The channel deletion and termination process is very similar to this process and DMIF guidelines suffice for its realization. Therefore they are not discussed here.  107  Target D M I F Terminal: S E R V E R  Originating D M I F Terminal: C L I E N T Application  The App. requests a new Channel  DAI  DMIF Layer  PNI + Network + DNI  DMIF Layer  DAI  Application  DA. ChannelAdd determine whether a new networkl connection b N _ is needed If ransM uxSetul SESSION: Register  (rsp-2) create the channel  DN_ Ic hannelA dded  Notify the application] running the service  Gathering information for  RESV  m essage  PATH RESV i i RESV_ CNFRM  Channel Add done  I  •9  DA_ ChannelAdd  (rsp-4)  the app. running the service replies.  Figure 6-8 Accommodating RSVP Signaling in DDSP  Who originates the channel creation process? According to DMLF, channel creation process can be invoked by either peer in DDSP communication (DMIF peers, or Client and Server). Because our approach is a client/server approach, we have only considered the scenario in which clients can originate the channel creation procedure. It, however, will not make so much difference if the server originates the process. The only difference would be in the implementation effort that becomes more complicated in this case. Since RSVP signaling is usually hidden from the application layer, all DMIF cares about is interacting with the RSVP process interface in order to initiate and co-ordinate the QoS signalling. Figure 6-8 elaborates on this course of actions.  108  As it has been explained, it is possible that the channel creation be started by the server instead of the client. This is also the case for Transmux creation procedure. Originating Transmux creation process from the server does not make any difference, and RSVP signaling still has to take place in the order shown in Figure 6-8, that is: after the completion of Transmux setup process and before starting the ChannelAdd process in the DNA layer. RSVP session registration: initiating PATH message transmission  This subject deals with pre-RSVP messaging period when RSVP process has not sent any PATH message yet. To trigger the RSVP process to advertise its data flows by sending out PATH messages, one need to register a session with RSVP process and specify the receiver address and some other information. The availability of this information is the key factor in determining when and to where PATH messages could be sent. The very first factor that must be known before RSVP can take any action is the destination (client) address. This address in the case of our system includes: client IP address, client port number (for data reception) and the transport protocol. These are not available to the sender until it receives the TransmuxSetup message from the client. When all the required information is at hand, the DMIF layer can register the session by calling "SESSION" primitive of the RSVP application interface (see RFC 2205 for further information [20]). Post-Registration events: Implementation guidelines for DMIF-RSVP  operation  When the session is registered with the RSVP process, the next step would be sending PATH messages back to the client (remember that we only consider the unicast case here). We know that PATH message is required to carry Sender_Template and Sender_TSpec. Sender template, described before, includes Server IP address and a TCP or UDP port number that the media data is supposed to be sent from. This suggests that a PATH message, of a specific flow, cannot be sent until the transport protocol port that is supposed to transmit flow packets is identified to the RSVP process. Since the sender itself can determine this port number, there is no further delay due to Sender_Template. Sender_TSpec is another field of information that PATH messages are entailed to carry. Information required for building Sender TSpec is delivered to the sender by the pending ChannelAdd call (in DMIF QoS descriptor) or it is available locally at the sender. Therefore a PATH message can be built just after receiving a ChannelAdd call, but before its completion. It 109  is necessary that PATH messages be sent after Transmux setup has been completed and both server and client are aware of the recently created connection. Figure 6-8 explains the sequence of events. To initiate the transmission of PATH messages the DMIF instance must call the "SENDER" primitive of the RSVP-Application interface. Translating DMIF QoS descriptor to TSpec is studied in the next section. Upon receiving the response of the TransmuxSetup call, the client knows that the channel has been created and sends the pending DA_ChannelAdd call through the DN_ChannelAdded call . It then waits for the PATH message from the server. The sender does not reply the DN_ChannelAdded immediately and instead starts sending PATH messages. When the RSVP process at the receiver receives a PATH message, it informs the client DMIF instance by performing an up-call to primitive "PATH_EVENT". The client can then issue a RESV message. Both client and server remain blocked until the process of resource reservation proceeds and a success acknowledgment or error message is received. This means that the server will not leave the respond the DN_ChannelAdded request until it receives an up-call "RESV_EVENT". It then acts based on the failure or success of the admission control check at the server and confirms or rejects the Channel Add process. At the client, the call to DN_ChannelAdded remains on hold until the RSVP process invokes "RESV_CONFIRM" or "RESV_ERR" up-call. It then acts based on this call (releases the Transmux channel if RESV_ERR is called but failure in resource reservation is not tolerated) and the response to ChannelAdd request. Sending a RESV message from client is initiated by a call to the "RESERVE" primitive. Information that is carried in the RESV message is derived from DMIF QoS descriptors. This information includes TSpec and RSpec in particular. Next section studies this conversion procedure in details. According to the argument of this subsection, the only functions of DMIF that need to be modified are ChannelAdd(ed) and TransmuxSetup functions. The former must be modified in both DMIF Service and DMIF Network Access layers; the latter only exists in the DMIF Network Access layer. What if we don't need to create a new Transmux channel? If for any reason the party that is calling channel add decides to exploit flexmux tool for a specific channel and use an existing Transmux channel, the procedure described above will be 110  repeated for the TransmuxConfig primitive instead of TransmuxSetup. If QoS renegotiation is not required and the previously reserved resources can accommodate the new channel the whole process of calling a Transmux primitive is removed. How are reserved resources released?  When a ChannelDelete request from the application arrives in the DMIF layer, the DMIF layer decides whether to release the transmux channel that is associated with it or to change its configuration. If the latter is chosen, the QoS renegotiation procedure has to take place, this process is described in section 6.4.4. If the former is chosen the TransmuxRelease function of the DNI will be called. This function on its turn has to release the reserved resources. A call to the "RELEASE" primitive of RSVP interface releases the reserved resources. Filter Spec: The selected RSVP Reservation Style  One of the parameters that RSVP process requires from an MPEG-4 application is the Filter spec. The format of RSVP filter specs depends on the reservation style. RSVP defines a reservation request as a set of options that are collectively called the reservation "style". The proposed system adopts one of these reservation styles that suit its requirements. The selected style is called Fixed Field Style (FF Style). FF style creates distinct reservations for flows from different senders and secures them from other senders' interference. The Filter Spec in this case has to explicitly specify the sender's template. FF Style is also the natural choice for unicast applications that cannot share resources with other clients and servers. The reservation style is set by the receiving application that makes the reservation request. This is in practice done by calling the RESERVE primitive of the RSVP-Application interface. This primitive is described in RFC 2205, i.e., [20]; parameters, that are related to reservation style setting are 'style' and 'style-dependent-parms'. The style' parameter indicates the reservation v  style and the structure of style-dependent-parms parameter, its name implies, is dependent on the chosen style; generally these will be appropriate flowspecs and filter specs. To create the filter spec, we only need to set the style-dependent-parms of the RESERVE primitive to the sender(s) IP address (es) and port number(s). It is clear that this information is available through the previous DMIF interaction (Transmux Setup)  111  6.4.3 Mapping DMIF and RSVP QoS Descriptors DMLF QoS descriptors are set by the MPEG-4 application that requests the creation of a new channel from the DMIF layer. This parameter was described in section 6.2. Land has a format different from TSpec and RSpec that are defined by IntServ (RFC 2210). In this section, after a brief overview of IntServ QoS descriptor parameters, the conversion of these parameters to DMLF QoS notation is discussed. IntServ QoS notion (TSpec, RSpec and ADSPEC) RSVP delivers QoS information in its several data objects as opaque data. The format and syntax of these objects are defined in RFC 2210 that describes the use of RSVP in IntServ networks. The following summarizes the semantics of these objects, for a detailed description the reader must refer to the original RFC: •  The RSVP SENDER_TSPEC object carries the traffic specification (sender TSpec) generated by each data source within an RSVP session. The information delivered through this object remains unchanged while being transported in the network and is delivered to both end hosts and intermediate devices. The information in this object that is of interest to us includes: o Peak Data Rate o Minimum policed unit o Maximum packet size o Token bucket size o Token bucket rate  •  The RSVP ADSPEC object carries infoimation from either data sources or intermediate network elements and flows downstream towards receivers. Elements inside the network and on the path of stream may use and update it. Based on the type of service that is supported for a specific flow, the ADSPEC may include different set of data fields, the default general format, used for both type of services, includes: o IntServ hop count o Path BW estimate o Min Path latency o Composed MTU o Flags -> Global break bit (if set all info in ADSPEC may be invalid)  The only ADSPEC data that is specific to the Controlled-Load service is the ControlledLoad break bit. Therefore the Controlled-Load service requires no extra data field to operate properly. Guaranteed-Service, however, has a complete set of extra information to be delivered by ADSPEC: o Ctot: end-to-end composed C ( Channel capacity) o Dtot: end-to-end composed D (Delay) o Csum: since-last-reshaping point composed C o Dtot: since-last-reshaping point composed D The RSVP FLOWSPEC object (Receiver_TSpec and RSpec), delivered in RESV messages, carries reservation request (Receiver_TSpec and RSpec) information, generated by data receivers. The information in the FLOWSPEC flows upstream towards data sources and is used in intermediate devices for reservation purposes. The format of the two objects of FLOWSPEC depends on the chosen service. The chosen service is specified in RSpec parameter and based on the value of RSpec the TSpec is set: o For RSpec indicating either "Controlled-Load service" or "GuaranteedService", TSpec includes the following fields : •  Peak data rate  •  Minimum policed unit  •  Maximum packet size (MTU)  •  Token bucket size  •  Token bucket rate  o For RSpec indicating "Controlled load Service", there is no more data fields in the RSPec, but for "Guaranteed-Service" RSpec contains two extra fields (described in RFC 2212): •  Rate (or BW): the required guaranteed rate (R).  •  Slack term: the amount by which the end-to-end delay bound will be below the end-to-end delay required by the application assuming each router along the path reserves R bandwidth.  Service Specific values defined for ADSPEC may be included in RSpec too. In this case RSpec will carry the Ctot, Csum, Dtot and Dsum data fields for Guaranteed-Service. 113  It is noticeable that the traffic specification (TSpec) used in RSVP messaging includes some "token bucket" parameters. For readers that are not familiar with token bucket concept, the following section provides a brief overview. Token bucket description of traffic Token bucket concept was adopted by IETF for traffic description in QoS systems (for QoS models and services). Figure 6-9 shows the architecture of a token bucket shaper. A token bucket shaper, also referred to as the "token bucket", is usually used for regulating bursty traffics while allowing some level of burst to pass (i.e., does not eliminate the burstiness completely). Packets that arrive in the system can only leave it if there is enough token in the token bucket. If there is enough token, packets are sent immediately; otherwise they are buffered in the data buffer and wait for the token bucket to be replenished. This approach enables the shaper to pass some bursts of traffic. The extensive study of token buckets is out of the scope of this document and a keen reader may refer to [25].  p_  rate at which tokens are placed in the bucket  Token Bucket  P = capacity of the bucket  Vak P ^avg => >  >  stability and bandwidth utilization  X peak  P o  Data Buffer  X avg  Regulator  Figure 6-9 Token-Bucket Traffic Shaping and Modeling  Token bucket parameters are seen in the figure above. They are: X: Data rate p: token rate (3: bucket size Now that all required parameters from DMLF and RSVP have been explained, the next section will discuss how DMLF and RSVP (IntServ) QoS parameters are mapped to each other. 114 Mapping DMIF and RSVP parameters From the set of specification objects provided by IntServ model, TSpec and RSpec are those of direct use and interest to DMIF. ADSpec contains important information that can be used by DMIF QoS monitoring functions. Since the channel addition procedure, described before, is of primary interest to us, we concentrate on mapping TSpec and RSpec to DMIF descriptors. TSpec: Peak Data Rate  Peak Data Rate is simply mapped to the MAX_BITRATE parameter present in DMIF QoS Descriptor (QoS_QualifireTag). It is usually set to the maximum throughput of link or the channel capacity (bitrate) "C". TSpec: Minimum Policed Unit  The size of packets that are transmitted to an IntServ network must not be less than what this parameter specifies. This parameter is used for overhead calculations in the IntServ network. It is important to note that flows with smaller packets are considered to have more overhead, therefore their reservation request is more likely to be rejected [21]. This parameter does not have a counterpart in DMIF Qualifier tags set. Therefore it is the responsibility of DMIF layer to assign a reasonable value to this IntServ parameter. For example, for an audio stream with fixed packet size of 120 bytes delivered using RTP, Minimum policed unit will be 120 + 20 (IP header) + 40 (RTP header) => 180 bytes. TSpec: Maximum Packet Size  Maximum packet size indicates the largest packet size that the traffic source generates. A reservation request is rejected if its maximum packet size is greater than the path MTU. Path MTU, the maximum transmission unit is a link layer restriction on the maximum number of bytes of data in a single transmission. In another word, it specifies the upper bound on the size of packets that can be delivered without IP fragmentation. None of IntServ services (guaranteed and controlled-load) allow packet fragmentation. A DMIF Qualifier tag, MAX_PDU_SIZE, exactly represents this factor and can be directly mapped to Maximum Packet Size. It is clear that to make a successful reservation request one should set this value to be less than path MTU. For this purpose, the end host must obtain the MTU somehow. RFC2210 explains this matter comprehensively and offers the required 115  methodology. A QoS-aware server shall then use the M T U to guide its MPEG-4 Sync-Layer to do the packetization in an acceptable manner and generate packets of size not exceeding M T U .  TSpec: Token Bucket Rate The token rate in a token bucket shaper determines the average rate of data passing through the shaper; therefore the token bucket rate cannot be less than the average data rate. If the rate of transmission is more than the average data rate, some packets may have to wait in the data buffer, but there is no need for waiting if there are enough tokens in the token bucket. The following formulizes this argument: Token Bucket Rate = Average Data Rate = A V G J B I T R A T E  TSpec: Token Bucket Size Assigning the token bucket size is not trivially done since there is no such parameter in DMIF QoS notation. The token bucket size usually denotes the burst size that the shaper accepts. Bursts of data greater than this value will be reshaped. For our streamed data, the maximum burst size is the maximum access unit size (MAX_AU_SIZE). Access units are packetized into S L packets, and S L packets that belong to the same access unit are sent consecutively. If the token bucket size does not allow all of these S L packets to get passed successively, some of them will be delayed until enough token is gathered to get them passed, in another word the burst is reshaped. So the token bucket size can be set to a value ranging from M A X _ S L _ SIZE (Maximum size of Sync Layer packets) to M A X _ A U _ S I Z E . It is possible to use values greater than M A X _ A U _ S I Z E , but it is not recommended. The reason is that a smaller bucket size is less demanding and the reservation request has more chance to be accepted. The above argument is summarized here:  Token bucket size for No Delay Shaping (more demanding): > M A X _ A U _ S I Z E Token bucket size for Optimum Shaping (less demanding): < M A X _ S L _ S I Z E The  values  of  MAX_AU_SIZE  or  MAX_SL_SIZE  can  be  delivered  through  the  MAX_PDU_SIZE field of the DMIF QoS_QualifierTags. The less the token bucket size is, the more the possibility of success in QoS admission control would be. If introducing some delay to the presentation is acceptable, a smaller token bucket size 116  can be achieved for the stream by dispersing the burst of traffic in time. The rest of this section describes how this is done. The argument given below is for a typical video stream, and can be easily extended to other kinds of streams. The assumptions in this argument are as following: 1. The video stream is composed of a sequence of frames, each sequence is made up of one I frames followed by " n " P frames. The sizes of these frames are assumed to be " I " and " P " bytes respectively. 2. The interval between two successive frames is constant, and denoted by T. 3. The serializing (or transmission) time is assumed to be negligible to simplify the analysis; (this assumption is true for most cases). 4. The token bucket shaper that shapes the video stream has a token rate 'R' equal to the stream rate. If we assume that the T fame is split into'm' smaller packets of size 'K' (which K > P) and the token bucket size is also set to be "K", then the token bucket shaper imposes the delay "D" to the stream delivery. Figure 6-10 illustrates where this delay comes from. In fact the time needed for the reconstruction of the initial I frame from K frames, introduces this delay. If, the token bucket size were equal to "I", as it was when "I" was not fragmented, this delay would have been zero. To calculate "D", we first have to compute the time needed to fill a token bucket of size "K" a few times! Remember that the token rate was equal to R and from our assumptions about the video traffic pattern we have: R =  I + n.P  (n +1)7/  Reminding that "I" was fragmented into m "K" packets, we should calculate the time for filling the token bucket "m-1" times. Assume that when the first "K" packet arrives, the bucket is full, so there remain 'm-1' packets to get passed the shaper. Based on this reasoning we have: ^  K  ,  ,(m-\)K-(n  D= — x(m-l) = R  + \)T  — I + nP  „  (I + nP)-D  or Token Bucket Size =K = — (m -1) • (n +1)7/  When the acceptable delay is known for a particular stream, the equation above, helps in determining the minimum acceptable value for token bucket size (K). An extensive study of token bucket modeling is found in the research paper: [26]. The token bucket rate and size, their relation with other parameters that describe the traffic as well as optimum values for them in some special situations are described in the paper. The method for 117  calculating optimum value of token bucket size, given in this section, suffices the purpose of this research, though a more accurate approach can be found in [26].  I  One I frame, n=4 P frames  'tK.  UP  4  T  •  1  1  1 fl 1 1  m=3 K packets, n=4 P frames  •S  1  1  >  Delay, introduced by traffic shaping Figure 6-10 Fragmenting I Frame to reduce the Required Token Bucket Size  RSpec: Rate and Delay Slack Term Rate and Delay Slack Term that are carried in RSpec specify the bandwidth and delay requirements of the stream. The value of Rate can simply be mapped to DMEF's AVG_BITRATE, but for slack term the mapping is a bit complex. Slack term is a parameter that may be used by intermediate nodes to compute a lower level of resource reservation. This is the difference between the delay guarantee needed by the application and the delay guarantee that can be guaranteed by reserving bandwidth R. If this difference is positive (i.e. there is slack) then intermediate nodes may use some or all of it. The end-to-end delay required by the application is delivered to the DMIF layer in the MAX_DELAY parameter. One can simply set the slack term to zero. But to benefit from relaxing QoS requests, it is better to use the slack term. The slack term can be derived from the following equation (a more precise argument is found in RFC 2214): a,  i  i  T~\T *T  A\7  Slack term - Max { (MAX_DELAY -  TokenBucketSize  Got  +—  _  + Dm  .  . ,  ), 0 }  „  .  _  Equation (6.1)  In the above equation, R is the reserved rate for the flow, i.e., the token rate. C, and D are ot  tot  available through ADSpec messages. 118  Table 6 D M I F R S V P QoS Notation Mapping for Multiplexed Streams  QoS_QuaIilier  Multiplexed QoS_Qualifier Value  AVG_DELAY  min(AVG_DELAY i)  MAX_DELAY  min(MAX_DELAY i)  LOSS_ PROBABILITY  min(AVG_DELAY i)  jnTER_TOLERANCE  min(jrrTER_TOLERANCE i)  MAX_PDU_SIZE  max(MAX_PDU_SIZE i)  A VG_B ITR ATE  AVG_BITRATEi  M AX_B ITR ATE  £ MAX_BITRATE,  Multiplexed Streams: The mapping above was for elementary streams that are not multiplexed with other streams. For multiplexed streams the story is a bit different in that the QoS qualifiers that are use by DMIF are extracted from individual qualifiers. Table 6 explains how the new QoS qualifier is produced for the multiplexed stream.  Summary Table 8 summarizes the mapping that has been presented in this section. Table 7 D M I F R S V P QoS Notation Mapping  Tspec  RSpec  DMIF QoS Qualifier  Token Rate  Rate  A VG_B ITR ATE  Other, DMIF MAX_SL_SIZE to  Token Bucket Size •  MAX_AU_SIZE M AX_B ITR ATE  Peak Rate  default: 128  Minimum Police Unit MAX_PDU_SIZE  Maximum Packet Size Delay Slack Term  Max{ 0 , (MAX_DELAY -  TokenBucketSize  +  Cwt _  . .  + D,ot) }  R R LOSS_PROBABLLITY  Loss Probability  JITTER_TOLERANCE  Tolerable Jitter  AVG_DELAY  Latency 119  6.4.4 QoS Renegotiation QoS renegotiation and dynamic resource management (reserving resources during the lifetime of a presentation) are necessary capabilities for a QoS-aware MPEG-4 streaming system. The need for such capabilities is in fact raised by the object-based nature of MPEG-4 that allows addition and deletion of streams in an ongoing presentation. The unpredictability of interactive applications such as MPEG-4 prevents us from adopting a static QoS provisioning method. RSVP allows receivers to change their reservations and senders to change their traffic descriptors dynamically [20][21]. From our system's standpoint, QoS renegotiation is done by client's calling RSVP 'RESERVE' primitive with modified parameters. The server may also call 'SENDER' primitive of RSVP to modify traffic characteristics, but it is the client that finally invokes the reservation process. To provide the client with the necessary information 'PATH' messages may be used, but it does not seem to be sufficient for our purposes and we might have to use the DMIF primitive for Transmux configuration to convey the necessary information (DN_TransmuxConfig). A new primitive may also be added to DMIF to enable modification of the QoS descriptors of an existing DMIF channel. "DA_ChannelReneg" is such a primitive. The next section briefly examines the addition of this primitive to DMIF and suggests what the semantics of this primitive should be. Definition of DAI_ChannelReneg (): A Proposal The process of QOS renegotiation and its implementation using currently available DMIF and RSVP functions has been explained above. It was also said that the procedure must somehow start with a DMIF request or a call to DN_TransMuxConfig primitive. The semantical definition of this function is seen below: DN_TransMuxConfig[Callback] (IN: networkSessionld, loop(TAT, ddDatalnO); OUT: loop(response)) This primitive can be used to reconfigure an established network channel and invoke the process of QoS renegotiation; therefore the need for another Channel Renegotiation primitive may dim. Nevertheless, by only using TransmuxConfig, it will not be possible to modify the data channel attributes above DMEF layer, nor it will be possible to explicitly ask DMIF layer to modify a channel. Therefore we need to have a DAI primitive that performs similarly to  120  DNI_TransmuxConfig. This primitive could be DAI_ChannelReneg. The following is my suggestion for this primitive's semantic form: DN_ChannelReneg[Callback] (IN: networkSessionld, serviceld, loop(CAT, ChannelDescriptor [includes qosDesc ], ddDataIn()); OUT: loop(response, TAT, ddDataOut())) The TAT field that exists in the response data structure of the function is used to inform whether a new Transmux channel has been created for the data channel or the old one is still valid. ddDataln and ddDataOut are used to inform the end-system (usually server) about how the multiplexing must be done and what kind of measures should taken in the DMIF layer (flexmux) and above in order to get the requested result. Other parameters that are present in this function are required for normal operation of DMLF.  6.4.5 Implementation and Evaluation A simplified version of the system proposed in this chapter has been implemented to examine the benefits of QoS-awareness in a streaming system. This version of the system takes advantage of the QoS functionality that is provided by Windows 2000 operating system. Windows 2000 provides some QoS services according to the IETF model (Guaranteed, Controlled Load, etc.). To use these services, the implementation utilized a generic QoS interface (GQoS) to interact with the windows socket-programming library (Winsock2). The library invokes RSVP signaling internally and reserves network and system networking resources for the requesting application. The major modification to the implementation came in the DNA layer of the server and client where DN_TransmuxSetup function is implemented. By modifying this function, it became possible to create QoS-enabled Transmux channels using Winsock2 primitives. The overall procedure of channel creation, therefore, has changed to reflect the enhancements depicted in Figure 6-8. This way the implementation can be called QoS-aware. To examine some of the advantages of using QoS services, a few experiments were conducted using the QoS-aware implementation. In these experiments a simple MPEG-4 presentation (4 . streams) was delivered over a congested network. In one experiment, there was no resource reservation and the traffic was treated as a best-effort flow (Figure 6-11). In another experiment Guaranteed Service level was used for delivering the presentation (Figure 6-12). Quality degradation for the first case was discernible. 121  The results of these experiments are depicted in Figure 6-11 and Figure 6-12. These figures present the arrival time of packets versus experiment time (that corresponds to sequence numbers of a perfectly transmitted stream). The arrival time for missing (lost) packets was set to zero; therefore the vertical lines in the figures indicate the moments that packet loss occurred. As it can be seen, for best effort traffic there were some packet loss during the congestion period; but for guaranteed service traffic there was no packet loss. •Stream  Arvl.  Tim e ( B e s t  Effort)  4500 4000 3500 3000 E  i-  25 00 2000 15 0 0 10 0 0 500  *-  cy  S  CM  CJ  O)  CD  C\J  co  co  r--  oo  S e q u e n c e  N u m b e r  Figure 6-11 Packet Loss for Best Effort Traffic -ArrivalTime  S e q u e n c e  forGuaranteed  Num ber (Experim  Q o S  Stream  e n t T i m e)  Figure 6-12 No Packet Loss for Guaranteed Service Traffic  Another series of observation were made to measure the jitter margin for packets. Figure 6-13 and Figure 6-14 depict the results of this experiment (the experiment was done in a slightly different situation; results are not comparable to Figure 6-11 and Figure 6-12). The vertical axis shows the decoding time stamp less the arrival time. This value indicates the Jitter margin or the time that the packet arrived before its deadline. Therefore negative values translate to missed deadline. As it can be seen in Figure 6-13, network congestion causes some packets of the best effort traffic to miss their deadlines. Level of reaction to congestion period is also greater for the 122  best effort traffic and can be seen in the form of increased jitter that eventually caused deadline miss. O n the other hand congestion could not harm the Guaranteed service traffic and there was less jitter and no missed deadline (Figure 6-14). The experiments, described above, were done i n an Ethernet network. T o emulate a congested network, A few transmitter/receiver applications were used to transmit huge amounts of data at a high bit rate over the network. Some of these applications were placed on the same machine (both inside and outside server application) where server was running on. This way internal network access congestion could be experienced too.  - Best Effort Congestion Period 250 .  J«  ~ E,  1  5  n—"MT n rTrT TrTTY '^m T  0  100  «  n  ,,  1  =1E  200 r  T  """-'rf—J '' ,'""111^/—' 1  1 1 1  n — j  — —MMIIMVT'  i —-—  1  teWRntrr ' " " " M l  TTlf  50  ^ o -50  - N t o r o m i - N n o i i ^ i n r u r - i O T L n m o r- N n n t I D <o  U ) i - S n o i l O ' - N O ) 0 1 B ) ' - N n o i W r N t 0 0 1 l i ) r S o o c o f O ^ m c O T f ^ m c o T f c x i m N . T f r k i r i i s . i n r g r j o o i r ( O S t t O I O I O ' - N N n ' t l O l O I D S f l O O l O l O r N N K  -100 -150  Sequence Number Figure 6-13 High Jitter, Missed Deadlines for Best Effort Traffic  - Guaranteed Seruce Congestion Period 200  ^ IT  1  5  0  E  IT 100 50  i - r ^ p j c n i O T - r ^ c o c o m i - r ^ c o c o w r r- co en io i c o o m i - t ^ c o c n i n i - r " i - cn C D ' - • I— LO CM O OO CO CO 00 CD CO T- OO CD m- cCM o o c n c n o ^ c M C M c o ^ i O L O C D i ^ -c o- •a >— c n- O T \ | oC Mcoc m o i - C M c o c o - t f i n c o c o t ^ Sequence Number  Figure 6-14 No Deadline missed, Low Jitter for Guaranteed Service  123  Chapter 7. Summary and Conclusions  Advancements in multimedia and signal processing technology as well as the rapid growth of Internet as a communication means led to the development of a new subject known as Multimedia Communications and Networking; a subject that can be viewed as the convergence point of Multimedia and Communications technologies. MPEG-4 is considered to be the multimedia technology of our time; hence it became of undoubted importance to provide a solution for the delivery of MPEG-4 content over the most common communication means, i.e., the Internet. This thesis examined the issues pertaining to the design and implementation of a QoS-aware MPEG-4 streaming system.  7.1 Summary of Contributions This thesis described my contributions to the field of Multimedia Communications and Networking. The major contributions of this dissertation are summarized below: (1) Designing a novel object-based system architecture for MPEG-4 streaming over the best-effort Internet: The object-based structural design of the client/server streaming system has many novel aspects. The unique feature of this system in comparison to other open-source or released systems (by the time of this writing) is its focus on DMIF that plays the major role. The object based architecture features easy implementation by making the design extremely flexible and to some extent scalable. The flexibility of this system is facilitated by the neat use of objects for representing communication concepts.  124  (2) Implementing the proposed design and providing a platform for real experimentations on DMIF by contributing the source code to IM1: The implementation of the design for the best-effort Internet was the first open-source implementation of this part of DMIF. The implementation effort was eased by the clever design that was done in the first stage. The implemented modules include: DMIF instance for remote retrieval that was used as a plug-in DLL to the MPEG-4 reference software IM1, and the complete implementation of the server. The implementations are entirely DMIF compliant. Both modules of our implementation are now part of the Evil software and the client part (DMIF Instance for remote retrieval) is on the path of standardization at the time of this writing. This implementation is one of the most important contributions of this research. (3) Designing and implementing a novel rate control technique: The fast-start rate-control technique that was designed and implemented for the MPEG-4 streaming system was a novel technique; it was not derived from previous works in this field, as there was not so much information published about this topic. The rate controller was extensively described in this thesis. Though some commercial streaming products already use a technique with the same name, but the design in this research is a new work with some unique features and considerations. (4) Proposing and implementing a method for the utilization of RTP in DMIF, A version of the system that used RTP in the data plane was implemented and tested. This implementation was done to explore the probable problems in employing RTP in the data plane. Revisions to the implementation have been mainly done in the control plane; the data plane simply utilizes the most general RTP payload format defined for MPEG-4. This way it is possible to evaluate different suggested payload formats. (5) Proposing and partially implementing a system architecture for QoS-aware MPEG-4 streaming as well as a method for incorporating RSVP in DMIF. Using the results and experience of the first design and implementation, I have designed a new streaming system that considers QoS issues. This design adopts some modifications toward QoS awareness and in some cases capability. The use of schedulers and traffic shapers in the server ensures that end-system's task in providing network-related QoS is surely done. By exploiting 125  RSVP, end-systems, clients and servers, are able to interact with the QoS-Internet and get quality services. In fact "QoS-awareness" is denoted when the client and server are RSVP-enabled. Scheduling and Traffic shaping are some additional features that help in providing end-to-end QoS. Mapping RSVP's signaling and QoS specification to DMIF's was the major task in this proposal. A partial implementation of this design was done to evaluate the advantages of QoSawareness for the streaming system.  7.2 Future Research Directions The topic "MPEG-4 streaming over the Internet" is gradually getting mature as more and more research is conducted on it. But to perfect this work there is still a long way to go. Some open issues and future development possibilities are enlisted and explained below. Finalizing and standardizing the methods for MPEG-4 delivery over RTP Transporting MPEG-4 over RTP is still a research topic and its prospect does not suggest that a quick resolution is achievable. There are practical solutions right now but there are still other options that haven't been explored. Our implementation of the RTP-enabled streaming system provides the necessary platform for exploring these options and other possibilities. Implementation of a platform-independent QoS-aware streaming system Implementing the QoS aware streaming system that was proposed in the last part of this thesis would be the ultimate effort in evaluating this design and its practicality. However a partial implementation using MS-Windows 2000 QoS facilities has been done, but it is better to implement a platform-independent system that uses its own scheduler and QoS signalling methods. This way, the system will have more control on its modules. Also, there is certainly need for some modifications to this proposal that will not be exposed without a complete implementation. RSVP is the QoS signaling protocol that was used in the proposal. Currently (at the time of this writing) there are some new proposals for the QoS network architecture that do not give RSVP the role of end-to-end signaling protocol. If RSVP loses its role in the QoS-Internet, the proposed QoS-aware system has to adopt the replaced protocol or method. Nevertheless, the token-bucket 126  modeling and parameter mapping presented in this thesis will probably remain intact as they are the models used in IntServ, and IntServ model is expected to survive at least for a while. The proposed architecture is also RSVP-independent and is expected to remain intact in case RSVP becomes obsolete. The architecture benefits from a generic QoS-aware design with DMIF being its key component. Simplifying DMIF and developing a lightweight DMIF Our implementation of DMIF showed that DMIF has many signaling features and parameters that may never be used in practice. These features also made the implementation very complex. DMIF data structures are such parameters that in author's opinion are more complex than needed. Reducing the complexity of DMIF and adjusting it for lightweight implementation could be a possible research area worth to explore.  7.3 Concluding Remarks Fortunately the results of our research in the field of MPEG-4 streaming over IP network have received significant attention from the research community. Our further explorations in this field on topics such as QoS and Rate Control added to our previous work of implementation and design of an MPEG-4 streaming system and formed this thesis. I tried to give each part of this thesis a share in proportion to the amount of work it took. To avoid lengthiness, I did not proceed on documenting the low level details of implementation in this thesis as they are platform-dependent and are always under revision. The implementation is well documented [1][2][3][15] and our research group's website [45] contains up-to-date information in this regard. I hope this thesis helps the efforts toward the realization of a complete multimedia application for QoS Internet.  127  Bibliography [I]  Y. Pourmohammadi, K. Asrar-Haghighi, A. Kaheel, H. Alnuweiri, S T . Vuong "On the design of a Q o S aware M P E G - 4 multimedia server", IEEE International Symposium on Telecommunications (IST2001), pp 149-153, Sep. 2001.  [2]  Y. Pourmohammadi, K. Asrar-Haghighi, A. Mohamed, H. Alnuweiri, "Streaming M P E G - 4 over IP and Broadcast Networks: DMIF based architect-ures", Packet Video 2001 P r o c , pp. 218-227., May 2001.  [3]  K. Asrar-Haghighi, Y. Pourmohammadi, H. Alnuweiri, "Realizing M P E G - 4 streaming over the Internet, A client/server architecture using DMIF." ITCC2001, Apr 2001.  [4]  Coding of Audio-Visual Objects - Part 1: Systems, ISO/IEC 14496-1 International Standard, ISO/IEC JTC1/SC29AA/G11 N2501, March 2000.  [5]  "Coding of Audio-Visual Objects - Part 6: Delivery Multimedia Integration Framework (DMIF), ISO/IEC 14496-6 International Standard, ISO/IEC JTC1/SC29AA/G11 N2501, March 2000.  [6]  "MPEG-4 Overview", ISO/IEC JTC1/SC29AA/G11 N4030, March 2001, (also available at  [7]  A. Puri, T. Chen, « Multimedia Systems, Standards, and Networks », book Mardel Dekker Inc. New York, Basel 1999.  [8]  G . Franceschini, "The Delivery Layer in M P E G - 4 , " Signal Processing: Image Communication, vol. 15, pp. 347-363.  [9]  H. Kalva, L. Tang, J . Huard, G. Tselikis, J . Zamora, L. Cheok, and A. Eleftheriadis, " Implementing Multiplexing, Streaming, and Server Interaction for M P E G - 4 " , IEEE Transaction on Circuits and Systems for Video technology, Vol 9, No.8, Dec. 1999.  [10] T. Ebrahimi, C. Home, " M P E G - 4 natural video coding", Signal processing: Image Communication 15 (2000) 365-385. [II]  R. Koenen, "Profiles and levels in M P E G - 4 : Approach and overview", Signal processing: Image Communication 15 (2000) 463-478.  [12] H. Kalva "Delivering M P E G - 4 Based Audio-Visual Services" Kluwer Academic Publishers Boston/ Dord/London. [13] J . Huard, G . Tselikis, "An Overview of the delivery Multimedia Integration Framework For Broadband Networks", IEEE communications Surveys vol. 2 no. 4 Fourth quarter 1999.  128  [14] 0 . Avaro, A. Eleftheriadis, C. Herpel, G. Rajan, L. Ward, " M P E G - 4 Systems: Overview", Signal Processing: Image Communication 15 (2000) p281-298. [15] K. Asrar Haghighi, " M P E G - 4 delivery: DMIF based Unicast and Systems", Master's Thesis, University of British Columbia, October 2001  Multicast  [16] Information Technology - Generic Coding of Moving Pictures and Associated Audio: Digital Storage Media Command and Control - ISO/IEC 13818-6 International Standard, ISO/IEC JTC1/SC29/WG11 MPEG96/N1300p1, July, 1996. [17] M. Handley, V. Jacobson, " S D P : Session Description Protocol" R F C 2327. [18] J . Crowcroft, M. Handley, I. Wakeman, « Internetworking Multimedia », ISBN: 1558605843, November 1999, Morgan Kaufman Publishers Inc. [19] R. Braden, R. Clark, D., and S. Shenker, "Integrated Services in the Internet Architecture: an Overview", ( R F C 1633), ISI, MIT, and P A R C , June 1994. [20] R. Braden, L. Zhang, S. Berson, S . Herzog, S. Jamin, "Resource ReSerVation Protocol (RSVP) - Version 1 Functional Specification" ( R F C 2205), S e p 1997 [21] J . Wroclawski, "The Use of R S V P with IETF Integrated Services", (RFC 2210), September 1997 [22] S . Shenker, C . Partridge, R. Guerin "Specification of Guaranteed Quality of Service" ( R F C 2212) , Sep 1997 [23] K. Nichols, S. Blake, F. Baker, D. Black, "Definition of the Defferentiated Services Field (DS Field) in the IPv4 and IPv6 Headers," RFC2474, Dec. 1998. [24] S. Black et al., "An Architecture for Differentiated Services," RFC2475, Dec. 1998. [25] S. Keshav « Engineering Approach to Computer Networks », ISBN: 0201634422, May 1997, Addison Wesley Longman [26] P. Tang, T. Tai "Network traffic characterization using token bucket model," IEEE INFOCOM'99, 1, pp.51 -62, New York, Apr 99 [27] H. Schulzrinne, Casner, Frederick, Jacobson R T P : A Transport Protocol for Real Time Applications R F C 1889, Internet Engineering Task Force, January 1996. [28] Civanlar, Casner, Herpel, " R T P Payload Format for M P E G - 4 Streams," Internet Draft draft-ietf-avt-rtp-mpeg4-03.txt [29] Y. Kikuchi, T. Nomura, S . Fukunaga, Y. Matsui, H. Kimata,, " R T P payload format for M P E G - 4 Audio/Visual streams," Internet-Draft draft-ietf-avt-rtp-mpeg4-es-04.txt [30] P. Gentric, et al, " R T P Payload Format for M P E G - 4 Streams", Internet Draft: draftgentric-avt-rtp-mpeg4-00.txt, November 2000  129  [31] Balabanian, Nortel, Internet draft "The role of DMIF in support of R T P M P E G - 4 Payloads", draft-balabanian-rtp-mpeg4-dmif-00.txt, September 16,1998 [32] D. W u , Y. Hou, W. Zhu, H. Lee, T. Chiang, Y. Zhang, H. Chao, "On End-to-End Architecture on Transporting M P E G - 4 Video over the Internet," IEEE Trans, on Circuit and Syst. for Video Tech., Vol. 10, No. 6, September 2000, pp. 923-941. [33] Balabanian, Nortel, Internet draft" The Role of DMIF with R T S P and M P E G - 4 " draft-balabanian-rtsp-mpeg4-dmif-00.txt Sept. 22,1998 [34] G. Franceschini, "Improvements November 2000.  to the S L , " Proposal to M P E G  committee,  [35] M. Karam, F. Tobagi "Analysis of the Delay and Jitter of Voice Traffic over the Internet" [36] P. Tang and T. Tai, "Network Traffic Characterization Using Token Bucket Model", lnfocom'99 . [37]  R. Landry, I. Stavrakakis "Study of Delay Jitter With and Without Peak Rate Enforcement"  [38] M. Alam, M. Atiquzzaman, M. Karim "Efficient M P E G video traffic shaping for the next generation internet", Globecom'99, pp364-368 [39] Van Jacobson, K. Nichols, K. Poduri, Internet draft draft-ietf-diffserv-pdb-vw-00.txt; "The 'virtual wire' per-domain behaviour" [40] H. Lee, T. Chiang, and Y. Zhang, "Scalable Rate Control for M P E G - 4 Video," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, No. 6, September 2000, pp. 878-894. [41] A. Betro, H. Sun, and Y. Wang, " M P E G - 4 Rate Control for Multiple Video Objects," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, February 1999, pp. 186-199. [42] W. Ding and B. Liu, "Rate Control of M P E G Video Coding and Recording by RateQuantization Modeling," IEEE Trans, on Circuits and Syst. for Video Tech., Vol. 6, February 1996, pp. 12-20. [43] Z. Zhang, S . Nelakuditi, R. Aggarwa, and R. Tsang, "Efficient Server Selective Frame Discard Algorithms for Stored Video Delivery over Resource Constrained Networks," IEEE I N F O C O M , March 1999, pp. 472-479. [44] C. Herpel, A.EIeftheriadis, " M P E G - 4 Systems: Elementary stream management" Signal Processing: Image Communication 15 (2000) p299-320. [45] U B C M P E G - 4 Streaming System, website :  130  APPENDIX A. DMIF Application Interface This appendix (Excerpted from DMIF document, ISO/TEC 144496-6) lists the primitives specified for the DMIF-Application Interface, and describes the parameters used. The only parameter with fixed syntax is the URL defined in Annex C of DMIF document, also given below. The C++ like formalism used to describe the primitives only aims at capturing their semantic meaning. IN and OUT keywords allow to clearly distinguish the parameters provided to the other side of the interface from those returned from it; by no means they are meant to force a synchronous implementation. The loop() construct allows to concisely represent an array of elements. The following describes the semantics of the parameters used in DAI (The primitives are described later): appDataBuffer: is the data conveying application information. In the case of MPEG-4 this data shall be used to convey SYNC layer information. appDataLen: is the length of the appDataBuffer field. ChannelDescriptor: is a parameter set by the DMIF User containing the complete description of the Quality of Service requested for a particular channel as well as possibly application specific descriptors. Its semantic definition is given in the QoS chapter of the thesis. channelHandle: is a local identifier that uniquely identifies a channel in the application space, no matter how many services the application attaches to, or how many DMIF Instances it is using. In this interface specification the channelHandle parameter is set by the DMIF Instance, however it would be also acceptable if it were set by the DMIF User. The algorithm used by the DMIF Instance (User) to set the channelHandle is a matter that does not affect this interface. direction: Indicates the direction of the channel, either UPSTREAM i.e., from the receiver to the sender or DOWNSTREAM i.e., from the sender to the receiver. errorFlag: is a flag that indicates whether an error has been detected (but not corrected) on the streamDataBuffer. mode: this parameter, also contained in the qosMode structure, indicates the type of QoS mode. The set of modes is summarized in the QoS chapter of thesis.  131  parentServiceSessionld: is a local identifier that uniquely identifies the Service Session whose URL is to be possibly used to expand the relative URL of a newly requested Service Session. qosMode: this is a parameter set by the DMIF User containing the requested QoS Monitoring mode for a particular stream (i.e., channelHandle), as well as, associated commands with the monitoring process (i.e., start, stop, single report). qosReport: is a parameter set by the DMIF Layer containing the QoS report for a particular stream (i.e., channelHandle). This parameter shall contain the QoS profile parameters contained in the qosDescriptor parameter sent in the addition of the channel (i.e., delay and loss), as well as, additional statistical parameters to be defined in the future. reason: a code identifying the reason. response: a code identifying the response. serviceName: at the target DMIF peer it identifies the actual service. serviceSessionld: is a local identifier that uniquely identifies a Service Session in the application space, no matter how many services the application attaches to, or how many DMIF Instances it is using. In this interface specification the serviceSessionld parameter is set by the DMIF Instance, however it would be also acceptable if it were set by the DMIF User. The algorithm used by the DMIF Instance (User) to set the serviceSessionld is a matter that does not affect this interface. streamDataBuffer: is the actual Data Unit generated by the DMIF User. streamDataLen: is the length of the streamDataBuffer field. URL: within DMIF is a string that identifies a Service. Refer to Annex C of DMIF document for more information on the usage of URLs in DMIF, including the list of allowed URL schemes. The following shows the format used for UBC MPEG-4 Streaming System :  x-  dtcp://a:b@ uuDatalnBuffer: is an opaque structure providing upper layer information; it is transparently transported through DMIF from the local peer to the remote peer. uuDatalnLen: is the length of the uuDatalnBuffer field. 132  uuDataOutBuffer: is an opaque structure providing upper layer information; it is transparently transported through DMIF from the remote Peer to the local Peer. uuDataOutLen: is the length of the uuDataOutBuffer field.  DMIF-Application Interface Semantics DA_ServiceAttach () D A _ S e r v i c e A t t a c h (IN: p a r e n t S e r v i c e S e s s i o n l d , U R L , u u D a t a l n B u f f e r , u u D a t a l n L e n ; O U T : r e s p o n s e , serviceSessionld, uuDataOutBuffer, uuDataOutLen)  This primitive is issued by a DMIF User to request the initialization of a Service Session: the service is unambiguously identified by its URL which conveys information for identifying both the delivery technology being used (i.e. protocol), the address of the target DMIF peer (this may have different meanings in the different scenarios) and the name of the service inside the domain managed by the target DMIF peer -which is then referred to as serviceName. The URL, if relative, is expanded using as base URL the URL associated to the parentServiceSessionld. The DMIF User might provide additional information such as client credentials in uuDataln: this additional information is opaque to the Delivery layer and is only consumed by the target DMIF User (which is locally emulated in Broadcast and Local Storage scenarios). The target DMIF User might in turn provide additional information in uuDataOut: this additional information is opaque to the Delivery layer and is only consumed by the local DMIF User. In an ISO/IEC 14496 application the uuDataOut shall return a single Object Descriptor or Initial Object Descriptor if required by the context of this call. In case of a positive response, the serviceSessionld parameter contains the Service Session identifier that the DMIF User should refer to in subsequent interaction through the DAI regarding this Service Session. DA_ServiceAttachCallback () D A _ S e r v i c e A t t a c h C a l l b a c k (IN: s e r v i c e S e s s i o n l d , s e r v i c e N a m e , u u D a t a l n B u f f e r , u u D a t a l n L e n ; O U T : response, uuDataOutBuffer, uuDataOutLen)  This primitive is issued by the target DMIF Instance to the appropriate target DMIF User as identified through the serviceName field. The steps involved in identifying the appropriate DMIF User and delivering the primitives to it are out of the scope of this specification. 133  The target DMIF Instance also provides the serviceSessionld parameter, that contains the Service Session identifier that the target DMIF User should refer to in subsequent interaction through the DAI regarding this Service Session. The target DMIF User (the Application Executive running the service) might also receive additional information (e.g.; client credentials) through the uuDataln field. The target DMIF User might in turn provide additional information in uuDataOut: this additional information is opaque to the Delivery layer and is only consumed by the local DMIF User. In an ISO/IEC 14496 application the uuDataOut shall return a single Object Descriptor or Initial Object Descriptor if required by the context of this call. In case of a negative response, the serviceSessionld becomes invalid at the target DMIF Instance. Real implementations of this primitive are likely to support an additional parameter identifying the calling peer. DA_ServiceDetach () DA_ServiceDetach (IN: serviceSessionld, reason; OUT: response)  This primitive is issued by a DMIF User to request the termination of the service identified by serviceSessionld; a reason should be specified. The DMIF Instance returns a response. DA_ServiceDetachCaIlback () DA_ServiceDetachCallback (IN: serviceSessionld, reason; OUT: response)  This primitive is issued by the target DMIF Instance to inform the target DMIF User that the service identified by serviceSessionld has been terminated due to the reason reason. The target DMIF User returns a response. DA_ChanneIAdd () DA_ChannelAdd (IN: serviceSessionld, loop(channelDescriptor, direction, uuDatalnBuffer, uuDatalnLen); OUT: loop(response, channelHandle, uuDataOutBuffer, uuDataOutLen))  This primitive is issued by a DMIF User to request the addition of one or more end-to-end channels in the context of a particular Service Session identified by serviceSessionld. Each channel is requested by providing an (optional) ChannelDescriptor and a direction. The local DMIF User might provide additional information for each requested channel in uuDataln. This additional information is opaque to the Delivery layer and is only consumed by the target DMIF User. In the case of an ISO/IEC 14496 application the uuDataln shall always be present. For 134  each requested channel, in case of a positive response, the channelHandle parameter contains the channel identifier that the DMIF User should refer to in subsequent interaction through the DAI involving this channel. DA_ChanneIAddCallback () DA_ChannelAddCallback (IN: serviceSessionld, loop(channelHandle, ChannelDescriptor, direction, uuDatalnBuffer, uuDatalnLen); OUT: loop(response, uuDataOutBuffer, uuDataOutLen))  This primitive is issued by the target DMIF Instance to the appropriate target DMIF User as identified through the serviceSessionld field, to inform the target DMIF User that the addition of channels is requested. For each requested channel, the target DMIF Instance provides the direction to the target DMIF User. It also provides the channelHandle parameter, that contains the channel identifier that the target DMIF User should refer to in subsequent interaction through the DAI involving this Channel. For each requested channel, the target DMIF User returns a response. In case of a negative response, the channelHandle becomes invalid at the target DMIF Instance. DA_ChannelDelete () DA_ChannelDelete (IN: loop(channelHandle, reason); OUT: loop(response))  This primitive is issued by a DMIF User to delete one or more channels as identified by channelHandle; a reason should be specified. The channels need not be all part of a single Service Session. The DMIF Instance returns a response. DA_ChannelDeleteCallback () DA_ChannelDeleteCallback (IN: loop(channelHandle, reason); OUT: loop(response))  This primitive is issued by the target DMIF Instance to inform the target DMIF User that the channels identified by channelHandle have been closed due to the reason reason. The target DMIF User returns a response. DA_UserCommand () DAJJserCommand (IN: uuDatalnBuffer, uuDatalnLen, loop(channelHandle))  135  This primitive is issued by a DMIF User to send uuData that refers to channelsidentified by channelHandle. This primitive is intended to support the delivery of control information in the upstream direction. DA_UserCommandCalIback () DA_UserCommandCallback (IN: uuDatalnBuffer, uuDatalnLen loop(channelHandle))  This primitive is issued by the target DMIF Instance to inform the target DMIF User that there is uuData relative to channels identified by channelHandle. DA_Data () DA_Data (IN: channelHandle, streamDataBuffer, streamDataLen) DA_Data (IN: channelHandle, stream DataBuffer, stream DataLen, appDataBuffer, appDataLen)  This primitive is issued by a DMIF User to send streamData in the channel identified by channelHandle.The second form of this primitive is issued by a DMIF User to send streamData in the channel identified by channelHandle along with application specific appData describing the streamData. In the case of ISO/IEC 14496-1:1999 based applications appData would carry sync layer information associated to the streamData. DA_DataCallback () DA_DataCallback (IN: channelHandle, stream DataBuffer, stream DataLen, errorFlag) DA_DataCallback (IN: appDataLen, errorFlag)  channelHandle,  streamDataBuffer,  stream DataLen,  appDataBuffer,  This primitive is issued by the DMIF Instance to the appropriate DMIF User (identified through the channelHandle) and provides the streamData along with an errorFlag for that channel .The second form of this primitive is issued by the DMIF Instance to send streamData in the channel identified by channelHandle along with application specific appData describing the streamData. In the case of ISO/IEC 14496-1:1999 based applications appData would carry sync layer information associated to the streamData. DA_UserCommandAck() DAJJserCommandAck (IN: uuDatalnBuffer, uuDatalnLen, loop(channelHandle); O U T : response, uuDataOutBuffer, uuDataOutLen)  136  This primitive is issued by a DMIF User to send uuDataln that refers to channels identified by channelHandle. This primitive is intended to support the delivery of control information with acknowledgement (and possibly additional information in uuDataOut). DA_UserCommandAckCaIIback() DA_UserCommandAckCallback (IN: uuDatalnBuffer, uuDatalnLen loop(channelHandle) ; O U T : response, uuDataOutBuffer, uuDataOutLen)  This primitive is issued by the target DMIF Instance to inform the target DMIF User that there is uuDataln relative to channels identified by channelHandle; the DMIF User shall acknowledge the receipt of the command, possibly providing additional information in uuDataOut. DA_ChannelMonitor (): DA_ChannelMonitor (IN: channelHandle, qosMode; OUT: response) This primitive is issued by a user to request the initialisation, termination or single report request of a QoS monitoring process by the DMIF layer. The channelHandle identifies the stream to be monitored by the DMIF layer. The mode of the QoS monitoring is set in the qosMode parameter. The qosMode may contain additional information about the type of QoS report the DMIF user is interested in getting from the DMIF layer. The response identifies an acknowledgment (i.e., positive or negative) from the DMIF layer to a QoS monitoring request from the DMIF user. DA_ChannelEvent (): DA_ChannelEyent (IN: channelHandle, mode, qosReport) This primitive is issued by a DMIF layer to inform the DMIF user with the measured QoS for the data channel specified by the channelHandle. The mode identifies the type of QoS monitoring event. The qosReport parameter contains the QoS profile (e.g., delay and loss parameters) measured for the channel.  DMIF-Application Interface semantics The DMIF Network Interface is an optional interface in the realization of DMIF. The complete description of this informative interface is found in the DMIF standard document [5]. For reader's reference, however, DNI primitives are enlisted below, note that the callback functions are the target DMIF side counterparts of the DNI primitives: DN_SessionSetup [callback] (IN:  networkSessionld,  calledAddress,  callingAddress,  compatibilityDescriptorln; OUT: response, compatibilityDescriptorOut) 137  This fucniotn is issued by the Originating DMIF to establish a Network Session with the Target DMIF. This is the first action performed in establishing a relation between two peers. DN_SessionRelease[Callback](IN: networkSessionld, reason; OUT: response) DN_SessionRelease() is issued by a DMIF peer to close all relations with the other peer. DN_ServiceAttach[Callback](IN: networkSessionld, serviceld, serviceName, ddDataIn(); OUT: response, ddDataOut()) DN_ServiceAttach() is issued by the Originating DMIF to establish a Service Session with the Target DMIF. This Service Session is established inside a previously established Network Session (created by DN_SessionSetup). DNjServiceDetach[Callback] (IN: networkSessionld, serviceld, reason; OUT: response) DN_ServiceDetach() is issued by the Originating DMIF to detach a Service Session previously established with the Target DMIF. DN_TransMuxSetup[Callback](IN: networkSessionld, loop(TAT, direction, qosDescriptor; resources()); OUT: loop(response, resources())) DN_TransMuxSetup() is issued by the Originating DMIF to establish one or more Transmux Channels inside a Network Session previously established with the Target DMIF. From the set of parameters, we are especially interested in the qosDescriptor that is set based on the information contained in the qosDescriptors passed in the DA_ChannelAdd() and related to the Elementary Streams being carried in the Transmux Channel. DN_TransMuxRelease[Callback] (IN: networkSessionld, loop(TAT); OUT: loop(response)) DN_TransMuxRelease() is issued by a DMIF peer to close all logical channels making use of the one or more indicated Transmux Channels. DN_ChannelAdd [Callback]  (IN:  networkSessionld,  serviceld,  loop(CAT,  direction,  ChannelDescriptor, ddDataIn()); OUT: loop(response, TAT, ddDataOut())) DN_ChannelAdd() is issued by the Originating DMIF to open one or more logical channels inside a Service Session. For each logical channel to be established, a tuple of parameters is provided, some of which is derived from parameters passed in the DA_ChannelAdd() and related 138  to the Elementary Stream being carried in the logical channel. For example, the ChannelDescriptor  contains the complete description of the Quality of Service requested for a  particular channel as well as possibly application specific descriptors. It is set based on the information contained in the related ChannelDescriptor passed in the DA_ChannelAdd(). DN_ChannelAdded[Callback](IN:  networkSessionld,  serviceld,  loop(CAT,  direction,  ChannelDescriptor, TAT, ddDataIn()); OUT: loop(response, ddDataOut())) DN_ChannelAdded() is issued by the Originating DMIF to notify the Target DMIF peer that one or more logical channels inside a Service Session were added. It is basically similar to DN_ChannelAdd, but is issued by the originating DMIF peer (e.g., client). In our implmentation we only needed to implement this function rather than DN_ChannelAdd. DN_ChannelDelete[Callback](IN: networkSessionld, loop(CAT, reason); OUT: loop(response)) DN_ChannelDelete() is issued by the Originating DMIF to close one or more logical channels previously established inside a Network Session. DN_TransMuxConfig[Callback](rN:  networkSessionld,  loop(TAT,  ddDataIn()); OUT:  loop(response)) DN_TransMuxConfig() is issued by the Originating DMIF to reconfigure one or more Transmux Channels previously established inside a Network Session. DN_UserCommand[Callback ](IN: networkSessionld, ddDataIn(), loop(CAT)) DN_UserCommand() is issued by the Originating DMIF to pass user data to the corresponding peer that refers to specific channels. DN_UserCommandAck[Callback](IN: networkSessionld, ddDataIn(), loop(CAT); OUT: response, ddDataOut()) DN_UserCommandAck() is issued by the Originating DMIF to pass user data to the corresponding peer that refers to specific channels.  139  DMIF QoS Monitoring Primitives and Operation DMIF addresses QoS monitoring operation by defining 2 primitives. Following sections discuss these primitives and their associated parameters. DA_ChannelMonitor () DA_ChannelMonitor (IN: channelHandle, qosMode; OUT: response)  This primitive is issued by a user to request the initialization, termination or single report request of a QoS monitoring process by the DMIF layer. Parameters that are passed to the DMIF layer are channelHandle and qosMode. The channelHandle identifies the stream to be monitored by the DMIF layer (channelHandle is mapped to internal channel indexes in each layer). The parameter qosMode specifies the operation mode of QoS monitoring. The qosMode may contain additional information about the type of QoS report the DMIF user is interested in. The only parameter that the primitive returns to the calling layer (application layer) is "response". The "response" parameter identifies an acknowledgment (i.e., positive or negative) from the DMIF layer to a QoS monitoring request issued by the DMIF user. DA_ChannelEvent () DA_ChannelEvent (IN: channelHandle, mode, qosReport)  Using this primitive, the DMIF layer informs the application layer about the measured QoS for the data channel specified in the channelHandle. The mode identifies the type of QoS monitoring event (see next section). The qosReport parameter contains the QoS profile (e.g., delay and loss parameters) measured for the channel.  Monitoring events Figure A-l depicts DMIF QoS monitoring operation. DMIF uses three data structures to facilitate event monitoring like QoS monitoring and QoS renegotiations of established data channels in a session QoS monitoring event can be initiated at any moment after the establishment of a channel (section 1 in Figure A-l). The monitoring process ends either when the channel is deleted or when the application explicitly asks the DMIF layer to do so (i.e., through a stop command). 140  DMIF defines three types of QoS monitoring events: QOS_MONITOR or periodic monitoring depicted in section 2 of Figure A-l; QOS_VIOLATION or notification upon violation of the requested QoS by the Application depicted in section 3 of Figure A-l; and QOS_REQUEST or notification upon a single request from the Application depicted in section 4 of Figure A-l.  DMIF User  DMIF Layer DAI ChannelMonitor  Application requests a QoS monitoring event process  (IN: channelHandle, qosMode)  ^  DMIF Layer replies to  w a QoS monitoring  event process request  (OUT: response)  DAI ChannelEvent (IN: channelHandle, mode, qosReport) DAI ChannelEvent (IN: channelHandle, mode, qosReport) DAI ChannelEvent (IN: channelHandle, mode, qosReport)  Application consumes periodic QoS monitoring events  3  Application receives a QoS violation event  .  DAI_ChannelEvent (IN: channelHandle, mode, qosReport)  DMIF Layer sends periodic events (mode =  QOS_MONITOR)  DMIF Layer detects a QoS violation (mode =  QOS_VlOLATlON)  4  Application requests and receives a single QoS report  ^ ^  DAI_ChannelEvent (IN: channelHandle, mode, qosReport)  DMIF Layer acknowledges and sends a QoS report (mode -  QOS_REQUEST)  Figure A-1 QoS Monitoring Events  Monitoring Parameter: qosMode To configure the QoS monitoring module in the DMIF layer, the qosMode data structure is used. The syntax of this parameter is defined in [5]. Information in this structure specifies the mode of operation for the monitoring module. Table 8 illustrates the set of modes currently defined. 141  Table 8 MPEG-4 DMIF defined QoS modes.  Mode QOS_MONITOR  Semantic Description The DMIF layer monitors periodically the QoS of a specific data channel and reports to the application the measured QoS every qosMonitorPeriod milliseconds (Refer to DMIF document for more information).  QOS_VIOLATION  The DMIF layer reports a QoS violation for a specific data channel. A QoS violation is defined when the monitored QoS does not fulfil the requested QoS profile (i.e., delay and loss parameters) specified in the qosDescriptor parameter.  The frequency of QoS reports generated by the DMIF layer is adjustable through qosMode structure. To indicate when a monitoring process shall start or terminate, a parameter in qosMode structure is defined; it is called status. Table 9 summarizes the set of status currently defined for qosMode. Table 9 MPEG-4 DMIF defined QoS status.  Status  >  :  Semantic Description  START  The DMIF layer starts the QoS monitoring process.  STOP  The DMIF layer terminates the QoS monitoring process.  SINGLE  The DMIF layer reports a single QoS report upon request of the application.  Monitoring Parameter: qosReport The qosReport should be able to carry a number of QoS metrics. The set of metrics currently defined are all found in the qosDescriptor data structure previously introduced in Table 3.  142  APPENDIX B. Features of the Implemented Software The implemented software modules include the DMIF instance for remote retrieval (also called remote instance) that works as a plug-in module with Evil software and the MPEG-4 streaming server, called "Apadana Server". The remote instance implementation is the standard compliant implementation of one part of DMIF and appears in the form of a DLL. It is invoked when at the client side the user calls a remote URL. The operation of this DLL is hidden from the user. Remote retrieval is done seamlessly, so that the user does not see any difference compare to local access or other scenarios. So from the user interface perspective there is nothing the remote instance can add to the system GUI. xi Go  Pelre:h Period  -  jnjx.  "  Second(s) Maximum Display bitiate  lico  '  Packet Diop 0  Zof Full Rate  ^ Number of Samples Displayed  ;. [555 Client Buffer Size  if Rudy  Figure B - l Server Output Snapshot  143  At the server, the story is different as it implements a whole application that DMIF is just a part of it (the delivery part). The user interface that is provided at the server gives the Server administrator the possibility to adjust its operation. The figure below gives an snapshot of the Server. As it is seen in the figure, the server features a monitoring screen with adjustable parameters that can be set through a window. These parameters allow the server administrator to view the details of outgoing traffic as well as its averaged characteristics. Another adjustable parameter is the client jitter buffer size that adjusts the fast-start rate controller. This topic was explained in the thesis comprehensively (Chapter 6). Packet Dropper is another module in the server that can be activated through the parameter setting window. The packet dropper is able to reduce the rate of transmission for all presentations by a ratio declared in the setting window. This way all streams are affected except scene and object descriptor streams. Although not implemented, it is very easy to activate the packet dropper for individual streams or presentations without affecting the others. If packet loss for some streams is accepted, Adaptive Rate Control is achievable. As it has been mentioned in Chapter 5, MPEG-4 over RTP implementation/experiment utilized the packet dropper for traffic control purposes by adaptively changing the transmission rate. Further, up-to-date, information about the Server and its features can be found in our website as this implementation is being constantly improved.  144  APPENDIX C. DMIF Instance for Remote Retrieval, the Implemented DNI Syntax This appendix portrays the syntax we have developed for the implementation of the DNI interface. The DNI primitives were defined in a C++ class. This class was used for the creation of DMIF Network Access Objects or as it is defined in our architecture: NetworkSession Objects. class D N _ C L A S S { public: B O O L DN_SessionSetup( long double networkSessionld,LPCSTR calledAddress.LPCSTR callingAddress, int8 compatibilityDescriptorln, int16 Response, int8 *compatibilityDescriptorOut )=0; B O O L DN_SessionRelease(long double networkSessionld, *response/*out*/)=0;  int16 reason,  int16  B O O L DN_ServiceAttach(long double networkSessionld, _ J n t 1 6 serviceld, const char * serviceName, ddDataClass * ddDataln, int16 Response, ddDataClass * ddDataOut)=0; B O O L DN_ServiceDetach(long double networkSessionld, int16 *response)=0; B O O L DN_ChannelAdd(long double networkSessionld, ReqLoop .GenericMsgLoop * RespLoop)=0;  int16 serviceld, _ i n t 1 6 reason,  int16 serviceld.GenericMsgLoop *  B O O L DN_TransMuxSetup(long double networkSessionld, GenericMsgLoop * TMReqLoop, GenericMsgLoop * TMConfLoop)=0; B O O L DN_UserCommand(long double networkSessionld, ddDataClass *ddDataln, GenericMsgLoop * CATLoop)=0; B O O L DN_UserCommandAck(long double networkSessionld, ddDataClass * ddData, GenericMsgLoop * CATLoop, int16 *response, ddDataClass * ddDataOut)=0; B O O L DN_ChannelAdded(long double networkSessionld,_int16 serviceld, GenericMsgLoop * ReqLoop /* loop(CAT, qoschannelDescriptor, direction, TAT, ddDataln())*/,GenericMsgLoop * CnfLoop /* O U T : loop(response, ddData0ut())7 )=0; B O O L DN_ChannelDelete(long double networkSessionld.GenericMsgLoop * CATReasonLoop, GenericMsgLoop * Resploop)=0; B O O L DN_TransMuxConfig(long double networkSessionld, GenericMsgLoop * TMReqLoop, GenericMsgLoop * TMCnfLoop)=0;  BOOL DN_TransMuxRelease(long double networkSessionld, GenericMsgLoop *TATLoop, GenericMsgLoop * RespLoop)=0; };  145  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items