Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Data transmission schemes for a new generation of interactive digital television Azimi, Mehran 2004-12-31

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
831-ubc_2004-901459.pdf [ 8.04MB ]
Metadata
JSON: 831-1.0091848.json
JSON-LD: 831-1.0091848-ld.json
RDF/XML (Pretty): 831-1.0091848-rdf.xml
RDF/JSON: 831-1.0091848-rdf.json
Turtle: 831-1.0091848-turtle.txt
N-Triples: 831-1.0091848-rdf-ntriples.txt
Original Record: 831-1.0091848-source.json
Full Text
831-1.0091848-fulltext.txt
Citation
831-1.0091848.ris

Full Text

Data Transmission Schemes for a New Generation of Interactive Digital Television by Mehran Azimi B.Sc, Amir-Kabir (Polytechnic) University of Technology, Tehran , 1994 M.Sc., Sharif University of Technology, Tehran, 1998 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Department of Electrical and Computer Engineering) We accept this thesis as conforming to the required standard The University of British Columbia MARCH 2004 © Mehrah Azimi, 2004 Abstract Interactive television (ITV) is an attractive technology, which changes the way TV viewers experience home entertainment. In this thesis, we design an interactive television system, which truly gives TV viewers freedom to control and individualize the presentation of TV program content. In this context, we present methods that add extra video and audio streams (called incidental streams) containing interactive content, to the transmission line of a digital TV system. The addition of these extra streams does not result in increasing the transmission bandwidth, nor in degrading the picture or sound quality of the main TV program content. Our design consists of two major transmission mechanisms for trans mitting the incidental data, called deterministic and stochastic service classes. The deterministic service class is designed such that no incidental stream data packet is lost during transmission. On the other hand, the stochastic service class is designed such that some incidental data loss is possible; however, the data loss rate is bounded. We present a strategy based on scalable video cod ing, which in conjunction with the deterministic and stochastic service classes, achieves the best possible picture and sound quality for the incidental streams under the constraint of available bandwidth. We also present data transmission methods for the deterministic and the stochastic service classes. In the context of the deterministic service class, we employ a deterministic traffic model for modelling the traffic of main streams, and then design a data transmission scheme based on the 'Network Calculus' theory. In the context of the stochastic service class, we employ Hidden Semi-Markov Models (HSMM) for modelling the traffic of main streams. We then design a data transmission scheme based on the 'Effective Bandwidth' theory. Furthermore, we design a data multiplexing system for the transmit ter head-end of the proposed interactive TV system. This includes a novel scheduling algorithm for controlling the multiplexing of incidental and main ii streams data packets. We present numerical results of simulation experiments, which support the theoretical methods presented in this thesis. m Contents Abstract ii Contents v List of Tables viii List of Figures ix Acknowledgements xiii Dedication xiv 1 Introduction 1 1.0.1 A New Vision of Interactivity for Television 3 1.1 Challenges 5 1.2 Thesis Scope 6 1.3 Our Approach 7 1.3.1 Framework1.3.2 Maximum Waiting Time 8 1.4 Structure and Mechanisms of the System 10 1.4.1 Admission Control . . . 10 1.4.2 Traffic Characterization 11 1.4.3 Service Classes ' 2 1.4.4 Scalable Coding- 3 1.4.5 Data Multiplexing 14 1.5 Thesis Contribution and Structure 15 2 Deterministic Service Class 17 2.1 Introduction 1iv 2.2 Traffic Characterization : 18 2.2.1 Deterministic Traffic Characterization 19 2.2.2 Parameterized Traffic Characterization Models 21 2.3 Fitting the (a,p) Model to a Source 26 2.3.1 Overview of Current Methods for Fitting the (a, p) Model to a Traffic Source 28 2.3.2 Our Approach to Constructing the Empirical Envelope 31 2.3.3 Obtaining the (a, p) Parameters from the Empirical En velope 34 2.4 Admission Control 7 2.4.1 Network Calculus Basics 32.4.2 Waiting-Time Bound in the proposed ITV application. 40 2.4.3 Our Algorithm to Find Tw 42 2.5 Results 43 2.5.1 Numerical Results of 'Model Fitting' methods 44 2.5.2 Numerical Results of Admission Control Mechanism' . 47 2.6 Conclusion 53 Stochastic Service Class 56 3.1 Introduction3.2 Effective Bandwidth 7 3.2.1 Large Deviation Principle 59 3.2.2 Effective Bandwidth Definition 61 3.2.3 Physical Interpretation of Effective Bandwidth 62 3.3 Quantifying Data Loss in Communication Networks Using Ef fective Bandwidth 63 3.3.1 Data Loss Probability in a Queuing Model with Priority 64 3.4 Admission Control for the Stochastic Service Class 67 3.5 Numerical Estimation of Effective Bandwidth 68 3.6 Conclusion 72 4 Video Traffic Modelling 74 4.1 Introduction4.2 Existing Stochastic Video Traffic Models 76 4.2.1 Stochastic Models for Full-Screen Broadcast Video . . 80 4.3 Mathematical Background of General Markovian Models ... 81 4.3.1 Markov Chain 8v 4.3.2 Hidden Markov Models ' ' 82 4.3.3 Limitation of HMM Models 83 4.3.4 Selecting Between HMM and HSMM Modelling Approach for the Proposed ITV Application 84 4.4 General Background on Semi-Markovian Signal Models .... 86 4.4.1 Semi-Markov Chains 84.4.2 Hidden Semi-Markov Models 89 4.4.3 General Background on Identification of HSMMs ... 89 4.5 New Formulation of HSMMs 94 4.5.1 Hidden (or Modulating) Layer Modelling 94 4.5.2 Observation Layer Modelling 97 4.5.3 Model Parameterizations 8 4.6 A New Algorithm for Off-line Identification of HSMMs .... 99 4.6.1 Implementation Issues 103 4.7 Online Identification of HSMMs 4 4.7.1 General RPE technique 105 4.7.2 Online Identification of HSMMs Using the RPE Method 106 4.7.3 Model Parameterizations 108 4.7.4 Estimation of the Hidden Layer Variables 109 4.7.5 Gradient Vector Update Equations 110 4.7.6 Parameter Update Equations 114 4.7.7 Updating the Hessian Matrix Estimate, RT 114 4.7.8 Implementation Issues 114.8 Effective Bandwidth of a Hidden Semi-Markov Model 116 4.8.1 Effective Bandwidth of Hidden Markov Models .... 116 4.8.2 Reformulating a Semi-Markov chain as a Markov chain 118 4.8.3 Effective Bandwidth of an HSMM by Reformulating as an HMM 119 4.9 Numerical Results 120 4.9.1 Numerical Results for Synthetic Data 124.9.2 Numerical Results for Empirical TV Traffic Traces . . 127 4.10 Chapter Conclusion 13Broadcast Head-End Architecture 138 5.1 Introduction 13vi 5.2 Multiplexing System Structure of Standard Digital TV System 140 5.3 Multiplexing System for Our Interactive TV System 140 5.3.1 TS packetizer 143 5.3.2 Traffic Shaping Unit 145.3.3 Multiplex Buffer 5 5.4 Scheduling 145.4.1 Scheduling Unit Objective 145 5.4.2 Prioritizing Policy 146 5.4.3 Limitations Imposed on Scheduling by the Digital TV Standard 145.4.4 Scheduling Algorithm 147 5.4.5 Packet Drop-Off .' 148 5.5 Chapter Conclusion 149 6 Thesis Summary 150 6.1 Thesis Contributions Summary 152 6.2 Future Research 153 Appendix A Proof of Theorem 1 in Section 2.4 155 Appendix B Q-Q Plot 156 Appendix C Likelihood Ratio Test 158 Cl General Likelihood Test 15C.2 Likelihood Ratio Test for testing the HMM model against HSMM model 159 Bibliography 162 vii List of Tables 2.1 Execution time in seconds for calculating E(t) for 1 < t < 10000. Simulations were run on a PC with pentium IV processor at 1.7 GHz, using Matlab implementation 45 2.2 Comparison of our method with the method in [1] for finding the (<7, p) parameters. E(i) constructed for all 1 < t < N. . . . 46 2.3 The model parameters constructed from the entire envelope (second column) and from E(t) for 1 < t < r and samples of E(t) at t = i6 for 1 < i < n. r = 200, n5 = N 46 2.4 Video sequences used in our study 48 2.5 Encoding parameters for the video streams used in our study. 48 2.6 Numerical values of (a, p) model parameters for the main video sequences used in our simulation 51 2.7 Simulation parameters 52 4.1 Results of likelihood ratio test for different typical TV video sequences. The number of states in HSMM and HMM models is N = 3 86 4.2 Actual and initial values of the parameters of HSMM models used in simulating our off-line identification algorithm 121 4.3 Actual and initial values of the parameters of HSMM models used in simulating our online identification algorithm 122 4.4 Estimated parameter values of HSMM model for empirical traf fic traces of typical TV programs 129 4.5 Simulation parameters 133 viii List of Figures 1.1 Screen shot of an Electronic Program Guide, EPG. EPG lets you search, navigate, and find out what's on your TV, all while watching TV 2 1.2 Screen shot of an Enhanced TV. Enhanced TV lets you view extra HTML based information about the current TV program. or surf the Internet, while watching TV 3 1.3 A transmission time-line, which illustrates the effect of Tw\ the gray boxes show when the data of the ith data unit are actually transmitted 9 1.4 Structure of the proposed Transmission System 11 2.1 E(t) for a CBR and a typical VBR source 21 2.2 Traffic constraint function for different traffic models 24 2.3 A*(t) for (a,p) model for rn =1,2 and 3 25 2.4 E(t) for two typical TV programs. The traffic is normalized to its maximum rate, i.e., the maximum rate of these sources is 1. 27 2.5 E(t) obtained from the direct approach, largest subadditive clo sure extrapolation, repetition extrapolation for r = 400 and sampling with 5 = 400. The source is a VBR MPEG-2 se quence, selected from the motion picture 'Mission Impossible' with 720 x 480 resolution, frame rate 30 and maximum bitrate of 5 Mbps 30 2.6 Pseudo-code of our algorithm for obtaining the (a, p) model parameters from E(t) 34 ix 2.7 Divide and Conquer approach in our algorithm, (a) First step: a single (o"i,pi) is fitted into the E(t). (b) Second Step: since A*(t) in step one was not a satisfactory estimate of E(t) for [Ti,T2], the (a2,p2) is fitted to E(t) for t € [Ti,T2]. (c) Third Step: the (<T3)/O3) is fitted to £(t) for t G [Ti,T2] 36 2.8 The General system <S 37 2.9 The arrival and departure functions: (a) Continuous time, where Ain and Aout are defined for all t > 0, (b) Discrete time, where Ain and Aout are only defined at discrete times denoted by clots. 38 2.10 a) The transmitter model with N high priority (main) streams, b) The equivalent model, where the N high priority input streams are replaced with one equivalent stream 40 2.11 Pseudo-code of our algorithm to find Tw 2 2.12 (3L(t) and A*L(t) 43 2.13 Utilization ratio, U(A*,d) 47 2.14 A*(t) and E(t) of video sequences' of typical TV programs, a) Mission Impossible, b) News, c) Talk show, d) Documentary. 49 2.15 A*(t) and E(t) of video sequences' of typical TV programs, a) Court show, b) Muppets, c) Soap Opera, d) Cartoon 50 2.16 R versus Tw for an incidental stream. Tw is obtained for each R using the admission control algorithm presented in Section 2.4.3. 54 3.1 A sample effective bandwidth curve a(6), and A(6>), A*(0) and inverse of the effective bandwidth function 1(c) = a~1(6). The average rate of the source is .448 62 3.2 A simple buffer of size B, filled at variable rate yt and emptied at constant rate c 64 3.3 A simple multiplexer model with prioritized inputs 65 3.4 A simple buffer filled at constant rate R and emptied at variable rate c. If the buffer workload isWL, then we know that the first data unit has waited WL/R seconds in the buffer 67 3.5 A simple multiplexer model with N high-priority and one low-priority inputs 68 4.1 Normalized bitrate and 'State transition path' for two typical TV programs 85 x 4.2 a) Histogram of state (or cluster) duration for one of the states in the News Sequence, b) Geometrical probability mass func tion, c) Discretized Gamma probability distribution function, d) Q-Q plot for Geometrical distribution, e) Q-Q plot for Gamma distribution 87 4.3 State transition paths that should be considered for computing at(j) in the HMM and HSMM cases 91 4.4 Comparison of the signal generation process in our model with that of existing signal models for HSMMs 98 4.5 'E' and 'M' steps of our algorithm for off-line identification of HSMMs. ...... 103 4.6 Parameter update equations performed in each step of a general RPE algorithm 107 4.7 Recursive Prediction Error scheme for estimating HSMM pa rameters . . 108 4.8 a-c) Parameter estimates for the first model versus the iteration number. The dotted lines show the actual value of the param eters, d) The log-likelihood function, log(F(yi, y2, ••• ,yt\0)), versus the iteration number. As shown, log(P(yi,y2, • • • ,Vt\9)) increases in each iteration 123 4.9 a-c) Parameter estimates for the second model versus the iter ation number. The dotted lines show the actual value of the pa rameters, d) The log-likelihood function, log(P(j/i, y2,.... 2/t|0)), versus the iteration number. As shown, log(P(yi,y2> • • • -.VtW)) increases in each iteration 124 4.10 Online estimation of a 3 state HSMM: a) state transition proba bility a°2; b) observation mean for state 1, ui, c) state duration mean for state 1, uSi\. The dotted line shows the actual value of the parameter. The parameter estimate converges to the actual value of the parameter 125 4.11 Online estimation of a 3 state HSMM, where the actual param eter changes at t = 5000: a) state transition probability a^; b) observation mean for state 1, ux. The dotted lines show the actual value of the parameter. The parameter estimates follow • the temporal changes in the actual value of the parameter. . . 126 xi 4.12 Online estimation of HSMM parameters for the 'Talk show' se quence 130 4.13 Online estimation of HSMM parameters for the Documentary (Best Places in Canada) sequence 131 4.14 Online estimation of HSMM parameters for the 'Mission Im possible' sequence 132 4.15 Effective Bandwidth curves of video sequences of typical TV programs 134 4.16 Tw versus R for an incidental stream that uses the stochastic service class 5 4.17 Probability of loss p versus rate R, where Tw is constant. The solid lines illustrate the loss probabilities obtained by the ad mission control scheme, and '+' signs illustrate the actual loss probabilities obtained from simulating the multiplexing system. As shown, actual loss probabilities obtained from simulation are close to what obtained from the admission control 136 5.1 Conceptual diagram of a Transport Multiplexer 141 5.2 Diagram of multiplexing system 142 5.3 Scheduling unit decides which TS packet should be sent at the next transmission time-slot 146 5.4 Packet drop off in Multiplex buffers of stochastic and best-eft'ort service classes. When a new packet arrives to a full buffer, it is pushed into the buffer 148 B.l Q-Q plot 157 xii Acknowledgements This thesis is the result of five years of work wherein I have been accompa nied and supported by many people. It is with pleasure that I now take the opportunity to express my gratitude to all of them. First and foremost, I would like to thank my supervisors, Dr. Rabab K. Ward, and Dr. Panos Nasiopoulos, who have always been extremely kind and supportive. I would never have been able to finish my thesis without my supervisors' insights, criticisms and thoughtful suggestions. I am most grateful to Rabab because of her continuous moral support, and for showing me the proper research methodology; and I am most grateful to Panos for his academic advice, and for being a mentor to me. I am thankful to all my colleagues at the 'Image Processing Lab' of the department of Electrical and Computer Engineering at UBC. Especially, I am grateful to the following individuals for their academic help or moral support during my studies, and for being my friends: Alex Paquet, Kemal Ugur, Parvin Mousavi, Vanessa Mirzaee, Yasser Pourmohammadi, Alen Docef, Parvaneh Saeedi, Farhad Ghasemi, and Purang Abolmaesoumi. I am also grateful to Dr. Vikram Krishnamurthy for his comments on one of my research papers, and to Kirsty Barclay for all her editorial help during my studies. I am grateful to my wife, Paria, for her love and patience, and for her inspirational and moral support during my studies. She is the love of my life, and I dedicate this thesis to her. Last but certainly not least, I would like to thank my parents, and my sisters, for their continuous love and support. I would not have accomplished this much in my life without their sacrifices for me throughout my life. MEHRAN AZIMI The University of British Columbia MARCH 2004 xiii To my beloved wife, Paria. xiv Chapter 1 Introduction Transport of the mails, transport of the hu man voice, transport of flickering pictures-in this century as in others our highest accom plishments still have the single aim of bringing men together. - Antoine de Saint-Exupery (1939) TELEVISION systems are migrating to digital technology. More Digital Television (DTV) systems are being deployed around the world everyday. This change is creating an incredible technological revolution in the entertain ment industry. The new DTV technology not only delivers crystal clear picture and superior sound quality, but also allows new services to be added to TV pro grams. These add-in services enhance the TV viewer's experience by adding extra features or content to TV programs. For example, the presently avail able Electronic Program Guide (EPG) (see Figure 1.1) is an add-in service for DTV systems. EPG lists the programs that are or will be available on each TV channel, plus a short summary or a commentary for each program. Interactivity is the most attractive enhancement promised to be added to digital TV systems. However, the concept of 'interactivity' for TV is not clearly defined, and the term 'Interactive TV has been used for many different TV systems with many different features. For example, Video-On-Demand systems and TV systems with VCR-like functionality are occasionally referred to as 'Interactive TV.' In Video-On-Demand systems, viewers select a movie or TV show from a library, and that movie is played back on their TV. In TV 1 Command Buttons 1® Watcft J«( "New"it «hr»J*d to Witdi Ore*, frm th* Qfm xtion button to «nt*f tn* SchrtM* to <h*ig» tr*tr*qu*n<> toM/of W*«kty Prmtn«S««(t k*y to tux to thu cfumrl now. Promotional Window | in Histofy OfttitKoru | 10:30pm 11:00pm | L.nl Spin City Dharma & Gr_ IT1"••IIMMIMADTV > »CRS The Nanny Everybody L V 3rd Rock Fro. Working Gone With the Wind • Good Morning, Vietnam » (Young Phiiadeiphians 1 Information Window Menu Bar Program Guide Figure 1.1: Screen shot of an Electronic Program Guide, EPG. EPG lets you search, navigate, and find out what's on your TV, all while watching TV. systems with VCR-like functionality, TV viewers can pause or rewind a live TV show, or save a TV show to watch later. This system is commercially available now. Today, 'Interactive TV mostly refers to the so-called ' Enhanced TV system defined by the Advanced Television Enhancement Forum (ATVEF) [2j. ATVEF is an industry alliance of many major companies. The goal of this forum is to standardize HTML enabled TV systems. With the Enhanced TV technology, a web page is displayed alongside a TV picture (see Figure 1.2) on the TV screen. Viewers surf this web page using their remote-controls to get more information about the program, do e-shopping, and so forth. There are two scenarios for sending the web page content. In the first scenario, the web page contents are seamlessly inserted into the broadcasted TV signal. In this scenario, no Internet connection is required. However, the size of the inserted web data is limited, and users are limited to the inserted web pages. In the second scenario, an internet link is inserted into the TV signal. This link is used by the receiver set-top box to download the web page content via an Internet connection. The 'Enhanced TV system is now commercially available (e.g., Microsoft WebTV©), and many TV shows are currently using this service. •1 TV picture Figure 1.2: Screen shot of an Enhanced TV. Enhanced TV lets you view extra HTML based information about the current TV program, or surf the Internet, while watching TV Thus, current interactive TV systems are limited to providing World Wide Web content. This content is mostly composed of text or still pictures. Besides, user interactions in current interactive TV systems are limited to web-surfing-like actions, such as menu selection, typing a URL, and so on. Actually, in practice users interact with the web page content, and not with the TV program content. It is evident that current interactive systems for digital TV remain limited. This limitation lies in the fact that none of the current interactive TV systems provide any control over the video or audio content of the TV program. 1.0.1 A New Vision of Interactivity for Television Our vision of interactivity in Television is a system which allows TV viewers to control the final presentation of the TV program content. Such interactivity has been successfully implemented in DVD, which is a non-broadcast interac tive media. Therefore, this Interactive TV system could be considered as a technology that offers DVD-like interactivity in broadcast digital TV systems. That is, the new Interactive TV system would enable TV viewers to control the final presentation of the video or audio content of a TV program as in a DVD player system. Important applications would include the followings 3 Multi-lingual audio, where viewers select the language of choice from an available set of languages. Parental management, where viewers select the content rating of a TV program. If the viewer selects the 'non-adult' version of a program, the receiver then seamlessly replaces the adult rated scenes of the program with a non-adult rated video sequence. Multi-angle video, where viewers can view a scene from one or more spa tially different angles. For example, during a soccer program, viewers can select to view the important scenes of the game from various angles. Video-in-Video, where viewers may select to open a small window in the corner of their displays. This small window shows a separate video that enhances the viewer's experience. For example, in a soccer program, the small window may display an important incident on the other side of the field. In the proposed system, viewers use their remote controls as if using a DVD-player remote control, to select choices regarding a TV program from a menu overlayed on top of the main video. This can include, for example, selecting a different language audio track, or switching between adult and non-adult versions of a movie. These choices are conveyed to the receiver and are used to select the appropriate video and audio sequences to be displayed. Hence, the viewer's experience of a TV program is customized based on his or her own choices. Thus, our proposed interactive TV system requires the transmission of extra video or audio streams alongside the main TV program stream. We call these extra streams 'incidental streams.'' Incidental streams carry the extra video or audio sequences required for an interactive TV application. For example, the audio stream of another language track in the multilingual audio application, or the video stream of a secondary camera angle in the multi-angle video application are incidental streams. Depending on the application, an incidental stream may carry a video or audio sequence with a limited and specified length, or an unspecified length. For instance, in the parental management application for a movie, the main stream carries the regular version of the movie, which we assume has a few 'offending' scenes. For each offending scene, a non-offending version of the 4 same scene is transmitted as an incidental stream. In this case, each incidental stream has a limited length, equal to its corresponding scene length. On the other hand, in the case of the multilingual application, the incidental stream carries the audio stream of the other language track for the entire program. In this case, the incidental stream is a continuous stream whose length is equal to the length of the program. In order to receive and display a digital TV signal, the TV receiver must be equipped with a digital set-top box. A set-top box designed and pro grammed specifically for the proposed ITV application will be able to display the incidental streams. Conventional set-top boxes simply ignore the inciden tal streams and only display main streams. This makes the system backwards compatible with the presently available digital TV set-top boxes. Therefore, adding the incidental streams to a TV program does not effect the compati bility of the broadcasted TV signal with conventional digital receivers. 1.1 Challenges In designing the proposed interactive TV system several issues and challenges must be addressed. Limited Transmission Bandwidth The channel bandwidth available for transmitting digital TV signal is limited. In most transmission media, such as Cable, Terrestrial and Satellite, a fixed channel is shared among a number of TV programs, where each program uses a fixed share of the transmission bandwidth. In the proposed interactive TV system, inci dental streams data must be accommodated in the same transmission channel as the main streams. However, reserving part of the transmis sion channel bandwidth for incidental streams is not an attractive ap proach for two reasons. First, the contents of incidental streams are not as important as the main program contents. This is because incidental streams usually carry enhancement content for a TV program; hence, incidental streams are expected to be viewed by much smaller TV audi ences than the main program content. Therefore, it is not cost-effective to reserve bandwidth for incidental data, which are of secondary impor tance. Second, depending on the application, incidental streams may carry video or audio clips, each with a limited length. For example, in 5 the parental management application for a movie, an incidental stream carries the non-offending version of the movie only when there is an of fending scene; otherwise, no incidental stream is broadcasted. Therefore, it is wasteful to reserve a fixed portion of bandwidth for an incidental stream, which may be utilized only during a few time intervals. Keeping the Quality of Main Streams Intact As noted, main streams carry more important content than incidental streams do. Therefore, adding incidental streams to a TV program should not degrade the qual ity of the main streams. Compatibility with Standard TV Receivers Adding incidental streams to a TV program should not make the broadcast signal incompatible with conventional TV receivers. Thus, it is necessary that our system design be compliant with the Digital TV standards. Same Channel Transmission The incidental streams should be accommo dated within the same transmission channel used for main streams. It is not an attractive option to use other transmission mechanisms for sending incidental data. For example, consider a possible scenario for implementing the proposed Interactive TV system via adding Internet links to TV programs. These links would point to incidental video and audio streams located on the Internet. Then, the set-top box receivers would use these links to download the incidental streams yia a fast in ternet connection, and display them on the TV screen. However, the problem with this approach is that each TV receiver would be required to have a fast internet connection. In fact, this approach has been taken and implemented by a few manufacturers, but it has failed as its concept has been rejected by both the TV broadcasters and consumers. 1.2 Thesis Scope As noted, the most important challenge in designing the proposed Interactive TV system arises from accommodating the incidental streams within the fixed bandwidth allocated for the main streams in the transmission line of a dig ital television broadcast system. In this thesis, we address this challenging 6 problem. We present novel solutions for adding incidental streams without in troducing any degradation whatsoever in picture quality of main streams, and without increasing transmission bandwidth. This is possible since the rate of main streams varies with time, and does not occupy its entire allocated band width at all times. The incidental streams are transmitted using the available bandwidth. Our method strives to make the most productive use of the avail able bandwidth, and delivers incidental video and audio content with the best possible picture and sound quality. Unlike current interactive TV technologies, the proposed system is a one way system. That is, no return path from TV viewers to the transmit ter or no Internet connection are required. Furthermore, adding incidental streams to a TV program using our method would not affect the compatibil ity of the broadcast signal with conventional digital TV receivers. In other words, a broadcast signal that carries both incidental and main streams is re ceivable by both conventional digital TV receivers and by receivers specifically programmed for the proposed ITV application. Conventional receivers will display the main streams, while receivers programmed for the proposed ITV application will be capable of displaying both incidental and main streams. These features make the proposed interactive TV system even more attractive to both consumers and TV broadcast companies. 1.3 Our Approach 1.3.1 Framework We encode the main video sequences with constant picture quality. Therefore, the main video streams are encoded at variable bitrate (VBR). It is well known that simple and slow activity video scenes require a smaller number of encoding bits than complex video scenes do; such that bitrate for complex scenes may reach the maximum allowed bitrate. Digital TV transmission media (e.g., cable or terrestrial) allow a fixed reserved bandwidth for each TV channel equal to the source maximum rate. Therefore, during simple scenes the allowed bandwidth is under-utilized. We propose to use these unused portions of the bandwidth for transmitting incidental stream data. Each data unit of an incidental stream contains time sensitive data. This means, each data unit should be transmitted before a certain transmis-7 sion deadline, so that it is available at the receiver at a certain time for de coding and presentation. Incidental stream data units are only transmitted at opportune moments, when the transmission bandwidth is not fully utilized by the main streams. In order to transmit incidental data units at these oppor tune moments, we propose that the transmitter receives each incidental data unit ahead of its transmission deadline, say by tw seconds. Each data unit is first buffered at the transmitter and then transmitted whenever some free bandwidth becomes available. It is vital to choose tw large enough so that the incidental data are transmitted and received by the receiver by the time they are to be decoded and presented to the viewer. Since decoders may receive the the incidental data units prior to their presentation time, these data have to be buffered at the decoder until their decoding time. 1.3.2 Maximum Waiting Time An important question arises here: "for an incidental stream with a given bitrate R, what is the minimum tw ?" We will denote the minimum tw by Tw. Therefore, Tw is defined as the maximum time that the data units of an incidental stream with rate R might wait in the transmitter buffer before being transmitted. Once Tw is found, the transmitter should then receive the incidental data units tw > Tw seconds before their transmission deadline. This ensures that all incidental data units are transmitted on time and made available at the decoder prior to or by their decoding time. We discuss our approach to finding Tw in the next section. A small Tw is extremely desirable for three reasons. First, a small Tw reduces the inescapable delay in starting the presentation of an incidental stream in a live program. Suppose that the first data bit of an incidental stream is delivered to the transmitter system for transmission at time t. If we ignore the constant delays caused by multiplexing, transmission and demultiplexing, then the decoding of this incidental stream can start at the receivers at t + Tw. Hence, it is very attractive to have a small Tw, so that playback of incidental streams in live TV programs can start very shortly after they have actually been captured. Second, since the receiver buffers must be capable of storing Tw seconds of an incidental stream data, a smaller Tw then requires a smaller buffer size at the receivers. Third, a small Tw is advantageous when viewers change from one TV channel to another. With a smaller Tw, viewers experience 8 System 1: System 2: i< 'arrival Figure 1.3: A transmission time-line, which illustrates the effect of Tw: the gray boxes show when the data of the ith data unit are actually transmitted. a shorter 'random access delay' for the incidental streams while they switch channels. By 'random access delay,' we mean the delay TV viewers experience from the moment they switch to a new channel to the time the playback of the new program actually starts. For the main video and audio streams, the random access time rarely becomes more than 0.5 seconds. This is because the coded main video and audio frames are broadcasted very close to their decoding and presentation times. To justify the effect of Tw on random access delay time for incidental streams, consider two transmitters offering two different maximum waiting times, Twx and 7V2, to an incidental stream of rate R. where Tw1 < Tw2- Assume the data units of the same incidental stream are sent to these two transmitters. In order to simplify the discussion, we ignore the delays caused by the transmission line and buffering at the receivers. Let ^decode denote the decoding time of the ith data unit of the incidental stream (see Figure 1.3). Since each data unit arrives at the transmitter buffer Tw seconds before its decoding time, then tarriual = tdecode — Tw denotes the time when the data bits of the ith data unit arrive at the transmitter buffer. The gray boxes in Figure 1.3 show the time instances when the data bits of this data unit are actually transmitted. Now, suppose a TV viewer changes the channel on his or her receiver to this program at taccess. In this case, the TV receiver starts receiving the data of this channel at taccess. In the first system, the receiver completely receives the ith data unit of the incidental stream, while in the second system, the receiver misses this data unit. Therefore, the presentation of the incidental stream in the first system starts sooner (i.e., from the ith data unit) than in the second system. -uJ--IV! rival •4' decode 1 ^ac w2 decode 9 1.4 Structure and Mechanisms of the System In this section, we introduce the mechanisms that we employ in the proposed system. Our objective here is to introduce the concept of each mechanism, and describe their roles. The details of each mechanism and our implementation approach are discussed in detail throughout the thesis. We refer to the different video or audio streams as 'traffic sources' and to the actual data as 'traffic' from here on. This is because each video and audio stream could be considered as a data generating source. Figure 1.4 illustrates the basic building blocks of the proposed transmis sion system. As shown, this system consists of the following units: admission control, traffic characterization, scalable coder, service classes, and data mul tiplexer. 1.4.1 Admission Control Before an incidental stream is added to a TV program, and actually starts sub mitting data to the transmitter for transmission, it is necessary to determine the rate R and the waiting-time tw for this stream. We refer to this mechanism as 'admission control.' The admission control mechanism determines whether or not a certain incidental stream is allowed to be transmitted. The admission control relies on some bandwidth provisioning mechanisms, which forecast the free bandwidth in the system in the future. The admission control mechanism is initiated by sending a connection request from the TV production studio to the transmitter. The connection request is sent in advance of the actual data, and conveys to the transmitter that an incidental stream is going to be added to the program in the near future. The connection request also conveys a set of minimum service param eters for the incidental streams, which are the minimum bitrate Rmm and the largest waiting-time t™ax that can be selected for this incidental stream. Then, the admission control must determine whether or not it can assign a bitrate R and a waiting-time tw to the incidental stream, such that R > Rmin and Tw <tw< t™ax. If the admission control can find such an (R,tw) parameter pair, then the incidental stream is accepted by the admission control. If not, then the incidental stream is rejected; that is, the incidental stream will not be added to the program. The above procedure is referred to as the 'admission 10 Main Audio Stream Main Video Stream Traffic characterization Connection Request J Traffic Descriptors Admission control Incidental Stream (R ,tw)'s pairs base layer Scalable encoder 1" enhancement layer 21"1 enhancement layer t 's teterministic service o stochastic service best effort service D service classes M U X Transport Stream Figure 1.4: Structure of the proposed Transmission System. test.' Once the transmitter receives the actual data of an accepted incidental stream, it re-encodes the incidental stream with rate R; and makes the encoded data units available for transmission at exactly tw seconds before their decoding time. This will be described in more details in the next sections. 1.4.2 Traffic Characterization The traffic characterization unit assigns a traffic descriptor to each main video source, and conveys them to the admission control unit. The traffic descriptor is used in forecasting the bandwidth required by the main streams, from which we can deduce the bandwidth available for the incidental streams. Therefore, the admission control unit uses the traffic descriptors in its bandwidth pro visioning mechanism to determine how much bandwidth will be available to the incidental streams. A traffic descriptor is composed of a set of parame-11 ters which contain useful and important characteristic information about the traffic shape of the source. Specifically, a traffic descriptor carries information about the traffic 'burstiness' of the sources. A 'traffic burst' refers to a state where a traffic source generates traffic at a rate higher than its average for a long period of time. We say a traffic source is bursty if it frequently generates traffic bursts. As it will be discussed later, extracting the traffic descriptors directly from traffic is not a straight forward process. For this reason, we employ a modelling approach, where we use a parameterized model for modelling the traffic; we then find the traffic descriptors from the model parameters. We refer to these parameterized models as Traffic Models. For pre-recorded TV programs, traffic models and traffic descriptors are obtained by using off-line algorithms. For live TV programs, the traffic models and traffic descriptors are obtained by monitoring traffic, and by using online methods. 1.4.3 Service Classes An incidental stream can be transmitted using one of three different service classes. These service classes are defined below. Deterministic Service Class: When an incidental stream is transmitted using the deterministic service class, the transmitter guarantees to send all the incidental data units on time and without any deadline violation or loss. The advantage of this approach is that the incidental stream experiences no data loss; hence the playback of incidental streams at the receivers will have no undesirable visual artifacts such as blocks or picture freezing. The disadvantage of this approach is that Tw (or R) is determined by the admission control process based on the most pes simistic bandwidth provisioning for incidental streams. This results in very large Tw (or small R), which is not desirable. Stochastic Service Class: When an incidental stream is transmitted using the stochastic service class, some of the incidental data units may be dropped (i.e., not transmitted); however, the data loss probability is guaranteed to be less than a certain threshold, say p%. The advantage of this approach is that Tw (or R) is determined using a more relaxed band width provisioning for incidental streams. This results in much smaller 12 Tw (or larger R) than that of the deterministic service class. However, this approach has the disadvantage that data loss is inescapable, and hence, playback of incidental streams will have some visual discrepan cies. Best Effort Service Class: In the best effort service class, the transmitter does not provide any guarantee of sending the incidental data. As the name 'best effort' implies, the transmitter uses any free bandwidth in the transmission line for sending the incidental data. Since no service guarantee is given, no admission control is necessary for this service class. As noted, each service class defined above has its own pros and cons. More precisely, if an incidental stream is transmitted with a deterministic ser vice class, then Tw should be selected large enough (or alternately, R is selected small enough) such that we always have enough bandwidth to send all the inci dental data on time. This means that Tw is selected such that even during the worst-case conditions, we find sufficient transmission opportunities for inci dental data. This worst-case condition happens when the bandwidth available to incidental streams is at its minimum. In this case, the admission control is performed based on a pessimistic bandwidth provisioning. Therefore, the bandwidth provisioning mechanism deviates far from a usual state of system. This means that on average, incidental stream data units will wait much less than Tw seconds in the transmitter buffer. This results in poor bandwidth utilization. Conversely, the bandwidth provisioning in the stochastic service class is based on a more relaxed approach. This results in much higher band width utilization. However, data loss is probable with this approach, which in some instances results in unattractive visual discrepancies in TV picture such as green blocks, picture freezing and so on. A solution, which offers a trade-off between visual quality and bandwidth utilization, is discussed below. 1.4.4 Scalable Coding In order to achieve a compromise between high bandwidth utilization and the smooth playback of incidental streams, we use the scalable coding technique for encoding the incidental streams [3-5], In this technique, a video or audio sequence is encoded to more than one bitstream. The first stream is called the base layer stream and usually has a low bitrate. The other bitstreams are called 13 the enhancement layer streams, and carry a better picture or sound quality than that of the base layer alone. Unlike the base layer stream, the decoding of enhancement layer streams is not stand-alone, and requires the base layer stream during the decoding process. We propose to send the base layer stream using the deterministic service class, and the first enhancement layer stream using the stochastic service class. The second enhancement layer is transmitted using the best effort service class. This approach guarantees that the base layer stream data are delivered to receivers without any data loss. Therefore, the incidental stream is guaranteed to play back with the minimum quality offered by the base layer stream. Meanwhile, the first enhancement layer data will be transmitted by taking advantage of the bandwidth that is not utilized by the main and base layer incidental streams. Since the data loss is bounded, most of the enhancement layer stream data are expected to be transmitted on time. The second enhancement layer data are transmitted using the best effort service. Therefore, any bandwidth left over by other streams is utilized by the second enhancement layer. This approach results in an incidental stream playback with minimum picture or sound quality determined by the base layer stream, an average quality determined by the first enhancement layer, and a best quality determined by the second enhancement layer. Therefore, for an incidental stream that is to be re-encoded using scal able coding and transmitted with different service classes, two admission con trol processes must be performed. The first admission control process deter mines the rate Rbase and waiting-time t^se for the base layer stream. The second admission control process determines the rate Renk and the waiting-time t^h at a given data loss rate, say p%, for the first enhancement layer stream. Since no service guarantee is offered in the best effort service class, no admission control process is necessary for the second enhancement layer stream. 1.4.5 Data Multiplexing The multiplexing unit is responsible for multiplexing the main and incidental streams together. This unit handles the data units of each traffic source ac cording to their priority. The main streams have the highest priority, followed by the incidental streams with deterministic service class. This, in turn, is followed by the incidental streams that use the stochastic service class. The 14 incidental streams with the best effort service class have the lowest priority. 1.5 Thesis Contribution and Structure This thesis consists of 6 chapters and 3 appendixes. Chapters 2-4 describe how the deterministic and stochastic service classes are designed. This includes the selection of the traffic model, the design of efficient model fitting methods, the selection of a traffic descriptor, and finally the admission control mechanism. In Chapter 2, we address the admission control problem for the deter ministic service class, and develop methods for the deterministic service class. We use the so called (a, p) model to model the traffic of main streams, and the so called 'traffic constraint function' as the traffic descriptor. This approach is based on the 'Network Calculus' theory, which studies the deterministic ser vice guarantees in a communication network. We develop efficient algorithms for fitting the (cr, p) model to a traffic source. These algorithms are useful in any application that employs the (<j, p) model, and are part of the novel contributions of this chapter. Finally, we design an admission control scheme for the deterministic service class. This admission control scheme is the most important novel contribution of this chapter. In Chapter 3, we design the stochastic service class. The approach taken is based on the recently introduced theory of 'Effective Bandwidth.' We develop a new physical interpretation of the effective bandwidth concept based on a data buffering model and the large deviation principles concept. Then, we design an efficient algorithm for admission control of the stochastic service class using the effective bandwidth concept. The two important contributions of this chapter are the physical interpretation of the effective bandwidth concept, and the admission control scheme for the stochastic service class. In Chapter 4, we address the traffic modelling problem for the stochastic service class. We examine different traffic modelling approaches for stochastic modelling of main streams, and select the family of Markovian signal models for modelling the data traffic in the proposed ITV application. We show that the 'Hidden Semi-Markov Models' capture the characteristics of digital TV traffic better than other Markovian models; hence, we employ this model. This line of development is one of the contributions of this chapter. Then, we present a new signal model for hidden semi-Markov models, and present novel methods for parameter estimation of this new signal model for both the off-15 line and online cases. This new signal model and its parameter identification algorithms are the most important contributions of this chapter, and are useful in other applications which employ hidden semi-Markov models. Finally, we show how the effective bandwidth of a source is obtained from the parameters of a hidden semi-Markov model. This line of development is also a part of contributions of this chapter. We design a conceptual transmitting system for the proposed interactive TV system in Chapter 5. We discuss the role and importance of 'packet scheduling policy,' and present a scheduling algorithm for multiplexing of main and incidental streams data. Even though this chapter does not include any major contributions, however, it shows how the deterministic and stochastic service class concepts are implemented and integrated together in an actual system. Finally, in Chapter 6, we present the thesis conclusion. We highlight the thesis contributions, and discuss the future research direction in this field as well. 16 Chapter 2 Deterministic Service Class Things which matter most should never be at the mercy of things which matter least. Goethe Overview A scheme needed for the deterministic service class is presented. This includes traffic modelling for main streams, traffic model fitting, traffic descriptor and an admission control scheme. Using the methods presented in this chapter, one can determine the maximum waiting time for an incidental stream with a given rate. 2.1 Introduction In this chapter, we present the scheme needed for implementing the deter ministic service class. For that, we need to forecast the minimum amount of bandwidth not occupied by the main streams. The approach taken is based on using a model to forecast the maximum data flow of each main video source. This model is referred to as 'traffic model', and the parameters which describe the maximum data flow of the source are referred to as 'traffic descriptors.' These traffic descriptors are then used in the admission control mechanism to obtain the rate and maximum waiting time of incidental streams. Our approach to this service class studies the transmission system in a worst-case condition scenario. In this scenario, all the main sources send the maximum possible traffic to the transmission system for long periods of time, 17 thus, during these times the bandwidth available for incidental streams is min imal. Using this minimum bandwidth knowledge, the rate and the maximum waiting time of incidental streams are then determined by the admission con trol mechanism. This ensures that even in worst-case conditions, there exists sufficient bandwidth to transmit the incidental data. The traffic descriptor of the main streams is used to specify the maxi mum traffic that the main sources can generate in any time period of length t. This traffic descriptor is known as the 'traffic constraint function' in the literature, and is defined in Section 2.2. This approach also requires that the traffic model captures the worst-case burstiness of a traffic source. That is, traffic models which capture the statistical properties of traffic are not needed here. The rest of this chapter is organized as follows. In Section 2.2, we discuss the current deterministic traffic models, and select the most suitable model for our application. The model we employ is called the (cr, p) model. We also show how the traffic descriptor is obtained from this model. In Section 2.3, we address the issues that arise in fitting the (a, p) model to empirical traffic traces, and present efficient solutions for these issues. In Section 2.4, we present an admission control scheme. In Section 2.5, we present numerical results of applying the methods presented in this chapter to actual traffic sources. 2.2 Traffic Characterization In this section we find a mathematical model for characterizing the traffic of TV video sources (i.e., the main streams). As described earlier, the term 'traffic of a video source' refers to the amount of data bits generated by the video encoder. The mathematical model that we seek should provide a deterministic bound on the amount of traffic a video source generates in any time interval. The important features of a good traffic characterization model are thus 1) accuracy in characterizing the traffic, 2) simplicity in implementation, and 3) ability to capture the useful characteristics of traffic in different time scales. For example, though the peak-to-average ratio of the bitrate of a source can roughly show how the burstiness of the source looks like in a large time-scale, it does not incorporate any information about the burstiness of the source on short time-scales. Therefore, it cannot be used in the design of an efficient 18 admission test. Although there is a great deal of related work on traffic characterization, much of this work cannot be applied to our problem. Most of these methods characterize the video sources using sophisticated stochastic models, such as Markov models [6-8], autoregressive [9,10], self-similar [11,12] and S-BIND [13,14] models. These approaches are all stochastic and.do not provide a deterministic bound on the traffic. The problem of deterministic characterization of video has been studied for other applications using different approaches [6,15-18]. These approaches study video bitrate variability at the frame level. Since none of these ap proaches address the rate variability of full screen video of TV programs at the scene level, the results of these studies are not valid for the proposed ITV application. In this section, we show how a deterministic traffic characterization is defined. Then we discuss the current parameterized deterministic traffic models, evaluate these models, and select a suitable model for the proposed ITV application. 2.2.1 Deterministic Traffic Characterization We describe the traffic by means of a cumulative function defined as the amount of data (e.g., number of bits or packets) generated by the source in the time interval [0,t]. This functions is called the cumulative traffic function, and is defined as A(t)= fy{T)dT (2.1) Jo where y(r) is the bitrate of the source at time r. We use discrete time, where the time parameter t corresponds to an integer number representing the GOP1 number in the video sequence. Therefore, in our application y(r) denotes the number of bits generated by a video source during the rth GOP (i.e.. y(r) is the size of the rih GOP). In this case, 2.1 becomes t A(t) = J^y(r), Vt>0 (2.2) r=0 1 Group Of Pictures (GOP) in MPEG terminology is the group of frames between two consecutive / frames. The GOP length in most NTSC TV programs is 15. Therefore, with a frame rate of 30 fps, each GOP corresponds to 15/30 = .5 seconds. 19 The traffic characterization of a source is obtained by defining the traffic constraint function A*(t), that defines an upper bound on the amount of traffic generated over any time interval of length t. Thus A*(t) > A(s + t) -A(s) Vs>0 (2.3) Note that the traffic constraint function A*(t) does not depend on s and hence provides a time-invariant bound for the function A. The traffic constraint function is always wide-sense increasing (i.e., A*(t) < A*(t + r) for r > 0 ). As discussed in [19] and [20], A*(t) defines a meaningful constraint only if it is subadditive, which means that A*(t + s) < A*(t) + A*(s) for all s, t > 0. If A*(t) is not subadditive, it can be replaced by its 'subadditive closure' [21]. The subadditive closure of a function f(t) is the function f'(t) defined with the following recursive equation /'(0) = 0, f'(t) = min [/(*), min [f'(s) + f'(t - s)}} , t > 0. (2.4) 0<.s<t Empirical Envelope The empirical envelope is the tightest traffic constraint function of a source and is defined as E(t) = m&x{A(s + t) - A{s)} Vs>0,Vi>0 (2.5) s The empirical envelope indicates the maximum burst size that a source gen erates in any time interval of length t. The shape of the empirical envelope function carries important information about the burstiness of the source in the worst-case conditions. For example, Figure 2.1 shows E(t) for a constant bitrate (CBR) and a typical variable bitrate (VBR) source. For the CBR source with rate R, the empirical envelope is a linear function of t with slope R, i.e., E(t) — Rt. For a VBR source with maximum bitrate R, E(t) is a con cave non-decreasing function. For a small t, E(t) of a VBR source is typically very close to Rt. Traffic characterization of multiplexed sources Consider an ideal multiplexer with N input video sources. An ideal multiplexer does not delay the incoming traffic and generates a multiplexed stream of all 20 Figure 2.1: E(t) for a CBR and a typical VBR source. the input streams. The instantaneous rate of the multiplexed stream is the aggregate instantaneous rate of the input streams. If the N input streams are each characterized by the traffic constraint functions A*(i), i = 1, 2,.... N, then the multiplexed stream has the traffic constraint function A*mux such that 2.2.2 Parameterized Traffic Characterization Models In order to use the concept of the traffic constraint function in a practical system, it is necessary to represent the function A*(t) with a parameterized model. Such a parameterized model would significantly facilitate the design of the admission tests of the incidental streams. Besides, by using a parame terized model, the sources can efficiently specify their traffic characteristics to the system, as only a few parameters need to be conveyed. As discussed above, the criteria used to evaluate a traffic characteriza tion model are accuracy, simplicity and efficiency of the model in capturing meaningful information about the burstiness of the sources. From the perspec tive of bandwidth provisioning, the model should be accurate. This means that A*(t) should be as tight as possible, so that we do not overestimate the traf fic of the source. Since the empirical envelope is the tightest bound for the traffic of a source, it is used as a benchmark for the accuracy of a traffic con straint function. While in general a model with more parameters can achieve a more accurate traffic constraint function, the additional parameterizations cause an increase in the complexity of the traffic model. Thus, the selection N (2.6) i=i 21 of an appropriate model must involve a compromise between accuracy and simplicity. There are five important parameterized deterministic traffic models studied in the literature, known as the peak rate model, the (r,T) model, the the (a, p) model, the (a, p) model and D-BIND model. The peak rate method The peak rate model is the simplest and the most widely used model of all traffic models. In this model, the traffic is described with only one parameter, the peak rate Rmax- The traffic constraint function for this model is given by A*(t) = Rmax x t for all t. The model is usually enforced for video sources by the rate control section of the source encoder. Note that the peak-rate model is appropriate for specifying CBR traffic, but will overestimate the traffic of VBR sources. This is illustrated in Figure 2.2-a, where the empirical envelope, E(t), and the peak-rate model traffic constraint function, A*(t), are shown for a VBR source. As shown, the A*(t) is not a good model for E(t) and overestimates the traffic for long time-intervals (i.e., for large fs). The (r, T) model The (r, T) model describes the traffic with a rate param eter r and a framing interval T. In this model, time is partitioned into frames of length T and the traffic generated during each frame interval is limited to rT bits. Thus, this model enforces an average rate r, while allowing for moderate bursts. The traffic constraint function for this model is given by: A*(t) = (\t/T] + l)r,Vt > 0, which is illustrated in Figure 2.2-b. This model is most suitable for VBR sources with small fluctuations in their bitrate. If a VBR source has large fluctuations in its bitrate, then in order to capture all changes in the traffic a large framing interval T must be selected. However, selecting a large T usually results in overestimation of the traffic. The (a, p) model The (a, p) model describes the traffic with a burst parame ter a and a rate parameter p [15,16]. In this model, the traffic constraint function is A*(t) = a + pt. Hence, this model enforces a rate p, while al lowing some burstiness up to a. Figure 2.2-c shows the traffic constraint function A*(t) for this model. Though this model is very simple, it has been successfully used in efficient characterizing of a large class of traffic 22 sources. This model can easily be implemented by using a leaky bucket traffic shaper2. Because of these attractive features, this model has been widely used in many traffic engineering applications. The (<T, p) model The (a, p) model is a generalization of the (a, p) model. A (o,p) model consists of a set of m (ai, pi) pairs, 1 < i < m. The traffic is limited by each (oi,pi) pair, i.e.,: A*(t) = mm(al + Plt), Vi > 0 (2.7) i Figure 2.2-d illustrates the A*(t) function for this model with m = 3 piece-wise linear segments. As shown, the traffic constraint function for this model consists of m piecewise-linear segments. By increasing the number of (cr, p) pairs m, the model results in a tighter and more accurate constraint function for the traffic. This is illustrated in Figure 2.3, where the (a,p) model for a source is plotted for rn =1,2 and 3. As shown by increasing TO, the traffic constraint function A*(t) gets closer to the empirical envelope of the source. However, practical considerations, such as implementation complexity, limit the size of the model, rn. Therefore, there is a tradeoff between the accuracy of the model (which usually requires large m) and the simplicity of the model. The D-BIND model The D-BIND traffic model is a general traffic model that uses a number of rate-interval pairs {(Ri,Ii)\i — 1,... ,n} [24,25] The maximum rate over any interval of length Ii is restricted to Ri for all pairs i. The traffic constraint function is given as follows: A*(t) = Ri\~ (t - Ii) + IUU for all h_x < t < h (2.8) The traffic constraint function of the D-BIND model thus consists of n piece-wise linear segments as shown in Figure 2.2-e. Note that the (a, p) model can be viewed as a special case of the D-BIND model since the (a, p) model defines an n segment concave piece-wise linear constraint function. It should be noted that the traffic constraint function of the D-BIND model in some instances may not be subadditive. 2Efficient implementation of the leaky bucket mechanism is discussed in [22,23]. 23 'h h h n (e) D-BIND model Figure 2.2: Traffic constraint function for different traffic models. 24 (c) m — 3 Figure 2.3: A*(t) for (B,p) model for m = 1,2 and 3. Several studies have evaluated these deterministic traffic models for modelling the video traffic3 with respect to accuracy and simplicity criteria (see [26,27] and [17]). In these studies, the simplicity of the models are eval uated based on the complexity of implementation of the admission control tests and the traffic monitoring and policing4 mechanisms. Meanwhile, it is shown in [27] how the parameters of each model can be expressed in terms of the other models. This enables a direct comparison of these models. All 3In all of these studies, the traffic video source is considered at frame level or ATM cell level. 4 In traffic monitoring, the traffic of the source is monitored in real time to make sure that it complies with its traffic characterization model. If the real traffic does not comply with the model, the traffic shape is enforced to follow the model by a mechanism called traffic policing. 25 these studies indicate that the (a, p) model is superior to the peak-rate and (r, T) models. However, they also show that the use of a single (cr, p) model cannot usually achieve an acceptable accuracy for most of the applications. It is shown in [17] that the (cr, p) model which employs multiple (cr, p) models can accurately characterize the VBR video. We employ the (cr, p) model, as it is known to be simple and accurate for modelling VBR video [17,26,27]. In order to evaluate the (5, p) model with respect to its ability to capture useful characteristics of traffic for our application, we studied the empirical envelopes of several typical TV programs. Figure 2.4 shows E(t) for two typical TV programs, a movie and a news program. The results show that E(t) is a concave increasing function, with two expected characteristics. 1) For very small i's, Eit) is almost linear, that is E(t) w Rmaxt, where Rmax is the maximum rate of the source. 2) For very large t's, E(t) is also almost linear with the slope dE(t)/dt sa Ravg, where Ravg is the average rate of the source. For other t's, E{t) is a concave decreasing function. In order to capture these two important characteristics of the E(t), we select the first (a,p) pair of the model as 0\ = 0 and p\ = R,nax. In addition, the rate parameter of the last (o~,p) pair is set to p„L — Ravg. This selection captures the two important characteristics of E(t) in our model. The other (cr, p) pairs' parameters should be found so that A*(t) models the concave section of E(t). 2.3 Fitting the (<r, p) Model to a Source We here address how to construct an accurate (a, p) model for a traffic source, specifically a video source, using a few (cr, p) pairs and a reasonable amount of computational effort. Finding an efficient and practical way of constructing a model, which offers the right trade off between accuracy, size and compu tational effort, is a real challenge. Such a model should be accurate in order to achieve high bandwidth utilization, and should include as few (cr, p) pairs as possible, so that it can be used in practical admission control schemes [27]. Computation time is very important for online traffic sources, where the (a, p) model should be constructed by monitoring samples from an online traffic source. There are many methods for selecting the (a. p) model parameters whose design objective is not to strive for high bandwidth utilization, but 26 E(t) E(t) 200 600, 400 500 300 100 (b) "Mission Impossible" movie 100 200 300 400 500 600 700 800 900 t Figure 2.4: E{t) for two typical TV programs. The traffic is normalized to its maximum rate, i.e., the maximum rate of these sources is 1. instead to select the traffic parameters according to bandwidth availability. These methods include Dual leaky bucket, fixed burst [28], concave hull [17], product [29] and maximum distance [17]. Characterization of a VBR traffic source with the goal of achieving high bandwidth utilization and use of the (a,p) model is studied in [1,17,30]. The benchmark in these methods for evaluating the accuracy of a traffic source is the empirical envelope of the source. Since E(i) is the most accurate traffic constraint function for a source, the approach of these methods is to first construct E(t), and then construct A*(t) as an approximation of E(t). However, construction of E(t) for all t is extremely time consuming and is not practical in real time applications. Existing methods use extrapolation techniques to reduce the computational load in finding E(t) for large time intervals (i.e., large t's), a process that in turn results in rough estimates of E(t). Moreover, finding the model parameters from the empirical envelope is also a challenge. Existing methods are based on a 'brute force' search approach, which is extremely time consuming. In this section, we present a new algorithm for constructing the em pirical envelope of a source. We show that our method results in a better approximation of E(t), when compared to existing methods. Moreover, due to 27 its speed and iterative design, our method can easily be employed for on-line traffic sources, where the source traffic is not known a-priori and the speed of the traffic characterization algorithm is important. We also present a unique and robust algorithm for obtaining the op timum model parameters from E(t). Since E(t) is the most accurate con straint function for a source, we select the (a, p) model parameters so that A*(t) > E(t) for all t, where A*(t) = minifa + pit) and A*(t) is as close to E(t) as possible. We use the 'divide-and-conquer' approach in our algorithm and set up the problem such that a powerful optimization method, called Sequential Quadratic Programming, can be applied to the problem. Unlike other methods in the literature, our method finds the model parameters di rectly from the true or sub-sampled E(t). In addition, our method is faster and more robust than the current methods and results in a near optimum model. In Section 2.3.1, we review previous works related to fitting a (5,p) model to a traffic source. We present our method for constructing the empirical envelope in Section 2.3.2. In Section 2.3.3, we present our method for obtaining the (a, p) model parameters from the empirical envelope. 2.3.1 Overview of Current Methods for Fitting the (a, p) Model to a Traffic Source As mentioned above, all current methods are based on the two counterparts. First, constructing the empirical envelope, E(t), and second, finding the (5,p) parameters from E(t). We now discuss current methods for each part. Constructing E(t) from the traffic •In [17], the empirical envelope E(t) is obtained by running an exhaustive search and finding the maximum burst size in the entire stream. More precisely, if the instantaneous traffic rate of a source is given by y(i) and the total length of the source is N, then E[t) is constructed by calculating: The drawback of this approach is its extensive computational complexity. In order to compute E(t) for 1 < t < N, 0(N2) operations is required. In k+t-l (2.9) 28 addition, N is generally very large, specially for video traffic sources (e.g., N « 105 for a 2 hour movie when the traffic is considered at the frame level). Hence, for the majority of video sources, it is not practical to compute E[t) using equation 2.9. There are methods in the literature that strive to reduce this computational complexity [1,30]. In [1,30] E(t) is approximated using extrapolation. That is, E(t) for 1 < t < r is computed using equation 2.9, and E(t) for t > T, denoted by ET(t), is extrapolated from E(t), 1 < t < T. The extrapolation method used is either the "largest subadditive closure" with the computational complexity of 0(N2) (which is the same complexity if E(t) was constructed using equation 2.9), where ET{t)= mm {E(k) + E(t-k)} for t > T (2.10) or the "repetition extrapolation" with the computational complexity of 0(TN), where E(t) for t > r is obtained by simply repeating the first r values in the envelope (i.e., E(t), 1 < t < r) ET{t) = l-\E(T) + E(t-[-\T) for t>r (2.11) T r The parameter r is experimentally selected for each application. The disad vantage of this extrapolation approach is that it results in high utilization only for the small maximum waiting times Tw. For large Tw's, the traffic character ization based on ET(t) results in a poor network utilization. This is because for large t's, ET(t) is not a good approximation of E(t). This is illustrated in Figure 2.5, where E(t) and ET(t) for a typical video source are shown. We observe that ET(t) is not a good approximation for E(t) for large t's. Finding the model parameters from E(t) Once E(t) (or ET(t)) is found, A*(t) is constructed as a piece-wise linear ap proximation of E(t). The current approach includes two steps. First, since E(t) is not necessarily concave and sub-additive, it is replaced with the con cave hull of E(t), denoted by H(E) [17,18]. H(E) is the smallest piece-wise linear concave function larger than E [31]. Theoretically, H(E) can be used as A*. However, the number of piece-wise linear segments in Ti(E) is usually very large, resulting in an impractical large model size. For this reason, Ti{E) is replaced with another piece-wise linear function that has only a few linear seg ments. The idea behind this approach is to use a cost function to measure the 29 Figure-2.5: E(t) obtained from the direct approach, largest subadditive closure extrapolation, repetition extrapolation for r = 400 and sampling with 5 = 400. The source is a VBR MPEG-2 sequence, selected from the motion picture 'Mission Impossible' with 720 x 480 resolution, frame rate 30 and maximum bitrate of 5 Mbps. difference between H(E) and the traffic constraint function of the new model (say A*).where the model size n is small enough. Then an algorithm with a heuristic approach is used to find the (cr, p) parameters of the A*n model. In this algorithm, each (o~i,Pi) pair is updated in each iteration through an exhaustive search through all the possible values for each (ai, p^ pair to find the one that minimizes the cost function (see [1,30]). The major drawbacks of this method are: 1) the computation of the convex hull of E and the heuristic approach to reduce the model size are both computationally expensive, 2) the heuristic method to reduce the model size is not always guaranteed to converge to a result, and 3) this method might converge to the local maximum of the cost function, thus it does not guarantee that the optimum parameters are found. 30 2.3.2 Our Approach to Constructing the Empirical En velope Here, we present a method that finds the exact values of E(t) for all 1 < t < r. For t > T, we find E(t) at equally 5 spaced samples t = 5, 25, ...,n5 (see Figure 2.5). The sampling interval 5 is a positive integer. If 5 < r, then the samples at t = id for which iS < r are repeating and need not to be computed again. For simplicity, we present our algorithm for constructing E(t) for t < T and for samples of E(t) separately in sections 2.3.2 and 2.3.2, respectively. The number of operations to construct E(t) for 1 < t < r and the equally spaced samples of E(t) are 0(TN) and 0(nN) respectively. Construction of E(t) for t < r Assume that the total number of bits generated by the source, y[i) for i = 1,2,... ,N are given, where N is the total length of the source. In our case, N is the total number of GOP's in the whole video sequence. We like to find E(t) for t = 1, 2,..., T, where r < N. E{t) is given by [17] The objective is to construct sk for A; = 1,2,..., AT. The empirical envelope E(i) is computed as the max(sfc('i)). Sfc is easily constructed from sfc_i by shifting elements of Sfc_i down and adding y(k) to the result. Our algorithm consists of the following steps: 1. Let k = 1 and initialize sx and e as follows k+t-l (2.12) We define vector sfc of size 1 x T as Sfc = y(k) ... Etfc-i E,t, (2.13) [y(i) o ••• < TXl (2.14) (2.15) e = Si 31 2. Let k = k + 1. Find Sfc and e using Sfc — (shift elements of Sk-i down by one) + [y(k) 0 ••• 0]T (2.16) (2.17) e — max(e, Sfc) 3. Repeat step 2 until the last input is reached, i.e, k — N. When the algorithm finishes, e is the empirical envelope of the source e(i) = E(i) for 1 < i < T. The computational complexity of our algorithm is only (D(TN), which is considerably less than 0(TN2) for a brute force approach using equation 2.12. Due to the iterative structure of our method, it can easily be adopted in on-line applications, where the source traffic y[i) is not known a priori. Construction of samples of E(t) for t — 5,25,..., n5 Given the traffic source y(i), we like to compute E(t) for t — 5,25,... ,n5, where n is the number of samples to be computed. Let Then using equation 2.12, we have E(i5) = maxfc Afe(i). Our goal here is to construct Afc(i) in an efficient way rather than computing Yl^k-iS+i 2/(0 f°r all 1 < k < N and 1 < % < n. Our algorithm iteratively constructs Ak, k — 1,2,... ,N and 1 < i < n. The key idea of our algorithm is that for each new k, we efficiently re-use some -pre-computed terms to construct A^ii). By doing so, our algorithm reduces drastically the number of operations required to construct each Ak{i). First, we define the vector Ak of size n x 1,. where Afe(i) is the sum of i6 consecutive input ending by y(k) as defined in equation 2.18. It can be easily shown that fc (2.18) l=k-iS+\ fc-(i-l)(5 Ak(t) = Ak{i -1) + y(0 ;=fc-i<5+l = Ak{i - 1) + Afc_(i_1)5(l) for i > 1 (2.19) 32 Our algorithm relies on equation 2.19 to iteratively construct A*, for k — 1, 2,..., N. Equation 2.19 requires Afc(l), which is already computed in previous iterations. Hence, we save the first element of A*, in each iteration for future use. For this purpose, we use a vector A of size ((n — 1)5 + 1) x 1, where Afc(l) is pushed into A in each iteration. In kth iteration, we have A(i) — Afc_i+1(l) and equation 2.19 becomes Afc(i) = Afc(i-l) + A((i-l)<5+l) for i > 1 (2.20) The empirical envelope samples, i.e., E(t) at t — iS, are easily obtained as maxfc Afc(i). The algorithm is summarized as follows. 1. Let k — 1. Initialize n x 1 vector A, n x 1 vector E$ and ((n— 1)5+1) x 1 vector A as : Es = A=[y{l) y(l) ... y(l)]T (2.21) f A(l) for i = 1 , AU) = { K ' 2.22 w I 0 for 1< % < (n - 1)5 + 1 2. Let k = k + 1. Update A(l) as A(l) = A(l) + y(k)-y(k-5) (2.23) Note that if this algorithm is executed in parallel with the algorithm presented in 2.3.2 and 5 < r, then we have A(l) = sk(5). Hence, this step of the algorithm can be ignored. 3. Update vector A f A(l) for i = 1 A(i) = \ W 2.24 W I A{i - 1) for 1< i < (n - 1)5 + 1 4. Update A(i) for i > 1 using equation 2.20. 5. Update Es — max(Eg, A). Steps 2 to 5 of the algorithm are repeated for k — 2, 3,..., Ar. When the algorithm finishes, we have E(i5) = Es(i), 1 < i < n. 33 Algorithm 1 Find the (a, p) model parameters. INPUTS: E(t) for t = 1, 2, . . . , N; a criteria to end the algorithm (i.e., the size of the (a, p) model n, or the maximum acceptable error in the model). OUTPUT: ai and p* for 1 < i < n. 1: Fit a single (a, p) to E(t) for t = 1, 2, . . . , N 2: A*(t) = a + pt 3: while A* (t) does not satisfy the accuracy criteria or the model has not reached its maximum size do 4: Find [Ti,T2] such that for all t G [Ti,T2], A* (t) overestimates E(t) more than a threshold 5: Fit a single (cr, p) to E(t) for t G [TI, T2] 6: Add the new (a, p) to the model 7: A* (t) = mini((Ti + pit) 8: end while Figure 2.6: Pseudo-code of our algorithm for obtaining the (a, p) model pa rameters from E(t). 2.3.3 Obtaining the (cf, p) Parameters from the Empiri cal Envelope Our method for finding the (a, p) model parameters from E(t) follows a divide-and-conquer approach [31]. In this approach, the problem is broken into sub-problems which are similar to the original problem but smaller in size, the subproblems are solved, and then the results are combined to create the solu tion to the original problem. Following this technique, we divide our problem to the subproblems of fitting a single (a, p) to the E(t) for a specific range of t, let us say t G [7i,T2]. This subproblem has a smaller size than the original problem and is easier to solve. In Section 2.3.3 we describe how we solve this subproblem, and how all the results are combined to obtain the final (a,p). Suppose E(t), for t — 1, 2,..., N, is given. Our algorithm first fits a single (cr, p) model to the whole input data. That is, we find a and p so that a + pt > E(t), and a + pt is a good approximation for E(t), for t = 1, 2,..., N. Then we proceed by reducing our problem to a subproblem of smaller size. For this, we first select the interval [TX,T2] such that A*(t) is not a satisfactory estimate for E(t) for all t G [Ti,T2]. For example, we find a [Ti,T2] such that 34 {A*(t) - E(t))/E(t) is greater than a threshold for all t € [T1;T2]. If there are more than one such interval, we select the one that A*(t) is the worst estimate for E(t). The same approach is then used, and a single (o~,p) is fitted to E(t) for t G [T1; T2]. This will add a new (cr, p) pair to our model. This procedure of adding a single (a,p) pair to the model is repeated until a criteria for ending the algorithm is met. This criteria depends on the application and is either the maximum number of (a, p) pairs in the model or an accuracy criteria. For example, in some applications the practical considerations may require that the model size does not exceed a certain size, say n. In this case, the algorithm ends when n (cr, p) pairs are added to the constructed model. On the other hand, some applications may require that a certain level of accuracy is preserved in the model, e.g., A*(t) does not overestimate E(t) more than a threshold, say p%. In this case, we add (cr, p) pairs to the model until A*(t) satisfies this accuracy criteria. Our sub-problem: fitting a single (cr, p) to E(t) In each iteration of our algorithm, we need to solve the sub-problem of fitting a single (cr, p) to a part of E(t). That is, given E(t), T\, and T2, we should find a and p such that: 1) a + pt > E(i) for all 1 < t'< N; this constraint ensures that the constructed model is concave and does not underestimate E(t) for any t, and 2) 0 + pt is an optimum approximation of E(t) for t € [T\,T2]. In order to measure the closeness between A*(t) and E(t) for t G pi,T2] we use the error function defined as: error{a, p) = ZjlTl (° + Pt ~ E{t)) = (T2 - 7\ + l)(a + ®g&p) - ESr, W) (2-25) The terms ESTJ E(t) and T2 — T\ + 1 do not depend on o or p. Hence, we only need to minimize the function error(a, p) = o + ^~^-p (2.26) with the constraint a + pt > E(t) for all 1 < t < N. This is a classic optimiza tion problem and there are many standard approaches available in literature to solve such a problem [32]. We choose to employ the Sequential Quadratic Programming (SQP) method to solve this problem in our application. SQP 35 t Figure 2.7: Divide and Conquer approach in our algorithm, (a) First step: a single is fitted into the E(t). (b) Second Step: since A*(t) in step one was not a satisfactory estimate of E(t) for [Ti, T2], the (cr2, p2) is fitted to E(t) for t e [Ti,r2]. (c) Third Step: the \<r3, p3) is fitted to E{t) for t G [Ti,T2]. is a robust and state of the art technique for solving optimization problems. For implementation details of this technique see [32,33]. Using the SQP tech nique, we easily minimize the function defined in equation 2.26, and find the optimum a and p to our problem within a few iterations. Note that there are- other approaches to finding the parameters of a single (cr, p) model like the 'product method' [29], 'maximum distance' [34] and 'fixed burst' [28]. These methods use different optimality criterion for selecting the parameters. We should point out that instead of using SQP, any of the above mentioned approaches can be used to solve the subproblem of 36 T=0 T=0 Figure 2.8: The General system S. fitting a single (a, p). 2.4 Admission Control In this section we present a method which finds a bound on the waiting time of the incidental streams data. This method is based on the "Network Calcu lus" theory [19,35,36]. In Section 2.4.1 we discuss the basics of the Network Calculus and describe how the bound on the waiting time is obtained. Then, in Section 2.4.2 we present an algorithm that employs this method and finds J- W • 2.4.1 Network Calculus Basics Consider a general system <S which is viewed as a black box; S receives the input data at the variable rate yin(t) and delivers the data after a variable waiting time at the variable rate y0ut{t)- In the proposed ITV application, S is the multiplexer on the transmitter side, that multiplexes the main and incidental streams. We define the cumulative function of the amount of data input and output to S as Vr > 0,Vi > 0 Vr>0,Vt>0 (2.27) Ain{t) and Aout(t) are called the 'arrival' and 'departure' functions, re spectively [35,36]. The arrival and departure functions for a sample system are T=t Ain{t) = YlVin(T^ T=t Aout(t) = ^Voutir), T=0 37 (a) Continuous time. (b) Discrete time. Figure 2.9: The arrival and departure functions: (a) Continuous time, where Ain and Aout are defined for all t > 0, (b) Discrete time, where Ain and Aout are only defined at discrete times denoted by dots. illustrated in Figure 2.9. From the arrival and departure functions we derive the following two quantities: Backlog: The backlog at time t is the amount of data waiting in the system S at time t and is given by Ain(t) — Aout(t). As shown in Figure 2.9, the backlog at ti is simply the vertical distance between Am{tx) and Aout(t2)-Waiting time (or delay): The waiting time at time t is the time that the incoming data at time t will wait in the system S before being served. The waiting time for the data that is input to the system at time t is given by: d(t) = min{r > 0 : Ain(t) < Aout{t + r)} (2.28) The waiting time at t0 is illustrated by d(t0) in Figure 2.9. If the traffic is continuous, then Ain(t0) = Aout(t0 + d(t0)), which means that all the input data to the system up to the time to are served by the time t0 + d(t0). As shown, for continuous traffic the waiting time is simply the horizontal distance between Ain and Aout. Bounds on Waiting Time and Backlog Network Calculus gives computational rules for bounding the waiting time and backlog. Before discussing how these bounds are obtained, we need to define the service curve and the horizontal deviation concepts: 38 Service Curve: Assume Ain(t) and Aout(t) are given functions. We say that the system S offers the input a service curve /? if and only if for all t > 0, there exists some s, 0 < s < t such that Aout(t) - Am(t -s)> B{s) (2.29) where (3{t) > 0 for all t > 0. The service curve is an abstract concept, and indicates the capacity of sys tem in accommodating traffic during a time interval of length s. Roughly speaking, f3(s) is a lower bound on the amount of traffic that can depart from the system during any time interval of length s, that is Aout(t) — Aout(t — s) > P(s). To better understand the physical meaning of f3(s) we write the equa tion 2.29 as Aout{t) > Am{s) + (3(t - 5) (2.30) Then, a more precise physical interpretation of [3 is that if s is the beginning of a busy period, that is the backlog at s is zero (Aout(s) — Ain(s) = 0) and there are always some data waiting in the system in [s,t], then the system will send at least (3(t — s) data units in [s,t]. Horizontal deviation: The horizontal deviation between the arrival and de parture functions denoted by h(Ain, Aonl) is defined as the maximum of all the waiting time values d(t), and mathematically is defined as h(Ain, Aout) — maxd(i) = max{min{r > 0 : Ain(t) < Aoat(t + r)}} (2.31) Now assume the input traffic to the system is characterized by the traffic constraint function A*n(t). This means that for alii > s > 0 (see 2.2) Ain(t) - Ain(s) < Al(t - s) (2.32) Two theorems in the Network Calculus state that the backlog and waiting time in a system are bounded respectively by the vertical and horizontal deviations between the traffic constraint function of the input A*n[t) and the service curve of the system (3(t) (see [19,20,35,36]). Since we are interested in the bound on the waiting time, we only state the theorem that defines a bound on the waiting time: 39 High proirity Stream #1 A* (£) High proirity Stream #2 A* (t| High proirity Stream #N ^jy (*> Low priority stream Low priority buffer (a) Equivalent high priority stream Low priority stream High priority buffer rateC Low priority buffer (b) Figure 2.10: a) The transmitter model with N high priority (main) streams, b) The equivalent model, where the ./V high priority input streams are replaced with one equivalent stream. Theorem 1 Assume a traffic source constrained by A*n(t) traverses a system S that offers the service curve 8(t). The waiting time d(t) for all t satisfies: d(t)<h(A*n,0) [35,36]. For a proof of this theorem, see Appendix 1. 2.4.2 Waiting-Time Bound in the proposed ITV appli cation. We model the multiplexer in our interactive TV application with the model shown in Figure 2.10-a. In this model, the inputs to the system are N main 40 streams and one incidental stream. The main streams have transmission prior ity over the incidental stream, i.e., the multiplexer serves the incidental stream only when the main stream's buffers are empty. The rate of the outgoing chan nel is constant, equal to C packets per second. We assume the traffic of the ith main stream is characterized by the traffic constraint function A*(t). As discussed in 2.2.1, all the main streams can be replaced with one equivalent high-priority stream, which is constrained by A*H(t) where N A*H(t) = J2AW (2-33) This is shown in Figure 2.10-b. We denote the arrival and departure functions for the low-priority input (i.e., the incidental stream) by Ain^(t) and Aout<i(t). Our motivation here is to first find the service curve for the low-priority input. Then, using theorem 1 we will find Tw. Consider an arbitrary time t. Call s < t the beginning of a busy period for the low-priority input, i.e., the backlog for low-priority input at s is zero (Airiii(s) — Aoui]£,(s)) and there is always some low-priority data waiting in the system during [s,t]. During [s,t] the high priority inputs can send up to A*H(t — s) packets to the system. Hence the system will send at least C(t — s) — A*H(t — s) packets of the low-priority input in [s,t]: Aout,L(t) - AouttL{s) > C{t - s) - A*H(t - s) (2.34) Since the backlog at s is zero then we have Aout,L{t) - Ain,L{s) > C(t - s) - A*„{t - s) (2.35) It follows from this equation that the service curve for the low-priority input is 0L(t) = Ct-A*H{t). The traffic constraint function for a constant bitrate incidental stream is given by where R is the rate of the stream. We use theorem lto find the maximum waiting time Tw in the proposed ITV application. That is, we consider a hypothetical system where the arrival function is A*nL(t) and the departure function is 0L(t). Theorem 1 states that Tw, defined as the maximum of d(t), is given by the horizontal deviation between A*nL and Pi, i.e., Tw = h(A*nL, Bi), where and A* L(t) are given in equations 2.35 and 2.36. 41 Algorithm 1 Find Tw INPUTS: (ui,pi) pairs for i = 1,2, ...,m; the incidental stream rate R; the incidental stream duration T and the channel rate C. OUTPUT: Tw 1: for i = 1 to m do 2: ^ = ^i^i ' Pi-Pi-X ti is the abscissa of the intersection of the ith and i — 1th line segments of „ . _ (R-C+pi)xt+tTi i: di — R di is the horizontal distance between the Rxt and PL^I)-4: end for 5: if T ^ oo then 6: Find T such that (3L{T') = RxT 7: dT=V -T 8: else 9: if maXi{C — Pi) < R then 10: dT = oo 11: else 12: dr = 0 13: end if 14: end if 15: Tw = max{dr, maxj{dj}} 16: RETURN Figure 2.11: Pseudo-code of our algorithm to find Tw. 2.4.3 Our Algorithm to Find Tw Our algorithm is presented in Figure 2.11, which uses the method presented in the previous section and finds Tw. The inputs to the algorithm are the m parameter pairs (oi,pi) of the main stream constraint function A*H(t), the channel output rate C, the duration of the incidental stream T and the rate of the incidental stream R. We have (3L{t) = Ct-A*H(t) = Ct - mm{ai + p^} (2.36) i PL(t) = max{-ai + (C - Pi)t} (2.37) i As shown in Figure 2.12, /?£,(£) is a convex piece-wise linear and non-decreasing function. First, the algorithm finds U for i — 1, 2,.., m, where U is the abscissa 42 Figure 2.12: BL{t) and A*L(t). of the intersection of the ith and i — 1th line segments of 3L- In the next step, the algorithm computes the horizontal distance di between A*L and 3Lit) for t = i = 1, 2,.., TO (see Figure 2.12): ^ = i,-^M = + + * (2.38) R R v y If the incidental stream is a video or audio sequence of length T, then the horizontal deviation at t = T is computed and denoted by dr- Otherwise, that is if the incidental stream is a video or audio sequence with an unlimited duration, the algorithm checks if < Rt for very large time intervals (i.e.. for t —> oo). If this condition is not met, it means that the system cannot guarantee any maximum waiting time and d? is set to oo. Finally, Tw is found as the maximum of all dj's and dx-2.5 Results In this section, we present the numerical results of implementing the meth ods presented in this chapter. First, we will evaluate the performance and accuracy of our model fitting methods presented in sections 2.3.2 and 2.3.3 by comparing them with the current methods in literature [1,17]. Then, we will present the results of implementing our admission control scheme using empirical traffic traces from video sequences of typical TV programs. These results demonstrate how the rate R and the maximum waiting-time Tw depend on each other in a typical digital TV system. 43 2.5.1 Numerical Results of 'Model Fitting' methods In our first experiment, we evaluated the performance and accuracy of our model fitting methods presented in sections 2.3.2 and 2.3.3 by comparing them with the current methods in literature [1,17]. The traffic source used in our experiment is an MPEG-2 video stream from the motion picture "Mission Impossible". This video stream was encoded with constant picture quality, with picture resolution 720 x 480, frame rate 30 and maximum bitrate of 4.5 Mbps. The length of this video was about one hour, which corresponds to N « 105 frames. In our simulation, the time parameter is an integer number t € N = {0,1,2,...} that represents the GOP number in the MPEG video stream. Since the frame rate of the video stream is 30 fps and the GOP size is 15, each GOP is 0.5 seconds. Thus, the 25 minutes sequence represents 3000 GOPs. The traffic in our simulation is also discrete and represents the number of packets. We use constant size packets of 184 bytes. This conforms to the digital TV and MPEG-2 standards5. For example, if a source generates 2 MBits in the time interval [0, .5] (i.e., in the first GOP), then the discrete traffic is represented by ?/(l) = cei/[(2xl06)/(184x8)] — 1359 packets. In order to make the results transparent from the maximum bitrate of the source, we normalize the traffic to 1 by dividing y by its maximum. For example, in the previous example, if the maximum bitrate of the source is 4.5 Mbps and GOP-time = .5 seconds, then we have y(l) = jSig1,)! = 1359/1529 ~ .88. In our first experiment, we construct the empirical envelope E(t), and compare the speed of our algorithm with other methods in the literature. We construct E(t) using the direct method [17], the largest sub-additive extrap olation method [1], the repetition extrapolation method [1], and our methods presented in Section 2.3.2. Table 2.1 summarizes the computation time of each method. As shown, our algorithm speed is almost the same as that of the repetition extrapolation approach, and both are considerably faster than the direct method. However, the extrapolation approaches do not estimate E(t) for t > T closely, while our method finds the exact samples -of Eit) for t > T (see Figure 2.5). In our second experiment, we fit a (a, p) model to the whole empirical 5The American and the European digital TV standards employ the MPEG-2 transport stream (TS) syntax for the transmission stream. Each packet in MPEG-2 TS syntax contains 184 bytes of data payload plus 4 bytes header. 44 Computation Method Execution time (seconds) Direct 161.27 approach Repetition extrapolation 17.28 r = 200 Sub-additive extrapolation 150.48 T = 200 Our sampling method 23.60 T = 200. delta = 50 Our sampling method 16.24 T = 200, delta = 100 Our sampling method 7.89 T = 200, delta = 200 Table 2.1: Execution time in seconds for calculating E[t) for 1 < t < 10000. Simulations were run on a PC with pentium IV processor at 1.7 GHz, using Matlab implementation. envelope E(t) using our method presented in Section 2.3.3, and the heuristic method presented in [1]. We computed the error function ^2^=1 A ^^^^ as a metric for the accuracy of each model. Table 2.2 summarizes the results. As shown, our method results in a more accurate model for the source than the method in [1]. In order to evaluate the effect of our sampling approach on the accuracy of the constructed (cr, p) model, we fit a (cr, p) model to the empirical envelope constructed by our sampling approach. Table 2.3 summarizes the results. As shown, the model parameters and the error function for the models constructed from the samples of E(t) are fairly close to the models constructed from the whole E(t). In our next experiment, we evaluate the accuracy of our method with respect to achieving high bandwidth utilizations. For this purpose, we use a metric that determines how closely a particular model A*(t) approximates E(t) with respect to bandwidth utilization [1]. We consider a single FCFS multiplexer with a switch that operates at r — 155 Mbps. We assume that all the input traffic sources connected to this switch are from the same source, which are all characterized by A*(t). We also assume that all the sources have an identical delay bound d. Assuming n sources are connected to this 45 our method method in [1] 0 1.0000 53.4927 0.3805 11.4585 0.4622 2.9396 0.5837 1.7966 0.6573 0 1.0000 50.1352 0.3063 13.4325 0.3293 3.5398 0.5644 1.2903 0.6888 Error 65.47 93.96 Table 2.2: Comparison of our method with the method in [1] for finding the (cr, p) parameters. E(t) constructed for all 1 < t < N. entire envelope 6 = 100 5 = 400 0 1.0000 53.4927 0.3805 11.4585 0.4622 2.9396 0.5837 1.7966 0.6573 0 1.0000 53.4511 0.3805 12.6243 0.4546 4.4318 0.5515 2.3172 0.6117 0 1.0000 90.5083 0.3567 40.9453 0.3961 13.9549 0.4475 6.6479 0.5189 Error 65.47 76.74 84.12 Table 2.3: The model parameters constructed from the entire envelope (second column) and from E(t) for 1 < t < r and samples of E(t) at t ,= i5 for 1 < i < N. T = 200, n<5 = N. switch. Then, as discussed in [15], these sources are supported by this FCFS multiplexer without maximum waiting time violation if and only if d>J2 A*(t) -txr, Vi > 0 (2.39) i=l We define the 'utilization ratio', U(A*,d), as the number of admissible sources using A*(t) to the number of admissible sources using the empirical enve lope E{t) at maximum waiting time d. Particularly, U(A*,d) is the maxi mum n that satisfies equation 2.39, divided by the maximum m that satisfies d > YZi E(l) -txr for Vi > 0. U(A*,d) shows how closely A*(t) approx imates E(t) with respect to bandwidth utilization. An ideal model, which admits the same number of streams as the empirical envelope, results in the constant U(A*,d) = 1. Figure 2.13 shows U(A*,d) for a (o,p) model of size 4, constructed using our method and the method presented in [1]. As shown, our method results in a higher utilization ratio than that of the method in [1]. Figure 2.13 also shows the utilization ratio for a (cr, p) model constructed from samples of E(t). As shown, the utilization ratio for the model constructed 46 n B4|-U(A ,d), exact value of E(t) is computed for all t, and model parameters obtained using our method U(A ,d), exact value of E(t) is computed for all t, and model parameters obtained using Liebeherr's method — U(A ,d), E(t) constructed by repetition extrapolation, x=200, and model parameters obtained using our method — U(A ,d), E(t) constructed by sampling, x=200, 5=100, n=30, and model parameters obtained using our method oe I I I I I i I i I i 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 d Figure 2.13: Utilization ratio, U(A*,d). from samples of the envelope is very close to the utilization ratio for the model constructed from the entire envelope. This means that using some samples of E(t) are sufficient to construct an accurate (cr, p) model and the computation of the entire envelope is not required. We also study the utilization ratio curve for different parameters S and T. In practical applications, 5 and r should be selected such that the utilization ratio is close to one for the selected delay bound d, and the computation time is reasonable for the application. Based on this experiment, we suggest 100 < r < 300, and 5 = r. This selection results in a reasonable utilization ratio for almost all maximum waiting times d. 2.5.2 Numerical Results of 'Admission Control Mecha nism' In this section, we present the results of applying our admission control method to a typical digital TV programs. All the video sequences used as the main 47 Sequence Name Source type 1 Mission Impossible Action movie 2 Muppets Children TV show 3 News News show 4 Talk Show Oprah Winfrey Show 4 Documentary Documentary 5 Court Show Judge Judy show 6 Muppets show Sesame St. show 7 Soap opera Days of our lives 8 Cartoon Tigger movie Table 2.4: Video sequences used in our study. Compression Standard MPEG-2 Resolution 720 x 480 Rate 4.50 Mbps Frame rate 30 fps GOP size 15 Number of P frames in each GOP 4 Number of B frames in each GOP 10 Table 2.5: Encoding parameters for the video streams used in our study. video streams (see table 2.4) were selected from typical TV programs. These video sequences were encoded with constant picture quality, and with a maxi mum bitrate of 4.5 Mbps. The picture quality of these videos was subjectively selected to be at a satisfactory level for TV applications. The length of every sequence used was 25 minutes. Table 2.5 summarizes the encoding parameters of the video sequences used in our simulation. By studying the empirical envelopes of the main video sequences, we observed that with m. = 5 (CT, p) pairs one can accurately model the video of most TV programs. Therefore, we select m = 5 in our application. Figures 2.14 and 2.15 show the empirical envelopes and the fitted (a.p) models for the traffic sources. Table 2.6 shows the numerical value of the (<?, p) model parameters for each source. As shown, E(t) is an increasing function where E(t)/t « 1 for small £'s, and dE[t)/dt is approximately the average rate of the source for large fs. E(t) drops faster for large fs in video sources with simple content than video sources with active and complex content. The fitted (a, p) 48 model with five (cr, p) pairs is also shown in these figures. The (a, p) pairs are selected such that (cri,pi) — (0,1) and the rate parameter of the last (a, p) pair (i.e., p$) is the average rate of the source. As shown, the model can approximate the E(t) very well with only a few (cr,p) pairs in the (&,p) model. 49 50 Sequence Name (a, p) parameters 1 Mission Impossible "60.5602 0.6376" 20.1277 0.7025 11.3654 0.7332 7.8706 0.7615 0 1.0000_ 2 News 100.9816 0.5018" 30.4283 0.5727 12.9072 0.6938 2.1920 0.8130 0 1.0000 3 Talk Show "27.7782 0.4642" 11.0175 0.4860 4.9481 0.5433 1.7709 0.6584 0 1.0000 4 Documentary 103.1472 0.6157 53.8328 0.6629 20.7147 0.7503 4.5189 0.8782 0 1.0000 5 Court Show "16.7626 0.4403" 5.8024 0.6030 2.3032 0.7112 0.6417 0.8298 0 1.0000_ 6 Muppets show "40.4100 0.8688" 36.5059 0.8703 34.3194 0.8740 3.8070 0.9445 0 1.0000_ 7 Soap opera "53.4927 0.3805" 11.4585 0.4622 2.9396 0.5837 1.7966 0.6573 0 1.0000 8 Cartoon "23.4084 0.8482" 8.3675 0.8680 4.6805 0.8811 0.6067 0.9556 0 1.0000_ Table 2.6: Numerical values of (<r, p) model parameters for the main video sequences used in our simulation. 51 Simulation parameter set I (cable medium) Simulation parameter set II (terrestrial medium) Transmission rate 19.8 Mbps 39.8 Mbps Number of TV programs sharing the channel, N 4 8 Maximum bitrate assigned to each main video stream 4.5 Mbps 4.5 Mbps Transmission capacity reserved for video streams 18 Mbps 36 Mbps Transmission capacity reserved for audio streams and other ancillary data 1.8 Mbps 3.8 Mbps Main video stream sources 1. Mission Impossible 2. News 3. Talk Show 4. Documentary 1. Mission Impossible 2. News 3. Talk Show 4. Documentary 5. Court Show 6. Muppets show 7. Soap opera 8. Cartoon Table 2.7: Simulation parameters. After fitting a (a, p) model to each video sequence, we conducted an other set of experiments where we considered a system similar to Figure 2.10, which consisted of N main (high-priority) streams and 1 incidental (low-priority) stream. This system simulates the head-end of a digital TV trans mission system. We conducted two experiments using two different simulation parameters, as shown in Table 2.7. The first set of parameters are selected to simulate cable transmission medium, while the second set simulates a terres trial medium6. As noted, no portion of the transmission capacity is reserved for incidental streams. In order to illustrate the relation between R and Tw for the incidental stream, we plotted R versus Tw as shown in Figure 2.16. This graph is inter esting as it provides exemplary numerical values for the rate of an incidental stream in a typical digital TV transmission system. As we expected, R is an 6A 6 MHz channel in the cable medium is capable of delivering digital data at the 19.8 Mbps rate. This capacity is usually shared by 4 or 5 TV programs. In terrestrial medium, a 6 MHz channel is capable of delivering at the 39.8 Mbps rate, which is usually shared by 8 or 9 TV programs. 52 increasing function of TW. This means that by allowing a larger waiting time in the multiplexing system, the system can accept higher rate incidental streams. However, for very large TW, R becomes constant and equal to C — RaVg, where C is the transmission rate reserved for the main video streams and R.AVG is the total average rate of all the main streams. This is due to the fact that even by increasing TW, R cannot become larger than the channel rate minus the main streams average rate. In next experiment, we tested the accuracy of our admission control scheme via simulation by observing the waiting time of the incidental streams data units during multiplexing. The results showed that if an incidental stream with rate R and maximum waiting time TW is accepted by the admission test, then the waiting time of its data units in the system is always less than TW. However, for the incidental streams which were rejected by the admission test, the waiting time of some data units was more than TW seconds. 2.6 Conclusion In this chapter, we presented methods for implementing the deterministic ser vice class. We employed a model for the traffic of main video sources. We used the concept of traffic constraint function, and the empirical envelope as the tightest traffic constraint function. After discussing the current approaches to deterministic traffic modelling, we selected the (a, p) model as the traffic model for our application. We showed that the (a, p) model can accurately model the empirical envelope of main video sources. Then, we presented efficient methods for fitting the (a, p) model to a traffic source. We showed that our model fitting methods result in a more efficient and more accurate model parameters than other methods in the lit erature. Next, we adapted the newly developed 'Network Calculus' theory, and designed an admission control mechanism for the deterministic service class of the proposed 1TV application. Our admission control mechanism finds the maximum waiting-time TW for an incidental stream with rate R. Our simulation results provided some exemplary numerical values for the maximum waiting-time TW and the rate R of an incidental stream in a typical digital TV system. The deterministic admission control scheme presented in this chapter 53 relies on the traffic constraint function of the main video streams, which is a worst-case estimate of the traffic a main video source can generate. Therefore, the obtained Tw is based on the most pessimistic forecast of the system. This approach is attractive since it ensures that no incidental data packet will be lost. However, it does not result in high utilization of available bandwidth. In the next chapter, we will discuss the stochastic service class, where Tw is found such that some data loss is possible, however, this data loss is limited. Our design of the stochastic service class is fundamentally different from the deterministic service class, and is based on a different type of traffic models. 55 Chapter 3 Stochastic Service Class Chance favors only the prepared mind. -Louis Pasteur, A New Kind of Country, 1978. Overview This Cha,pter presents a scheme for implementing the stochastic service class based on the 'effective bandwidth' theory. The effective bandwidth characteris tics are exploited. We also show how the effective bandwidth is used to design an admission control scheme for the stochastic service class of the proposed ITV application. Using the methods presented in this chapter, one can find Tw for an incidental stream with given rate R and data loss probability p. 3.1 Introduction In this chapter, we develop a method for implementing the 'Stochastic Service Class.' As discussed in Chapter 1, when an incidental stream is added to a TV program using the stochastic service class, the transmitter does not guarantee to send all the incidental data units on time (i.e., before their transmission deadline). The data units which are not transmitted on time are considered lost data. Therefore, some incidental data loss is probable in the stochastic service class. However, the rate and waiting time of an incidental stream should be selected so that the data loss probability is less than a threshold. Therefore, an incidental stream that is to be transmitted using the stochastic service class should be first accepted by an admission control mechanism. This admission control mechanism verifies that the transmitter can send this incidental stream 56 with rate R and maximum waiting time tw and with a data loss probability less than a given threshold, say p%. Our ultimate goal in this chapter is to present a scheme, by which the admission control finds the maximum waiting-time Tw, given the rate R and the loss probability p% for an incidental stream. A key issue in implementing the stochastic service class is to find a suitable traffic descriptor, which encapsulates the stochastic properties of the main streams traffic. Then an accurate admission control using the selected traffic descriptor should be designed. Our approach here is based on the theory of 'effective bandwidth'. The main motivation behind this theory is to provide a measure of bandwidth usage by a traffic source in a communication network, which can adequately represent the statistical characteristics of the source. In this theory, each traffic source is described with a traffic descriptor called 'effective bandwidth curve'. This theory then provides mechanisms to find a level of statistical service guarantee for usual network operations, such as multiplexing, buffering, etc. We use this theory to design the admission control mechanisms of our application problem. The rest of this chapter is organized as follows. In Section 3.2, the effective bandwidth is defined and its characteristics are described. In Section 3.3, we show how the effective bandwidth is used to bound the data loss in general network operations. Based on this theory, we design an admission control mechanism for the stochastic service class in Section 3.4. In Section 3.5, we discuss the current approaches to the numerical estimation of the effective bandwidth curve. 3.2 Effective Bandwidth The theory of effective bandwidth was first introduced in the early 1990's by Gibbens and Hunt [37], Kelly [38], and Guerin et al [39]. Since then, this the ory has attracted much attention from both the mathematics and engineering communities, and emerged as a powerful but complicated mathematical the ory. Currently a great effort is in progress to expand the effective bandwidth theory and its applications. The associated mathematical theory of the effective bandwidth concept is built upon the theory of Large Deviation Principle, LDP, which studies the tail properties of probability distributions. The effective bandwidth of a source is closely related to the moment generating function of the arrival process of 57 the source. The moment generating function of a random variable contains more information about the stochastic characteristics of a process than its mean. Hence, traffic characterization methods based on the effective band width function are more accurate than the widely used traffic characterization methods based on 'Poisson processes', which rely on the average rate. A useful interpretation of the effective bandwidth concept is that the effective bandwidth theory gives the probability of a traffic source generating traffic at a rate higher than its average for a long period of time. More precisely, let us denote the instantaneous rate of a traffic source by y(t) and the average rate of the source by ji. Then we expect Y?T=i V^) to be close to at for large t's. Effective bandwidth theory bounds the probability that X)t=i 2/(0 ~ ia> where a > a. Hence, under some mild conditions, the effective bandwidth theory gives the probability that a variable rate source generates traffic that is equal to a constant rate source with rate a during a long period. The probability that the source follows the effective bandwidth model is incorporated into the model through a parameter named 'scale factor1, 9. Therefore, the effective bandwidth is a function of the scale factor, usually denoted as a(6). The effective bandwidth value lies always between the average and the peak rate of the source. Higher levels of certainty result in a larger 9 and an effective bandwidth that is closer to the peak rate, e.g., a certainty value equal to one corresponds to 9 = oo and a(oo) is the peak rate of the source. The effective bandwidth concept can be viewed as a compromise be tween two alternative bandwidth allocation schemes, a pessimistic outlook and an optimistic one. In the pessimistic case, one uses a strict approach to bandwidth allocation, where the bandwidth allocation is based on the sources peak rate. This approach seeks to eliminate data loss. In the optimistic case, one uses a lenient approach to bandwidth allocation, where the bandwidth is allocated based on the source's average rate. This approach seeks to gain high bandwidth utilization. The effective bandwidth a(9) gives a spectrum between these two approaches, where the scale factor 0 < 9 < oo determines how lenient or strict this approach is. In next Section, we first briefly review the large deviation principle concept. Then, we present a precise definition of the effective bandwidth in Section 3.2.2. 3.2.1 Large Deviation Principle As mentioned, the theory of effective bandwidth relies on the 'Large Deviation Principle', LDP. Large deviation principle is a theory that studies the tail properties of probability distributions. This theory refers to a collection of techniques used for estimating properties of rare events, such as the frequency of their occurrence, or the most likely manner of their occurrence. Large deviations do not apply to any event that has a very low proba bility of occurrence. Roughly speaking, a large deviation event is caused by a large number of unlikely things occurring together, rather than a single event of small probabilities. For example, winning a lottery cannot be studied with large deviations, since it is a single event composed of a single trial and cannot be broken into more than one sub-event. However, the probability that the average grade of a class in an easy exam becomes very low can be considered a rare event, since it can be decomposed to the improbable sub-events that each individual student gets a very low mark. In the proposed ITV applica tion, a large burst of traffic is generated by a source when the source starts sending traffic at a rate higher than its average, and continues to do so for a long period of time. Therefore, the occurrence of a large burst of traffic can be broken into many low-probability sub-events. One can consider LDP as a tool to turn the probability problems into deterministic optimization problems. Loosely speaking, to calculate the prob ability of a rare event, one assigns a cost to each sample path that would cause an event to occur. In the example of having a very low average grade for a big class in an easy exam, a path is that all the students get a low mark, and an alternate path is that many students get zero and only a few get very good marks. Then one finds the cheapest (or the most probable) path in that set of sample paths. The probability of event is then estimated by: F(event) = e-nxconst (3.1) where n is an asymptotic parameter, usually the length of time over which we observe the process. Therefore, one can think about the rare events in terms of sample paths and costs, and to find the probability of a rare event, one can simply consider the cheapest way that the event can happen. 59 General Definition Of LDP. Let y(2),... be a sequence of a random processes. Let Y(t) = £T=i y(r)-We say that Y(1),Y(2),... satisfies the large deviation principle with the rate function / if (see [40]) 1. For every closed set C C M. we have lim sup - log F(Y(n) € C) < - inf /(a) (3.2) ii—too Tl a£C 2. For every open set G C R, we have lim inf - logP(y(n) £(?)>- inf /(a) (3.3) It is shown in the LDP theory that if the random variable Y has a finite moment generating function E(eeY) for all 6, where E denotes expected value, then Y satisfies the large deviation principle with the rate function I = A* where: A(0) = lim -logE(eeyw) (3.4) t-*oo t A*(a) =sup[0a- A(0)] (3.5) e A((9) is called the 'Gartner-Ellis' limit. A*(a) is called the 'Legendre Transform' of A(9). Also, it is shown that A* (a) has the following characteristics: 1. A*(a) is non-negative, i.e., A*(a) > 0 2. A*(a) is strictly increasing in a. 3: A*(a) attains its minimum at a — E(y), i.e., at the average of y. 4. A*(E(y)) = 0 5. A*(a) is convex, i.e., A*(aax + (1 - a)a2) < aA*(al) + (1 - a)A*(a2) 6. If A*(a) is the Legendre transform of A((9), then A(r9) is the Legendre transform of A*(a) as well, i.e., (A*)*(0) = A(f9). For this reason, the two functions A* and A are called Legendre transform pairs. 60 3.2.2 Effective Bandwidth Definition Let y(t) denote the instantaneous rate of a traffic source. The arrival process is defined as Y(t) = YlT=oy(T)> which is the total traffic generated in [0,i]. Then the effective bandwidth for this source is defined as (see [41]) a{6) = lim -j- logE^W), V# e K+ (3.6) t—»oo Ot Parameter 9 is called scale factor. By definition, fy(0) — E(e9Y^) is the mo ment generating function of the random process Y(t). Therefore, the effective bandwidth of a traffic source is closely related to the moment generating func tion of the arrival process of the source. Figure 3.1 shows a typical effective bandwidth curve for a traffic source with bounded peak rate. It has been shown that [41] 1. a (9) is a non-decreasing function of 9. 2. a(0) lies between the average and peak rate of the source, i.e., K(y) < d{9) < y, where y denotes the peak rate of the source y. 3. If we have ./V sources, each with arrival process Y^t), and if Y(t) is the multiplexed stream of these sources, i.e., Y(t) = ^^(i), then: a(0) = J>(0) (3.7) i 4. It is shown that the shape of a{9) around 9 = 0 primarily depends on the mean, the variance and the higher moments of Y[t), while.the shape of a(9) for large 0's is primarily influenced by the tail distribution of Y(t) around it's maximum. This can be justified using the Taylor expansion of a{9) at 9 = 0 and 1/6? = 0. <*{0) = m + V20 + 0(92) (3.8) -^ + 0(1) (3.9) It is shown that ^ = E(y), % = |var(i/), % = y where y is the peak rate of y and ry4 is the average length of the periods that y is equal to its maximum, that is y is the average length of burst periods [41]. 61 <X(8) A(8) 0 0.2 0.4 0.6 0.8 1 0.6 0.8 1 a c Figure 3.1: A sample effective bandwidth curve a(6), and A(8), A* (9) and inverse of the effective bandwidth function 1(c) = a~l(6). The average rate of the source is .448. 3.2.3 Physical Interpretation of Effective Bandwidth One useful physical interpretation of a(6) is that of using the effective band width concept to approximate the behavior of a VBR source with a constant rate. Suppose y(t) has a finite mean value E(y) = fi. We know from the law of large numbers that with probability 1, Y(t)/t —> fi as t —> oo. Thus the probability that Y(t)/t is away from u goes to 0 as t increases. In the the ory of large deviation principal, it is shown that this convergence to 0 occurs exponentially fast, that is, for a> fi, lim - logP(y(t) > ta) = -A*(a) (3.10) t—*oo t where A*(a) is known as the rate function. Roughly speaking, equation 3.10 states that ¥(Y(t)/t^a)^e-tA'{a) (3.11) 62 Therefore, the value of A* (a) indicates how difficult it is for Y(t)/t to be close to the constant rate a. A*(6) is related to a(9) with A*(a) = swpg[9a — 9a(9)]. That is A*(a) is the Legendre transform of 9a(9). 3.3 Quantifying Data Loss in Communication Networks Using Effective Bandwidth In this Section, we describe how the effective bandwidth concept is used for quantifying the probability of loss in communication networks. Let us first consider a simple buffering model, where a single buffer of size B is filled by data from a variable rate source with rate y(t), and is emptied at constant rate c. (see Figure 3.2). Let W(t) denote the buffer workload at t. c is selected to be greater than the average input rate, but less than the maximum input rate. Therefore, we expect some data workload to build up in the buffer occasionally, and also that the buffer becomes empty regularly. It is shown that the probability of the buffer overflow, i.e., the probability that starting with an empty buffer the workload exceeds the buffer size before the buffer becomes empty again, is bounded as F(W(t)>B)<p (3.12) where p = e~B0*. 9* is selected such that a(9*) = c and t is large1. Equation 3.12 also gives a bound on the buffering delay, defined as d(t) = W(t)/c, F(d(t) > D) < p = e~Be' (3.13) where D — B/c is the maximum acceptable buffering delay. Validity of equations 3.12 and 3.13 have been proven [41-43]. An intu itional proof of these equations can be given based on equation 3.11 as follows. Note that a buffer overflow in this simple buffering model happens if the input has the rate a > c for at least B/(a — c) time units. According to equation 3.11, the probability that the input behaves in that way is approximately exp{ —A*(a)\ (3.14) a — c xNote that the buffer workload random process, W(t), converges to a marginal distribu tion as t becomes large enough, usually denoted with W^. Therefore, ¥(W(t) > B) actually denotes the tail properties of the distribution. 63 y(t) Figure 3.2: A simple buffer of size B, filled at variable rate yt and emptied at constant rate c. where the parameter a can be any value larger than c. Let 9* be selected such that a(0*) = c. Then the probability that buffer occupancy reaches B is approximately given by: £ ^-dW'(B)} <3-15) Now we approximate this sum of exponentials with the exponential with the largest exponent, that is, by p^exp{- . B A*(a')} (3.16) a* — a(9*) where a* is such that A*(«*) • A*(o) ,01„, 1 J = nun —-A-A- (3.17) a* - a(9*) a>a(s*) a - a(9*) However, from the definition of A* in equation 3.5, it can be noted that A* (a) > 9a — 9a(9) for all 9. Hence we have min —^ ^ . = 9* and V ' a>a(6*) a - a(9*) P^e~Be' (3.18) Therefore, one can say that a(9) is the rate at which the buffer must be emptied so that the buffer overflow probability decays exponentially with rate 3.3.1 Data Loss Probability in a Queuing Model with Priority We consider a simple system with two prioritized inputs as shown in Figure 3.3. The inputs are buffered at two different buffers of size BH and BL, and are 64 High priorily buffer VH(t) vdt) yH(t) + yL(t) I Low priority buffer ih.it). BLit) (a) (b) The equivalent hypothetical buffer for A Multiplexer with prioritized inputs determining the low-priority buffer overflow probability Figure 3.3: A simple multiplexer model with prioritized inputs. then multiplexed to one output with constant rate c. The high-priority input has primitive priority over the low-priority input, which means high-priority packets will never be impeded by low-priority packets. Let and yi{t) be the rate of the high-priority and low-priority inputs at t respectively. The cumulative arrival functions for these inputs are denoted by Yfj and Yjr, where we have t YH(t) = J2yH(r), Vt>0 T = 0 t YL(t) = ^2yL(T), Vt>0 (3.19) T=0 Also, let yH and yL denote the maximum rate of the input sources. The output rate c is such that yH < c <yH + yL. Therefore, the high-priority buffer will never overflow, while it is possible that the low-priority buffer overflows. Also, c is greater than the average of y# + yi, which means that the low-priority buffer becomes empty regularly. Let p denote the probability that the low-priority buffer overflows. Let Wi{t) be the low-priority buffer workload at t. The low-priority buffer overflows if WL(t) > BL- During [0,t] the high-priority source sends Yn(t) bits to the system and the system will transmit ext — Yait) bits from 65 the low-priority buffer data. Therefore, we have WL(t) yL[t) -(cxt- YH(t)) (YL(t) + YH(t))-cxt (3.20) and thus F(WL(t) > BL) F((YL(t) + YH(t))-cxt> BL) (3.21) The right hand side of this equation can be interpreted as the probability of buffer overflow in a hypothetical single buffer of size Bi, filled at rate llL{t) + Un{t) and emptied at rate c. Therefore, in order to study the data loss probability of the low-priority source, we can replace the prioritized model with another model, which has a single hypothetical buffer of size Bi, filled at rate yi(t) + yu{t)\ and emptied at rate c, as shown in Figure 3.3-b. Then, as discussed in Section 3.3, if 9* is selected such that aH,(9*) + ai(9*) — c, we will have p = e~Be" or B = — \og(p)/9*. If the low-priority input source has a constant input rate R (i.e., yi(t) = R for all £), then we can find a bound on the time that the low-priority data units wait in the buffer. Let say that WL data units of the low-priority input are waiting in the buffer, as shown in Figure 3.4. Let di(t) denote the time that data unit #1 has being waiting in the buffer. Then, since the buffer fill up rate is constant, we have di{t) = Wi{t)/R. Therefore, F{di{t) > d,) = P(l/I/i(0 > Rxd). Note that if the low-priority source is a variable rate source rather than a constant rate source, then this method will not be applicable. When the low-priority source has constant bitrate, we have which means that F(WL(t) > BL) is equal to probability of loss in a buffer of size Bi, filled with the high-priority source at rate and emptied at constant rate c — R. Now we extend the simple model in Figure 3.3-a to a system with ./V high-priority inputs and one low-priority input, as shown in Figure 3.5-a. In this model, ./V high-priority sources are multiplexed with one low-priority ¥(WL(t) > BL) = n(YL(t) + YH(t) -cxt)>BL) = F{(R x t + YH{t) - c x t) > BL) = F(YH{t)-{c-R)xt> BL) (3.22) (3.23) (3.24) 66 WL(t) data unit *WL(t) data unit #2 Figure 3.4: A simple buffer filled at constant rate R and emptied at variable rate c. If the buffer workload is WL, then we know that the first data unit has waited WL/R seconds in the buffer. source. The ith high-priority input has the instant rate of yn,ii maximum rate of ym, and effective bandwidth of at(9). The maximum rate of multiplexed stream is c. c is selected such that Y^iLiVn.i < c < VL,I + Y^=iVH,ii therefore the high-priority buffers never overflows, but it is possible for the low-priority buffer to overflow. It is easily noted that the probability of the low-priority buffer overflow is given by F(WL(t) > BL) = F((YL(t) + £2=i YHiri{t)) -cxt> BL), where is the arrival process of the ith high-priority input. As stated in property 4 in Section 3.2.2, the effective bandwidth of a multiplexed source is simply the sum of the effective bandwidths of all the sources. Therefore, the probability of the low-priority buffer overflow is bounded by e~m"', where 6* is selected such that aL + YliLiai{Q*) ~ c- ^ tne low-priority source has a constant rate, then ai(0) = R for all 9; and 9* is selected such that zZlx<0*) = c-R 3.4 Admission Control for the Stochastic Ser vice Class In this Section, we describe how the effective bandwidth, and the queuing model presented in Section 3.3.1, are adapted to design an admission con trol test for our interactive TV system. Suppose N television programs are sharing one transmission line with constant capacity c. We assume that each main video stream is characterized by a known effective bandwidth curve, say ai(0) for the ith main video. Now suppose an incidental stream with constant rate R, maximum waiting-time d, and probability of violating the maximum waiting time constraint of p, is requested to be added to the system using the stochastic service class. The multiplexer system can be modelled with the buffering system shown in Figure 3.5. As discussed in Section 3.3.1, the 67 J/H,l(*> VHMQ vUt) (a) A Multiplexer with N high-priority and one low-priority inputs BUt) (b) The equivalent hypothetical buffer for determining the low-priority buffer overflow probability Figure 3.5: A simple multiplexer model with N high-priority and one low-priority inputs. following equations hold N p = e-Be' B = Rxd R (3.25) Hence, given two out of three parameters R, d and p, one can use the equa tion 3.25 to determine the third un-known parameter. If the triple (R,d,p) conforms to the quality requirements of the connection request, then the con nection request is accepted. 3.5 Numerical Estimation of Effective Band width In this section, we describe the current approaches to estimating the effective bandwidth of a source. Before doing so, it is noted from the definition of effective bandwidth (equation 3.6) that a(6) closely depends on the 'moment 68 generating function'2 of the arrival process of the source. Therefore, to es timate the effective bandwidth, one should estimate the moment generating function or all the generating momentums3 of the arrival process of the source, which is a very complicated task. Also, note that in many applications such as in the proposed ITV application, the whole effective bandwidth curve, i.e., a{9) for all 9, should be estimated. This actually makes the estimation process even harder. Current approaches to numerical estimation of effective bandwidth are as follows. 1. Direct Approach Recall that the effective bandwidth a is defined by a{9) = lim -J-logIE(e9y(t)) t^oo 9t Thus, by monitoring the traffic one can use m akxm(9) = 8~lk-1 \og(m~l ^ ^(Y^-Y^-I)^ (3 2?) i=i as an estimator for a [43-46]. This approach is attractive since it cir cumvents modelling the traffic source. However, this approach takes a very long time to converge to an accurate result. According to [43], an accurate estimator of a{9) requires that both k and m be large. So, the monitoring time A; x m may in fact be very lengthy. In [45], a technique called re-sampling or boot strap is proposed for this problem. In these approaches some synthetic data are generated, which are similar to the original data in 'some sense' [47-49]. These synthetic data are used along the original data, as a data set with a larger sample size, to estimate a(9) using equation 3.27. However, this approach intro duces two important and problematic issues. First, we should determine 2Moment generating function of a random variable X is defined as rp(9) = E{eeX) (3.26) 3The n'h momentum of random variable X is defined as E(Xn). All the momentums of X can be successively obtained by differentiating ip(9) w.r.t 6 and then evaluation at 8 = 0, that is ^"(0) = E(Xn) , n > 1. 69 how the synthetic data are generated. This depends on what kind of similarity concept is suitable for our data and application. In the origi nal bootstrap method, which was developed for i.i.d. random variables, the empirical distribution of the original samples F is estimated. Then new samples are drawn from this distribution F. However, this method is not suitable for dependant data, fn a method proposed for dependant data, called 'moving block bootstrap', the data is partitioned to blocks of size b, and the same algorithm is then performed on these blocks. However, the disadvantage of this method is that it destroys all the de pendency of lags larger than b, hence its is suitable only for short term dependant data. Furthermore, this approach introduces another com plication, which is selection of an appropriate block size b. The second issue in using the bootstrap method is to determine how many synthetic data are enough to find an accurate estimate of a(6). This question is usually addressed by either using a large sample size, which is thought to be sufficiently large in advance, or by continuing to generate synthetic data sets until the final estimate of a(9) does not change by adding a new set of synthetic data. For example, in each iteration we generate a new set of synthetic data and estimate the a(9). If this a(9) is close to the a{9) estimated in the previous iteration by less than a threshold, then this estimate of a{9) is accepted. Otherwise, more synthetic data are generated. In general, this approach is not appropriate for the proposed ITV appli cation, since the sample sizes from the traffic of a typical TV program are not large enough to result in an accurate estimate of a{9). Besides, using the bootstrap method is not appropriate, since the current approaches to generate synthetic data will destroy many important characteristic features in our original data. Virtual Buffer Approach In this approach, a virtual buffer of size b is considered, which is filled with the source and is emptied at a constant rate c, such that c is between the average rate and peak rate of the source (see [42,44,50,51]). Let a_1(c) be the inverse of the a(9) function (i.e., a_1(c) = 9 a{9) = c). As mentioned in Section 3.3, the probability of buffer overflow can be estimated as exp(—ba~l(c)). Therefore, the buffer workload is monitored in simulation and the empirical distribution of 70 data loss, n(b), is obtained. Then, a~l(c) is chosen so that the distance between ix(b) and p(b) = exp(—ba~l(c)) is minimized [44]. The distance measure employed is usually the "Kullback-Leibler distance"4 measure [43,52]. The estimate for a_1(c) is given by: a-\c) = log(l + Z^n{b) B -) Alternately, if a sufficient size of data is available, one can plot the logarithm of the loss probability versus b. As shown in [44], this plot has a straight part with slope a~l(c). Hence the slope of this part of this plot can be used as an estimate for a~l(c). The disadvantage of this method is that the buffer occupancy should be simulated for a long time to obtain an accurate estimate for a. This means that a large sample size is required to simulate the buffer occu pancy. Furthermore, this procedure should be repeated for each c to obtain the effective bandwidth curve, i.e., a(9) for all 9. On the other hand, this approach is attractive in that it does not assume any signal model for the traffic, which circumvents modelling the traffic. Model Fitting Approach This is a two step approach. First, an appro priate parametric model for the source is selected, and its parameters are estimated. Second, the effective bandwidth is numerically computed or obtained from the model parameters. This approach has been carried out successfully in many different applications. An important advantage of this approach is that one can estimate the model parameters in real time, and use the current estimate of model parameters to update the effective bandwidth estimate online. Model fitting itself consists of several steps, including: 1) Model selec tion: a mathematical parameterized signal model should be selected for the traffic. 2) Model order selection: depending on the selected model, it is usually necessary to select the model size. There are some systematic 4Kullback-Leibler distance or the relative entropy of two discrete distribution p and q is defined by d = £>log2(^) (3.28) <?fc 71 approaches to this problem, like Akaike information criterion. Selecting a larger model size usually increases the complexity of the model. 3) Estimating the model parameters. For off-line traffic sources, traffic is known a priori and off-line parameter estimation methods can be used. This is usually easier than the online case, where the traffic itself is not know a priori. In online parameter estimation methods, the parame ter estimate is usually updated after observing each traffic sample, such that our estimate converges to the actual value of parameters gradually. The online methods are usually able to track the changes in the model parameters. We employ the traffic modelling approach in the proposed ITV appli cation to find the numerical value of the effective bandwidth curve of a traffic source. This modelling approach comprises three important issues. First, we should select a suitable traffic model, which can capture the important charac teristics of TV video traffic. Second, we should find model parameter identifi cation (i.e., model fitting) methods, which can estimate the model parameters from the actual traffic. Finally, we should design a method for obtaining the effective bandwidth from the model parameters. We address these three issues in the next chapter. 3.6 Conclusion In this chapter, we presented a scheme for the admission control of stochastic service call. Our approach is based on the effective bandwidth theory. We defined the effective bandwidth, and exploited the important characteristics of this concept. Then, we showed how the effective bandwidth concept is used to design an admission control scheme for the stochastic service class. Using the methods presented in this chapter, one can find the maximum-waiting time for an incidental stream whose bitrate is R and data loss probability is p%. We also discussed current approaches for estimating the numerical value of the effective bandwidth curve from traffic samples of a source. We' selected a modelling approach, where a traffic source is modelled using a stochastic model, and effective bandwidth is then obtained from the model parameters. However, we did not indicate which stochastic model shall be used for the video sources in the proposed ITV application. In the next chapter, we address this 72 important problem. That is, we select a stochastic model for modelling the traffic generated by full-screen video sequences of TV programs. We justify our model selection by some evidence from traffic samples of video sequences of actual TV programs. We present methods for estimating model parameters from the traffic samples. We also show how the effective bandwidth curve is obtained from the parameters of the selected stochastic model. Using the methods presented in the next chapter, one can estimate the numerical value of the effective bandwidth curve for a main video source. Hence, the methods presented in this chapter, in conjunction with the methods presented in next chapter, complete the big picture of the admission control mechanism for the stochastic service class. 73 Chapter 4 Video Traffic Modelling All Models are wrong, but some of them are useful. Modelling should never be an end to itself, but an aid to understand our complex world. -George Box Overview Stochastic modelling of main video streams traffic is considered, and General hidden Markovian models are discussed for these streams. Specifically, Hid den Semi-Markov Models, HSMMs, are selected for modelling the main video streams traffic. A new formulation for HSMMs is presented, which significantly improves the computational efficiency of HSMM parameter identification al gorithms. Based on our new formulation of HSMMs, efficient algorithms for off-line and online identification of an HSMM parameters are presented. Then, it is shown how effective bandwidth curve is obtained from HSMM parameters. Using the methods presented in this chapter, one can estimate the numerical value of the effective bandwidth curve of a video source. The estimated effective bandwidth curve is then utilized in the admission control of incidental streams in the stochastic service class, as discussed in Chapter 3. 4.1 Introduction In this chapter we seek a stochastic model to characterize the traffic of the main video streams in a TV network. This model should accurately capture the important stochastic characteristics of the traffic. Our motivation is to 74 use this traffic model to find the numerical value of the effective bandwidth curve of the source. Hence, our objectives in this chapter are (1) to select an appropriate stochastic model for video traffic sources, (2) to present a method for estimating the model parameters from traffic (parameter identification), and (3) to obtain the numerical value of the effective bandwidth from the model parameters. As mentioned earlier, this traffic model is then used by the admission control unit to determine how much bandwidth will be available to the incidental streams that use the stochastic service class. Ideally, a good traffic model for our application should be (a) accurate enough to characterize the important statistical properties of the traffic (b) computationally efficient, and (c) could be used for obtaining the effective bandwidth curve. Note that video traffic demonstrates different characteristics at different time scales. Many of the current modelling efforts strive to capture the rate variability of traffic at the frame level. In those approaches, periodic pattern in frame sizes plays an important role in the traffic shape. This periodic pattern is caused by the frame coding pattern of DCT based video compression standards (e.g., IBBPBB frame coding pattern in each GOP of the MPEG-2 standard). For the proposed ITV application, the traffic model should be able to capture the video rate variability at the GOP level (or scene level) rather than at the frame level. This is because the waiting-times in buffers in the proposed ITV application are much larger than the video frame rate, and hence, rate variabilities due to frame patterns are ruled out by buffering. The rest of this chapter is organized as follows. In Section 4.2, we review current approaches to video traffic modelling, fn Section 4.3, we present our modelling approach, which is based on 'Hidden Markov Models', HMMs. In so doing, we discuss the concept of 'state-duration modelling' using the notion of Semi-Markov signal models. We show that Hidden Semi-Markov Models, HSMMs, are a better model choice for the proposed ITV application rather than those based on the previously used HMMs. In Section 4.4, we present the theoretical background of HSMMs. As will be discussed, the challenging issue involved in employing HSMMs is that of 'parameter identification'. We address this problem in sections 4.5-4.7. For that effect, we first present a novel signal model for HSMMs in Section 4.5. Based on our new model, we present novel methods for parameter identification of HSMMs for the off-line and online cases in sections 4.6 and 4.7. In Section 4.8, we show how the numerical value of the effective bandwidth curve is obtained from the parameters of an HSMM model. 75 Finally, we present experimental results of applying the methods developed in this chapter to empirical traffic samples of typical TV programs in Section 4.9. 4.2 Existing Stochastic Video Traffic Models Current modelling approaches for video traffic modelling (also known as 'source modelling' in literature) can be divided into five main classes: Renewal Traffic Models Renewal models are mathematically simple and have a long history. These models describe the packets generation at certain points in time. Let An be the time between generation of nth and ('ft + 1)"' packets, then An in a renewal model is identically dis tributed, but its distribution function is allowed to be general. Poisson and Bernoulli processes are the most popular cases of renewal models in continuous and discrete-time cases respectively. In a Poisson process. A,L is described by an exponential distribution F(An < r) = 1 - e"Ar, where A is the average number of packets generated per time unit. These models have the severe drawback that the autocorrelation function of An for any non zero lag is equal to zero. Therefore, these models do not capture any dependency among the An's in the past nor in the future. Autoregressive processes Autoregressive (AR) models estimate the next signal value in a stochastic process as a function of previous signal values. The most important class of these models is the linear autoregressive models, with the form where yt is the random variable, a.;'s are real constants, and et is an i.i.d. random variable with zero mean. More complicated autoregressive models such as MA (Moving Average), ARM A (Auto Regressive Mov ing Average) and ARIMA (Auto Regressive Integrated Moving Average) have also being considered for modelling video traffic. The drawback of these models is that they cannot successfully model the marginal distri bution of the video traffic [53]. N (4.1) 76 TES models The Transform-Expand-Sample (TES) modelling approach strives to simultaneously model both the marginal distribution and the auto correlation function of an empirical sample set. This means that TES models can capture the first and second order statistical characteristics of a stochastic time-series at the same time. TES models consist of two classes, called TES+ and TES-. where the plus or minus superscript distinguishes between the cases where the model gives rise to processes with positive or negative lag-1 autocorrela tions respectively. TES models consist of two stochastic processes called background and foreground sequences [53-55]. Background sequences have the form: tc = { where U0 is uniformly distributed on [0,1), {Vn}^tl is a sequence of i.i.d. random variables, called innovation sequence, and the operator () denotes fractional part (also known as 'modulo-1 operator'). It is shown that the marginal distributions of £/+ and U~ are both uniformly dis tributed over [0,1), regardless of the distribution of Vn. The foreground sequence is of the form Xn+= £>(£/+) X~ = D(UZ) (4.4) where D is a transformation from [0,1) to the real numbers 5ft, called distortion. Given an empirical data sample set, TES models select the innovation sequence Vn and the distortion function D, so that both the autocorrela tion function and the density function of the foreground process match the autocorrelation and density functions of the empirical data. The most important family of distortion functions consists of a com pound distortion function of the from [54,55] D(x) = Hy\Si(x)), XG[0,1) (4.5) (£/+_, +14), n>0 [ - 1 U+, n even 1 - U~, n odd 1 ' ' 77 where the inner transform, S$(x), is a 'smoothing' transform called the stitching transform, and is parameterized by 0 < £ < 1. and is given by Q ( \ J yl^ 0 < y < £ *(y) = \(i-y)/(i-fl, *<v<i (46) The outer transformation, Hy1, is the inverse of the empirical distribu tion function (i.e., histogram) computed from empirical samples {Yn}^=1. The rationale for TES modelling approach is based on the following facts. First, all TES background processes are stationary, Markovian, and their marginal distributions is uniform in [0,1), regardless of distributions of innovation process Vn [54]. Second, the inversion method from elemen tary statistics, allows any uniform variate U on [0,1) to be transformed to a desired marginal distribution F by applying F~l(U). In particular, TES models use F = Hy. Third, for 0 < £ < 1 the stitching transform S^(Un) preserves the uniformity of Un; that is, S^(U^) is also uniformly distributed in [0,1). It is shown that the stitching transform has only a 'smoothing' effect. That is, a sequence {S^ (£/+)} is more 'continuous looking' than an underlying sequence {U+}. It is shown that this trans form preserves uniformity, that is S^(U^) is also uniformly distributed in [0,1). Hence, it follows that any foreground TES process of the form Xn = Hy1(St(Un)) (4.7) obtained from the background process {Un} is always guaranteed to have the histogram distribution Hy, regardless of the selected innovation se quence Vn and the stitching parameter £. The choice of the density of the innovation sequence, fy, determines the autocorrelation function of the foreground process. Thus, TES modelling decouples the fitting of the empirical distribution from the fitting of the empirical autocorrela tion function. Since the former is guaranteed, one can concentrate on the latter. In practice, model fitting is carried out by a heuristic search for pairs (£, fy). The search is considered successful if the corresponding TES sequence gives rise to an autocorrelation function that adequately approximates its empirical counterpart [56-58]. Though TES models have been recently used for modelling video traffic with acceptable accuracy, their implementation is too computationally 78 complicated for practical applications. In practice, the heuristic search for (£, fy) relies on a brute-force computation and on selecting the best n combinations of (£,/y) pairs, in the sense that the resulting TES model autocorrelation function minimizes the mean square error with respect to the empirical autocorrelation function. Then, another algorithm is employed to select among the n candidate models the one whose sample path bears the 'most resemblance' to the empirical samples. Further more, it is still not known how a TES model can be used for obtaining performance guarantees in a communication network. Specifically, there is no method available for obtaining the effective bandwidth of a traffic source from its TES model Self-similar or Fractal models These models are based on self-similar pro cess models. The self-similarity concept implies that samples of a pro cess demonstrate similar statistical characteristics when considered at different time scales. The key characterizing parameter is the so-called Hurst parameter, H, which captures the degree of self-similarity in a given empirical signal. Recently, many researches have used self-similar process models for modelling video traffic [12,53,59-63]. Most of these researches deal only with the statistical analysis of data sets, including the estimation of the Hurst parameter. These studies provide only lim ited information about traffic characteristics. Furthermore, very little research has been done on the analysis of communication networks (or queues) with self-similar sources. Markovian Signal Models Markov signal models have been successfully used for modelling video traffic [6,9,53,64]. One of the popular Marko vian models is the two state Markov chain (also called ON/OFF model), where one state represents the peak rate and the other represents the minimum rate. Though this model is simple an'd is useful for modelling video traffic in some cases, it is not accurate enough for modelling full screen video sources. The other commonly used model is the Markov Modulated Process, including the Markov Modulated Poisson Process. Most of the previous research on video modelling study the statistical characteristics of 'video conference' sources and not 'broadcast video' sources. Broadcast video traffic , such as TV video, has different characteristics from 79 those found in video conference applications. Usually, video conference se quences are encoded at a very low bitrate and consist of head and shoulder pictures with little or no camera movements. Hence video conference sequences differ from full-screen television video in two ways: first they consist of very rare scene changes and object movements in the picture; and second the bitrate of video conference streams has little fluctuations when compared to bitrate of full-screen video. 4.2.1 Stochastic Models for Full-Screen Broadcast Video The usual first step in modelling a real world's stochastic process is through analysis of the system generating the signal. The system that generates video traffic in digital TV applications is an MPEG-2 encoder. We are interested in modelling the amount of traffic that this encoder generates. MPEG-2 en coders use some important encoding parameters, such as number of slices, GOP pattern, GOP length, quantization scale and so on. These parameters affect the quality of the coded video, and are selected according to the appli cation needs. Unfortunately, given the input video sequence and the coding parameters, there is no procedure to obtain the statistical parameters of traf fic generated by the encoder. Therefore, it is not useful to incorporate the video coding parameters into the traffic model. However, analysis of MPEG-2 encoding techniques can provide valuable insights into the video traffic shape. One important characteristic of MPEG video traffic is the periodic frame pat tern in each GOP. However, in the proposed ITV application, buffering delays for incidental streams (i.e., waiting-time in buffers) are much larger than video frame rates. This means that incidental buffer sizes are much larger than an average incidental video frame, and the periodic frame pattern in each GOP is filtered out during the buffering process. Thus, rate variability caused by the periodic frame pattern does not play any role in the proposed ITV appli cation. Since the buffering delay for an incidental stream in the proposed ITV application is usually more than a couple of seconds, we consider the video source rate at the GOP level rather than the frame level. Note that in most video streams in digital TV application, each GOP contains 15 frames, which results in a GOP length of .5 seconds. Therefore, we seek a traffic model which can capture the correlation between consecutive GOP sizes in a video stream. It is well known that each GOP size depends on the visual content of 80 its corresponding scene. Two features of a video scene that affect its corre sponding GOP size are 1) visual details in each frame, and 2) video activities, such as object movements, background movements (e.g., camera panning and zooming), scene changes, etc. In general, simple and non-active scenes result in a small GOP size, while complex scenes result in larger GOP sizes. This inspires us to select a traffic model which can capture the trend in video traf fic caused by the underlying scene changes in video, while capturing the rate fluctuations caused by other parameters. 'Hidden Markov Models' (HMMs) are known to match this intuitional requirement. These models consist of a hidden layer and an observable layer. The hidden layer is a Markov chain process, which determines the state of the signal. The state of the signal is not observable, and is considered 'hidden'. One can only observe the output of the observation layer. State of the signal at time t determines the spectral characteristics of the observable layer of the model. That is, statistical char acteristics of the observed signal at time t depend on the hidden state of the signal at time t. Other research studies have confirmed that HMMs are relatively suc cessful in capturing the video traffic characteristics of video conference and broadcast video sources [65-70]. HMMs have also been widely used in many other engineering applications such as speech processing, signal estimation, and queuing networks [71,72]. The underlying mathematical theory for these models is well established, and efficient algorithms are available for their im plementation. However, HMMs have a limitation. Before we exploit this limitation, we will present a few mathematical preliminaries on the Markov models in the next section. We will then discuss the limitation of HMMs, and use a new model, which can alleviate this limitation. 4.3 Mathematical Background of General Marko-vian Models 4.3.1 Markov Chain The simplest form of Markovian signal models is a 'Markov chain'. Consider the discrete-time stochastic process {st}, t = 1,2,..., which takes its values 81 from the set {1, 2,N}. The space {1, 2,.... A/}, is called the state space. If st — i, then the process is said to be in state i at time t. st is a Markov chain if whenever the process is in state i, then the probability that it enters state j in next time-unit is constant. This property is the essence of Markovian signal models and is called Markovian Property, and simply states that the probability of transition from a state to another state does not depend on the previous states of the process, that is F{st = j|st_! = i,5t_2 = k,...)= F{st = j\st^ = i) (4.8) Let aij represent the probability of going from state i to state j, that is aij=F(st = j\st.1=i) (4.9) Note that for all i and j, a^ is constant and time-invariant. The matrix A — [aij] is called the state transition probabilities matrix or state transition matrix. Note that since a^'s are probabilities and since the process st should make a transition to one state at each time instance, we have N aZJ>0, l<i,j<N; £^ = 1, i = l,...,N (4.10) i=i Similarly, the n-step transition matrix An is defined, where A™- represents the probability of being in state j after n state transitions starting in state i. 4.3.2 Hidden Markov Models Hidden Markov Models, HMMs (also called Markov Modulated Processes, MMPs) are a powerful and widely used class of Markovian signal models. These models are 'doubly stochastic', and consist of a hidden layer and an observable layer. The hidden layer is a Markov chain process st, which follows equations 4.8-4.8, and determines the state of the signal at t. The state of the signal is not observable, and is considered 'hidden7. We observe the observation process yt. The spectral characteristics of the observed signal at time t is determined by the state of signal at t. A common model choice for modelling the observable layer is a parametric probability density function (pdf), where the pdf parameters are determined by the current state of the signal. This is denoted by ¥(yt\st = i) = bi(yt) (4.11) 82 where bi(x) is a parameterized density function. For example, if bi(x) is a Normal distribution with mean /^ and variance of, then the probability of observing yt given that the state process is in state i at t, is 1 2a? F(yt\st = ez) = ^==e ^ (4.12) which only depends on i. Note that rather than probability density functions, other stochastic models such as Poisson process or TES models, have being considered for modelling the observation layer of HMMs [57], 4.3.3 Limitation of HMM Models One disadvantage of HMMs lies in their limitation in modelling the 'state du ration' densities. State duration is defined as the time that the state process of an HMM spends in each state before making a transition to another state. Note that in a Markov chain, the probability of remaining in a state is con stant. That is, P(st+i = i\st = i) = an, where ati is constant. Therefore, the probability of staying exactly d time units in state i is given by p(d) = at1 • (1 - au) (4.13) Hence, state duration densities in HMMs are limited to the form given in equation 4.13, which is known as 'Geometric' discrete distribution func tion in literature. This implies that the probability of staying d time units in each state exponentially goes to zero as d increases. However, the Geometric distribution is not appropriate for the state duration modelling of many phys ical signals. In order to remove this limitation, a more sophisticated model, called 'Semi-Markov chain,' is used where state duration densities are modelled with some non-Geometric distribution. A hidden Markov model that uses a semi-Markov chain for modelling the hidden state process is called 'Hidden Semi-Markov Model', HSMM. Generally speaking, HSMMs are more power ful in modelling physical signals than HMMs. However, HSMMs are more complicated. 83 4.3.4 Selecting Between HMM and HSMM Modelling Approach for the Proposed ITV Application Is HSMM a better model choice for TV video traffic rather than HMM? The answer relies on whether Geometric distribution is a good model choice for state duration of video traffic processes or not. fn order to gain an insight to this matter, we conducted two different experiments using empirical bitrate of a few typical TV programs, as discussed below. The first experiment was based on an empirical approach to estimate the state sequence from the empirical bitrate traces of typical TV programs. In this experiment we first partitioned the video sequences to different scenes. The partitioning method used is similar to that presented in [73], where sud den jumps in the empirical bitrate of video is used to detect scene changes. A manual comparison of detected scene changes and video content showed that this method successfully captures the scene changes from the empirical bitrate record. Then, all scenes were partitioned into three clusters (low, medium and high bitrate) according to the average encoded GOP size of each scene. The clustering algorithm used is presented in [57]. The approach of this clustering algorithm is to partition the given data set, so that elements in each cluster are 'closer' to each other than to elements in all other clusters. We assume that each cluster represents a hidden state of the Markov chain model. According to this classification, we constructed a 'state transition sequence' for each video sequence (see Figure 4.1). Next, we extracted the empirical state-duration traces for each state from the state transition sequence of each video, and cal culated the histograms of these state-duration traces. In order to determine if Geometric distribution is a good fit for these empirical state-duration traces, we used both visual histogram comparison and 'Q-Q plot technique' (see ap pendix B). Figure 4.2-a shows the histogram of one of the state duration for a 'News' video sequence, and 4.2-b shows the typical shape of a Geometric and a Gamma distribution. It is noted that empirical data histogram reveals a curve shape more similar to Gamma distribution than Geometric distribution. Fig ures 4.2-c and d show the Q-Q plots for Geometric and Gamma distributions. As discussed in appendix B, if the statistical properties of the empirical data match the selected parametric distribution, then the Q-Q plot is expected to be approximately linear with an intercept of 0 and slope 1. It is noted that, the Q-Q plot for Gamma distribution is closer to a line with slope 1 rather than 84 (a) Normalized bitrate of the 'News' (b) Normalized bitrate of the sequence. 'Mission Impossible' sequence. 3.5 ] 3.5 (c) State transition path for 'News' (d) State transition path for 'Mission sequence. Impossible' sequence. Figure 4.1: Normalized bitrate and 'State transition path' for two typical TV programs. the Q-Q plot for Geometric distribution. Hence, we conclude that a discretized Gamma distribution is a better choice for modelling the state durations of TV video traffic. The second experiment was motivated by the results of the first experi ment. In this experiment, we used the 'likelihood ratio test' hypothesis testing method (see appendix C) to determine if an HMM model is a better choice for digital TV traffic sources rather than an HSMM model with Gamma state-duration densities. We let yT = {yi,U2, • • • ,VT} denote the empirical bitrate trace of the video sequence under study. Then, we test the null hypothesis "HO: yT is an HMM" against UH1: yT is an HSMM". Details of adopting 85 Sequence -21n(A) 95% percentile point of x2 distribution with v = 3 degree of freedom Test result Mission Impossible II 21.29 7.815 Reject HO News 38.93 7.815 Reject HO Soap Opera 43.17 7.815 Reject HO Table 4.1: Results of likelihood ratio test for different typical TV video se quences. The number of states in HSMM and, HMM models is N = 3. the general likelihood ratio test method for conducting this hypothesis test are given in appendix C. In this test, the likelihood ratio A is computed as the likelihood of yT being generated by an HMM model to the likelihood of yT being generated by an HSMM model. It is shown that —2/n(A) has a x2 distribution (see appendix C). Hence, —2/n(A) is compared to the 100(1 — a) percentile point of a x2 distribution, where a is the significance level of the test. Table 4.1 summarizes the result of this test for the empirical bitrate of a couple of TV programs. The significance level of the test is a = 5%. As shown, for all the sources, —2/n(A) is greater than the f00(l — a) percentile point, which means that the null hypothesis (i.e., 'JV is an HMM') should be rejected. Hence, we choose hidden semi-Markov models for modelling the TV video traffic in the proposed ITV application, where the state durations are modelled with Gamma distribution. In the next section, we present the math ematical formulation of HSMMs, and address the important issues raised in employing HSMMs. In Section 4.9, we present the result of fitting an HSMM model to empirical bitrate traces of typical TV programs. 4.4 General Background on Semi-Markovian Signal Models 4.4.1 Semi-Markov Chains The discrete-time stochastic process {st}, t = 1,2,..., which takes its values from the state space {1,2, ...,N} is a Semi-Markov chain if the next state of signal depends only on the current state and the amount of time that signal 86 Figure 4.2: a) Histogram of state (or cluster) duration for one of the states in the News Sequence, b) Geometrical probability mass function, c) Discretized Gamma probability distribution function, d) Q-Q plot for Geometrical distri bution, e) Q-Q plot for Gamma distribution. 87 has spend in the current state. Formally, this condition is stated as P(st = i|st_i = st_2 = • • • = st^d = i, st-d-i = k, • • •) = F(st = j\st-i = i, d) (4.14) where i 7^ k. A — [a^-] is known as the state transition probabilities matrix, where is the probably of going to state j from state i, knowing that the signal is leaving state i. dij = P(st = i|st-i =i,st^ i) Note ay's are all constant, and are constrained to N Oij>0, l<i,j<AT; £^- = 1, i = l,...,N (4.16) i=i All ajj's are zero. In a Semi-Markov chain, the state duration densities are modelled in some non-Geometrical form, unlike that of equation 4.13. This non-Geometrical from is usually denoted by a state duration probability mass function <Pi(d), where <Pi(d) is the probability that the signal stays exactly d time units in state i during its visits to this state. (fi(d) may be selected to be one of the known parameterized probability mass functions, such as Poisson, Binomial and so on. Alternately, <Pi(d) may be selected as a discretized probability distribution function, such as Normal or Gamma pdf. The signal generation process of a Semi-Markov chain st can be sum marized as follows. 1. Start with a given initial state, e.g., st = i for t = 1. 2. Select a duration d according to the state-duration density function of . the ith state, <pi(d). 3. For the next d time units, stay at the same state i. 4. Select the next state according to a constant state transition matrix A = [a,j], with the constraint that the signal should leave the current state (i.e., an = 0). 5. Go back to step 2. (4.15) 88 4.4.2 Hidden Semi-Markov Models A Hidden Semi-Markov Model, HSMM, (also known as Semi-Markov Modu lated Process, SMMP) is similar to an HMM except, that the hidden state process is a semi-Markov chain. Generally speaking, HSMMs are a generaliza tion of HMMs and are more powerful in modelling physical signals. However, HSMMs are more complicated than HMMs. The most important problem that arises in using HSMMs is the identi fication of the model parameters. This identification problem is studied either in off-line or online cases, fn off-line case, given a set of observations from an HSMM signal, yT = {y\,y2, • • • ,VT}, one should find the parameters of the HSMM model, denoted by 6. In online case, one observes a signal sample yt at time t, and should update the current estimate of the model parameters 6t such that 6t gradually converges to the actual model parameters. In the rest of this Chapter, we address the identification of HSMMs in both off-line and online cases, as they are both necessary in implementation of the pro posed interactive TV system. Off-line identification methods are necessary for estimating the model parameters of pre-recorded video streams. Online iden tification methods are employed for online model parameter estimation from live video sources. Ultimately, the estimated model parameters are used to find the effective bandwidth curve of sources. 4.4.3 General Background on Identification of HSMMs fdentification of HSMMs is conceptually similar to identification of HMMs; and current approaches to the identification of HSMMs constitute a generalization of the identification methods for HMMs. There are powerful methods available for identification of HMMs. Off-line identification of HMMs is based on an iterative algorithm, known as Baum-Welch1 or Expectation Maximization, EM algorithm. This algorithm finds the maximum likelihood estimate of model parameters 9. That is, 6 is estimated so that P(3M#) ^s maximized. It is shown that the maximum likelihood estimate converges to the true parameter value as the sample size T increases. The EM algorithm consists of two steps in each iteration; the E or the Expectation step and the M or the Maximization step. The algorithm starts 1The EM method method refers to a general class of approaches. The Baum-Welch algorithm is a variant of the EM algorithm for estimating the HMM parameters. 89 with an initial estimate of model parameters 6. Note that a part of an HMM model is hidden from us, and that is the state transition path of the underlying Markov chain, denoted by ST = {si, s2, • • • , sT}. In the E step the "optimal" state transition path ST is estimated from samples Vr and.current parameter estimates 9. This is done by estimating a set of so called forward and back ward variables, at's and (3t's. The forward parameters, at's, are computed by induction from at_i. Similarly, the backward parameters, /3t's, are computed from Pt+i- In the M step, the model parameters are re-estimated by finding the MLE estimate of 6 from the computed forward-backward parameters and yT- The E and M steps are iterated until 6 converges. The EM algorithm has been extended to the context of HSMMs by us ing 'explicit state duration modelling' [72,74-76] or 'parametric state duration modelling' [77]. The first approach estimates the density of the state durations for all possible values explicitly, while the second approach uses a parametric-distribution to model the state duration, and only estimates the model pa rameters. However, current methods for identification of HSMMs, which are based on these two approaches, have the major drawback of greatly increased computational load compared to the HMM case. This increase in the compu tational load lies in the estimation formulas for forward-backward variables in the E step, where <xt (f3t) should be computed from {cxt-i, c*t_2, • • • ,at_o} (and from {/3t+1, (3t+2, • • • ,(3t+D}), instead of at-\ (A+i) m HMMs, where D is the maximum allowable state duration. More precisely, in EM algorithm for identification of HMMs, the forward variables at(j) are computed by induction from at_i's using In this equation at(j) is obtained by adding up the probability that st has traversed one of the state transition paths shown in Figure 4.3-a. While in identification of HSMMs, at(j) is computed as That is, the probability of being in state j at t is computed by summing the probability of the paths shown for d = 1, 2, 3, • • • , D in Figure 4.3-b. Similar equations are used for computing f3t's. l<t<T l<j<N (4.17) (4.18) 90 Figure 4.3: State transition paths that should be considered for computing at{j) in the HMM and HSMM cases. It is shown that current approaches to identification of HSMMs using the EM algorithm increase the memory usage by the factor D times and the computation load by D2/2 times, when compared to the EM algorithm for HMMs. Since D is usually large in many applications(e.g., D — 25 in most speech processing applications), these algorithms become impractical. In [78], an HSMM with N states and maximum state duration D is reformulated as an HMM with ND states, then the standard EM algorithm is used to estimate the model parameters. There are other approaches, which are based on the 'state duration dependant transition probabilities' [79]. In these approaches, the state transition matrix is time-varying or is replaced with a tensor. The drawback of the methods presented in [78,79] approaches 91 is addition of a large number of parameters into the model. These extra parameters should be estimated in addition to the usual HMM parameters, and this requires a large sample size in order to obtain an accurate estimate. Online identification of HSMMs is conceptually harder than the off line case. Currently, there are no available methods for on-line identification of HSMMs in the literature. However, the on-line identification of HMMs have been studied in [71,80-83]. These approaches are based on either the 'recursive maximum likelihood', (RML), or the 'recursive prediction error', (RPE), techniques. In the RML approach, the current estimate of model parameters vector, 9t, is updated at each time instance in such a direction that the 'Kullback-Leibler' (KL) information measure is maximized, ft is shown that this results in the maximum likelihood estimate of the model parameters as t —> oo. The RPE method is a class of general numerical parameter estimation method. RPE algorithms are, in essence, 'Extended Kalman filters' (EKF). More precisely, RPE methods are a special class of Extended Kalman filters for the case when the unknown constant parameters of the model are viewed, and estimated, as states, fn this approach a norm V{9) that measures the prediction error of the model is defined. The model parameters are updated according to where is a search direction based on some information about V(9) ac quired at previous iterations, and A is a positive step size. There are different algorithms based on equation 4.60 which use different choices for the search direction /. The most important class of these approaches is the quasi-Newton methods, where the direction function /' is The mathematical theory for general RPE technique is presented in [84]. In [71,82], a formulation of HMMs is presented, which allows the general RPE scheme to be applied to on-line identification of the HMMs. The norm function used in this method is a weighted quadratic prediction error criterion: et+1 = ot + \-f (4.19) /« = -\V"{9t)\ 1 • V\9t) (4.20) T (4.21) 92 where e(k, 6) = y(k, 6) — y(k) is the prediction error. -ft is a normalizing factor and 8{t, k) is a weighting sequence, introduced to increase the effect of recent observations on the estimate. In [83], an algorithm based on the RPE technique is presented, which finds the MLE of the model parameters, and hence, can be considered as an RML algorithm. In summary, the existing off-line parameter identification methods for HSMM's are very computationally demanding, and none online parameter identification method have-been suggested. In the next section of this chapter, we will present new methods for identification of HSMMs for both off-line and online cases. Our methods are based on a new formulation of HSMMs. In our new signal model, we introduce a new variable to the traditional HSMMs, named the 'state duration' variable, dt. dt is actually a vector, where dt{i) denotes the time that the signal has spent in state i, given that the state at time t is i. We then use state duration dependent transition probabilities, where the probability of transition from state i to j is not constant and depends on dt(i). Hence, in our model, the probability of being in any state at time t depends on the state at t — 1 as well as dt-\. We model the state duration densities with parameterized probability density functions. We then present a novel version of the EM algorithm for off-line iden tification of HSMMs based on our model. Our algorithm finds the local max imum likelihood estimate of the model parameters. The major advantage of our method over current ones is that it has almost the same computational complexity as the EM algorithm for the HMMs, and hence is useful for a larger set of practical applications. We also present a sophisticated method for online identification of HSMMs, which is based on our proposed signal model. Our approach adaptively up dates parameter estimates, so that the log-likelihood of model parameters is maximized. We use an estimate of the Newton-Raphson direction as the up date direction of our model parameters. This choice results in a near optimum convergence rate in our algorithm. In this regard, our method is similar to the online identification algorithm for HMMs in [82]. 93 4.5 New Formulation of HSMMs 4.5.1 Hidden (or Modulating) Layer Modelling We consider a signal model where the state of the signal at time t, st, t € N, is determined by a finite-state discrete-time semi-Markov chain. We assume the initial state s\ is given or its distribution is known. The state space has N distinct states. Without loss of generality, we assume st takes its values from the set {ei, e2, • • • , e^}, where et is a N x 1 vector with unity as the ith element and zeros elsewhere. If st = e*, we say the signal is in state i at time t. The semi-Markov property of the model implies that the probability of a transition from state ej to e.;, j ^ i, at time t depends on the duration spent in state e.j prior to time t. This can be written as F(st+1 = ei\st = st-i = ek, • • • ,8i = et) = F(st+1 = ei\st = ey,dt(j)) (4.22) where dt(j) is the duration spent in the jth state prior to time t, that is dt{J) = id\st-i = ej, • • • , st_d+1 = ej, st_d ^ ej} (4.23) For each time t, we define the 'state duration' vector dt of size N x 1 as f d,(j) if st — e,-dt= \, .f \ 3 (4-24) [1 if st ^ ej dt(j) is easily constructed from dt-i(j) as dt(j) = st{j) x dt-\(j) + 1, which can be written in vector format as dt = st © dt_i + 1 (4.25) where 0 denotes element-by-element product. We model the state duration densities with a parametric probability mass function, pmf, 4>i(d). This means the probability that st stays exactly for d time units in state i is given by (f>i(d). 4>i(d) should be selected so that it adequately captures the properties of the signal under study. Hence, selec tion of <f>i(d) should be justified by some evidence from samples of the signal. Though HMM state durations are inherently discrete, it is noted in many stud ies that continuous parametric density functions are also suitable for modelling 94 state durations in many applications, including speech processing [76,77]. In this approach state durations are modelled with the best fitting parametric probability density function, pdf, and then the discrete counterpart of this density function is taken as the best pmf. That is, if 4>i{x) is the continuous pdf for state duration of ith state, then the probability that the signal stays fd in state i for exactly d time units is given by / <pi(x)dx. Since negative Jd-l state durations are not physically meaningful, it is usually more appropriate to select 4>i(x) from the family of exponential distributions [76]. Specifically, the family of Gamma distributions are considered in [77] for speech processing applications. In this paper, we assume that <f>i(x) is a Gamma distribution function with shape parameter and scale parameter rji, that is <fc(d) = -^—d^-le'^d (0 < d < co) (4.26) where V is the gamma function. The mean and variance of <f>i are Vi/rn and Vi/rtf respectively (see [85]). Note that our signal model presented here is applicable with minor changes, if a pdf other than Gamma is selected. Fur thermore, let <f>i(x) denote the cumulative probability distribution function of <j>i(x), i.e., <&i(d) = F(st stays in state i for at most d time units) (4.27) = / <j>i{x)dx (4.28We construct our model for HSMMs using state duration dependant transition probabilities. We define the state transition matrix Adt, as .Adt = [aij(dt)} where ai:j'(dt) = F(st+1 = ej\st = ei,dt(i)). Clearly, a;j(dt)'s are not constant and change in time; however, we will denote aij(dt) with a^- for notational simplicity. The diagonal elements of Adt, a^s, are the probability of staying in state i knowing that st has been in state i for dt(i) time units. an = F(st+1 = ei\st = eh dt(i)) = F(st+i = ei\st = eu st_, = eu ... S(_dt(i)+1 = ei; st_d((l) ^ e^) _ F(st+1 = eu st = ej,..., st-dt{i)+2 = e-i\St-dt{i)+i — ei} st-dt{i) ei) F(st - ei, st-i = ei,..., sf_dt(i)+1 = ei\st-dt{i)+i — eu st-dt(i) et) (4.29) 95 Meanwhile, the denominator of equation 4.29 can be written as HkLl ^(St+k ei,st+k-l = ei; . . . , St-dt(i)+2 = ei\st-dt(i) + \ = ei,St-dt(i) 7^ ei) or 2~lT=i ^(st stays in state i for exactly dt(i) — 1 + k time units), which is 1 — <&i{dt{i) — 1). Therefore, equation 4.29 can be written as °«(dt(t)) = i - ^(0 - i) (4-30) The probability that the state process stays in the ith state during its visit to this state for exactly d time units is given by (1 — au(d)) • Yl'kZi au(k). By substituting a„ from 4.30, it is easily shown that the probability density function of the state space durations is actually equal to the selected model <f>i(d). For i 7^ j, a,ij is the probability of leaving state i and entering state j, and is given by atj = ¥{st+i ^ ei\st = eudt(i)) x P(st+i = eAst = et,i ^ j) = (1 - a.a) • a°- (4.31) where a°- = P(st+i = ej\st = ei,i ^ j) is defined as the probability of transition from state i to state j, knowing that the signal leaves state i. We write the matrix Adt in terms of a diagonal matrix P{dt) representing the recurrent state transition probabilities, and a constant matrix A° representing the non-recurrent state transition probabilities. Adt=P(dt) + {I-P{dt))A° (4.32) 0 PM) := { 1 - Hdt(i)) = (4.33) 1 - *4(dt(i) - 1) ' J , \ ,\ (4-34) 0 ,i = j \?{st+i = j\st = i) Note that a° are constrained to Yl!j=\a1j = Since P(dt) is a diagonal matrix, one can show that Ejli aij{dt) = 1 f°r ah t-96 Hence, the hidden state process st evolves in time based on the following equations: st+i = Adl • st + vl+i (4.35) Adt = P(dt) + (I - P(dt)) • A° dt+1 = st+1 Qdt + 1 where vt+\ is a martingale increment; that is, E(t;t+1|si, S2, • • • , st) = 0. REMARK. Our modelling scenario can be encapsulated as a time homogeneous first-order Markov model. Consider the 2-vector process St as St — (st, dt). st takes its values from the finite set {ei\i = 1, 2, • • • , N}, and dt takes its values from the infinite set {f?\i = 1, 2, • • • ,N;d = 1, 2, • • • }, where ff is a N x 1 vector with d as the ith element and unity elsewhere. According to equation 4.25, dt+\ depends only on st+\, st and dt, and according to equation 4.22, st+i depends only on st and dt. Therefore, P(5t+1|S,t) is independent of t, and hence St is a homogeneous first-order infinite states Markov chain. 4.5.2 Observation Layer Modelling The state process st is hidden and is not observed. We observe the observation process yt, where the probabilistic distribution of yt is determined by state at time t, i.e., st. In this paper, we assume that for each state i, yL has a normal distribution. That is, if st — et then F(yt\st = e^) = N(yt, fJ-i,o~2), where Pi and of are the mean and standard deviation of the observation process yt for state i. We denote the probability of observing yt in state i with bi(yt) throughout this paper, that is k(ijt) = F{yt\st = (4.36) Therefore, yt may be written as yt = (At, st) + (Va2, st)wt (4.37) where fj, — /i2, • • • , P-N]', °"2 = [°~h ah ''' > aAf] > (•> •) denotes inner prod uct and wt is Gaussian white noise with zero mean and variance 1 . Equations 4.35 and 4.37 define the signal generation process in our sig nal model. Figure 4.4 summarizes and compares the signal generation process in our model with that of other HSMM signal models. 97 1. Start with an initial state, e.g., st — ei for t = 1. 2. Initialize dt = [l 1 ... l]' 3. Generate yt based on the density function of the current state. 4. Compute Adl (equation 4.32), and choose a new state st+i based on Adt-5. Update the state duration variable, dt, using equation 4.25. 6. Go to step 3. a) Our HSMM model. 1. Start with an initial state, e.g., St — ei for t = 1. 2. Select a duration d according to the ith state duration density function. 3. For the next d time units, stay at state st, and generate yt based on the densities of state Sj. 4. Select the next state according to a constant state transition matrix A, with the constraint that the signal should leave the current state. 5. Go to step 2. b) Traditional HSMM model. Figure 4.4: Comparison of the signal generation process in our model with that of existing signal models for HSMMs. 4.5.3 Model Parameterizations There are iV2 + 3N parameters that define an HSMM signal using our model. These parameters are N2 — non-recurrent transition probabilities a°-, the mean and variance of the observation process Hi and of for 1 < i < N, and the Vi and % parameters of the gamma distribution of the state-durations for 98 1 < i < N. We define 6 as a vector containing all the model parameters „0 ) uN-l,Ni & — (Mil M2, • • ' , PN, °2i " ' ' J aN> a12> a13> ' ' ' i a' (4-38) ^1,^2, • • • ,VN,VI,V2, ••• ,VN) 4.6 A New Algorithm for Off-line Identifica tion of HSMMs In this section, we present a new algorithm for off-line identification of HSMMs. Our algorithm is based on the signal generation model presented in Section 4.5, and requires less computational effort when compared to presently existing methods. Given a set of observations from an HSMM signal, = {yi,v2, • • • > yr}, we like to find 6, the parameters of the HSMM model. The algorithm we use is a variant of the EM algorithm. We first initialize 6 to an initial guess. Similar to the EM algorithm for identification of HMMs, in the 'E' step of our algorithm we define a set of probabilistic measures which describe the evolution of the hidden state variable st. We define 'forward variables' at(i) as the conditional probability of observing the partial sequence yi,y2, • • • ,yt and being in state i at time t, given the model parameters 0. That is, at(i)=F(8t = ei,y1y2...yt\0) (4.39) Let dt dt{l) dt(2) ••• dt(N) where dt(i) = E(dt{i)\8t = i,0,y1,y2,...,yt) (4.40) is our estimate of the state-duration variable for state i at time t. dt is initial ized to [l 1 • • • ' l] for t = 1. We reconstruct dt^i(i) iteratively as dt+1{i) = l+E(st(i)\ym...yt,e)-dt(i), l<i<N = l + -pQ--dt(i), l<i<N (4.41) i=l The state transition matrix Adt is updated for each t as Adt = P(dt) + (I - P(dt))A° (4.42) 99 where P is given in equation 4.33. The forward variable at(i) for t = 1 is initialized to the given initial state, that is, = for 1 < i < N. The other forward variables are constructed iteratively as N l < t < T, l < j < N (4.43) Similarly, the backward variables 6t{i) are defined as the probability of observing the partial sequence j/t+1,7/t+2,... ,yr, given the model parameters 6 and that the state at time t is = W(yt+1yt+2... yT\st = eu 6) Bt's are computed by initializing BT{i) = 1, for 1 < i < N and constructing the other variables iteratively using JV Bt{i) = £Pt+i(j) • ai3 • b3{yl+l) 1 < t < T, 1 < j < N (4.44) We define jt(i) as the probability of being in state i at time t, given the observation sequence VT and the model parameters 0. it{i) = nst = i\yT,o) jt(i) is expressed in terms of cc's and /?'s as N (4.45) (4.46) i=l Also, we define £t(i,j) as the conditional probability of being in state i at time t, and state j at time t + 1 6(*,j)=P(St =2,^+1=^,0) j) is given in terms of a's and B's as at(i)ayt)j(yt+1)A+i(j) iV N (4.47) (4.48) ^ ®t(i)a<ijbj{yt+i)Bt+i{i) i=l j = l 100 The variables 7 and £ are useful in interpreting the number of transitions between states. That is, Y^t=i 7t W ^s *he expected number of transitions from state i to any other state in an<f YlJ=i €t(.hj) IS the expected number of transitions from state i to state j in J-V-In step M of the algorithm, the model parameters 6 are updated to the maximum likelihood estimate of the model parameters computed from the forward-backward variables in step E. There are different approaches to obtaining the updating equations, all of which result in the same equations. These approaches are: 1) One can find 0 such that the 'Baum's auxiliary function,' Q, is maximized. Baum's auxiliary function is defined as the ex pectation of \og¥(yT, S\8)). It can be shown that maximizing the Baum's auxiliary function is analogous to maximizing the likelihood P(Vr|0). Hence, one can solve dQ/dO — 0 to obtain the update equations for the model param eters. 2) The problem can be set-up as a constrained maximization problem and the Lagrange-multiplier approach can be used to maximize the auxiliary function Q. 3) One can use the filtration variables, at's and f3t's, to count the expected number of transitions and use the concept of counting the event occurrence to obtain the update equations. We use the latter approach here, though the result is analogous if other approaches were used. Based on the definitions, Ylt=i 7t W*s ^ne expected number of transitions from state i in 3^T and Ylt=i Ct{i,j) IS the expected number of transitions from state i to state j in VT. Then, the transition probabilities are estimated as 0 expected number of transitions from state i to state j in VT expected number of transitions from state i in VT 5>(*,J) _ t=i ELY iS) T-1 ^2 <xt(i)aijbj(yt+i)6t+i{j) _ t=i ~ T-1 t=i 101 (4.49) (4.50) Similarly, the mean and variance of the observation process are estimated as T-l ^lt(i)yt = T=i (4'51) t=i T-l J2it(i)(yt - Hi)2 ^ = (4-52) Let /j,iiS and a2s be the mean and variance of the state-duration for state i respectively. fj,is, is estimated as T-l = (4-53) £>(st+i ^,^ = ^13^,0) We have P(St+i 7^ i, St = Z^T, #) = 1t(i) - £t(i, Hence, in terms of a's and /3's is given by T-l ^at(i) (0t{i) ~ aiibi(yt+1)Pt+i{i)) dt(i) _ 4=1 T-l (4.54) Y^aS) Wt{i) - aiibi(yt+i)Pt+i{i)) t=i Similarly, the variance of state-duration distribution is given by T-l £at0Q (8t(i) - aii6i(3/t+i)A+i(*)) _ ^M)2 <t = *=1 r_t ' (4-55) ^ai(i) (A(«) - aabi(yt+i)Pt+i{i)) t=i 102 • E Step: 1. Forward Filtering: Compute ctt(i) and dt(i) for 1 < i < N and 1 < t < T — 1 using equations 4.41 and 4.43. 2. Backward Filtering: Compute (3t(i) for 1 < i < JV and 1 < £ < T using equation 4.44. 3. Find the State Probabilities: Compute £,t(i,j) and ji.(i) using equations 4.48 and 4.46. • M Step: 1. Update the model parameters, 8: Use equations 4.49, 4.51, 4.52, 4.54 and 4.55 to update the model param eters. Figure 4.5: 'E' and 'M' steps of our algorithm for off-line identification of HSMMs. The parameters of the state duration distributions, Vi and m, are given in Figure 4.5 summarizes our algorithm. The algorithm stops when the 9 converges to a constant vector. The forward-backward algorithm has the computational complexity of 0(N2T) per pass and requires a memory of 3NT because all the forward-backward variables and the estimate's of the state duration variables need to be stored. 4.6.1 Implementation Issues Choice of Initial Estimates Since the EM algorithm finds the local max imum of the likelihood function, it is important to start the algorithm with suitable initial values. Though there is no straightforward method for select ing proper initial values, there are a few techniques in the literature which can assist in selecting the initial values. One of these methods uses segmen tation of the observations into states, and averaging the observations between terms of and of s as (4.56) 103 the segmented states. Segmentation can be performed manually, using max imum likelihood segmentation or the ''segmental k-means' method. For more information on these techniques, please refer to [86]. Scaling From equation 4.43 it becomes clear that as t increases, at's become small very fast, and can quickly fall out of the numerical range of any computer. The same argument applies to /?t's, when t decreases. To avoid this, we suggest using a scaling scheme similar to the scheme used in [72] and [86] for the HMM case, where at's are scaled to sum up to 1 for all t. More precisely, let at denote the unsealed variable, at(i) denote the scaled variable and d-t the local version of a's before scaling. Let ct be the scaling factor at time i, where ct — 1/ 2~2iLi &t(i)- Both a's and /?'s are scaled using ct, that is, at(i) — ctat(i) and J3i{i) = ct(3t(i). It can be easily shown that &t(i) = Ctat(i) A+iW = A+iA+iW (4-57) where Ct = nf=i c» an(^ A+i = PI^Lt+i cs- Using these equations, it can be shown that the scaling factors are cancelled out in all of the final update equations, except equations 4.54 and 4.55. In order to cancel the scaling effect on equations 4.54 and 4.55, these equations should be rewritten as r-i (A(Oci_1 - aiibi{yt+i)Pt+i{i)) dt{i) MM = H=i (4-58) ^"tW {Pt{i)ctl - aubi(yt+i)Pt+i(i)) t=\ T-l £"*(*) {Pi{i)ctl - aiibi(yt+i)Pt+i{i)) (M*) ~ MM)2 4, = lzL—r~1 (4-59) {Pt{i)ctl - aabiiyt+JPt+^i)) t=i This assures us that the scaling will have no effect on parameter estimates. 4.7 Online Identification of HSMMs In this section, we present a new algorithm for online identification oi HSMMs. Our algorithm is based on the signal generation model presented in Section 104 4.5. Our approach is to set up the problem of online identification of HSMMs such that the general recursive prediction error (RPE) method can be applied to the problem. First, we will present a general background on general RPE technique in Section 4.7.1. Then, we will present how we adopt this technique for online identification of HSMMs in Section 4.7.2. 4.7.1 General RPE technique RPE is a class of general numerical parameter estimation, where a norm V(6) is defined that measures the prediction error and the estimate of the model parameters is updated according to et+l = et + x- /w (4.60) where is a search direction based on some information about V{0) acquired at previous iterations, and A is a positive step size. An important class of these methods is the Newton type algorithms, where the correction in 4.60 is chosen in the Newton Direction /W = -[K"(0t)]-1-V:,(0t) (4.61) There are approaches that use values of function V as well as of its gradients. The most important subclass of these approaches is the quasi-Newton methods, which somehow form an estimate of V" and then use equation 4.61. Now consider a weighted quadratic prediction error criterion t V{d)^ltYJP{t,k)e\k,6) (4.62) k=i where e(k, 0) = y(k) — y{k) is the prediction error. B(t, k) is a weighting sequence with the following property B{t,k) = \{t)B{t-l,k) ,0<k<t-l (4.63) 3(t,t) = 1 (4.647 is a normalizing factor. Then it is shown that the general search equation is given by Bt = 9t-i - MRt]-1 • V (4.65) 105 where V is the gradient of V, and Rt is a matrix that modifies the search direction. It is shown that this updating formula can be written as 8t = + 7t • [Rt]-1 • ib(t, 0t_,) • e(£, 9t^) (4.66) where ijj is the gradient matrix of y with respect to 8 The simplest choice for R is Rt — I, which results in the gradient or the steepest-descent method. This method becomes fairly inefficient in the region close to the minimum of the norm function. Choosing Rt = V"(9t), makes the equation 4.66 a Newton method, which typically perform much better close to the minimum. In this case, V" is given by N N t=i t=i It may however be quite costly to compute all the terms of ip'. Then an approximation of V" is used, where the second sum in equation 4.67 is ignored 1 N (4.68) t=i which is by definition the Hessian of y. If we choose Rt = V" and use the approximation in equation 4.68, then it is shown that Rt is given as t Rt = -rt^2l3(t,k)-it>(k)-Tj;'(k) (4.69) fc=i It is also shown that Rt can be constructed recursively as Rt = Rt-i + 7tbJ>(W(t) - R(t - 1)] - (1 - 7t) • R(t - 1) + lt^' (4.70) The RPE algorithm is summarized with the recursive scheme presented in Figure 4.6. 4.7.2 Online Identification of HSMMs Using the RPE Method Let 0t denote our estimate of the model parameters at t. We define the 'ob jective function' ^(0t) = logP(yi, 2/2, • • •, Ut\dt), as the log-likelihood of the 106 1. e(t) = y(t) - y(t) 2. 6t = 6t-1 + lt-[Rt]-l-W)-e{t) 3. Rt = Rt.! + lt{m^'(t) ~ R(t ~ 1)] Figure 4.6: Parameter update equations performed in each step of a general RPE algorithm. observations up to time t given 9t. £t(9t) can be rewritten as t tt{ot) = J2 iogP(t/T|t/i, y2, • • •, yr-i-,et) T-l t = EM«t) (4-71) T = l where = logP(yT|yi, y2, • • •, yT-i, Qt) is the log-likelihood increment. Our approach here is to update 6t in a direction that maximizes the objective function £t{Qt)- We use the recursive prediction error (RPE) method, where the parameters are updated in the Newton-Raphson direction (see [84] for an extensive discussion of this method). This selection greatly increases the speed of the algorithm. Starting with an initial guess for 9t at t = 1, 9t is updated at each time instance t using Ot+i - 9t + • iV+i • A+i (4-72) where Rt = d2£t(9)/d92 is an estimate of the 'Hessian Matrix', ipt = dut(9)/d0 determines the search direction and is the gradient of ut with respect to 6t. Xt is a step size. Rt and ipt are called 'sensitivity parameters' and are estimated recursively in our algorithm. Figure 4.7 illustrates the basic block diagram of our scheme. The details of how each block performs its function are described in sections 4.7.4 to 4.7.7. Below, we summarize the basic function of each block. 1. Estimation of hidden layer variables: Given the observation yt and model parameters estimate 9t, this block computes the forward variable at and state duration estimate dt. These variables are constructed iteratively from at-i and dt-\. 107 yt dat 9dt 90 ' 90 yt Sensitivity Equations Oct At 99 Update H Hessian Matrix yt Estimation of Hidden Layer Variables rr Ut+1 Update Model Parameter Rt+1 e t+ Figure 4.7: Recursive Prediction Error scheme for estimating HSMM param eters. 2. Updating the gradient vector V't+i- Using the obtained ctt and dt, this block computes the gradient vector ipt+i- For this computation. dat_i/d9 and ddt/dO should be estimated as well. Hence, this block constructs dat_i/d9 and ddt/d9 recursively and uses these parameters to update 3. Updating the parameters estimate 9: This block updates the model pa rameters estimate, 9t, using equation 4.72. 4. Updating Rt: This block recursively updates the estimate of the Hessian matrix, Rt, from and ibt. We present the details of each part of our algorithm in the following sub-sections. For simplicity, we use 9 instead of 9t in our notations. 4.7.3 Model Parameterizations Similar to the off-line case, our signal model has N2+3N parameters. However, as shown in [71], it is more convenient to parameterize the model as 9 = (AI, a2, c12, c13, • • • , cyv_1)iVl ^, v)' (4-73) where ctj are simply defined as ci3- = (a")1^2. This parameterization selection simplifies the development of the update equations. As in the off-line case, pt 108 and cr2 are the vectors of the mean and variance of the observations process, and v and r\ are the vectors of the parameters of the gamma probability density functions of the state durations. 4.7.4 Estimation of the Hidden Layer Variables As with the off-line case, we define the forward variables as at(i) = F(yiy2 • • - yt, st i\0). Let at = [at(l) ott{2) ••• at(N)] . Then the forward filtering recur sion equation is given by at+1 = BA'dat (4.74) where B is a diagonal matrix, bn = bi(yt+i) (see equation 4.36), and A'dt is the transpose of the state transition matrix (equation 4.35). The conditional estimate of the state at time t is given by 7t = E(st\yuy2, • • • ,yu0) = ctt(cxul)-1 (4.75) Given the observations up to time t, the next state and next observation of the signal are estimated as T£-{8t+1\y1,y2,...,yu0) = A,dt'yt (4.76) iit+i = (M,E(s(+i|yi,2/2, • • -,yt,0)) = (^A'dlat-{at,l)-1) (4.77) The estimate of the state duration variable is updated similarly to the off-line case (equation 4.41), as dt+i = E(st\yl,y2,..., yt, 9) © dt + 1 = at(atll)-10dt + l (4.78) The log-likelihood increment, ut+1 (equation 4.71), is given by ut+1 = logF(yt+l\yi,y2,...,yt;9) (4.79) N = log£P(7/t+i|st+i = ei,y1,y2,...,yt;0) xP(% = e^, y2,..., yt\ 0) i=i (4.80) = \og(l,BA'dtlt) (4.81109 4.7.5 Gradient Vector Update Equations We denote the derivative operator with respect to variable x by DX, that is, Dx(.) = d(.)/dx . Thus, the gradient vector ipt is written as _ dut+i, dut+i dut+l dut+1 dut+1 dut+1 dpi ' do2 ' dcij ' dfj,iiS ' da2s J \e=et = (D^UI+I, DG2Ut+i, DC ut+l, sut+1, DA2 ut+i) (4.82) We write the elements of t/jt+i m terms of the derivative of the filtration parameters (i.e.,a,, and dt) with respect to the model parameters. Our esti mates of the derivatives of the at and d, are constructed recursively from their estimates at t — 1. For example, the first element of f/'t+i, DIHut+\. is writ ten in terms of D^ott and DHdt. Dltjat+i and DHdt+i are also constructed recursively from DHctt and.DFLJdt. In deriving the update equations, it is assumed that the probabilities of non-recurrent transitions (i.e., c^'s), and the parameters of the state duration probability density functions (i.e., v and 77) do not depend on each other. Thus, to calculate i/jt+\, we use the following equations for the deriva tives of ut+i with respect to a2, Cij, v and r\. 1-D„ D^ut+i = (l,BA'dtlt)-1 ((l,D^(B)A'dtlt) + (l,BD^(A'dt)7t) + (1, BA'dtD^t))) (4.83) DIHLL = ^(aOa.Qt)"1 +at(l,Z?w(at))-1 (4.84) D^at = D^(B)A'dtat_! + BD^A'Jat.! + BA^^a^O (4.85) £>/J4B = (^=5^) • B • diag(ej) (4.86DtltAdt = D,t(P)(I-A°) (4.87) 110 d.ag { (1 _ ^ _ 1))2 O Z>w* (4.88) Dwdt = dt_i 0 D^7t_i + I>wdt_i 0 7t_i (4.89) {l,BA'dtlt)-1 ((l,Dai{B)A!dtlt) + (l,BDa,{A!dt)lt) + (1, BA'^D^))) (4.90) Da2jt = Da,(at){l,cxt)-1 + cxt(l,Dal(cxt))-x (4.91) Da?at = D^A'^oc^ + BD^A'Ja^ + BA'dDa,(cxt^) (4.92) DalB = ((^2~f)2 - • diag(e8) (4.93) Da?Adl = Da?(P)(I-A°) (4.94diag { (i - Hdt - i))* 0 ^ (4.95) D^dt = Da?(7t-i) © dt-i + 7t-i © D^dt.x) (4.96) 111 (^BA'^t)-1 {(l,BDCmn{A>dt)lt) + (l,BA'dDCmM)) (4.97) D^iaMl,^)-1 + cxt(l,DCmn(at))-1 (4.98) BDCmn(A'dt)<*t-i + BA!dtDCmn{oLt-i) (4.99-P-DCmn(A°) (4.100{0 if m 7^ i 2cij iim = i,n = j ' (4.101) -2c„m if m = i, n ^ j 4- £>„ (4.102) £>w7t = DIh{at){l,at)-1 + at{l,DTK{at))-1 (4.103) = BZ^fA'Ja,-! + BA'dDm(at^) (4.104^Adt = ^(P)(/-A°) (4.105) Dm(P) is a matrix with all zero elements, except the element in row i and column i, which is given by 1 - $(dt(0;7?i,t/j) . = ^4 - ^>K(0 - 1;^,^)^ A»[$ W) - i)](i -  A„[<f>K(*))](i - -1)) (i - $(4(0 -1))2 (4.106) DVi<&(d; rji, v%) is obtained by differentiating <f>(<i; r/j, z/j) as defined in equa tion 4.27 DVi${d; Vi, ^i) = *{${d; r,t, u%) - r*, ^ + 1)) (4.107) DCmnUt+l Dr cx, DCmn Adt DCmna°3 112 5- Du DViut+l = (l,BA'dtlt)-1 {(l,BDUi{A'dt)lt) + (l.BA'dD^(lt))) (4.108) DVilt = A,(at)(l,at)-1 + at(l, ^(a,))"1 (4.109) DViat = B^^Ja,.! + B^D^at.!) (4.110) A,(Adt) = A,(P)(/-A°) (4.111DVi{P) is a matrix with all zero elements, except the element in row i and column i, which is given by D 1 - $(dt(i);^,i/i) DVi[*(dt(i) - 1)1(1 ~ $K(*))) - DvjMdtWW ~ $ W) ~ 1)K (1-*K(*)))2 j (4.112) Unfortunately, differentiating §(d; rj, v) as defined in equation 4.26, with respect to v does not result in a simple form as in equation 4.107. How ever, we can easily find the numerical value of DVi§(d\r)i,vi). We have A,($(d;77,i/)) = (logfa) - ^(v))§(&\i),v) + / log(x)<^(x; v, u)dx Jo (4.113) where is the 'digamma' function [87,88]. The digamma function is a known special function defined as (4.114) where T(x) is the gamma function. The numerical value of the digamma function at any point is easily obtained from ^(x+n) = < 7 + J2(-l)k+1((k + l)xk , forn = 1 and -1< x < 1 n—1 ^ h *(x' + 1) , for n > 1 and -1 < x < 1 r 4- A; • (4.115) fc=i 113 It is shown that only twenty terms of equation 4.115 suffice to compute the $(x) to the full machine precision in 32-bit floating point format [77, 87]. The term log(x)4>(x; rj, v)dx is easily computed using numerical definite integral calculation methods. Therefore, we can compute the numerical values of DVi(§{dt(i) - 1) and DVi($(dt(i)), and substitute them into equation 4.112 to obtain DVi{P). 4.7.6 Parameter Update Equations After finding tpt+i, the parameter vector 0 is updated using 9t+l = 0t + \t+1 • RT^ • ifrt+i (4.116) 4.7.7 Updating the Hessian Matrix Estimate, Rt Rt is an estimate of the second derivative of the criterion function. Rt is updated recursively (see [84,89]) as Rt = Rt-i + \tmW{t) - R{t - 1)] (4.117) 4.7.8 Implementation Issues Our online identification algorithm finds the local maximum likelihood esti mate of the model parameters. Hence, similar to the off-line case, it is im portant to start the algorithm with proper initial values. Furthermore, the following issues should be considered when implementing our algorithm. Scaling Similar to the off-line case, the forward variable a., converges rapidly to zero as t increases. This can be avoided by using the same scaling technique as in the off-line case, where at is scaled as at = at{l,at)-1 (4.118) It can be shown that this scaling does not affect the update and recursion equations. 114 Choice of the step size At At in equation 4.116 is a step size. In theory, Xt can be any sequence satisfying oo oo At>0, ]TAt = co, J^A2<oo (4.119) 1=1 t=i A common choice is At = 1/t, where Xt has a weighting effect on the norm function, such that recent observations have a stronger effect on the update than older observations. There is a trade-off between the 'tracking ability' and 'noise sensitivity' of the algorithm in selecting At. Choosing a larger step size (e.g., Xt = 1/y/t) results in faster convergence of the estimate to the real parameters and allows the algorithm to track the changes in the actual parame ter values faster. However, selecting larger step sizes makes the estimate more sensitive to noise. The possibility of using different and more sophisticated choices for step sizes is discussed in [89,90] and [91]. Avoiding matrix inversion in the update equations The update equation 4.116 involves inversion of Rt. This matrix inversion can be avoided by using the matrix inversion lemma [89]. If A,B,C and D are matrixes then the matrix inversion lemma states [A + BCD}-1 = A'1 - A-XB[DA-IB + C-1]-1^-1 (4.120) By taking A = (1 — jt)Rt-i, B — D' = 'tp and C — 7t, parameter 4.116 can be written as: R7l = 1 (R;\- WWtt ) (4121) Projection of parameters to the constrained domain Since the parameters of an HSMM are constrained, the estimated parameters should be projected into the constrained space. More precisely, the parame ters Cij — {a°j)ll2 are constrained to J2iLi cn = 1- Thus, after updating the parameters using equation 4.116, c^-'s are re-normalized as 4 = 7#^r (4-122) TN c2-115 4.8 Effective Bandwidth of a Hidden Semi-Markov Model In this section, we show how the effective bandwidth of a signal is obtained from its HSMM's parameters. Our approach is to reformulate the Semi-Markov chain of an HSMM as a Markov chain with larger number of states. First, we will show how the effective bandwidth of an HMM is obtained. Next, we show how a Semi-Markov chain is reformulated as a Markov chain with larger number of states. Finally, we show how the effective bandwidth of an HSMM is obtained using this approach. 4.8.1 Effective Bandwidth of Hidden Markov Models In this Section, we show how the effective bandwidth of an HMM is obtained. Let yt be the observation process of an HMM with the hidden Markov state process xt, and with the transition probabilities matrix A. Let Y(t) be the t cumulative sum of observation process y in [0,t], that is Y(t) = ^^yT- Also r=0 let ipi(9) be the moment generating function of the observation layer for state i, that is, <pi(0) = E(e6yt\xt = i). We define V^OM) := E(e9r«|x'i = i). We have 116 ^(9,1) =E(e0Y{t)\xl =i) =E(eByi\x1 = i) x E(ee(yW-y'(1»|x-1 = 1) M =ip\0) J2^(e6Y{t)~YW\x2 = =i)x F{x2 = j\xx = i) i=i M =^(e)J2ne6Y{t)-YW\x2=j)a.lJ M ^{e)^2neeY[t-l)\xi=j)al3 i=i M =^(0)£>*(0,*K- (4.123) i=i -If we consider the vector $f(9,t) — [tpi(9,t)} and the diagonal matrix $(0) = diag[<^i(0)], then the above equation can be written in matrix form as \b(0,t) =*(0) -A- #(0,t- 1) (4.124) with the initial condition ^(9,1) = <&(9) • 1T, where 1 is a column vector with all its elements being one. Now let 7Tj be the probability that initial state x\ = i, and also let 7T = [7ri,7r2, ' • • ,7TM], tllUS E{eeY{t)) = 7T^{9,t) = TT{${9)Ay-l<b{9)l T (4.125) Since the transition matrix A is irreducible, then A is primitive. Since <J>(0) is diagonal, then matrix $(9) A is also primitive. Hence, we can use the Perron-Frobenious theorem (see Theorem 8.5.1 in [92]) from matrix theory to show that lim m9)A/sPm9)A)]n = L{9) (4.126) where L{9) is a constant matrix and sp($(9)A) is the spectral radius of the matrix $(9) A. The spectral radius of a matrix is simply the largest absolute 117 eigenvalue of that matrix. Therefore, equations 4.125 and 4.126 result in the following expression lim \- logE(e<"'(t)) = (1/9) \og(sp(<5>(9)A)) (4.127) t—tea 9t Hence, we have a(9) = (1/9) log (sp($(6)A)) (4.128) where <f>(#) is a diagonal matrix. 4.8.2 Reformulating a Semi-Markov chain as a Markov chain Let st be a Semi-Markov chain with N distinct states, taking its value from the space {1,2,..., N}. We assume that the time that signal stays in each state is limited, say by D time units. This means that the state process st will not stay in any state for more than D time units. Our motivation in here is to reformulate the N state Semi-Markov chain st in the form of a Markov chain with N x D states. Our approach in similar to what was presented in [78]. Assume, that the state process is in state i at time t, i.e., st — i. Then define dt as the time that st has spent in state % prior to time t. That is dt = {d\st = z, st-i ='«,..., St-d+i = i, st-d (4.129) Notice that dw = \dt + 1 liSt+1 = St (4.130) L 1 Otherwise Therefore, dt+\ depends only on dt, st and s/+1. Note that by definition, variable dt is restricted to if st+i = st and dt < D if st+i = st and dt = D (4.131) if st+i ^ st Now consider the vector stochastic process xt defined as xt = (st, dt). xt takes its values from the space {(i,d)|l < i < NA < d < D} and clearly is a finite-state process with N x D states. Note that au(dt) if xt = (i, dt) and xt+i = (i, dt + 1) (xl+i\xt) = { al3(dt) if xt = (i,dt) and xt+1 = (j, 1), i + 3 (4.132) 0 Otherwise 118 where au(dt) is the probability that st stays in state i given that it has spent dt time units in state i. Similarly a,ij(dt) is the probability that st transits from state i to state j given that it has spent dt time units in state i. Note that F(xt+i\xt) is independent of t, and hence. xt is a Markov chain. For simplicity of notation, we will denote the state space for xt as {1,2,..., M) from now on, where M — N x D. If xt — m, then we have 5f = [!!7rJ + 1 (4133) dt = m - (st - l)D Til — 1 = m-[^-\D (4.134) where [xJ is the largest integer less than or equal to x. We also have vi — [st - l)D + dt. Let d = m - \J^\D, i = [^J + 1 and j = [^\ + 1; then the state transition probabilities matrix for the Markov process xt is given by au{d) if n = m + 1, and L^j = j atj{d) ifn-L351JI> = l, and^J^L^J (4135) 0 Otherwise Since xt has M = N x D states, then the state transition matrix for xt has M2 = (ND)2 states. However, since variable dt is restricted based on equation 4.131, it can be easily shown that (M2 - N2D) elements of the state transition matrix are zero and only N2D elements of the transition matrix are non-zero. 4.8.3 Effective Bandwidth of an HSMM by Reformu lating as an HMM Let yt be an HSMM, with state process st. Following the approach presented in Section 4.8.2, st can be reformulated as a Markov chain xt with larger number of states; Hence, yt can be modelled as an HMM with modulating (or hidden) process xt. Then, the effective bandwidth of yt is easily obtained using the equation 4.128 as presented in Section 4.8.1. Let bi(yt) = F(yt\st = i) denote the densities of the observation process yt when yt is considered as an HSMM. Also let <pi(6) be the moment generating function of the observation process, given that the state process is in state i, that is <pi(9) = E(eme\st — i). Now consider yt as an HMM with state process 119 xt. Let b'm(yt) denote the densities of the observation process given that state at time t is xt — m. According to equation 4.133, we have b'm(yt) = nyt\xt = m) (4.136) = nvt\st = L-= 6L2-iJ+1(yO m — 1 J + l) (4.137) D (4.138) Based on this equation, one can easily construct the diagonal matrix $(6>) (see equation 4.8.2) of size ND x ND as ${9) = diag{<Pl(0),--- ,<Pi(6),M9),--- ,<PN(9),--- ><PN(9)} » ' "> » ' •> ^ ' By substituting the obtained $(6) in equation 4.128, one can easily obtain the effective bandwidth of yt, ay(6). 4.9 Numerical Results In this section, we present the numerical results of implementing the methods presented in this chapter. In Section 4.9.1, we present the results of applying the off-line and online identification methods presented in sections 4.6 and 4.7 to synthetic data generated by simulating an HSMM. Our objective is to experimentally verify that our identification method actually finds the true parameter values. As our online identification method results in the same esti mate as the off-line method, we only apply the off-line identification algorithm to empirical traffic samples of a few typical TV programs in Section 4.9.2. We also present numerical results of estimating the empirical bandwidth curve for these sources. 4.9.1 Numerical Results for Synthetic Data To verify that our identification methods give accurate estimates of an HSMM parameters, we conducted two experiments, one for the off-line case, and one for the online case. In our first experiment, the parameters of two HSMM signals, each having N = 3 distinct states, were estimated using the off-line algorithm presented in Section 4.6. The number of samples for each model was D times D times D times (4.139) 120 Parameter Actual parameter values for for model 1 Actual parameter values for for model 2 Initial parameter values for for models 1 and 2 A° 0 0.30 0.70 0.75 0 0.25 0.15 0.85 0 0 0.30 0.70 0.70 0 0.30 0.50 0.50 0 0.00 0.50 0.50 0.10 0.00 0.90 0.50 0.50 0.00 -10 0 10 i -10 0 10" T -15 3 15 a2 5 5 5]' 10 10 10 1 8 8 8]' Ms 10 20 30' i 10 20 30]' 10 10 10' f n 5 10 15] 5 10 6]' 'l 10 20] V [50 200 45C )]' [50 200 180 i / [10 100 20C )]' Table 4.2: Actual and initial values of the parameters of HSMM models used in simulating our off-line identification algorithm. T = 10000. The actual values of the parameters are given in Table 4.2. The first model can be considered to be in a low-noise condition (i.e., |//,; — ^> a2) while the second model is in high-noise condition. The initial values for the model parameters in both cases are similar, and are shown in Table 4.2. Figures 4.8 and 4.9 illustrate how some of the parameter estimates are updated in each iteration. We observe that the parameter estimates converge to the actual value of the parameters after a few iterations. The log-likelihood of the total observation yT given the parameters estimate, log(P(^T|^)), is also plotted in figures 4.8-d and 4.9-d. As shown, this log-likelihood increases in each iteration, demonstrating that the algorithm finds the maximum-likelihood estimate of the model parameters. In the next experiment, we applied our method for online identification of HSMMs to two HSMM signals, with the parameters shown in Table 4.3. As shown, the actual parameters of the second model change at t = 5000. The results of estimating some of the parameters of these models are presented in figures 4.10 and 4.11, respectively. We observe that the parameter estimates converge to the actual value of the parameter as t becomes large, and that our algorithm successfully tracks the changes in model parameters. 121 Parameter Actual parameter values for model 1 Actual parameter values for model 2 1 < t < 5 x 103 A" 0 0.80 0.20 0.50 0 0.50 0.30 0.70 0 0 0.50 0.50 0.50 0 0.50 0.30 0.70 0 A4 [-10 0 iol' [-10 0 10]' a2 [2 2 21' [4 4 4]' Us [10 20 30]' [10 20 30]' V [5 10 15]' [5 10 15]' V [50 200 450]' [50 200 450]' Parameter Actual parameter values for model 2 5 x 103 < t < xlO4 Initial parameter values for models 1 and 2 A° 0.00 0.50 0.50 0.15 0.00 0.85 0.30 0.70 0.00 0.00 0.50 0.50 0.50 0.00 0.50 0.50 0.50 0.00 [-5 0 10]' [-13 4 20]' a2 [4 4 4]' [10 10 10]' Ms [10 20 30]' [5 10 10]' V [5 10 15]' [8 10 20]' V [50 200 450]' [40 100 200]' Table 4.3: Actual and initial values of the parameters of HSMM models used in simulating our online identification algorithm. 122 Figure 4.8: a-c) Parameter estimates for the first model versus the iteration number. The dotted lines show the actual value of the parameters, d) The log-likelihood function, log(P.(yi, 2/2, • • •, Vt\Q)), versus the iteration number. As shown, log(P(yi,y2, • • • ,Ut\9)) increases in each iteration. 123 Figure 4.9: a-c) Parameter estimates for the second model versus the iteration number. The dotted lines show the actual value of the parameters, d) The log-likelihood function, log(P(j/i, y2, • • •, Vt\Q)), versus the iteration number. As shown, log(P(y1; j/2,... ,yt\0)) increases in each iteration. 124 (c) Figure 4.10: Online estimation of a 3 state HSMM: a) state transition prob ability a°2; b) observation mean for state 1, pi, c) state duration mean for state 1, /iSti. The dotted line shows the actual value of the parameter. The parameter estimate converges to the actual value of the parameter. 125 (a) -4 -5 -6 -7 -8 -9 -10 -11 -12 -13. 6000 8000 10000 (b) Figure 4.11: Online estimation of a 3 state HSMM, where the actual parameter changes at t — 5000: a) state transition probability a21; b) observation mean for state 1,^1- The dotted lines show the actual value of the parameter. The parameter estimates follow the temporal changes in the actual value of the parameter. 126 4.9.2 Numerical Results for Empirical TV Traffic Traces In this experiment, we first fitted an HSMM model to the empirical traffic traces of a few typical TV programs using the offline parameter identification method presented in Section 4.6. The empirical traffics are from the video sequences described in tables 2.4 and 2.5 in Section 2.5. Table 4.4 shows the numerical value of the estimated model parameters for three of the sources. Figures 4.12-4.14 show how the model parameters converge to the final esti mate as more iterations of the off-line algorithm are performed. Then, we obtained the effective bandwidth curve a(0) from the esti mated HSMM model parameters using the method presented in Section 4.8. Figure 4.15 shows the obtained effective bandwidth curves. As shown, a{9) is an increasing function of 6, where a(0) is the average rate and a(oo) is the maximum rate of the source. Then, we employed the obtained effective bandwidth curves in our ad mission control scheme, and plotted the maximum waiting-time Tw versus rate R (see Figure 4.16) for an incidental stream that uses the stochastic service class, and for a constant loss probability p. In next simulation, we examined the accuracy of our stochastic admis sion control mechanism. We considered a transmission system, which multi plexes N main video streams, one incidental stream using the deterministic service, and one incidental stream using the stochastic service. We conducted two experiments using two different simulation parameters, as shown in table 4.5. The first set of parameters are selected to simulate cable transmission medium, while the second set simulates a terrestrial medium2. The incidental stream using the deterministic service class has the rate Then, we used the stochastic admission control scheme to obtain the loss probability for an incidental stream with rate R and waiting time Tw that uses the stochastic service class. Finally, we simulated the buffering operations in the transmis sion system3, and observed p the loss probability of incidental stream that uses stochastic service class. Figures 4.17-a and 4.17-b illustrate the loss probabil-2A 6 MHz channel in the cable medium is capable of delivering digital data at the 19.8 Mbps rate. This capacity is usually shared by 4 or 5 TV programs. In terrestrial medium, a 6 MHz channel is capable of delivering at the 39.8 Mbps rate, which is usually shared by 8 or 9 TV programs 3The packet scheduling method employed in this simulation is described in detail in Chapter 5. 127 ity p versus R obtained from simulation and admission control scheme for the simulation parameter sets I and 11 respectively. As shown, the actual loss prob abilities obtained from simulation, shown by the plus signs, are very to the loss probabilities estimated by the admission control scheme. These results verify that our stochastic admission control scheme can accurately estimate the sys tem performance in terms of loss probabilities. Furthermore, it is noted that the admission control is more accurate for a larger Tw parameter. That is the actual loss values are closer the loss probabilities estimated by the admission control in Figure 4.17-a, where Tw = 30, than Figure 4.17-b, where Tw — 5. This is due to the fact that our admission control scheme is designed based on the assumption that the buffer waiting-time (or the buffer size) is very long. 128 cu rf a cu +^ cu S S-i CU r-« .2 3 .2 Q. ~ If » c ^ rf , M > - , co co a) a> a CX cn '"Si -1-5 rf rf > s CO cu 3 "rf > CT> LO LO CM o S oo CM o OS CO CO LO CO o co LO co cn co co LO o o o o o o LO O o CO o o ° s o Q CO o o CD O co p o © o CO o CO CO o CO 00 co LO LO o CN CM co co CO co o o CN LO o CTl CN CN oo CN CT> O LO CT CN LO LO CN CO o CN 129 130 Figure 4.13: Online estimation of HSMM parameters for the Documentary (Best Places in Canada) sequence. 131 132 Simulation parameter set I (cable medium) Simulation parameter set II (terrestrial medium) Transmission rate 19.8 Mbps 39.8 Mbps Number of TV programs sharing the channel, N 4 8 Maximum bitrate assigned to each main video stream 4.5 Mbps 4.5 Mbps Transmission capacity reserved for video streams 18 Mbps 36 Mbps Transmission capacity reserved for audio streams and other ancillary data 1.8 Mbps 3.8 Mbps Rate of the incidental stream that uses the deterministic service class 0.8 Mbps 1.5 Mbps Main video stream sources 1. Mission Impossible 2. News 3. Talk Show 4. Documentary 1. Mission Impossible 2. News 3. Talk Show 4. Documentary 5. Court Show 6. Muppets show 7. Soap opera 8. Cartoon Table 4.5: Simulation parameters. 133 134 (a) 0.2 0.4 0.6 0.8 1 1.2 R Simulation parameter set I, p = .05 120r 40 20 0.2 0.4 0.^ 0.8 1 1.2 Simulation parameter set I, p = .01 7 6 T5 w 4 3 2 1 0.5 1.5 (b) Simulation parameter set. II, p = .05 0 0.5 ^ 1.5 2 (d) Simulation parameter set II. p = .01 Figure 4.16: Tw versus R for an incidental stream that uses the stochastic service class. 135 Probabaility of loss versus R, T =30 0.14 0.12 0.1 0.08 P 0.06 0.04 0.02 — Probabaility of loss, computed by the admission control s • Actual loss observed in experiment (a) Simulation parameter set I, Tw = 30 Probabaility of loss versus R. T = 5 0.35 0.3 0.25 0.2 P 0.15 0.1 0.05 Probabaility of loss, computed by the admission control i Actual loss observed in experiment 0 0.2 .0,4 0.6 0.8 (c) Simulation parameter set II, Tw = 5 Figure 4.17: Probability of loss p versus rate R, where Tw is constant. The solid lines illustrate the loss probabilities obtained by the admission control scheme, and '+' signs illustrate the actual loss probabilities obtained from simulating the multiplexing system. As shown, actual loss probabilities obtained from simulation are close to what obtained from the admission control. 136 4.10 Chapter Conclusion In this chapter, we studied stochastic traffic models for modelling the traffic of full screen video sources in digital TV. We showed that though HMMs can adequately capture most of the stochastic properties of video traffic sources, they cannot adequately model the state duration densities for the full screen video sequences. Therefore, we selected the more sophisticated HSMM models for modelling the video traffic sources in the proposed ITV application. Fur thermore, we gathered some evidence from empirical traffic samples of typical TV programs, which showed that Gamma distribution is a good model choice for modelling the state durations densities. Next, we presented a novel signal model for HSMMs. We showed that our signal model results in easier and more efficient parameter identification algorithms. Based on our new model, we presented a variant of the EM al gorithm for off-line identification of HSMMs. Furthermore, we presented an online identification algorithm based on our new signal model, and based on the general recursive prediction error technique. Using these methods, one can efficiently estimate the parameters of an HSMM for off-line or online cases from the traffic samples . Next, we showed how the numerical value of the effective bandwidth function is obtained from the parameters of an HSMM. Our approach is based on reformulating an HSMM as an HMM of a higher dimension. In summary, one can use the methods presented in this chapter to ob tain the numerical value of the effective bandwidth function of a source from the traffic samples. The obtained effective bandwidth curve is used in con junction with the admission control methods presented in Chapter 3 to find the maximum waiting-time for an incidental stream that uses the stochastic service class. Up to this point (i.e., Chapters 2-4), we have presented methods which find the maximum waiting time for an incidental stream that uses the deter ministic or the stochastic service classes. These methods determine when the data packets of an incidental stream should be made available to the transmit ter for transmission. In the next Chapter, we describe how the data packets of main and incidental streams are actually handled during multiplexing. 137 Chapter 5 Broadcast Head-End Architecture If something anticipated arrives too late it finds us numb, wrung out from waiting, and we feel -nothing at all. The best things arrive on time. -Dorothy Gilman, A New Kind of Country, 1978. Overview A system for multiplexing main and incidental stream data is presented. The role and importance of packet scheduling policy is discussed, and a novel schedul ing algorithm is presented. Our approach ensures that all the main and inci dental streams are treated according to their importance. 5.1 Introduction In this chapter, we present our design of the transmitter head-end for our interactive digital TV system. As discussed in Chapter 1, the digital TV standard requires that the encoded video and audio streams of a TV program be delivered to the TV receivers in the form of a single multiplexed stream called 'Transport Stream', (TS). The syntax of transport stream is defined in the digital TV standard. Here, we present our design of a multiplexer system, which is capable of multiplexing incidental streams data alongside the main streams data. Our system takes the priority of the main and incidental streams 138 into account, and manages the flow of the data packets in the system such that: 1) all the main data packets are transmitted on time, 2) all the incidental data which are transmitted using the deterministic service class are transmitted on time, 3) the bandwidth that is not used by the main streams or deterministic service class streams is fairly shared among the incidental streams that use stochastic service class, and 4) the remainder unused bandwidth is fairly shared by the incidental streams that use the best-effort service class. One crucial part of our multiplexing system is a scheduling algorithm, which determines the order of packets in the interleaved packet sequence that forms the transport stream. We present a scheduling algorithm for our multi plexing system. Our scheduling algorithm employs a prioritizing policy, where input data streams are divided into six different priority classes. Data packets belonging to each class are considered for transmission only if the higher pri ority classes do not have any data packet ready for transmission. This ensures that more important data (e.g., main streams) are given a higher priority than less important data (e.g., incidental streams). Furthermore, we use a weighted fair queuing policy for scheduling the streams of each priority class. In this policy, the waiting time of each data packet is considered as the key decision factor to decide which packet must be served next. Our approach ensures that the bandwidth is fairly divided among the streams of each priority class. Our scheduling algorithm also ensures that the generated transport stream is compliant with the standard TV receiver model, as indicated by the American digital TV standard ATSC. This ensure that any standard digital TV receiver or set-top box can extract and decode the main streams from the transport stream generated by our system. The rest of this chapter is organized as follows. In Section 5.2, we present an overview of of standard digital TV multiplexing systems. Then, we present our design of multiplexing system for our interactive TV system in section 5.3. In 5.4, we present our scheduling algorithm. Chapter conclusion is presented in Section 5.5. 139 5.2 Multiplexing System Structure of Standard Digital TV System A conceptual diagram of a multiplexer system at the head-end of a TV trans mission system is shown in Figure 5.1. This system multiplexes the video and audio streams of a number of TV programs, and creates a multiplexed trans port stream. This transport stream is then broadcasted over a cable, terrestrial or satellite channel to the TV receivers. For most TV programs, the source video and audio source sequences are captured at a different location than the transmitter head-end. In this scenario, the source video and audio streams are usually delivered to the transmitter system through a private high-speed link, such as a satellite link. For more details about the standard digital TV transmitter architecture see [93-96]. 5.3 Multiplexing System for Our Interactive TV System Figure 5.2 shows the diagram of our multiplexing system. The data inputs to our multiplexer systems are the video and audio streams, which come from a broadcast station in the case of live programs, or from an off-line storage medium in the case of prerecorded programs. The output is a single transport stream, which has the constant bitrate of RTS bps. As shown, our multiplexer system consists of four basic units: 1) 'TS packetizer' units, which packetize the input stream and generate TS packet streams; 2) 'Traffic Shaping Unit', which passes the TS packets to the multiplex buffers at a regulated rate: 3) 'Multiplexing Buffers', which hold the TS packets ready for transmission; and 4) 'Scheduling Unit', which takes the TS packets from the multiplex buffers and creates the multiplexed transport stream. We will discuss the mechanisms of these units in more details in the following sections. As shown, a 'scalable transcoder' re-encodes an incidental stream to generate a three layer scalable stream. In a general scenario, the base layer is transmitted using the deterministic service class; the first enhancement layer is transmitted using the stochastic service class; and the second enhancement layer is transmitted using the best effort service class. The bitrate of each layer is determined during the admission control process by the admission control 140 Program 1 ^jg^v. Uncomprcsscii 4^lV Video Sequence Uncompressed Audio Video Encoder ISO/IEC 13818-2 Audio Encoder ISO/IEC 13818-3 Time Base STC 1 ^Packetizcr),pi;SVid^ TS MUX Broadcast (Error prone media) Program N Uncompressed ""S [;-. Video Sequence Uncompressed Audio Video Encoder ISO/IEC 13818-2 Audio Encoder ISO/IEC 13818-3 Time Base STCN System Coder ISO/IEC 13818-1 Figure 5.1: Conceptual diagram of a Transport Multiplexer. unit. 141 142 5.3.1 TS packetizer The TS packetizer units simply break the input bitstream into 184 byte units, add the 4 byte TS packet header as indicated by the standard, and generate the TS packets of size 188 byte. For more information about TS packet structure see [93,94,96,97]. 5.3.2 Traffic Shaping Unit As shown in Figure 5.2, the traffic shaping unit passes the TS packets from the packetizer units to the multiplex buffers. The function of this unit is to regulate the rate of packet submission to the multiplex buffers. This rate regulation is necessary to ensure that the actual amount of data submitted to the multiplex buffers is in accordance with the bandwidth reserved for the main streams. Furthermore, the traffic shaping unit controls when the incidental data packets are submitted to the multiplex buffers. That is, this unit is responsible for sending the incidental data to the multiplex buffer Tw seconds before their transmission deadline. As shown, the traffic shaping unit uses a buffer for each stream, and con trols the packet departures from each buffer by using a buffer control unit for each buffer. These buffer control units use different schemes for each stream, as described below. Main Audio Streams Main audio streams in digital TV applications have a constant bitrate. ff the bitrate of a main audio stream is Ri, then the buffer control unit ensures that no more than |"8^84] TS packets are submitted to the multiplex buffer during each 1 second period, where \x] denotes the smallest integer greater than or equal to x. For this purpose, the buffer control unit uses a Token variable. The token variable is updated periodically every 1 second to r8x^184 "I • Whenever a packet from the buffer is submitted to the multiplex buffers, its token variable is decremented. The buffer control unit sends a TS packet only if the assigned token variable is greater than zero. Main Video Streams If a main video stream is characterized by its maxi mum bitrate and not a (a, p) model, then a similar scheme to what is used for main audio streams is used by the buffer control units. 143 However, if a main video stream is characterized by a (a, p) model, then the method used by the buffer control unit is different. In this case, the buffer control unit must ensure that the aggregate number of TS packets submitted to the multiplex buffer in any time window of length t is less than mini(<Ji + prf) 1. Fortunately, there is a very powerful and efficient method for implementing this mechanism based on Token Buckets. A token bucket is a mechanism for ensuring that the traffic generated by a source is compliant with a single (cr, p) model. A token bucket is simply a variable initialized to o and incremented at rate p. This variable is bounded from above by a. Whenever a packet is submitted to the multiplex buffer, the token bucket variable is decremented. The buffer control unit submits a TS packet only when the token variable is greater than zero. For a (<7, p) model consisting of N (o, p) pairs, N token buckets should be employed. The buffer control unit submits a TS packet only when the minimum of all the N token variables is greater than zero. This mechanism ensure that the total traffic delivered to multiplex buffer complies with the (a, p) model. PSI tables Program Specific Information (PSI) tables are data tables em bedded into the transport stream, which contain important information necessary for demultiplexing the transport stream [93,95,97]. For exam ple, PSI tables carry the so called 'identification numbers', which tell the digital TV receivers which packets should be decoded for a specific TV program. There are four types of PSI tables: Program Allocation Ta ble (PAT), Program Map Table (PMT), Conditional Access Table (CAT) and Private Tables [95,97]. Since PSI tables carry information necessary for decoding the transport stream, it is necessary that PSI tables are re peatedly inserted into the transport stream. The repetition frequency of PSI data is not specified by the MPEG standard. However, it is advised that the PSI tables be repeated between 10 to 50 times per second. We employ a token variable for controlling the transmission frequency of each PSI table. These token variables are updated periodically according •'Note that the variables a and p should be translated from bits and bps to 'TS packet count' and 'TS packets per seconds' by dividing them to 8 x 188 and rounding towards infinity. 144 to the repetition frequency of the PSI tables and the actual size of the PSI table in bytes. The buffer control unit of PSf tables submits a TS packet to the multiplex buffer only when the assigned token variable is greater than zero. Incidental Streams The function of buffer control unit for incidental streams is to send the incidental TS packets to the multiplex buffer TW seconds before their transmission deadline. Since incidental streams have a con stant bitrate, this mechanism is easily implemented by sending the first TS packet of an incidental stream exactly Tw seconds before the decod ing time of the first frame of the incidental stream; the consecutive TS packets are submitted to the multiplex buffer at the constant rate of the stream. For example, consider an incidental stream with maximum waiting time Tw and rate R bps, which is equivalent to Rj(8 x 188) TS packets per second. Also suppose that the decoding of this stream should start at time T. Then the buffer control unit will send the first TS packet of this stream at T — Tw to the multiplex buffer; and each consecutive TS packet is transmitted after (8 x 188)/i? seconds. 5.3.3 Multiplex Buffer Multiplex buffers hold the TS packets that are ready for transmission. The size of multiplex buffers for main streams is selected such that these buffers never overflow. A buffer of size .5 x R, where R is the maximum bitrate of the stream in bps is usually enough. The size of multiplex buffers for incidental streams is Tw x R, where Tw is the maximum waiting time assigned to the incidental stream during the admission control process. 5.4 Scheduling 5.4.1 Scheduling Unit Objective Suppose the bitrate of transport stream is Rrs bps. Since each TS packet has the constant size of 188 bytes, then each TS packet is transmitted in exactly A = 1„8x8 seconds. We call A a transmission time-slot. That is, a transmission time-slot represents the time required for sending 188 bytes of data. Hence a 145 Stream #1 Stream #2 Scheduler: Which packet will occupy the next Stream #N time-slot? Figure 5.3: Scheduling unit decides which TS packet should be sent at the next transmission time-slot. free time-slot represents the opportunity of sending only one TS packet. The function of the scheduling algorithm is to assign each transmission time-slot in the TS stream to one of the input streams, as shown in Figure 5.3. 5.4.2 Prioritizing Policy Our algorithm employs a prioritizing policy, where the input streams are di vided to different priority classes. Each class of streams is served only when higher priority streams do not have any data packet ready for transmission. In priority order, these classes are: 1) PSI tables, 2) main audio streams, 3) main video streams, 4) incidental streams with deterministic service class, 5) inci dental streams with stochastic service class, and finally 5) incidental streams with best effort service class. 5.4.3 Limitations Imposed on Scheduling by the Digital TV Standard The digital TV standard has defined a reference model for the buffering pro cess in digital TV receivers, called 'Transport Stream System Target Decoder' or TS-STD. This reference model specifies a standard layered buffering struc ture required to de-multiplex and decode a transport stream [93-96]. It also specifies the minimum size of each buffer and how data flows between the 146 buffers. The purpose of this reference model is to standardize the buffering process at TV receivers. All digital TV receivers are required to implement the TS-STD, and all the transport streams must be generated in compliance with this reference model. Hence, our transport stream multiplexer must employ a mechanism to ensure that the generated transport stream is compliant with the reference target decoder. This function is performed by the scheduling unit. That is, the scheduling algorithm should ensure that assigning the current transmission time-slot to a specific stream will not result in a buffer overflow in the reference TS-STD model. We implement this mechanism by simulating the TS-STD model for each TV channel. Using the simulated TS-STD model, we first check that assigning the current time-slot to a stream does not result in a buffer overflow in the TS-STD model. 5.4.4 Scheduling Algorithm As discussed before, the function of the scheduling algorithm is to decide which stream should occupy the next transmission time-slot in the transport stream. The mechanisms of our scheduling algorithm can be broken into two conceptual steps. In the first step, our algorithm creates a set of candidate streams. The candidate streams are selected by selecting the streams that 1) whose multiplex buffer is not empty, 2) the TS-STD model allows them to be transmitted at the current time-slot, and 3) they have higher-priority than other streams that satisfy the first two conditions. Therefore, candidate streams are all selected from the same priority class, and are all eligible for transmission at the current time-slots. In the second step, the algorithm selects one stream among the can didate streams for transmission. Depending on the type of the candidate streams, we use different policies to decide which candidate stream should be transmitted. For PSI tables data, a simple round robin policy is used. For main streams (either video or audio), let Wi denote the buffer workload and Ri the maximum rate of the stream. Then, we select the stream for which Wi/Ri is maximum. For incidental streams (either deterministic, stochastic or best effort services), we use an 'Earliest Deadline First' (EDF) policy. Let Wl be the buffer workload, Ri the stream rate and Ti the maximum waiting time 147 Packet that will be dropped off packet packet packet packet #N+1 #N #2 #1 packet packet packet #N+1 #N #2 Figure 5.4: Packet drop off in Multiplex buffers of stochastic and best-effort service classes. When a new packet arrives to a full buffer, it is pushed into the buffer. assigned to the stream. Then we assign the deadline W-dl = Tl~lt (5'1} to each stream. Note that a small dl means that data packets in the buffer are close to passing their maximum waiting time. The scheduler selects the stream that has the smallest d.;. 5.4.5 Packet Drop-Off The admission control schemes, along with the mechanisms used by the traffic shaping unit, ensure that the multiplex buffers of PSI tables, main streams, and incidental streams using deterministic service will never overflow. That is, these mechanisms ensure that the aggregate number of TS packets submitted from these stream to the multiplex buffer during each scheduling cycle is less than or equal the number of packets that can be transmitted. Therefore, we expect to experience no packet drop off for these streams. However, we anticipate that the multiplex buffers of the incidental streams with stochastic and best effort service classes overflow occasionally. This overflow occurs when the transmission line is committed to the main and incidental streams with deterministic service class for a long time, and the scheduler cannot send enough TS packets from the incidental streams with stochastic or best effort service classes. .In this case, the exceeding TS packet in the multiplex buffer should be dropped off. This is shown in Figure 5.4. 148 As shown, when a new TS packet arrives to a full multiplex buffer, the new packet is pushed into the buffer. That is, a packet from the buffer beginning is dropped, other packets are shifted, and the new packet is added to the buffer end. The reason that we drop the first packet from the beginning of the buffer, and not the new packet, is that first packet has been in the buffer for more than the assigned maximum waiting time Tw, and hence it is too late to trans mit this packet. Note that the multiplex buffer size is Tw x R, and is filled at the constant rate R. 5.5 Chapter Conclusion In this Chapter, we designed a multiplexer system for the transmitter head-end of our interactive TV system. We described the buffering structure required for handling the main, incidental and other ancillary data packets. Then, we presented a novel scheduling scheme for controlling the multiplexing op erations. Our scheduling method ensures that all the main and incidental streams data packets are treated according to their importance during the multiplexing process. Furthermore, our scheduling algorithm employs a technique which en sures that the broadcasted stream is backward compatible with conventional digital TV receivers. This guarantees that conventional digital TV receivers, which are not programmed for our interactive TV system, are able to display the main video and audio content without any discrepancy. 149 Chapter 6 Thesis Summary In this thesis, we proposed and defined a new interactive system for digital TV. This system gives TV viewers the freedom to control TV program content. In so doing, we have introduced a new technological concept, which improves the home entertainment technology. We then addressed the most challenging issue involved in the design of the proposed interactive TV system. This issue concerns adding extra inci dental data to a digital TV transmission channel. This must be accomplished without increasing the bandwidth or degrading the quality of other programs. We then presented data transmission schemes for our interactive TV system that allows to transmit the incidental data. We efficiently took advantage of any unused bandwidth in the transmission channel to transmit the inciden tal data. We classified the transmission schemes of incidental data into three classes, deterministic, stochastic, and best-effort service classes. We proposed to use scalable video coding for the incidental streams. In this approach, an incidental stream is encoded to a three-layer scalable stream. The base, first enhancement and second enhancement layers are transmitted using the deterministic, stochastic and best-effort service classes respectively. This technique not only results in very efficient bandwidth utilization, but also improves the perceived picture or audio quality of incidental streams. We then designed an admission control scheme for the deterministic and stochastic service classes. These admission control schemes answer the crucial question of whether an incidental stream can be added to a TV program or not. Our approach in designing the admission control schemes was based on modelling the traffic of main video streams using a traffic model. This model is then used for designing the admission control test. 150 In the case of the deterministic service class, we employed the (a, p) model for modelling the traffic of main main streams. We developed methods for fitting the (a, p) model to traffic sources. These methods are more efficient and more accurate than previously available methods. In so doing, we helped to advance the knowledge in the deterministic traffic modelling field. We then adapted the 'Network Calculus' theory, and designed an efficient admission control scheme for the deterministic service class. For the stochastic service class, we employed Hidden Semi-Markov Mod els (HSMM) for modelling the traffic of main video streams. We developed efficient methods for the identification of HSMM model parameters for off-line and online cases. In so doing, we have advanced existing knowledge about the general semi-Markovian signal models, off-line and online identification of HSMMs, and stochastic traffic modelling of full-screen video sources. Us ing the 'Effective Bandwidth' theory, we then designed an efficient admission control scheme for the stochastic service class. Then, we presented our design of a data multiplexer for the transmit ter head-end of our interactive digital TV system. Our design is capable of multiplexing incidental stream data alongside the main streams data. We described how the flow of main and incidental data packets are controlled dur ing the multiplexing process. We presented a novel scheduling scheme, which determines the order of data packets in the broadcasted packet sequence. Fur thermore, we employed mechanisms which ensure that the conventional digital TV receivers can extract and display main video and audio content from the multiplexed stream. This makes our system backward compatible with the presently existing conventional digital TV receivers. We have tested the validity and efficiency of the methods presented in this thesis via simulation experiments. Numerical results of these experiments are presented throughout the thesis. In summary, this thesis presents efficient data transmission schemes for transmitting extra video and audio content alongside conventional digital TV data. By exploiting the methods presented in this thesis, new interactive TV applications are enabled, and the home entertainment technology is advanced. Furthermore, some research results presented herein, can benefit other research areas, such as deterministic traffic modelling, QoS enabled data networks, and semi-Markovian stochastic models. 151 6.1 Thesis Contributions Summary The major contributions of this thesis are summarized as follows, where they are listed in the order of appearance in the thesis. • New interactivity concept: We defined a new interactivity concept for TV, which allows TV viewers to personalize the video or audio content of TV programs. This new interactivity concept drastically enhances TV viewers experience, and advances the home entertainment technology. • Data transmission before presentation time: We developed a novel transmission technique for transmitting the incidental data units ahead of their presentation time. This technique allows us to take optimal advantage of the transmission bandwidth that is unoccupied by the main streams. • Deterministic admission control: A new admission control scheme was developed in chapter 2 to be used in the deterministic service class. This is the most important line of development of this thesis in the context of the deterministic service class. • (CT, p) Model fitting: The algorithm presented in chapter 2 for fitting (a, p) model to a traffic source is one of the contributions of this thesis. This algorithm is useful in any application that employs (<?, p) model. • Physical interpretation of effective bandwidth: A new physical interpretation of effective bandwidth is offered in chapter 3. Such in terpretation is important because it helps in advancing the stochastic queuing theory. • Stochastic admission control: A new admission control scheme was developed in chapter 3 to be used in the stochastic service class. This is the most important line of development of this thesis in the context of the stochastic service class. • Employing HSMM for modelling the full-screen video traffic: We showed in chapter 4 that HSMMs are a better model choice than HMMs for modelling the stochastic properties of full-screen high bitrate video. This line of development advances the video modelling field. 152 • New signal model and identification algorithms for HSMM: We presented a new signal model for HSMMs in chapter 4. This model re sults in more efficient model parameter identification algorithms. We also presented off-line and online parameter identification algorithms based on our new signal model. The new signal model and identification algo rithms are useful in any application that employs HSMMs, and are one of the most important contributions of this thesis. • Effective bandwidth of HSMMs: In chapter 4, we showed for the first time how to obtain the effective bandwidth of an HSMM signal. This line development is useful in any queuing application that uses HSMM traffic. 6.2 Future Research In this thesis, we mainly focused on the mechanisms used at the transmitter head-end. Obviously, in order to display an incidental stream, a digital set-box receiver is required, which should be specifically designed and programmed for the proposed ITV application. In the context of this thesis, a set-top box is considered as a black box equipped with a large buffer (e.g., a hard disk) for caching the incidental stream data. This set-top box is assumed to be capable of controlling the playback of main and incidental streams. Though, the design concepts for the set-top box receiver architecture are simple, there is room for improvement. Thus, it is beneficiary to exploit the set-top box architecture in more detail in future research. For example, one can improve the buffer management schemes used at the receiver end for controlling the cashing of the interactive content, such that the incidental data is not lost when a TV viewer switches channels, and such that the random access delay for incidental streams is reduced. One can also improve the user interface (e.g., menus where TV viewers select their choices about a TV program), such that TV viewers can interact with the TV program more efficiently, and navigate among the main and incidental streams easier. Furthermore, future research may improve the traffic models used in this thesis. For example, one can exploit the possibility of using stochastic traffic models other than HSMMs, such as self-similar models, or Transform-Expand-Sample (TES) traffic models, for modelling the traffic of main video sources. 153 This can result in easier, more efficient, or more accurate traffic models. 154 Appendix A Proof of Theorem 1 in Section 2.4 Theorem 1 Assume a traffic source constrained by A*n(t) traverses a system that offers the service curve 8{t). The waiting time d(t) for all t satisfies: d(t) <h(A*n,8) [35,36]. Proof It follows from the definitions of d(t) (equation 2.28) and li(Am, Aout) (equation 2.31) that r < h(Ain, Aout) <==> Ain{t - T) > Aout{t) (A.l) Now consider some fixed t. From the definition of d(t), for all r < d(t) we have Ain{t) > Aout{t + T) (A.2) Now the service curve property at time t + r (equation 2.29) implies that there is some s such that Aout{t + T)> Ain{t + T-S) + 8{s) (A.3) So, from A.2 and A.3 we have Am(t) > Ain{t + T-S)+ p(s) (A.4) It follows from this equation that Ain(t) > Ain(t + r — s), which implies that t > t + r — s. Thus, 8(s) < Am(t) - Am(t + T- s) < AUs - T) (A.5) From the definition of h(A*n, 8) (see A.l) and A.5 it follows that r < h(A*n, 8). Since this is true for all r < d(t), we conclude that d(t) < h(A*n,8), Q.E.I. 155 Appendix B Q-Q Plot The 'Quantile-Quantile' (Q-Q) plot, also known as 'probability plot' is a graph ical technique for assessing whether or not an experimental data set follows a given distribution such as the normal or Weibull [98-100]. This technique is also used for determining if two data sets come from populations with a common distribution. By a 'quantile', we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70%fall above that value. A Q-Q plot is a plot of the quantiles of an experimental data set against the quantiles of the assumed distribution. If Q-Q plot is used to determine if two data sets come from the same distribution, then quantiles of the first data set are plotted against the quantiles of the second second data set. If the experimental data are actually from the assumed theoretical distribution, then the points in Q-Q plot should form approximately a straight line. This case is illustrated in Figure B.l-a. In this figure, the vertical axis is the quantile of the experimental data, and the horizontal axis is the quantile of a candidate probability distribution. As shown, the points in this Q-Q plot are very close to form a line. This indicates that the experimental data are actually drawn from a population with the assumed distribution. However, departures from this straight line indicate departures from the specified distribution. This is illustrated in Figure B.l-b. Hence, one can use the correlation coefficient associated with the linear fit to the data in the Q-Q plot as a measure of the goodness of the fit. In practice, Q-Q plots can be generated for several competing distri butions to see which provides the best fit. In this case the probability plot generating the highest correlation coefficient is the best choice since it gener-156 ates the straightest probability plot. 157 Appendix C Likelihood Ratio Test Cl General Likelihood Test Let ST — {si,s2, • • • ,ST} be samples from a stochastic model, and let 6 de note the model parameters, which takes it values from the space Q. Using the maximum likelihood principle, one can estimate the model parameters by finding 6 such that L{6) = P(Sr|0) is maximized. This can be regarded as finding the 'best' explanation for the observed S^. Now suppose one wishes to test whether some of the model parameters are restricted ox not, for example, if some of the model parameters are bounded or if some of the model parameters are zero. Formally, this is denoted by testing if 0 G to, where to is a subspace of Q. The usual approach to this problem is based on the likelihood ratio concept, which is defined as sup L(0) A(Sr) = ^-JJ^ (Cl) supL(0) n That is, A(ST) is the ratio of the best chance of observing ST for 6 G ui to the best chance of observing ST for 0GQ. Since to C fi, then A is always between 0 and 1. Values of A close to 1 suggest that the data are very compatible with 6 £ LU. That is, ST is explained almost as well by the parameter estimates under 6 G to as by parameter estimates under 9 G fi. For these values of A we should accept 6 G to. Conversely, if A is close to 0, then the data would not be very compatible with 9ew and it would make sense to reject 0 G to. This is the rationale behind the likelihood ratio test. A likelihood ratio test is 158 a hypothesis test for testing HO: 0 e cu (C.2) against HI: 0 6 fi, tu C fi (C.3) In order to obtain the rejection region and confidence intervals, it is necessary to know the distribution of A. However, this is ordinarily very com plicated. Fortunately, it is shown that under very general conditions —21n(A) has a x2 distribution with n degree of freedom, where n is the difference in the dimension of LU and fi. Hence, by comparing —2 ln(A) to the upper 100 x (1 — a) percentile point of a x2 distribution, one can decide to reject HO or not. a is known as the significance level of the test, and is usually selected a — 5. REMARK As a general concept in hypothesis testing, the HI hy pothesis represent a more general case (or more complex concept) than HO. In these cases, the Hi hypothesis is adopted unless there is sufficient evidence to reject the special (or simple) hypothesis HO. This concept is conveyed in the test by the notion of w C fi. C.2 Likelihood Ratio Test for testing the HMM model against HSMM model Suppose we have two signal model candidates for modelling an empirical se quence Sr, and would like to use the likelihood ratio test to determine which candidate is the better choice. The first model candidate is a Markov chain with N states. In such a model, the signal makes a transition to a new state or stays at the same state at each time instance. The next state of signal de pends only on the current state, and is selected according to a constant state transition probabilities matrix A = [a^]. This model is parameterized with 9 — (an, • • • , ajv-i,jv). It is easily shown that state durations have a Geomet rical probability mass function, where the probability of staying exactly d time units in state i is given by ipi(d) = — au) The second model candidate is a Semi-Markov chain with the same number of states (i.e., A^ states). In this model, once the signal enters a new state, a state duration d is selected, and the signal stays exactly for d time units 159 in the same state. After d time units, the signal will make a transition to a new state. The state duration d for state i is selected according to the probability mass function <Pi(d), where (pi(d) is a discretized Gamma probability density function. That is, <Pi{d) = I ^j—x^le-^dx (C.4) Vi and i]i are parameters of the Gamma pdf. This signal model is parameterized with 6 = {al2,a13,--- , aN_ltN, rfU • • • ,r)N,vu--- ,vN), where A = [ai3] is the state transition probabilities. Note that a^'s are zero, and XljLi aij = 1 ror an 1 < i < N. We use the likelihood ratio test to determine if the Semi-Markov chain is a better model choice for the empirical trace ST- More precisely, we test the null hypothesis 'HO: Sy is generated by a Markov chain' against 'HI: ST is generated by a Semi-Markov chain'. According to equation C.3, it is necessary to parameterize the signal models such that HO represent a special case of HI. This means that we should model a Markov chain as a special case of a Semi-Markov chain with Gamma state duration densities. This is easily done by defining N HO: 0 ecu, to ={9\vx = v2 = • • • = vN = 1, ]T ciij = 1,1 < i < N} i=i N Hl:0€n, ={0| ^2 aij = 1,1 < i < AT} (C.5) J=I Note that LO is a subspace of fi. We should just show that conditions in HO represent a Markov chain. That is, letting vx — v2 = • • • = v^ = 1 in the Semi-Markov chain model will result in a Markov chain. This is easily done by letting Vi — 1 in equation C.6. = e-j(d-i) _ e-vd (C 6) Selecting an = e~v results in <f>i{d) = ct^_1(l — an), which is identical to Geometrical state duration densities of Markov chains. Hence, one can find —2 ln(A) and compare it to the upper 100 x (I —a) percentile point of a \/2 distribution with degree of freedom to decide to reject the null hypothesis HO or not. 160 Note that the hypothesis testing approach presented here is applicable to testing the validity of a HMM model against a HSMM model for a given empirical trace 3^r with very minor changes. 161 Bibliography [1] J. Liebeherr and D. E. Wrege, "Traffic characterization algorithms for VBR video in multimedia networks," ACM/Springer Multimedia Sys tems Journal, vol. 6, no. 4, pp. 271-283, 1998. [2] ATVEF forum, "ATVEF specicifcations for enhanced TV," avaialbale from http://www.atvef.com. [3] Y. Wang, J. Osterman, and Y.Q. Zhang, Digital Video Processing and Communications, chapter 11, Scalable Video Coding, Prentice Hall, 2002. [4] W. Dapeng, Y.T. Hou, and Y.Q. Zhang, "Scalable video coding and transport over broadband wireless networks," Proceedings of the IEEE, vol. 89, no. 1, pp. 6-20, January 2001. [5] H. Yanagihara, M. Sugano, A. Yoneyama, and Y. Nakajima, ''Scalable video decoder and its application to multi-channel multicast system," IEEE Transactions on Consumer Electronics, vol. 46, no. 3, pp. 866-871, 2000. [6] D.P. Heyman, A. Tabatabai, and T.V. Lakshman, "Statistical analysis and simulation study of video teleconference traffic in ATM networks," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 2, no. 1, pp. 49-59, March 1992. [7] O. Rose, "Simple and efficient models for variable bit rate MPEG video traffic," Performance Evaluation, vol. 30, pp. 69-85, 1997. [8] M.Devetsikiotis M.R.lsmail, I.Lambadaris and A.R.Kaye, "Simulation and modeling of variable bit rate MPEG video transmission over ATM networks," International Journal 162 of Communication Systems, vol. 9, 1996, Available from http://www.sce.carleton.ca/bbnlab/bnlpapers.shtml. [9] T.V. Lakshman D.P. Heyman, "Source models for VBR broadcast-video traffic," Proc. IEEE Infocom, pp. 664-671, June 1994. [10] H. Hughes M. Krunz, "A traffic model for MPEG-coded VBR streams," Proc. ACM Sigmetrics, pp. 47-55, May 1995. [11] A. Mukherjee A. Adas, "On resource management and qos guarantee for long range dependent traffic," Proc. IEEE Infocom, pp. 779-787, April 1995. [12] W. Willinger M. W. Garret, "Analysis, modelling and generation of self-similar VBR video traffic," Proc. ACM Sigcomm '94, pp. 269-280, August 1994. [13] J. F. Kurose, "On computing per-session performance bounds in high speed multi hop computer networks," Proc. ACM Sigmetrics and Per formance '92, pp. 128-139, June 1992. [14] H. Zhang and E. W. Knightly, "Providing end-to-end statistical perfor mance guarantees with bounding interval dependent stochastic models," Proc. of ACM Sigmetrics, pp. 211-220, May 1994. [15] R. L. Cruz, "A calculus for network delay, part f: Network elements in isolation," IEEE Trans, on Information Theory, vol. 37, no. 1, pp. 114-131, January 1991. [16] R. L. Cruz, "A calculus for network delay, part ff: Network analysis," IEEE Trans, on Information Theory, vol. 37, no. 1, pp. 132-141, January 1991. [17] D. E. Wrege et. al., "Deterministic delay bounds for VBR video in packet switching networks: Fundamental limits and practical tradeoffs," IEEE/ACM Trans, on Networking, vol. 4, no. 3, pp. 352-362, June 1996. [18] Dallas E. Wrege, Multimedia Networks with Deterministic Quality-of-Service Guarantee, Ph.d. thesis, University of Virginia, August 1996. 163 [19] R. L. Cruz and C. M. Okino. "Service guarantees for window flow con trol," 34th Allerton Conf. Communication, Control, and Computing. October 1996. [20] J.Y. Le Boudec, "Application of netwrok calculus to guaranteed services network," IEEE Tran. on Information Theory, vol. 44, no. 3, pp. 1087-1096, 1998. [21] C. S. Chang, "On deterministic traffic regulation and service guaran tees: A systematic approach by filtering," IEEE Trans. On Information Theory, vol. 44, no. 3, pp. 1097 -1110, 1998. [22] ATM Forum, "ATM forum traffic management specification version 4.0," Contribution 95-0013R.il, March 1996. [23] ATM Forum, "ATM user-network user interface specification version 3.0," Prentice-Hall, 1993. [24] E. W. Knightly, "Traffic models and admission control integrated ser vices networks," Ph.D. Thesis, University of Californium, Berkeley, May 1996. [25] E. W. Knightly and H. Zhang, "Traffic characterization and switch utilization using a deterministic bounding interval dependent traffic model," INFOCOM, vol. 3, pp. 1137-1145, 1995. [26] K. Moth L. Dittman, S. B. Jacobsen, "Flow enforcement algorithms for ATM netwroks," IEEE Journal on Selected Areas in Communications, vol. 9, no. 3, pp. 343-350, April 1991. [27] E. P. Ratgheb, "Modelling and performance comparison of policing mechanisms for ATM networks," IEEE Journal on Selected Areas in Communications, vol. 9, no. 4, pp. 325-334, April 1991. [28] M. Pancha, P.; El Zarki, "Leaky bucket access control for VBR MPEG video," INFOCOM, vol. 2, pp. 796-803, 1995. [29] F. Guillemin, C. Rosenberg, and J. Mignault, "On characterizing an ATM source via the sustainable cell rate traffic descriptor," INFOCOM, vol. 3, pp. 1129 -1136, 1995. 164 [30] D. E. Wrege and J. Liebherr, "Video traffic characterization for multi media networks with a deterministic service," INFOCOM, pp. 537-544, 1996. [31] T. H. Cormen, Introduction to Algorithms, McGraw Hill, 2001. [32] P. E. Gill et. al., Practical Optimization, Academic Press, 1981. [33] R. Fletcher, Practical Methods of Optimization, John Wiley, 1980. [34] C. Rosenberg et. al., "New approach for traffic characterisation in ATM networks," IEEE Proc. on Comm., vol. 142, pp. 87 -90, 1995. [35] P. Thiran J.Y. Le Boudec, Network calculus : a theory of deterministic queuing systems for the Internet, Springer, 2001. [36] J.Y. Le Boudec and P. Thiran, "A short tutorial on network calculus. I. fundamental bounds in communication networks," The 2000 IEEE Int. Symposium on Circuits and Systems, vol. 4, pp. 93-96, 2000. [37] R. J. Gibben et al, "Effective bandwidth for multi-type UAS channel," Queueing systems, vol. 9, pp. 17-28, 1991. [38] F. P. Kelly, "Effective bandwidth at multi-class queues," Queueing systems, vol. 9, pp. 5-16, 1991. [39] R. Guerin et al, "Equivalent capacity and its application to bandwidth allocation in high-speed networks," IEEE J. Selected Areas in Commu nication, vol. 9, pp. 968-981, 1991. [40] Alan Weiss, "An introduction to large deviations for communication networks," IEEE Journal on selected areas in Communications, vol. 13, no. 6, pp. 938-952, 1995. [41] C.S. Chang, Performance Quarantees in Communication Networks, Springer-Verlag, 2000. [42] N.G. Duffield, "A large deviation analysis of errors in measurement based admission control to buffered and bufferless resources," Queueing Systems, vol. 34, no. 1, pp. 131-168, January 2000. 165 [43] Jean C. Walrand Gustavo de Veciana, George Kesidis, "Resource man agement in wide-area ATM networks using effective bandwiths," IEEE Journal on Selected Areas in Communications, vol. 13, no. 6, pp. 1081-1090, 1995. [44] N.G. Duffield, J.T. Lewis, N. O'Connell, R. Russell, and F. Toomey, "Entropy of ATM traffic streams: a tool for estimating qos parameters," IEEE Journal on Selected Areas in Communications, vol. 13, no. 6, pp. 981-990, August 1995. [45] P. Rabinovitch, "Statistical estimation of effective bandwidth," M.S. thesis, Carleton University, April 2000. [46] S. Tartarelli, M. Falkner, M. Devetsikiotis, I. Lambadaris, and S. Gior dano, "Empirical effective bandwidths," Global Telecommunications Conference, 2000. GLOBECOM '00, vol. 1, no. 27, pp. 672-678, De cember 2000. [47] Ff. R. Kunsch, "The jacknife and the bootstrap for general stationary observations," The annals of Statistics, vol. 17, pp. 1217-1241, 1989. [48] Raoul LePage and Lynne Billard, Eds., Exploring the Limits of Boot strap, John Wiley and sons, April 1992. [49] G.A. Young, "Bootstarb: More than a stab in the dark?," Statistical Science, vol. 9, pp. 382-415, 1994. [50] N.G. Duffield; J.T. Lewis; N. O'Connell, "The entropy of an arrival pro cess: A tool for estimating QOS parameters of ATM traffic," Proceedings of 11th IEE UK Teletraffic Symposium, March 1994. [51] N.G. Duffield, "A large deviation analysis of errors in measurement based admission control to buffered and bufferless resources," AT&T research lab report, 1998. [52] C. Courcoubetis; G. Kesidis; A. Ridder; J. Walrand; R. Weber, "Admis sion control and routing in ATM networks using inferences from mea sured buffer occupancy," IEEE Trans, on Communication, vol. 43, no. 2/3/4, pp. 1778-1784, April 1995. 166 [53] A . Adas, "Traffic models in broadband networks," IEEE Communica tions Magazine, vol. 35, no. 7, pp. 82-89, July 1997. [54] D. L. Jagerman and B. Melamed, "The transition and autocorrelation structure of TES processes, part I: General theory," Stochastic Models, vol. 8, no. 2, pp. 193-219, 1992. [55] D. L. Jagerman and B. Melamed, "The transition and autocorrelation structure of TES processes, part II: Special cases," Stochastic Models, vol. 8, no. 3, pp. 499-527, 1992. [56] B. Melamed, D. Raychaudhuri, B. Sengupta, and J. Zdepski, "TES-based traffic modeling for performance evaluation of integrated net works," INFOCOM '92. Eleventh Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 1, pp. 75-84, 1992. [57] B. Melamed and D. E. Pendarakis, "Modeling full-length VBR video using markov-renewal-modulated tes models," IEEE Journal on Selected Areas in Communications, vol. 16, no. 5, pp. 600-611, June 1998. [58] Chang Bum Lee, Kyeong Bong Ha, and Rae-Hong Park;, "Computa tion of effective bandwidth of aggregated vbr mpeg video traffic in atm networks using the modified equivalent capacity," IEEE International Conference on Communications, ICC 96, Conference Record, Converg ing Technologies for Tomorrow's Applications, vol. 2, no. 2, pp. 627-631, June 1996. [59] N. Ansari, Y.Q. Hai Liu Shi, and Hong Zhao, "On modeling mpeg video traffics," IEEE Transactions on Broadcasting, vol. 48, no. 4, pp. 337 -347, December 2002. [60] J.C. Cano and P. Manzoni, "On the use and calculation of the hurst parameter with mpeg videos data traffic," Proceedings of the 26th Eu-romicro Conference, vol. 1, pp. 448-455, September 2000. [61] R. Narasimha and R.M. Rao, "Discrete-time self-similar systems and stable distributions: applications to vbr video modeling," IEEE Signal Processing Letters, vol. 10, no. 3, pp. 65-68, March 2003. 167 [62] B.N. Bashforth and CL. Williamson. "Statistical multiplexing of self-similar video streams: simulation study and performance results," Pro ceedings of the Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 119-126, July 1998. [63] O. Lazaro, D. Girma, and J. Dunlop, "Statistical analysis and evaluation of modelling techniques for self-similar video source traffic," The 11th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, vol. 2, pp. 1540 -1544, September 2000. [64] V.S. Frost and B. Melamed, "Traffic modeling for telecommunications networks," IEEE Communications Magazine, vol. 32, no. 3, pp. 70-81, March 1994. [65] Z. Liu, J. Huang, and Y. Wang, "Classification TV programs based on audio information using hidden markov model," IEEE Second Workshop on Multimedia Signal Processing, pp. 27-32, December 1998. [66] J. Huang, Z. Liu, Y. Wang, Y. Chen, and E.K. Wong, '•'Integration of multimodal features for video scene classification based on HMM," IEEE 3rd Workshop on Multimedia Signal Processing, pp. 53-58, September 1999. [67] D.L. Mclaren and D.T. Nguyen, "An HMM-based simulation model for the production of coded image data," International Conference on Image Processing and its Applications, pp. 93-96, April 1992. [68] J. Huang, Z. Liu, and Y. Wang, "Joint video scene segmentation and classification based on hidden markov model," IEEE International Con ference on Multimedia and Expo, ICME 2000, vol. 3. [69] L. Chaisorn, CT. Seng, and C.H. Lee, "The segmentation of news video into story units," IEEE International Conference on Multimedia and Expo, ICME 2002, vol. 1. [70] P. Morguet and M. Lang, "An integral stochastic approach to image se quence segmentation and classification," IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1998, vol. 5. 168 [71] R.J. Elliott, L. Aggoun, and J.B. Moore, Hidden Markov Models: Esti mation and Control, Springer-Verlag,New York, 1995. [72] L. R. Rabiner, "A tutorial on hidden markov models and selected ap plications in speech recognition," Proceedings of the IEEE, vol. 77, no. 4, pp. 257-286, February 1989. [73] A. A. Lazar, G. Pacifici, and D. E. Pendarakis, "Modeling of video sources for real time scheduling," Multimedia Systems, vol. 2, no. 6, pp. 253-266, 1994. [74] M. J. Russel and R. K. Moore, "Explicit modelling of state occupancy in hidden markov models for automatic speech recognition," IEEE In ternational Conference on Acoustics, Speech, and Signal Processing, pp. 5-8, March 1985. [75] B. Sin and J. H. Kim, "Nonstationary hidden markov model," Signal Processing, vol. 46, pp. 31-46, 1995. [76] L. H. Jamieson and C. D. Mitchell, "Modelling duration in a hidden markov model with the exponential family," IEEE International Con ference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 331 -334, 1993. [77] S. E. Levinson, "Continuously variable duration hidden markov models for automatic speech recognition," Computer Speech and Language, vol. 1, pp. 29-45, 1986. [78] V. Krishnamurthy, J. B. Moore, and S. H. Chung, "Hidden fractal model signal processing," Signal Processing, vol. 24, no. 2, pp. 177-192, 1991. [79] S. V. Vaseghi, "State duration modelling in hidden markov models," Signal Processing, vol. 41, no. 1, pp. 31-41, 1995. [80] E. Weinstein, M. Feder, and A. V. Oppenheim, "Sequential algorithms for parameter estimation based on the kullback-leibler information mea sure," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 9, pp. 1652-1654, September 1990. 169 [81] U. Hoist and G. Lindgren, "Recursive estimation in mixture models with markov regime," IEEE Trans. Inform. Theory, vol. 37, pp. 1683-1690, 1991. [82] I. B. Collings, V. Krishnamurthy, and J. B. Moore, "Online identifica tion of hidden markov models via recursive prediction error techniques," IEEE Transactions on Signal Processing, vol. 42, pp. 3535-3539, 1994. [83] I. B. Collings and T. Ryden, "A new maximum likelihood gradient algorithm for online hidden markov model identification," IEEE Inter national Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2261-2264", May 1998. [84] L. Ljung and T. Soderstrom, Theory and Practice of Recursive Identifi cation, MIT Press, Cambridge, 1983. [85] R. S. Burington, Handbook of probability and statistics with tables, McGraw-Hill, 1970. [86] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. [87] M. Abramowitz and I. Stegun, Handbook Of Mathematical Functions, with Formulas, Graphs, and Mathematical Tables, 1964. [88] G. E. Andrews, R. Askey, and R. Roy, Special Functions, Cambridge University Press, 1999. [89] L. Ljung, System Identification: Theory For The User, Prentice Hall PTR, 1999. [90] B. T. Polyak, "New method of stochastic approximation type," Automat. Remote Control, vol. 51, pp. 937-946, 1990. [91] D. Ruppert,. "Stochastic approximation," in Handbook in Sequential Analysis, B. K. Ghosh and P. K. Sen, Eds., pp. 503-529, New York: Marcel Dekker 1991. [92] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge, 1987. 170 [93] Advanced Television Systems Committee, "ATSC digital television stan dard A-53, ATSC digital television standard," 2001. [94] Digital Video Broadcasting Project, "DVB digital television standard," [95] R. S. Chernock, Data broadcasting: Understanding the ATSC Data, Broadcast Standard, McGraw-Hill, New York, 2001. [96] H. Benoit, Digital television : MPEG-1, MPEG-2, and principles of the DVB system, J. Wiley and Sons, 1997. [97] Advanced Television Systems Committee, "ATSC digital television stan dard A-65, program and system information protocol for terrestrial broadcast and cable," 2000. [98] M. Evans, N. Hastings, and B. Peacock, Statistical Distributions, .New-York: John Wiley, 2000. [99] V. Barnett, "Probability plotting methods and order statistics," Applied Statistics, , no. 24, pp. 95-108, 1975. [100] V. Barnett, "Quantile-quantile plot," NIST/SEMATECH e-Handbook of Statistical Methods. 171 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0091848/manifest

Comment

Related Items