Data Transmission Schemes for a New Generation of Interactive Digital Television by Mehran Azimi B . S c , Amir-Kabir (Polytechnic) University of Technology, Tehran , 1994 M.Sc., Sharif University of Technology, Tehran, 1998 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F THE REQUIREMENTS FOR T H E DEGREE OF D o c t o r of P h i l o s o p h y in T H E F A C U L T Y O F G R A D U A T E STUDIES (Department of Electrical and Computer Engineering) We accept this thesis as conforming to the required standard The University of British Columbia M A R C H 2004 © Mehrah Azimi, 2004 Abstract Interactive television (ITV) is an attractive technology, which changes the way T V viewers experience home entertainment. In this thesis, we design an interactive television system, which truly gives T V viewers freedom to control and individualize the presentation of T V program content. In this context, we present methods that add extra video and audio streams (called incidental streams) containing interactive content, to the transmission line of a digital T V system. The addition of these extra streams does not result in increasing the transmission bandwidth, nor in degrading the picture or sound quality of the main T V program content. Our design consists of two major transmission mechanisms for transmitting the incidental data, called deterministic and stochastic service classes. The deterministic service class is designed such that no incidental stream data packet is lost during transmission. On the other hand, the stochastic service class is designed such that some incidental data loss is possible; however, the data loss rate is bounded. We present a strategy based on scalable video coding, which in conjunction with the deterministic and stochastic service classes, achieves the best possible picture and sound quality for the incidental streams under the constraint of available bandwidth. We also present data transmission methods for the deterministic and the stochastic service classes. In the context of the deterministic service class, we employ a deterministic traffic model for modelling the traffic of main streams, and then design a data transmission scheme based on the 'Network Calculus' theory. In the context of the stochastic service class, we employ Hidden SemiMarkov Models (HSMM) for modelling the traffic of main streams. We then design a data transmission scheme based on the 'Effective Bandwidth' theory. Furthermore, we design a data multiplexing system for the transmitter head-end of the proposed interactive T V system. This includes a novel scheduling algorithm for controlling the multiplexing of incidental and main ii streams data packets. We present numerical results of simulation experiments, which support the theoretical methods presented in this thesis. m Contents Abstract ii Contents iv L i s t of Tables viii L i s t of F i g u r e s ix Acknowledgements xiii Dedication xiv 1 Introduction 1.0.1 A N e w V i s i o n of Interactivity for Television 3 1.1 Challenges 5 1.2 Thesis Scope 6 1.3 Our Approach 7 1.3.1 Framework 7 1.3.2 Maximum Waiting Time 1.4 1.5 2 1 1.4.1 Admission Control 1.4.2 Traffic Characterization 1.4.3 Service Classes 10 . . . 10 11 ' 12 1.4.4 Scalable Coding- 13 1.4.5 Data Multiplexing 14 Thesis C o n t r i b u t i o n and Structure D e t e r m i n i s t i c Service Class 2.1 8 Structure and Mechanisms of the System 15 17 Introduction 17 iv 2.2 2.3 2.4 Traffic Characterization : 2.2.1 Deterministic Traffic Characterization 2.2.2 Parameterized Traffic Characterization Models Fitting the (a,p) Model to a Source 2.3.1 Overview of Current Methods for Fitting the (a, p) Model to a Traffic Source 2.3.2 Our Approach to Constructing the Empirical Envelope 2.3.3 Obtaining the (a, p) Parameters from the Empirical Envelope Admission Control 2.4.1 Network Calculus Basics 2.4.2 Waiting-Time Bound in the proposed I T V application. 2.4.3 Our Algorithm to Find T Results 2.5.1 Numerical Results of 'Model Fitting' methods 2.5.2 Numerical Results of Admission Control Mechanism' . Conclusion w 2.5 2.6 3 4 18 19 21 26 28 31 34 37 37 40 42 43 44 47 53 Stochastic Service Class 3.1 Introduction 3.2 Effective Bandwidth 3.2.1 Large Deviation Principle 3.2.2 Effective Bandwidth Definition 3.2.3 Physical Interpretation of Effective Bandwidth 3.3 Quantifying Data Loss in Communication Networks Using Effective Bandwidth 3.3.1 Data Loss Probability in a Queuing Model with Priority 3.4 Admission Control for the Stochastic Service Class 3.5 Numerical Estimation of Effective Bandwidth 3.6 Conclusion 56 56 57 59 61 62 V i d e o Traffic M o d e l l i n g 4.1 Introduction 4.2 Existing Stochastic Video Traffic Models 4.2.1 Stochastic Models for Full-Screen Broadcast Video . . 4.3 Mathematical Background of General Markovian Models . . . 4.3.1 Markov Chain 74 74 76 80 81 81 v 63 64 67 68 72 4.3.2 4.3.3 4.3.4 Hidden Markov Models ' ' Limitation of H M M Models Selecting Between H M M and HSMM Modelling Approach for the Proposed ITV Application 4.4 General Background on Semi-Markovian Signal Models . . . . 4.4.1 Semi-Markov Chains 4.4.2 Hidden Semi-Markov Models 4.4.3 General Background on Identification of HSMMs . . . 4.5 New Formulation of HSMMs 4.5.1 Hidden (or Modulating) Layer Modelling 4.5.2 Observation Layer Modelling 4.5.3 Model Parameterizations 4.6 A New Algorithm for Off-line Identification of HSMMs . . . . 4.6.1 Implementation Issues 4.7 Online Identification of HSMMs 4.7.1 General RPE technique 4.7.2 Online Identification of HSMMs Using the RPE Method 4.7.3 Model Parameterizations 4.7.4 Estimation of the Hidden Layer Variables 4.7.5 Gradient Vector Update Equations 4.7.6 Parameter Update Equations 4.7.7 Updating the Hessian Matrix Estimate, R 4.7.8 Implementation Issues 4.8 Effective Bandwidth of a Hidden Semi-Markov Model 4.8.1 Effective Bandwidth of Hidden Markov Models . . . . 4.8.2 Reformulating a Semi-Markov chain as a Markov chain 4.8.3 Effective Bandwidth of an HSMM by Reformulating as an H M M 4.9 Numerical Results 4.9.1 Numerical Results for Synthetic Data 4.9.2 Numerical Results for Empirical T V Traffic Traces . . 4.10 Chapter Conclusion T Broadcast H e a d - E n d A r c h i t e c t u r e 5.1 Introduction 82 83 84 86 86 89 89 94 94 97 98 99 103 104 105 106 108 109 110 114 114 114 116 116 118 119 120 120 127 137 138 138 vi 5.2 Multiplexing System Structure of Standard Digital TV System 5.3 Multiplexing System for Our Interactive TV System 5.3.1 TS packetizer 5.3.2 Traffic Shaping Unit 5.3.3 Multiplex Buffer 5.4 Scheduling 5.4.1 Scheduling Unit Objective 5.4.2 Prioritizing Policy 5.4.3 Limitations Imposed on Scheduling by the Digital TV Standard 5.4.4 Scheduling Algorithm 5.4.5 Packet Drop-Off .' 5.5 Chapter Conclusion 6 Thesis S u m m a r y 140 140 143 143 145 145 145 146 146 147 148 149 150 6.1 Thesis Contributions Summary 152 6.2 Future Research 153 A p p e n d i x A P r o o f of T h e o r e m 1 i n Section 2.4 155 Appendix B Q-Q Plot 156 A p p e n d i x C L i k e l i h o o d R a t i o Test 158 C l General Likelihood Test 158 C.2 Likelihood Ratio Test for testing the HMM model against HSMM model 159 Bibliography 162 vii List of Tables 2.1 Execution time in seconds for calculating E(t) for 1 < t < 10000. Simulations were run on a PC with pentium IV processor at 1.7 GHz, using Matlab implementation 2.2 Comparison of our method with the method in [1] for finding the (<7, p) parameters. E(i) constructed for all 1 < t < N. . . . 2.3 The model parameters constructed from the entire envelope (second column) and from E(t) for 1 < t < r and samples of E(t) at t = i6 for 1 < i < n. r = 200, n5 = N 2.4 Video sequences used in our study 2.5 Encoding parameters for the video streams used in our study. 2.6 Numerical values of (a, p) model parameters for the main video sequences used in our simulation 2.7 Simulation parameters 4.1 Results of likelihood ratio test for different typical TV video sequences. The number of states in HSMM and HMM models is N = 3 4.2 Actual and initial values of the parameters of HSMM models used in simulating our off-line identification algorithm 4.3 Actual and initial values of the parameters of HSMM models used in simulating our online identification algorithm 4.4 Estimated parameter values of HSMM model for empirical traffic traces of typical TV programs 4.5 Simulation parameters viii 45 46 46 48 48 51 52 86 121 122 129 133 List of Figures 1.1 Screen shot of an Electronic Program Guide, EPG. EPG lets you search, navigate, and find out what's on your TV, all while watching TV 1.2 Screen shot of an Enhanced TV. Enhanced TV lets you view extra HTML based information about the current TV program. or surf the Internet, while watching TV 1.3 A transmission time-line, which illustrates the effect of T \ the gray boxes show when the data of the i data unit are actually transmitted 1.4 Structure of the proposed Transmission System 2 3 w th 2.1 2.2 2.3 2.4 E(t) for a CBR and a typical VBR source Traffic constraint function for different traffic models A*(t) for (a,p) model for rn =1,2 and 3 E(t) for two typical TV programs. The traffic is normalized to its maximum rate, i.e., the maximum rate of these sources is 1. 2.5 E(t) obtained from the direct approach, largest subadditive closure extrapolation, repetition extrapolation for r = 400 and sampling with 5 = 400. The source is a VBR MPEG-2 sequence, selected from the motion picture 'Mission Impossible' with 720 x 480 resolution, frame rate 30 and maximum bitrate of 5 Mbps 2.6 Pseudo-code of our algorithm for obtaining the (a, p) model parameters from E(t) ix 9 11 21 24 25 27 30 34 2.7 Divide and Conquer approach in our algorithm, (a) First step: a single (o"i,pi) is fitted into the E(t). (b) Second Step: since A*(t) in step one was not a satisfactory estimate of E(t) for [Ti,T ], the (a ,p ) is fitted to E(t) for t € [Ti,T ]. (c) Third Step: the (<T /O) is fitted to £(t) for t G [Ti,T ] 2.8 The General system <S 2.9 The arrival and departure functions: (a) Continuous time, where A and A are defined for all t > 0, (b) Discrete time, where Ai and A are only defined at discrete times denoted by clots. 2.10 a) The transmitter model with N high priority (main) streams, b) The equivalent model, where the N high priority input streams are replaced with one equivalent stream 2.11 Pseudo-code of our algorithm to find T 2 2 3) in 2 2 2 3 36 37 out n out w 2.12 (3 (t) and A* (t) L 38 40 42 43 L 2.13 Utilization ratio, U(A*,d) 2.14 A*(t) and E(t) of video sequences' of typical TV programs, a) Mission Impossible, b) News, c) Talk show, d) Documentary. 2.15 A*(t) and E(t) of video sequences' of typical TV programs, a) Court show, b) Muppets, c) Soap Opera, d) Cartoon 2.16 R versus T for an incidental stream. T is obtained for each R using the admission control algorithm presented in Section 2.4.3. w 47 49 50 w 3.1 A sample effective bandwidth curve a(6), and A(6>), A*(0) and inverse of the effective bandwidth function 1(c) = a~ (6). The average rate of the source is .448 3.2 A simple buffer of size B, filled at variable rate y and emptied at constant rate c 3.3 A simple multiplexer model with prioritized inputs 3.4 A simple buffer filled at constant rate R and emptied at variable rate c. If the buffer workload isW , then we know that the first data unit has waited W /R seconds in the buffer 3.5 A simple multiplexer model with N high-priority and one lowpriority inputs 54 1 62 t 64 65 L L 4.1 Normalized bitrate and 'State transition path' for two typical TV programs x 67 68 85 4.2 a) Histogram of state (or cluster) duration for one of the states in the News Sequence, b) Geometrical probability mass function, c) Discretized Gamma probability distribution function, d) Q-Q plot for Geometrical distribution, e) Q-Q plot for Gamma distribution 87 4.3 State transition paths that should be considered for computing a (j) in the HMM and HSMM cases 91 4.4 Comparison of the signal generation process in our model with that of existing signal models for HSMMs 98 4.5 ' E ' and ' M ' steps of our algorithm for off-line identification of HSMMs. . . . . . . 103 4.6 Parameter update equations performed in each step of a general RPE algorithm 107 4.7 Recursive Prediction Error scheme for estimating HSMM parameters . . 108 4.8 a-c) Parameter estimates for the first model versus the iteration number. The dotted lines show the actual value of the parameters, d) The log-likelihood function, log(F(yi, y2, ••• ,yt\0)), versus the iteration number. As shown, log(P(yi,y2, • • • ,Vt\9)) increases in each iteration 123 4.9 a-c) Parameter estimates for the second model versus the iteration number. The dotted lines show the actual value of the parameters, d) The log-likelihood function, log(P(j/i, y2,.... 2/t|0)), versus the iteration number. As shown, log(P(yi,y2> • • • -.VtW)) increases in each iteration 124 4.10 Online estimation of a 3 state HSMM: a) state transition probability a° ; b) observation mean for state 1, ui, c) state duration mean for state 1, u \. The dotted line shows the actual value of the parameter. The parameter estimate converges to the actual value of the parameter 125 4.11 Online estimation of a 3 state HSMM, where the actual parameter changes at t = 5000: a) state transition probability a^; b) observation mean for state 1, u . The dotted lines show the actual value of the parameter. The parameter estimates follow • the temporal changes in the actual value of the parameter. . . 126 t 2 Si x xi 4.12 Online estimation of HSMM parameters for the 'Talk show' sequence 4.13 Online estimation of HSMM parameters for the Documentary (Best Places in Canada) sequence 4.14 Online estimation of HSMM parameters for the 'Mission Impossible' sequence 4.15 Effective Bandwidth curves of video sequences of typical T V programs 4.16 T versus R for an incidental stream that uses the stochastic service class 4.17 Probability of loss p versus rate R, where T is constant. The solid lines illustrate the loss probabilities obtained by the admission control scheme, and '+' signs illustrate the actual loss probabilities obtained from simulating the multiplexing system. As shown, actual loss probabilities obtained from simulation are close to what obtained from the admission control 130 131 132 134 w 135 w 5.1 5.2 5.3 5.4 B.l 136 Conceptual diagram of a Transport Multiplexer Diagram of multiplexing system Scheduling unit decides which TS packet should be sent at the next transmission time-slot Packet drop off in Multiplex buffers of stochastic and best-eft'ort service classes. When a new packet arrives to a full buffer, it is pushed into the buffer 141 142 Q-Q plot 157 xii 146 148 Acknowledgements This thesis is the result of five years of work wherein I have been accompanied and supported by many people. It is with pleasure that I now take the opportunity to express my gratitude to all of them. First and foremost, I would like to thank my supervisors, Dr. Rabab K. Ward, and Dr. Panos Nasiopoulos, who have always been extremely kind and supportive. I would never have been able to finish my thesis without my supervisors' insights, criticisms and thoughtful suggestions. I am most grateful to Rabab because of her continuous moral support, and for showing me the proper research methodology; and I am most grateful to Panos for his academic advice, and for being a mentor to me. I am thankful to all my colleagues at the 'Image Processing Lab' of the department of Electrical and Computer Engineering at UBC. Especially, I am grateful to the following individuals for their academic help or moral support during my studies, and for being my friends: Alex Paquet, Kemal Ugur, Parvin Mousavi, Vanessa Mirzaee, Yasser Pourmohammadi, Alen Docef, Parvaneh Saeedi, Farhad Ghasemi, and Purang Abolmaesoumi. I am also grateful to Dr. Vikram Krishnamurthy for his comments on one of my research papers, and to Kirsty Barclay for all her editorial help during my studies. I am grateful to my wife, Paria, for her love and patience, and for her inspirational and moral support during my studies. She is the love of my life, and I dedicate this thesis to her. Last but certainly not least, I would like to thank my parents, and my sisters, for their continuous love and support. I would not have accomplished this much in my life without their sacrifices for me throughout my life. MEHRAN AZIMI The University of British Columbia MARCH 2004 xiii To my beloved wife, Paria. xiv Chapter 1 Introduction Transport of the mails, transport of the human voice, transport of flickering pictures-in this century as in others our highest accomplishments still have the single aim of bringing men together. - Antoine de Saint-Exupery (1939) T ELEVISION systems are migrating to digital technology. More Digital Television (DTV) systems are being deployed around the world everyday. This change is creating an incredible technological revolution in the entertainment industry. The new DTV technology not only delivers crystal clear picture and superior sound quality, but also allows new services to be added to TV programs. These add-in services enhance the TV viewer's experience by adding extra features or content to TV programs. For example, the presently available Electronic Program Guide (EPG) (see Figure 1.1) is an add-in service for DTV systems. EPG lists the programs that are or will be available on each TV channel, plus a short summary or a commentary for each program. Interactivity is the most attractive enhancement promised to be added to digital TV systems. However, the concept of 'interactivity' for TV is not clearly defined, and the term 'Interactive T V has been used for many different TV systems with many different features. For example, Video-On-Demand systems and TV systems with VCR-like functionality are occasionally referred to as 'Interactive TV.' In Video-On-Demand systems, viewers select a movie or TV show from a library, and that movie is played back on their TV. In TV 1 Command Buttons 1 ® Watcft J«( "New"it «hr»J*d to Witdi Ore*, frm th* Qfm xtion button to «nt*f tn* SchrtM* to <h*ig» tr*tr*qu*n<> t o M / o f W*«kty Prmtn«S««(t k*y to t u x to thu cfumrl now. Promotional Window | in Histofy | 10:30pm 11:00pm Information Window | Menu Bar L.nl Spin City O f t t i K o r u Dharma & Gr_ IT1"••IIMMIMADTV »CRS The Nanny V > Everybody L 3rd Rock Fro. Working Gone With the Wind • Good Morning, Vietnam » (Young Phiiadeiphians 1 Program Guide Figure 1.1: Screen shot of an Electronic Program Guide, EPG. EPG lets you search, navigate, and find out what's on your TV, all while watching TV. systems with VCR-like functionality, TV viewers can pause or rewind a live TV show, or save a TV show to watch later. This system is commercially available now. Today, 'Interactive T V mostly refers to the so-called ' Enhanced TV system defined by the Advanced Television Enhancement Forum (ATVEF) [2j. ATVEF is an industry alliance of many major companies. The goal of this forum is to standardize HTML enabled T V systems. With the Enhanced T V technology, a web page is displayed alongside a T V picture (see Figure 1.2) on the T V screen. Viewers surf this web page using their remote-controls to get more information about the program, do e-shopping, and so forth. There are two scenarios for sending the web page content. In the first scenario, the web page contents are seamlessly inserted into the broadcasted T V signal. In this scenario, no Internet connection is required. However, the size of the inserted web data is limited, and users are limited to the inserted web pages. In the second scenario, an internet link is inserted into the T V signal. This link is used by the receiver set-top box to download the web page content via an Internet connection. The 'Enhanced T V system is now commercially available (e.g., Microsoft WebTV©), and many T V shows are currently using this service. •1 T V picture Figure 1.2: Screen shot of an Enhanced T V . Enhanced T V lets you view extra H T M L based information about the current T V program, or surf the Internet, while watching T V Thus, current interactive T V systems are limited to providing W o r l d W i d e Web content. T h i s content is mostly composed of text or still pictures. Besides, user interactions i n current interactive T V systems are limited to web-surfing-like actions, such as menu selection, t y p i n g a URL, and so on. Actually, in practice users interact w i t h the web page content, and not with the T V program content. It is evident that current interactive systems for digital TV remain limited. T h i s l i m i t a t i o n lies i n the fact that none of the current interactive T V systems provide any control over the video or audio content of the T V program. 1.0.1 A New Vision of Interactivity for Television O u r vision of interactivity i n Television is a system which allows T V viewers to control the final presentation of the T V program content. Such interactivity has been successfully implemented in D V D , which is a non-broadcast interactive media. Therefore, this Interactive T V system could be considered as a technology that offers D V D - l i k e interactivity i n broadcast digital T V systems. T h a t is, the new Interactive T V system would enable T V viewers to control the final presentation of the video or audio content of a T V program as i n a D V D player system. Important applications would include the followings 3 Multi-lingual audio, where viewers select the language of choice from an available set of languages. Parental management, where viewers select the content rating of a T V program. If the viewer selects the 'non-adult' version of a program, the receiver then seamlessly replaces the adult rated scenes of the program with a non-adult rated video sequence. Multi-angle video, where viewers can view a scene from one or more spatially different angles. For example, during a soccer program, viewers can select to view the important scenes of the game from various angles. Video-in-Video, where viewers may select to open a small window in the corner of their displays. This small window shows a separate video that enhances the viewer's experience. For example, in a soccer program, the small window may display an important incident on the other side of the field. In the proposed system, viewers use their remote controls as if using a DVD-player remote control, to select choices regarding a T V program from a menu overlayed on top of the main video. This can include, for example, selecting a different language audio track, or switching between adult and non-adult versions of a movie. These choices are conveyed to the receiver and are used to select the appropriate video and audio sequences to be displayed. Hence, the viewer's experience of a T V program is customized based on his or her own choices. Thus, our proposed interactive T V system requires the transmission of extra video or audio streams alongside the main T V program stream. We call these extra streams 'incidental streams.'' Incidental streams carry the extra video or audio sequences required for an interactive T V application. For example, the audio stream of another language track in the multilingual audio application, or the video stream of a secondary camera angle in the multi-angle video application are incidental streams. Depending on the application, an incidental stream may carry a video or audio sequence with a limited and specified length, or an unspecified length. For instance, in the parental management application for a movie, the main stream carries the regular version of the movie, which we assume has a few 'offending' scenes. For each offending scene, a non-offending version of the 4 same scene is transmitted as an incidental stream. In this case, each incidental stream has a limited length, equal to its corresponding scene length. On the other hand, in the case of the multilingual application, the incidental stream carries the audio stream of the other language track for the entire program. In this case, the incidental stream is a continuous stream whose length is equal to the length of the program. In order to receive and display a digital TV signal, the TV receiver must be equipped with a digital set-top box. A set-top box designed and programmed specifically for the proposed ITV application will be able to display the incidental streams. Conventional set-top boxes simply ignore the incidental streams and only display main streams. This makes the system backwards compatible with the presently available digital TV set-top boxes. Therefore, adding the incidental streams to a TV program does not effect the compatibility of the broadcasted TV signal with conventional digital receivers. 1.1 Challenges In designing the proposed interactive TV system several issues and challenges must be addressed. Limited Transmission Bandwidth The channel bandwidth available for transmitting digital TV signal is limited. In most transmission media, such as Cable, Terrestrial and Satellite, afixedchannel is shared among a number of TV programs, where each program uses afixedshare of the transmission bandwidth. In the proposed interactive TV system, incidental streams data must be accommodated in the same transmission channel as the main streams. However, reserving part of the transmission channel bandwidth for incidental streams is not an attractive approach for two reasons. First, the contents of incidental streams are not as important as the main program contents. This is because incidental streams usually carry enhancement content for a TV program; hence, incidental streams are expected to be viewed by much smaller TV audiences than the main program content. Therefore, it is not cost-effective to reserve bandwidth for incidental data, which are of secondary importance. Second, depending on the application, incidental streams may carry video or audio clips, each with a limited length. For example, in 5 the parental management application for a movie, an incidental stream carries the non-offending version of the movie only when there is an offending scene; otherwise, no incidental stream is broadcasted. Therefore, it is wasteful to reserve a fixed portion of bandwidth for an incidental stream, which may be utilized only during a few time intervals. As noted, main streams carry more important content than incidental streams do. Therefore, K e e p i n g the Q u a l i t y of M a i n Streams Intact adding incidental streams to a TV program should not degrade the quality of the main streams. Adding incidental streams to a TV program should not make the broadcast signal incompatible with conventional TV receivers. Thus, it is necessary that our system design be compliant with the Digital TV standards. C o m p a t i b i l i t y w i t h S t a n d a r d T V Receivers The incidental streams should be accommodated within the same transmission channel used for main streams. It is not an attractive option to use other transmission mechanisms for sending incidental data. For example, consider a possible scenario for implementing the proposed Interactive TV system via adding Internet links to TV programs. These links would point to incidental video and audio streams located on the Internet. Then, the set-top box receivers would use these links to download the incidental streams yia a fast internet connection, and display them on the TV screen. However, the problem with this approach is that each TV receiver would be required to have a fast internet connection. In fact, this approach has been taken and implemented by a few manufacturers, but it has failed as its concept has been rejected by both the TV broadcasters and consumers. Same C h a n n e l T r a n s m i s s i o n 1.2 Thesis Scope As noted, the most important challenge in designing the proposed Interactive TV system arises from accommodating the incidental streams within the fixed bandwidth allocated for the main streams in the transmission line of a digital television broadcast system. In this thesis, we address this challenging 6 problem. We present novel solutions for adding incidental streams without introducing any degradation whatsoever in picture quality of main streams, and without increasing transmission bandwidth. This is possible since the rate of main streams varies with time, and does not occupy its entire allocated bandwidth at all times. The incidental streams are transmitted using the available bandwidth. Our method strives to make the most productive use of the available bandwidth, and delivers incidental video and audio content with the best possible picture and sound quality. Unlike current interactive TV technologies, the proposed system is a one way system. That is, no return path from TV viewers to the transmitter or no Internet connection are required. Furthermore, adding incidental streams to a TV program using our method would not affect the compatibility of the broadcast signal with conventional digital TV receivers. In other words, a broadcast signal that carries both incidental and main streams is receivable by both conventional digital TV receivers and by receivers specifically programmed for the proposed ITV application. Conventional receivers will display the main streams, while receivers programmed for the proposed ITV application will be capable of displaying both incidental and main streams. These features make the proposed interactive TV system even more attractive to both consumers and TV broadcast companies. 1.3 1.3.1 Our Approach Framework We encode the main video sequences with constant picture quality. Therefore, the main video streams are encoded at variable bitrate (VBR). It is well known that simple and slow activity video scenes require a smaller number of encoding bits than complex video scenes do; such that bitrate for complex scenes may reach the maximum allowed bitrate. Digital TV transmission media (e.g., cable or terrestrial) allow a fixed reserved bandwidth for each TV channel equal to the source maximum rate. Therefore, during simple scenes the allowed bandwidth is under-utilized. We propose to use these unused portions of the bandwidth for transmitting incidental stream data. Each data unit of an incidental stream contains time sensitive data. This means, each data unit should be transmitted before a certain transmis7 sion deadline, so that it is available at the receiver at a certain time for decoding and presentation. Incidental stream data units are only transmitted at opportune moments, when the transmission bandwidth is not fully utilized by the main streams. In order to transmit incidental data units at these opportune moments, we propose that the transmitter receives each incidental data unit ahead of its transmission deadline, say by t seconds. Each data unit is first buffered at the transmitter and then transmitted whenever some free bandwidth becomes available. It is vital to choose t large enough so that the w w incidental data are transmitted and received by the receiver by the time they are to be decoded and presented to the viewer. Since decoders may receive the the incidental data units prior to their presentation time, these data have to be buffered at the decoder until their decoding time. 1.3.2 M a x i m u m Waiting Time An important question arises here: "for an incidental stream with a given bitrate R, what is the minimum t ?" We will denote the minimum t by T . Therefore, T is defined as the maximum time that the data units of an incidental stream with rate R might wait in the transmitter buffer before being transmitted. Once T is found, the transmitter should then receive the incidental data units t > T seconds before their transmission deadline. This ensures that all incidental data units are transmitted on time and made available at the decoder prior to or by their decoding time. We discuss our approach tofindingT in the next section. A small T is extremely desirable for three reasons. First, a small T reduces the inescapable delay in starting the presentation of an incidental stream in a live program. Suppose that thefirstdata bit of an incidental stream is delivered to the transmitter system for transmission at time t. If we ignore the constant delays caused by multiplexing, transmission and demultiplexing, then the decoding of this incidental stream can start at the receivers at t + T . Hence, it is very attractive to have a small T , so that playback of incidental streams in live TV programs can start very shortly after they have actually been captured. Second, since the receiver buffers must be capable of storing T seconds of an incidental stream data, a smaller T then requires a smaller buffer size at the receivers. Third, a small T is advantageous when viewers change from one TV channel to another. With a smaller T , viewers experience w w w w w w w w w w w w w w w w 8 System 1: -uJ- •4'decode -IV! rival System 2: 1 i< 'arrival ^ac decode w 2 Figure 1.3: A transmission time-line, which illustrates the effect of T : the gray boxes show when the data of the i data unit are actually transmitted. w th a shorter 'random access delay' for the incidental streams while they switch channels. By 'random access delay,' we mean the delay TV viewers experience from the moment they switch to a new channel to the time the playback of the new program actually starts. For the main video and audio streams, the random access time rarely becomes more than 0.5 seconds. This is because the coded main video and audio frames are broadcasted very close to their decoding and presentation times. To justify the effect of T on random access delay time for incidental streams, consider two transmitters offering two different maximum waiting times, Tw and 7V , to an incidental stream of rate R. where Tw < Tw - Assume the data units of the same incidental stream are sent to these two transmitters. In order to simplify the discussion, we ignore the delays caused by the transmission line and buffering at the receivers. Let ^decode denote the decoding time of the i data unit of the incidental stream (see Figure 1.3). Since each data unit arrives at the transmitter buffer T seconds before its decoding time, then t =t — T denotes the time when the data bits of the i data unit arrive at the transmitter buffer. The gray boxes in Figure 1.3 show the time instances when the data bits of this data unit are actually transmitted. Now, suppose a TV viewer changes the channel on his or her receiver to this program at t . In this case, the TV receiver starts receiving the data of this channel at t . In the first system, the receiver completely receives the i data unit of the incidental stream, while in the second system, the receiver misses this data unit. Therefore, the presentation of the incidental stream in the first system starts sooner (i.e., from the i data unit) than in the second system. w x 1 2 2 th w arriual decode th access access th th 9 w 1.4 Structure and Mechanisms of the System In this section, we introduce the mechanisms that we employ in the proposed system. Our objective here is to introduce the concept of each mechanism, and describe their roles. The details of each mechanism and our implementation approach are discussed in detail throughout the thesis. We refer to the different video or audio streams as 'traffic sources' and to the actual data as 'traffic' from here on. This is because each video and audio stream could be considered as a data generating source. Figure 1.4 illustrates the basic building blocks of the proposed transmission system. As shown, this system consists of the following units: admission control, traffic characterization, scalable coder, service classes, and data multiplexer. Admission Control 1.4.1 Before an incidental stream is added to a TV program, and actually starts submitting data to the transmitter for transmission, it is necessary to determine the rate R and the waiting-time t for this stream. We refer to this mechanism as 'admission control.' The admission control mechanism determines whether or not a certain incidental stream is allowed to be transmitted. The admission control relies on some bandwidth provisioning mechanisms, which forecast the free bandwidth in the system in the future. The admission control mechanism is initiated by sending a connection request from the TV production studio to the transmitter. The connection request is sent in advance of the actual data, and conveys to the transmitter that an incidental stream is going to be added to the program in the near future. The connection request also conveys a set of minimum service parameters for the incidental streams, which are the minimum bitrate R and the largest waiting-time t™ that can be selected for this incidental stream. Then, the admission control must determine whether or not it can assign a bitrate R and a waiting-time t to the incidental stream, such that R > R and T <t < t™ . If the admission control can find such an (R,t ) parameter pair, then the incidental stream is accepted by the admission control. If not, then the incidental stream is rejected; that is, the incidental stream will not be added to the program. The above procedure is referred to as the 'admission w mm ax min w ax w w w 10 Main Audio Stream Main Video Stream Traffic characterization Traffic Descriptors Connection Request J Admission control M t 's U (R ,t)'s pairs w teterministic base layer Incidental Stream Scalable encoder o service Transport Stream X stochastic service 1" enhancement layer best effort service 2" enhancement layer 1 1 D service classes Figure 1.4: Structure of the proposed Transmission System. test.' Once the transmitter receives the actual data of an accepted incidental stream, it re-encodes the incidental stream with rate R; and makes the encoded data units available for transmission at exactly t seconds before their decoding time. This will be described in more details in the next sections. w 1.4.2 Traffic C h a r a c t e r i z a t i o n The traffic characterization unit assigns a traffic descriptor to each main video source, and conveys them to the admission control unit. The traffic descriptor is used in forecasting the bandwidth required by the main streams, from which we can deduce the bandwidth available for the incidental streams. Therefore, the admission control unit uses the traffic descriptors in its bandwidth provisioning mechanism to determine how much bandwidth will be available to the incidental streams. A traffic descriptor is composed of a set of parame11 ters which contain useful and important characteristic information about the traffic shape of the source. Specifically, a traffic descriptor carries information about the traffic 'burstiness' of the sources. A 'traffic burst' refers to a state where a traffic source generates traffic at a rate higher than its average for a long period of time. We say a traffic source is bursty if it frequently generates traffic bursts. As it will be discussed later, extracting the traffic descriptors directly from traffic is not a straight forward process. For this reason, we employ a modelling approach, where we use a parameterized model for modelling the traffic; we then find the traffic descriptors from the model parameters. We refer to these parameterized models as Traffic Models. For pre-recorded TV programs, traffic models and traffic descriptors are obtained by using off-line algorithms. For live TV programs, the traffic models and traffic descriptors are obtained by monitoring traffic, and by using online methods. 1.4.3 Service Classes An incidental stream can be transmitted using one of three different service classes. These service classes are defined below. When an incidental stream is transmitted using the deterministic service class, the transmitter guarantees to send all the incidental data units on time and without any deadline violation or loss. The advantage of this approach is that the incidental stream experiences no data loss; hence the playback of incidental streams at the receivers will have no undesirable visual artifacts such as blocks or picture freezing. The disadvantage of this approach is that T (or R) is determined by the admission control process based on the most pessimistic bandwidth provisioning for incidental streams. This results in very large T (or small R), which is not desirable. D e t e r m i n i s t i c Service Class: w w When an incidental stream is transmitted using the stochastic service class, some of the incidental data units may be dropped (i.e., not transmitted); however, the data loss probability is guaranteed to be less than a certain threshold, say p%. The advantage of this approach is that T (or R) is determined using a more relaxed bandwidth provisioning for incidental streams. This results in much smaller Stochastic Service Class: w 12 T (or larger R) than that of the deterministic service class. However, this approach has the disadvantage that data loss is inescapable, and hence, playback of incidental streams will have some visual discrepancies. w Best Effort Service Class: In the best effort service class, the transmitter does not provide any guarantee of sending the incidental data. As the name 'best effort' implies, the transmitter uses any free bandwidth in the transmission line for sending the incidental data. Since no service guarantee is given, no admission control is necessary for this service class. As noted, each service class defined above has its own pros and cons. More precisely, if an incidental stream is transmitted with a deterministic service class, then T should be selected large enough (or alternately, R is selected small enough) such that we always have enough bandwidth to send all the incidental data on time. This means that T is selected such that even during the worst-case conditions, we find sufficient transmission opportunities for incidental data. This worst-case condition happens when the bandwidth available to incidental streams is at its minimum. In this case, the admission control is performed based on a pessimistic bandwidth provisioning. Therefore, the bandwidth provisioning mechanism deviates far from a usual state of system. This means that on average, incidental stream data units will wait much less than T seconds in the transmitter buffer. This results in poor bandwidth utilization. Conversely, the bandwidth provisioning in the stochastic service class is based on a more relaxed approach. This results in much higher bandwidth utilization. However, data loss is probable with this approach, which in some instances results in unattractive visual discrepancies in TV picture such as green blocks, picture freezing and so on. A solution, which offers a trade-off between visual quality and bandwidth utilization, is discussed below. w w w 1.4.4 Scalable C o d i n g In order to achieve a compromise between high bandwidth utilization and the smooth playback of incidental streams, we use the scalable coding technique for encoding the incidental streams [3-5], In this technique, a video or audio sequence is encoded to more than one bitstream. The first stream is called the base layer stream and usually has a low bitrate. The other bitstreams are called 13 the enhancement layer streams, and carry a better picture or sound quality than that of the base layer alone. Unlike the base layer stream, the decoding of enhancement layer streams is not stand-alone, and requires the base layer stream during the decoding process. We propose to send the base layer stream using the deterministic service class, and the first enhancement layer stream using the stochastic service class. The second enhancement layer is transmitted using the best effort service class. This approach guarantees that the base layer stream data are delivered to receivers without any data loss. Therefore, the incidental stream is guaranteed to play back with the minimum quality offered by the base layer stream. Meanwhile, the first enhancement layer data will be transmitted by taking advantage of the bandwidth that is not utilized by the main and base layer incidental streams. Since the data loss is bounded, most of the enhancement layer stream data are expected to be transmitted on time. The second enhancement layer data are transmitted using the best effort service. Therefore, any bandwidth left over by other streams is utilized by the second enhancement layer. This approach results in an incidental stream playback with minimum picture or sound quality determined by the base layer stream, an average quality determined by the first enhancement layer, and a best quality determined by the second enhancement layer. Therefore, for an incidental stream that is to be re-encoded using scalable coding and transmitted with different service classes, two admission control processes must be performed. The first admission control process determines the rate Rbase and waiting-time t^ for the base layer stream. The second admission control process determines the rate R k and the waitingtime t^ at a given data loss rate, say p%, for the first enhancement layer stream. Since no service guarantee is offered in the best effort service class, no admission control process is necessary for the second enhancement layer stream. se en h 1.4.5 Data Multiplexing The multiplexing unit is responsible for multiplexing the main and incidental streams together. This unit handles the data units of each traffic source according to their priority. The main streams have the highest priority, followed by the incidental streams with deterministic service class. This, in turn, is followed by the incidental streams that use the stochastic service class. The 14 incidental streams with the best effort service class have the lowest priority. 1.5 Thesis Contribution and Structure This thesis consists of 6 chapters and 3 appendixes. Chapters 2-4 describe how the deterministic and stochastic service classes are designed. This includes the selection of the traffic model, the design of efficient model fitting methods, the selection of a traffic descriptor, and finally the admission control mechanism. In Chapter 2, we address the admission control problem for the deterministic service class, and develop methods for the deterministic service class. We use the so called (a, p) model to model the traffic of main streams, and the so called 'traffic constraint function' as the traffic descriptor. This approach is based on the 'Network Calculus' theory, which studies the deterministic service guarantees in a communication network. We develop efficient algorithms for fitting the (cr, p) model to a traffic source. These algorithms are useful in any application that employs the (<j, p) model, and are part of the novel contributions of this chapter. Finally, we design an admission control scheme for the deterministic service class. This admission control scheme is the most important novel contribution of this chapter. In Chapter 3, we design the stochastic service class. The approach taken is based on the recently introduced theory of 'Effective Bandwidth.' We develop a new physical interpretation of the effective bandwidth concept based on a data buffering model and the large deviation principles concept. Then, we design an efficient algorithm for admission control of the stochastic service class using the effective bandwidth concept. The two important contributions of this chapter are the physical interpretation of the effective bandwidth concept, and the admission control scheme for the stochastic service class. In Chapter 4, we address the traffic modelling problem for the stochastic service class. We examine different traffic modelling approaches for stochastic modelling of main streams, and select the family of Markovian signal models for modelling the data traffic in the proposed ITV application. We show that the 'Hidden Semi-Markov Models' capture the characteristics of digital TV traffic better than other Markovian models; hence, we employ this model. This line of development is one of the contributions of this chapter. Then, we present a new signal model for hidden semi-Markov models, and present novel methods for parameter estimation of this new signal model for both the off15 line and online cases. This new signal model and its parameter identification algorithms are the most important contributions of this chapter, and are useful in other applications which employ hidden semi-Markov models. Finally, we show how the effective bandwidth of a source is obtained from the parameters of a hidden semi-Markov model. This line of development is also a part of contributions of this chapter. We design a conceptual transmitting system for the proposed interactive TV system in Chapter 5. We discuss the role and importance of 'packet scheduling policy,' and present a scheduling algorithm for multiplexing of main and incidental streams data. Even though this chapter does not include any major contributions, however, it shows how the deterministic and stochastic service class concepts are implemented and integrated together in an actual system. Finally, in Chapter 6, we present the thesis conclusion. We highlight the thesis contributions, and discuss the future research direction in this field as well. 16 Chapter 2 D e t e r m i n i s t i c Service Class Things which matter most should never be at the mercy of things which matter least. Goethe Overview A scheme needed for the deterministic service class is presented. This includes traffic modelling for main streams, traffic model fitting, traffic descriptor and an admission control scheme. Using the methods presented in this chapter, one can determine the maximum waiting time for an incidental stream with a given rate. 2.1 Introduction In this chapter, we present the scheme needed for implementing the deterministic service class. For that, we need to forecast the minimum amount of bandwidth not occupied by the main streams. The approach taken is based on using a model to forecast the maximum data flow of each main video source. This model is referred to as 'traffic model', and the parameters which describe the maximum data flow of the source are referred to as 'traffic descriptors.' These traffic descriptors are then used in the admission control mechanism to obtain the rate and maximum waiting time of incidental streams. Our approach to this service class studies the transmission system in a worst-case condition scenario. In this scenario, all the main sources send the maximum possible traffic to the transmission system for long periods of time, 17 thus, during these times the bandwidth available for incidental streams is minimal. Using this minimum bandwidth knowledge, the rate and the maximum waiting time of incidental streams are then determined by the admission control mechanism. This ensures that even in worst-case conditions, there exists sufficient bandwidth to transmit the incidental data. The traffic descriptor of the main streams is used to specify the maximum traffic that the main sources can generate in any time period of length t. This traffic descriptor is known as the 'traffic constraint function' in the literature, and is defined in Section 2.2. This approach also requires that the traffic model captures the worst-case burstiness of a traffic source. That is, traffic models which capture the statistical properties of traffic are not needed here. The rest of this chapter is organized as follows. In Section 2.2, we discuss the current deterministic traffic models, and select the most suitable model for our application. The model we employ is called the (cr, p) model. We also show how the traffic descriptor is obtained from this model. In Section 2.3, we address the issues that arise in fitting the (a, p) model to empirical traffic traces, and present efficient solutions for these issues. In Section 2.4, we present an admission control scheme. In Section 2.5, we present numerical results of applying the methods presented in this chapter to actual traffic sources. 2.2 Traffic Characterization In this section we find a mathematical model for characterizing the traffic of TV video sources (i.e., the main streams). As described earlier, the term 'traffic of a video source' refers to the amount of data bits generated by the video encoder. The mathematical model that we seek should provide a deterministic bound on the amount of traffic a video source generates in any time interval. The important features of a good traffic characterization model are thus 1) accuracy in characterizing the traffic, 2) simplicity in implementation, and 3) ability to capture the useful characteristics of traffic in different time scales. For example, though the peak-to-average ratio of the bitrate of a source can roughly show how the burstiness of the source looks like in a large time-scale, it does not incorporate any information about the burstiness of the source on short time-scales. Therefore, it cannot be used in the design of an efficient 18 admission test. Although there is a great deal of related work on traffic characterization, much of this work cannot be applied to our problem. Most of these methods characterize the video sources using sophisticated stochastic models, such as Markov models [6-8], autoregressive [9,10], self-similar [11,12] and S-BIND [13,14] models. These approaches are all stochastic and.do not provide a deterministic bound on the traffic. The problem of deterministic characterization of video has been studied for other applications using different approaches [6,15-18]. These approaches study video bitrate variability at the frame level. Since none of these approaches address the rate variability of full screen video of TV programs at the scene level, the results of these studies are not valid for the proposed ITV application. In this section, we show how a deterministic traffic characterization is defined. Then we discuss the current parameterized deterministic traffic models, evaluate these models, and select a suitable model for the proposed ITV application. 2.2.1 D e t e r m i n i s t i c Traffic C h a r a c t e r i z a t i o n We describe the traffic by means of a cumulative function defined as the amount of data (e.g., number of bits or packets) generated by the source in the time interval [0,t]. This functions is called the cumulative traffic function, and is defined as A(t)= fy{T)dT (2.1) Jo where y(r) is the bitrate of the source at time r. We use discrete time, where the time parameter t corresponds to an integer number representing the GOP number in the video sequence. Therefore, in our application y(r) denotes the number of bits generated by a video source during the r GOP (i.e.. y(r) is the size of the r GOP). In this case, 2.1 becomes 1 th ih t Vt>0 A(t) = J^y(r), (2.2) r=0 Group Of Pictures (GOP) in M P E G terminology is the group of frames between two consecutive / frames. The GOP length in most NTSC T V programs is 15. Therefore, with a frame rate of 30 fps, each GOP corresponds to 15/30 = .5 seconds. 1 19 The traffic characterization of a source is obtained by defining the traffic constraint function A*(t), that defines an upper bound on the amount of traffic generated over any time interval of length t. Thus A*(t) > A(s + t) -A(s) Vs>0 (2.3) Note that the traffic constraint function A*(t) does not depend on s and hence provides a time-invariant bound for the function A. The traffic constraint function is always wide-sense increasing (i.e., A*(t) < A*(t + r) for r > 0 ). As discussed in [19] and [20], A*(t) defines a meaningful constraint only if it is subadditive, which means that A*(t + s) < A*(t) + A*(s) for all s, t > 0. If A*(t) is not subadditive, it can be replaced by its 'subadditive closure' [21]. The subadditive closure of a function f(t) is the function f'(t) defined with the following recursive equation /'(0) = 0, f'(t) = min [/(*), min [f'(s) + f'(t - s)}} , 0<.s<t t > 0. (2.4) E m p i r i c a l Envelope The empirical envelope is the tightest traffic constraint function of a source and is defined as E(t) = m&x{A(s + t) - A{s)} Vs>0,Vi>0 (2.5) s The empirical envelope indicates the maximum burst size that a source generates in any time interval of length t. The shape of the empirical envelope function carries important information about the burstiness of the source in the worst-case conditions. For example, Figure 2.1 shows E(t) for a constant bitrate (CBR) and a typical variable bitrate (VBR) source. For the CBR source with rate R, the empirical envelope is a linear function of t with slope R, i.e., E(t) — Rt. For a VBR source with maximum bitrate R, E(t) is a concave non-decreasing function. For a small t, E(t) of a VBR source is typically very close to Rt. Traffic characterization of m u l t i p l e x e d sources Consider an ideal multiplexer with N input video sources. An ideal multiplexer does not delay the incoming traffic and generates a multiplexed stream of all 20 Figure 2.1: E(t) for a CBR and a typical VBR source. the input streams. The instantaneous rate of the multiplexed stream is the aggregate instantaneous rate of the input streams. If the N input streams are each characterized by the traffic constraint functions A*(i), i = 1, 2,.... N, then the multiplexed stream has the traffic constraint function A* such that mux N (2.6) i=i 2.2.2 P a r a m e t e r i z e d Traffic C h a r a c t e r i z a t i o n M o d e l s In order to use the concept of the traffic constraint function in a practical system, it is necessary to represent the function A*(t) with a parameterized model. Such a parameterized model would significantly facilitate the design of the admission tests of the incidental streams. Besides, by using a parameterized model, the sources can efficiently specify their traffic characteristics to the system, as only a few parameters need to be conveyed. As discussed above, the criteria used to evaluate a traffic characterization model are accuracy, simplicity and efficiency of the model in capturing meaningful information about the burstiness of the sources. From the perspective of bandwidth provisioning, the model should be accurate. This means that A*(t) should be as tight as possible, so that we do not overestimate the traffic of the source. Since the empirical envelope is the tightest bound for the traffic of a source, it is used as a benchmark for the accuracy of a traffic constraint function. While in general a model with more parameters can achieve a more accurate traffic constraint function, the additional parameterizations cause an increase in the complexity of the traffic model. Thus, the selection 21 of an appropriate model must involve a compromise between accuracy and simplicity. There are five important parameterized deterministic traffic models studied in the literature, known as the peak rate model, the (r,T) model, the the (a, p) model, the (a, p) model and D-BIND model. The peak rate method The peak rate model is the simplest and the most widely used model of all traffic models. In this model, the traffic is described with only one parameter, the peak rate R The traffic constraint function for this model is given by A*(t) = R x t for all t. The model is usually enforced for video sources by the rate control section of the source encoder. Note that the peak-rate model is appropriate for specifying CBR traffic, but will overestimate the traffic of VBR sources. This is illustrated in Figure 2.2-a, where the empirical envelope, E(t), and the peak-rate model traffic constraint function, A*(t), are shown for a VBR source. As shown, the A*(t) is not a good model for E(t) and overestimates the traffic for long time-intervals (i.e., for large fs). max max The (r, T) model The (r, T) model describes the traffic with a rate parameter r and a framing interval T. In this model, time is partitioned into frames of length T and the traffic generated during each frame interval is limited to rT bits. Thus, this model enforces an average rate r, while allowing for moderate bursts. The traffic constraint function for this model is given by: A*(t) = (\t/T] + l)r,Vt > 0, which is illustrated in Figure 2.2-b. This model is most suitable for VBR sources with small fluctuations in their bitrate. If a VBR source has largefluctuationsin its bitrate, then in order to capture all changes in the traffic a large framing interval T must be selected. However, selecting a large T usually results in overestimation of the traffic. The (a, p) model The (a, p) model describes the traffic with a burst parameter a and a rate parameter p [15,16]. In this model, the traffic constraint function is A*(t) = a + pt. Hence, this model enforces a rate p, while allowing some burstiness up to a. Figure 2.2-c shows the traffic constraint function A*(t) for this model. Though this model is very simple, it has been successfully used in efficient characterizing of a large class of traffic 22 sources. This model can easily be implemented by using a leaky bucket traffic shaper . Because of these attractive features, this model has been widely used in many traffic engineering applications. 2 The (<T, p) model The (a, p) model is a generalization of the (a, p) model. A (o,p) model consists of a set of m (ai, pi) pairs, 1 < i < m. The traffic is limited by each (oi,pi) pair, i.e.,: A*(t) = mm(a + t), i l Pl Vi > 0 (2.7) Figure 2.2-d illustrates the A*(t) function for this model with m = 3 piece-wise linear segments. As shown, the traffic constraint function for this model consists of m piecewise-linear segments. By increasing the number of (cr, p) pairs m, the model results in a tighter and more accurate constraint function for the traffic. This is illustrated in Figure 2.3, where the (a,p) model for a source is plotted for rn =1,2 and 3. As shown by increasingTO,the traffic constraint function A*(t) gets closer to the empirical envelope of the source. However, practical considerations, such as implementation complexity, limit the size of the model, rn. Therefore, there is a tradeoff between the accuracy of the model (which usually requires large m) and the simplicity of the model. The D-BIND model The D-BIND traffic model is a general traffic model that uses a number of rate-interval pairs {(Ri,Ii)\i — 1,... ,n} [24,25] The maximum rate over any interval of length Ii is restricted to Ri for all pairs i. The traffic constraint function is given as follows: A*(t) = R i \ ~ (t - Ii) + IUU for all h_ x <t <h (2.8) The traffic constraint function of the D-BIND model thus consists of n piece-wise linear segments as shown in Figure 2.2-e. Note that the (a, p) model can be viewed as a special case of the D-BIND model since the (a, p) model defines an n segment concave piece-wise linear constraint function. It should be noted that the traffic constraint function of the D-BIND model in some instances may not be subadditive. 2 Efficient implementation of the leaky bucket mechanism is discussed in [22,23]. 23 'h h h n (e) D-BIND model Figure 2.2: Traffic constraint function for different traffic models. 24 (c) m — 3 Figure 2.3: A*(t) for (B,p) model for m = 1,2 and 3. Several studies have evaluated these deterministic traffic models for modelling the video traffic with respect to accuracy and simplicity criteria (see [26,27] and [17]). In these studies, the simplicity of the models are evaluated based on the complexity of implementation of the admission control tests and the traffic monitoring and policing mechanisms. Meanwhile, it is shown in [27] how the parameters of each model can be expressed in terms of the other models. This enables a direct comparison of these models. All 3 4 In all of these studies, the traffic video source is considered at frame level or A T M cell level. In traffic monitoring, the traffic of the source is monitored in real time to make sure that it complies with its traffic characterization model. If the real traffic does not comply with the model, the traffic shape is enforced to follow the model by a mechanism called 3 4 traffic policing. 25 these studies indicate that the (a, p) model is superior to the peak-rate and (r, T) models. However, they also show that the use of a single (cr, p) model cannot usually achieve an acceptable accuracy for most of the applications. It is shown in [17] that the (cr, p) model which employs multiple (cr, p) models can accurately characterize the VBR video. We employ the (cr, p) model, as it is known to be simple and accurate for modelling VBR video [17,26,27]. In order to evaluate the (5, p) model with respect to its ability to capture useful characteristics of traffic for our application, we studied the empirical envelopes of several typical TV programs. Figure 2.4 shows E(t) for two typical TV programs, a movie and a news program. The results show that E(t) is a concave increasing function, with two expected characteristics. 1) For very small i's, Eit) is almost linear, that is E(t) w R t, where R is the maximum rate of the source. 2) For very large t's, E(t) is also almost linear with the slope dE(t)/dt sa R , where R is the average rate of the source. For other t's, E{t) is a concave decreasing function. In order to capture these two important characteristics of the E(t), we select the first (a,p) pair of the model as 0\ = 0 and p\ = R, . In addition, the rate parameter of the last (o~,p) pair is set to p„ — R . This selection captures the two important characteristics of E(t) in our model. The other (cr, p) pairs' parameters should be found so that A*(t) models the concave section of E(t). max max avg avg nax L 2.3 avg Fitting the (<r, p) Model to a Source We here address how to construct an accurate (a, p) model for a traffic source, specifically a video source, using a few (cr, p) pairs and a reasonable amount of computational effort. Finding an efficient and practical way of constructing a model, which offers the right trade off between accuracy, size and computational effort, is a real challenge. Such a model should be accurate in order to achieve high bandwidth utilization, and should include as few (cr, p) pairs as possible, so that it can be used in practical admission control schemes [27]. Computation time is very important for online traffic sources, where the (a, p) model should be constructed by monitoring samples from an online traffic source. There are many methods for selecting the (a. p) model parameters whose design objective is not to strive for high bandwidth utilization, but 26 E(t) E(t) 600, 500 400 300 200 100 100 200 300 400 500 600 700 8 0 0 900 t (b) "Mission Impossible" movie Figure 2.4: E{t) for two typical TV programs. The traffic is normalized to its maximum rate, i.e., the maximum rate of these sources is 1. instead to select the traffic parameters according to bandwidth availability. These methods include Dual leaky bucket, fixed burst [28], concave hull [17], product [29] and maximum distance [17]. Characterization of a VBR traffic source with the goal of achieving high bandwidth utilization and use of the (a,p) model is studied in [1,17,30]. The benchmark in these methods for evaluating the accuracy of a traffic source is the empirical envelope of the source. Since E(i) is the most accurate traffic constraint function for a source, the approach of these methods is to first construct E(t), and then construct A*(t) as an approximation of E(t). However, construction of E(t) for all t is extremely time consuming and is not practical in real time applications. Existing methods use extrapolation techniques to reduce the computational load in finding E(t) for large time intervals (i.e., large t's), a process that in turn results in rough estimates of E(t). Moreover,findingthe model parameters from the empirical envelope is also a challenge. Existing methods are based on a 'brute force' search approach, which is extremely time consuming. In this section, we present a new algorithm for constructing the empirical envelope of a source. We show that our method results in a better approximation of E(t), when compared to existing methods. Moreover, due to 27 its speed and iterative design, our method can easily be employed for on-line traffic sources, where the source traffic is not known a-priori and the speed of the traffic characterization algorithm is important. We also present a unique and robust algorithm for obtaining the optimum model parameters from E(t). Since E(t) is the most accurate constraint function for a source, we select the (a, p) model parameters so that A*(t) > E(t) for all t, where A*(t) = minifa + pit) and A*(t) is as close to E(t) as possible. We use the 'divide-and-conquer' approach in our algorithm and set up the problem such that a powerful optimization method, called Sequential Quadratic Programming, can be applied to the problem. Unlike other methods in the literature, our method finds the model parameters directly from the true or sub-sampled E(t). In addition, our method is faster and more robust than the current methods and results in a near optimum model. In Section 2.3.1, we review previous works related to fitting a (5,p) model to a traffic source. We present our method for constructing the empirical envelope in Section 2.3.2. In Section 2.3.3, we present our method for obtaining the (a, p) model parameters from the empirical envelope. 2.3.1 O v e r v i e w o f C u r r e n t M e t h o d s f o r F i t t i n g t h e (a, p) M o d e l t o a Traffic Source As mentioned above, all current methods are based on the two counterparts. First, constructing the empirical envelope, E(t), and second, finding the (5,p) parameters from E(t). We now discuss current methods for each part. C o n s t r u c t i n g E(t) from the traffic •In [17], the empirical envelope E(t) is obtained by running an exhaustive search and finding the maximum burst size in the entire stream. More precisely, if the instantaneous traffic rate of a source is given by y(i) and the total length of the source is N, then E[t) is constructed by calculating: k+t-l (2.9) The drawback of this approach is its extensive computational complexity. In order to compute E(t) for 1 < t < N, 0(N ) operations is required. In 2 28 addition, N is generally very large, specially for video traffic sources (e.g., N « 10 for a 2 hour movie when the traffic is considered at the frame level). Hence, for the majority of video sources, it is not practical to compute E[t) using equation 2.9. There are methods in the literature that strive to reduce this computational complexity [1,30]. In [1,30] E(t) is approximated using extrapolation. That is, E(t) for 1 < t < r is computed using equation 2.9, and E(t) for t > T , denoted by E (t), is extrapolated from E(t), 1 < t < T. The extrapolation method used is either the "largest subadditive closure" with the computational complexity of 0(N ) (which is the same complexity if E(t) was constructed using equation 2.9), where 5 T 2 E {t)= mm {E(k) + E(t-k)} T for t > T (2.10) or the "repetition extrapolation" with the computational complexity of 0(TN), where E(t) for t > r is obtained by simply repeating the first r values in the envelope (i.e., E(t), 1 < t < r) E {t) = l-\E(T) + E(t-[-\T) for t>r (2.11) r The parameter r is experimentally selected for each application. The disadvantage of this extrapolation approach is that it results in high utilization only for the small maximum waiting times T . For large T 's, the traffic characterization based on E (t) results in a poor network utilization. This is because for large t's, E (t) is not a good approximation of E(t). This is illustrated in Figure 2.5, where E(t) and E (t) for a typical video source are shown. We observe that E (t) is not a good approximation for E(t) for large t's. T T w w T T T T F i n d i n g t h e m o d e l parameters from E(t) Once E(t) (or E (t)) is found, A*(t) is constructed as a piece-wise linear approximation of E(t). The current approach includes two steps. First, since E(t) is not necessarily concave and sub-additive, it is replaced with the concave hull of E(t), denoted by H(E) [17,18]. H(E) is the smallest piece-wise linear concave function larger than E [31]. Theoretically, H(E) can be used as A*. However, the number of piece-wise linear segments in Ti(E) is usually very large, resulting in an impractical large model size. For this reason, Ti{E) is replaced with another piece-wise linear function that has only a few linear segments. The idea behind this approach is to use a cost function to measure the T 29 Figure-2.5: E(t) obtained from the direct approach, largest subadditive closure extrapolation, repetition extrapolation for r = 400 and sampling with 5 = 400. The source is a VBR MPEG-2 sequence, selected from the motion picture 'Mission Impossible' with 720 x 480 resolution, frame rate 30 and maximum bitrate of 5 Mbps. difference between H(E) and the traffic constraint function of the new model (say A*).where the model size n is small enough. Then an algorithm with a heuristic approach is used to find the (cr, p) parameters of the A* model. In this algorithm, each (o~i,Pi) pair is updated in each iteration through an exhaustive search through all the possible values for each (ai, p^ pair to find the one that minimizes the cost function (see [1,30]). The major drawbacks of this method are: 1) the computation of the convex hull of E and the heuristic approach to reduce the model size are both computationally expensive, 2) the heuristic method to reduce the model size is not always guaranteed to converge to a result, and 3) this method might converge to the local maximum of the cost function, thus it does not guarantee that the optimum parameters are found. n 30 2.3.2 O u r Approach to Constructing the Empirical E n velope Here, we present a method that finds the exact values of E(t) for all 1 < t < r. For t > T, we find E(t) at equally 5 spaced samples t = 5, 25, ...,n5 (see Figure 2.5). The sampling interval 5 is a positive integer. If 5 < r, then the samples at t = id for which iS < r are repeating and need not to be computed again. For simplicity, we present our algorithm for constructing E(t) for t < T and for samples of E(t) separately in sections 2.3.2 and 2.3.2, respectively. The number of operations to construct E(t) for 1 < t < r and the equally spaced samples of E(t) are 0(TN) and 0(nN) respectively. C o n s t r u c t i o n o f E(t) for t < r Assume that the total number of bits generated by the source, y[i) for i = 1,2,... ,N are given, where N is the total length of the source. In our case, N is the total number of GOP's in the whole video sequence. We like to find E(t) for t = 1, 2,..., T, where r < N. E{t) is given by [17] k+t-l (2.12) We define vector s of size 1 x T as fc Sfc = y(k) ... Etfc-i E,t, (2.13) The objective is to construct s for A; = 1,2,..., AT. The empirical envelope E(i) is computed as the max(sfc('i)). Sfc is easily constructed from s _i by shifting elements of Sfc_i down and adding y(k) to the result. Our algorithm consists of the following steps: k fc 1. Let k = 1 and initialize s and e as follows x [y(i) o ••• < TXl (2.14) (2.15) e = Si 31 2. Let k = k + 1. Find Sfc and e using Sfc — ( s h i f t e l e m e n t s o f Sk-i 0 ••• 0 ] + [y(k) down by one) (2.16) T (2.17) e — max(e, Sfc) 3. Repeat step 2 until the last input is reached, i.e, k — N. When the algorithmfinishes,e is the empirical envelope of the source e(i) = E(i) for 1 < i < T. The computational complexity of our algorithm is only (D(TN), which is considerably less than 0(TN ) for a brute force approach using equation 2.12. Due to the iterative structure of our method, it can easily be adopted in on-line applications, where the source traffic y[i) is not known a priori. 2 C o n s t r u c t i o n of samples of E(t) for t — 5,25,..., n5 Given the traffic source y(i), we like to compute E(t) for t where n is the number of samples to be computed. Let — fc 5,25,... ,n5, (2.18) l=k-iS+\ Then using equation 2.12, we have E(i5) = max Afe(i). Our goal here is to construct Afc(i) in an efficient way rather than computing Yl^k-iS+i 2 / ( 0 f° all 1 < k < N and 1 < % < n. Our algorithm iteratively constructs Ak, k — 1,2,... ,N and 1 < i < n. The key idea of our algorithm is that for each new k, we efficiently re-use some-pre-computedterms to construct A^ii). By doing so, our algorithm reduces drastically the number of operations required to construct each Ak{i). First, we define the vector Ak of size n x 1,. where Afe(i) is the sum of i6 consecutive input ending by y(k) as defined in equation 2.18. It can be easily shown that fc r fc-(i-l)(5 A (t) k = A {i -1) + y ( 0 k ;=fc-i<5+l = A {i - 1) + A _ _ (l) k fc 32 (i 1)5 for i > 1 (2.19) Our algorithm relies on equation 2.19 to iteratively construct A*, for k — 1, 2,..., N. Equation 2.19 requires A (l), which is already computed in previous iterations. Hence, we save the first element of A*, in each iteration for future use. For this purpose, we use a vector A of size ((n — 1)5 + 1) x 1, where A (l) is pushed into A in each iteration. In k iteration, we have A(i) — Afc_ (l) and equation 2.19 becomes fc th fc i+1 A (i) = A ( i - l ) + A((i-l)<5+l) fc for i > 1 f c (2.20) The empirical envelope samples, i.e., E(t) at t — iS, are easily obtained as maxfc Afc(i). The algorithm is summarized as follows. 1. Let k — 1. Initialize n x 1 vector A, n x 1 vector E$ and ((n— 1)5+1) x 1 vector A as : E = A=[y{l) y(l) s f A(l) AU) = { I0 K w ' ... (2.21) , 2.22 y(l)] T for i = 1 for 1< % < (n - 1)5 + 1 2. Let k = k + 1. Update A(l) as (2.23) A(l) = A(l) + y(k)-y(k-5) Note that if this algorithm is executed in parallel with the algorithm presented in 2.3.2 and 5 < r, then we have A(l) = s (5). Hence, this step of the algorithm can be ignored. k 3. Update vector A f A(l) for i = 1 I A{i - 1) for 1< i < (n - 1)5 + 1 A(i) = \ W 2.24 W 4. Update A(i) for i > 1 using equation 2.20. 5. Update Es — max(Eg, A). Steps 2 to 5 of the algorithm are repeated for k — 2, 3,..., A . When the algorithmfinishes,we have E(i5) = Es(i), 1 < i < n. r 33 A l g o r i t h m 1 Find the (a, p) model parameters. I N P U T S : E(t) for t = 1, 2, . . . , N; a criteria to end the algorithm (i.e., the size of the (a, p) model n, or the maximum acceptable error in the model). O U T P U T : ai and p* for 1 < i < n. 1: 2: 3: 4: 5: 6: F i t a single (a, p) to E(t) for t = 1, 2, . . . , N A*(t) = a + pt w h i l e A* (t) does not satisfy the accuracy criteria or the model has not reached its maximum size do Find [ T i , T ] such that for all t G [Ti,T ], A* (t) overestimates E(t) more than a threshold F i t a single (cr, p) to E(t) for t G [TI, T2] A d d the new (a, p) to the model 2 A * (t) = m i n i ( ( T i + 7: 8: 2 pit) end while Figure 2.6: Pseudo-code of our algorithm for obtaining the (a, p) model parameters from E(t). 2.3.3 O b t a i n i n g t h e (cf, p) P a r a m e t e r s f r o m t h e E m p i r i cal Envelope Our method forfindingthe (a, p) model parameters from E(t) follows a divideand-conquer approach [31]. In this approach, the problem is broken into subproblems which are similar to the original problem but smaller in size, the subproblems are solved, and then the results are combined to create the solution to the original problem. Following this technique, we divide our problem to the subproblems of fitting a single (a, p) to the E(t) for a specific range of t, let us say t G [7i,T ]. This subproblem has a smaller size than the original problem and is easier to solve. In Section 2.3.3 we describe how we solve this subproblem, and how all the results are combined to obtain the final (a,p). Suppose E(t), for t — 1, 2,..., N, is given. Our algorithm first fits a single (cr, p) model to the whole input data. That is, we find a and p so that a + pt > E(t), and a + pt is a good approximation for E(t), for t = 1, 2,..., N. Then we proceed by reducing our problem to a subproblem of smaller size. For this, we first select the interval [T ,T ] such that A*(t) is not a satisfactory estimate for E(t) for all t G [Ti,T ]. For example, we find a [Ti,T ] such that 2 X 2 2 2 34 {A*(t) - E(t))/E(t) is greater than a threshold for all t € [T T ]. If there are more than one such interval, we select the one that A*(t) is the worst estimate for E(t). The same approach is then used, and a single (o~,p) is fitted to E(t) for t G [T T2]. This will add a new (cr, p) pair to our model. This procedure of adding a single (a,p) pair to the model is repeated until a criteria for ending the algorithm is met. This criteria depends on the application and is either the maximum number of (a, p) pairs in the model or an accuracy criteria. For example, in some applications the practical considerations may require that the model size does not exceed a certain size, say n. In this case, the algorithm ends when n (cr, p) pairs are added to the constructed model. On the other hand, some applications may require that a certain level of accuracy is preserved in the model, e.g., A*(t) does not overestimate E(t) more than a threshold, say p%. In this case, we add (cr, p) pairs to the model until A*(t) satisfies this accuracy criteria. 1; 2 1; O u r s u b - p r o b l e m : fitting a single (cr, p) to E(t) In each iteration of our algorithm, we need to solve the sub-problem of fitting a single (cr, p) to a part of E(t). That is, given E(t), T\, and T , we should find a and p such that: 1) a + pt > E(i) for all 1 < t'< N; this constraint ensures that the constructed model is concave and does not underestimate E(t) for any t, and 2) 0 + pt is an optimum approximation of E(t) for t € [T\,T ]. In order to measure the closeness between A*(t) and E(t) for t G p i , T ] we use the error function defined as: 2 2 2 error{a, p) = Zjl (° + P ~ E{t)) t Tl = (T - 7\ + l)(a + ®g&p) 2 - E S r , W) (2-25) The terms ESTJ E(t) and T — T\ + 1 do not depend on o or p. Hence, we only need to minimize the function 2 error(a, p) = o + ^~^-p (2.26) with the constraint a + pt > E(t) for all 1 < t < N. This is a classic optimization problem and there are many standard approaches available in literature to solve such a problem [32]. We choose to employ the Sequential Quadratic Programming (SQP) method to solve this problem in our application. SQP 35 t Figure 2.7: Divide and Conquer approach in our algorithm, (a) First step: a single is fitted into the E(t). (b) Second Step: since A*(t) in step one was not a satisfactory estimate of E(t) for [Ti, T ], the (cr , p ) is fitted to E(t) for t e [ T i , r ] . (c) Third Step: the \<r , p ) is fitted to E{t) for t G [ T i , T ] . 2 2 3 3 2 2 2 is a robust and state of the art technique for solving optimization problems. For implementation details of this technique see [32,33]. Using the SQP technique, we easily minimize the function defined in equation 2.26, and find the optimum a and p to our problem within a few iterations. Note that there are- other approaches to finding the parameters of a single (cr, p) model like the 'product method' [29], 'maximum distance' [34] and 'fixed burst' [28]. These methods use different optimality criterion for selecting the parameters. We should point out that instead of using SQP, any of the above mentioned approaches can be used to solve the subproblem of 36 T=0 T=0 Figure 2.8: The General system S. fitting a single (a, p). 2.4 Admission Control In this section we present a method whichfindsa bound on the waiting time of the incidental streams data. This method is based on the "Network Calculus" theory [19,35,36]. In Section 2.4.1 we discuss the basics of the Network Calculus and describe how the bound on the waiting time is obtained. Then, in Section 2.4.2 we present an algorithm that employs this method and finds - W• J 2.4.1 Network Calculus Basics Consider a general system <S which is viewed as a black box; S receives the input data at the variable rate yi (t) and delivers the data after a variable waiting time at the variable rate y ut{t)- In the proposed ITV application, S is the multiplexer on the transmitter side, that multiplexes the main and incidental streams. We define the cumulative function of the amount of data input and output to S as n 0 T=t in{t) = Yl ( ^ Vin A T Vr > 0,Vi > 0 T=t Aout(t) = ^Voutir), Vr>0,Vt>0 (2.27) T=0 A {t) and A (t) are called the 'arrival' and 'departure' functions, respectively [35,36]. The arrival and departure functions for a sample system are in out 37 (a) Continuous time. (b) Discrete time. Figure 2.9: The arrival and departure functions: (a) Continuous time, where A and A are defined for all t > 0, (b) Discrete time, where A and A are only defined at discrete times denoted by dots. in out in out illustrated in Figure 2.9. From the arrival and departure functions we derive the following two quantities: B a c k l o g : The backlog at time t is the amount S at time t and is given by A (t) — A (t). of data waiting in the system As shown in Figure 2.9, the backlog at ti is simply the vertical distance between A {tx) and A (t )in out out m 2 The waiting time at time t is the time that the incoming data at time t will wait in the system S before being served. The waiting time for the data that is input to the system at time t is given by: W a i t i n g t i m e (or delay): = min{r > 0 : A (t) < A {t + r)} d(t) in (2.28) out The waiting time at t is illustrated by d(t ) in Figure 2.9. If the traffic is continuous, then A (t ) = A (t + d(t )), which means that all the input data to the system up to the time to are served by the time t + d(t ). As shown, for continuous traffic the waiting time is simply the horizontal distance between A and A . 0 in 0 0 out 0 0 0 in 0 out B o u n d s on W a i t i n g T i m e a n d B a c k l o g Network Calculus gives computational rules for bounding the waiting time and backlog. Before discussing how these bounds are obtained, we need to define the service curve and the horizontal deviation concepts: 38 Assume A (t) and A (t) are given functions. We say that the system S offers the input a service curve /? if and only if for all t > 0, there exists some s, 0 < s < t such that Service C u r v e : in out A (t) - A (t -s)> out m B{s) (2.29) where (3{t) > 0 for all t > 0. The service curve is an abstract concept, and indicates the capacity of system in accommodating traffic during a time interval of length s. Roughly speaking, f3(s) is a lower bound on the amount of traffic that can depart from the system during any time interval of length s, that is A (t) — A (t — s) > P(s). To better understand the physical meaning of f3(s) we write the equation 2.29 as out out A {t) > A {s) + (3(t - 5 ) out (2.30) m Then, a more precise physical interpretation of [3 is that if s is the beginning of a busy period, that is the backlog at s is zero (A (s) — A (s) = 0) and there are always some data waiting in the system in [s,t], then the system will send at least (3(t — s) data units in [s,t]. out in The horizontal deviation between the arrival and departure functions denoted by h(A , A ) is defined as the maximum of all the waiting time values d(t), and mathematically is defined as Horizontal deviation: in h(A , A ) in out — onl maxd(i) = max{min{r > 0 : A (t) < A (t + r)}} in oat (2.31) Now assume the input traffic to the system is characterized by the traffic constraint function A* (t). This means that for alii > s > 0 (see 2.2) n A (t) - A (s) < Al(t in in (2.32) - s) Two theorems in the Network Calculus state that the backlog and waiting time in a system are bounded respectively by the vertical and horizontal deviations between the traffic constraint function of the input A* [t) and the service curve of the system (3(t) (see [19,20,35,36]). Since we are interested in the bound on the waiting time, we only state the theorem that defines a bound on the waiting time: n 39 High proirity Stream #1 A* (£) High proirity Stream #2 A* (t| High proirity Stream #N ^ j y (*> Low priority stream Low priority buffer (a) High priority buffer Equivalent high priority stream rateC Low priority stream Low priority buffer (b) Figure 2.10: a) The transmitter model with N high priority (main) streams, b) The equivalent model, where the ./V high priority input streams are replaced with one equivalent stream. T h e o r e m 1 Assume a traffic source constrained by A* (t) traverses a system S that offers the service curve 8(t). The waiting time d(t) for all t satisfies: d(t)<h(A* ,0) [35,36]. n n For a proof of this theorem, see Appendix 1. 2.4.2 Waiting-Time B o u n d i n the proposed I T V application. We model the multiplexer in our interactive TV application with the model shown in Figure 2.10-a. In this model, the inputs to the system are N main 40 streams and one incidental stream. The main streams have transmission priority over the incidental stream, i.e., the multiplexer serves the incidental stream only when the main stream's buffers are empty. The rate of the outgoing channel is constant, equal to C packets per second. We assume the traffic of the i main stream is characterized by the traffic constraint function A*(t). As discussed in 2.2.1, all the main streams can be replaced with one equivalent high-priority stream, which is constrained by A* (t) where th H N A* (t) = H J2 W A (- ) 2 33 This is shown in Figure 2.10-b. We denote the arrival and departure functions for the low-priority input (i.e., the incidental stream) by A ^(t) and A i(t). Our motivation here is to first find the service curve for the low-priority input. Then, using theorem 1 we will find T . Consider an arbitrary time t. Call s < t the beginning of a busy period for the low-priority input, i.e., the backlog for low-priority input at s is zero (A i(s) — A £,(s)) and there is always some low-priority data waiting in the system during [s,t]. During [s,t] the high priority inputs can send up to A* (t — s) packets to the system. Hence the system will send at least C(t — s) — A* (t — s) packets of the low-priority input in [s,t]: in out< w oui] irii H H A , (t) out L - A {s) outtL (2.34) > C{t - s) - A* (t - s) H Since the backlog at s is zero then we have A , {t) - A , {s) out L in L (2.35) > C(t - s) - A*„{t - s) It follows from this equation that the service curve for the low-priority input is 0 (t) = L Ct-A* {t). H The traffic constraint function for a constant bitrate incidental stream is given by where R is the rate of the stream. We use theorem lto find the maximum waiting time T in the proposed ITV application. That is, we consider a hypothetical system where the arrival function is A* (t) and the departure function is 0L(t). Theorem 1 states that T , defined as the maximum of d(t), is given by the horizontal deviation between A* and Pi, i.e., T = h(A* , Bi), where and A* (t) are given in equations 2.35 and 2.36. w nL w nL L 41 w nL Find Tw INPUTS: (ui,pi) pairs for i = 1,2, ...,m; the incidental stream rate R; the incidental stream duration T and the channel rate C. OUTPUT: T 1: for i = 1 to m do Algorithm 1 w 2 : ^ = ^ i ^ i ' Pi-Pi-X ti is the abscissa of the intersection of the i and i — 1 line segments of th „ i: . _ di — di th (R-C+pi)xt+tTi R is the horizontal distance between the Rxt and PL^I)- 4: e n d for 5: i f T ^ oo t h e n 6: Find T such that (3 {T') = RxT L 7: d =V -T 8: else 9: i f maXi{C — Pi) < R t h e n T 10: dT 11: else 12: = oo dr = 0 end if 14: e n d i f 13: 15: Tw = max{dr, maxj{dj}} 16: RETURN Figure 2.11: Pseudo-code of our algorithm to find T . w 2.4.3 O u r A l g o r i t h m to F i n d T w Our algorithm is presented in Figure 2.11, which uses the method presented in the previous section and finds T . The inputs to the algorithm are the m parameter pairs (oi,pi) of the main stream constraint function A* (t), the channel output rate C, the duration of the incidental stream T and the rate of the incidental stream R. We have w H (3 {t) = Ct-A* (t) L H P (t) L = Ct - mm{ai + p^} i = m a x { - a + (C - )t} i Pi (2.36) (2.37) i As shown in Figure 2.12, /?£,(£) is a convex piece-wise linear and non-decreasing function. First, the algorithmfindsU for i — 1, 2,.., m, where U is the abscissa 42 Figure 2.12: B {t) and A* (t). L L of the intersection of the i and i — 1 line segments of 3L- In the next step, the algorithm computes the horizontal distance di between A* and 3Lit) for t = i = 1, 2,..,TO(see Figure 2.12): th th L ^ = i ,-^M = + R +* (2 R v . 38) y If the incidental stream is a video or audio sequence of length T, then the horizontal deviation at t = T is computed and denoted by dr- Otherwise, that is if the incidental stream is a video or audio sequence with an unlimited duration, the algorithm checks if < Rt for very large time intervals (i.e.. for t —> oo). If this condition is not met, it means that the system cannot guarantee any maximum waiting time and d? is set to oo. Finally, T is found as the maximum of all dj's and dxw 2.5 Results In this section, we present the numerical results of implementing the methods presented in this chapter. First, we will evaluate the performance and accuracy of our model fitting methods presented in sections 2.3.2 and 2.3.3 by comparing them with the current methods in literature [1,17]. Then, we will present the results of implementing our admission control scheme using empirical traffic traces from video sequences of typical TV programs. These results demonstrate how the rate R and the maximum waiting-time T depend on each other in a typical digital TV system. w 43 2.5.1 N u m e r i c a l Results of ' M o d e l F i t t i n g ' methods In our first experiment, we evaluated the performance and accuracy of our model fitting methods presented in sections 2.3.2 and 2.3.3 by comparing them with the current methods in literature [1,17]. The traffic source used in our experiment is an MPEG-2 video stream from the motion picture "Mission Impossible". This video stream was encoded with constant picture quality, with picture resolution 720 x 480, frame rate 30 and maximum bitrate of 4.5 Mbps. The length of this video was about one hour, which corresponds to N « 10 frames. In our simulation, the time parameter is an integer number t € N = {0,1,2,...} that represents the GOP number in the MPEG video stream. Since the frame rate of the video stream is 30 fps and the GOP size is 15, each GOP is 0.5 seconds. Thus, the 25 minutes sequence represents 3000 GOPs. The traffic in our simulation is also discrete and represents the number of packets. We use constant size packets of 184 bytes. This conforms to the digital TV and MPEG-2 standards . For example, if a source generates 2 MBits in the time interval [0, .5] (i.e., in the first GOP), then the discrete traffic is represented by ?/(l) = cei/[(2xl0 )/(184x8)] — 1359 packets. In order to make the results transparent from the maximum bitrate of the source, we normalize the traffic to 1 by dividing y by its maximum. For example, in the previous example, if the maximum bitrate of the source is 4.5 Mbps and GOP-time = .5 1 seconds, then we have y(l) = = 1359/1529 ~ .88. In our first experiment, we construct the empirical envelope E(t), and compare the speed of our algorithm with other methods in the literature. We construct E(t) using the direct method [17], the largest sub-additive extrapolation method [1], the repetition extrapolation method [1], and our methods presented in Section 2.3.2. Table 2.1 summarizes the computation time of each method. As shown, our algorithm speed is almost the same as that of the repetition extrapolation approach, and both are considerably faster than the direct method. However, the extrapolation approaches do not estimate E(t) for t > T closely, while our methodfindsthe exact samples -of Eit) for t > T (see Figure 2.5). In our second experiment, we fit a (a, p) model to the whole empirical 5 5 6 jSig ,)! The American and the European digital T V standards employ the MPEG-2 transport stream (TS) syntax for the transmission stream. Each packet in MPEG-2 TS syntax contains 184 bytes of data payload plus 4 bytes header. 5 44 Computation Method Execution time (seconds) Direct approach Repetition extrapolation r = 200 Sub-additive extrapolation T = 200 Our sampling method 161.27 17.28 150.48 23.60 T = 200. delta = 50 Our sampling method 16.24 T = 200, delta = 100 Our sampling method 7.89 T = 200, delta = 200 Table 2.1: Execution time in seconds for calculating E[t) for 1 < t < 10000. Simulations were run on a PC with pentium IV processor at 1.7 GHz, using Matlab implementation. envelope E(t) using our method presented in Section 2.3.3, and the heuristic method presented in [1]. We computed the error function ^2^ ^^^^ metric for the accuracy of each model. Table 2.2 summarizes the results. As shown, our method results in a more accurate model for the source than the method in [1]. In order to evaluate the effect of our sampling approach on the accuracy of the constructed (cr, p) model, we fit a (cr, p) model to the empirical envelope constructed by our sampling approach. Table 2.3 summarizes the results. As shown, the model parameters and the error function for the models constructed from the samples of E(t) are fairly close to the models constructed from the whole E(t). In our next experiment, we evaluate the accuracy of our method with respect to achieving high bandwidth utilizations. For this purpose, we use a metric that determines how closely a particular model A*(t) approximates E(t) with respect to bandwidth utilization [1]. We consider a single FCFS multiplexer with a switch that operates at r — 155 Mbps. We assume that all the input traffic sources connected to this switch are from the same source, which are all characterized by A*(t). We also assume that all the sources have an identical delay bound d. Assuming n sources are connected to this A =1 45 a s a Error our method 0 1.0000 53.4927 0.3805 11.4585 0.4622 2.9396 0.5837 1.7966 0.6573 65.47 method in [1] 0 1.0000 50.1352 0.3063 13.4325 0.3293 3.5398 0.5644 1.2903 0.6888 93.96 Table 2.2: Comparison of our method with the method in [1] for finding the (cr, p) parameters. E(t) constructed for all 1 < t < N. Error entire envelope 0 1.0000 53.4927 0.3805 11.4585 0.4622 2.9396 0.5837 1.7966 0.6573 65.47 6 = 100 0 1.0000 53.4511 0.3805 12.6243 0.4546 4.4318 0.5515 2.3172 0.6117 76.74 5 = 400 0 1.0000 90.5083 0.3567 40.9453 0.3961 13.9549 0.4475 6.6479 0.5189 84.12 Table 2.3: The model parameters constructed from the entire envelope (second column) and from E(t) for 1 < t < r and samples of E(t) at t ,= i5 for 1 < i < . T = 200, n<5 = N. N switch. Then, as discussed in [15], these sources are supported by this FCFS multiplexer without maximum waiting time violation if and only if d>J2 A*(t) -txr, Vi > 0 (2.39) i=l We define the 'utilization ratio', U(A*,d), as the number of admissible sources using A*(t) to the number of admissible sources using the empirical envelope E{t) at maximum waiting time d. Particularly, U(A*,d) is the maximum n that satisfies equation 2.39, divided by the maximum m that satisfies d > YZi ( ) -t for Vi > 0. U(A*,d) shows how closely A*(t) approximates E(t) with respect to bandwidth utilization. An ideal model, which admits the same number of streams as the empirical envelope, results in the constant U(A*,d) = 1. Figure 2.13 shows U(A*,d) for a (o,p) model of size 4, constructed using our method and the method presented in [1]. As shown, our method results in a higher utilization ratio than that of the method in [1]. Figure 2.13 also shows the utilization ratio for a (cr, p) model constructed from samples of E(t). As shown, the utilization ratio for the model constructed E l xr 46 n B4|- U(A U(A U(A U(A — — o e 0 ,d), exact value of E(t) is computed for all t, and model parameters obtained using our method ,d), exact value of E(t) is computed for all t, and model parameters obtained using Liebeherr's method ,d), E(t) constructed by repetition extrapolation, x=200, and model parameters obtained using our method ,d), E(t) constructed by sampling, x=200, 5=100, n=30, and model parameters obtained using our method I 500 I 1000 I 1500 I 2000 I 2500 i 3000 I 3500 i 4000 I 4500 i 5000 d Figure 2.13: Utilization ratio, U(A*,d). from samples of the envelope is very close to the utilization ratio for the model constructed from the entire envelope. This means that using some samples of E(t) are sufficient to construct an accurate (cr, p) model and the computation of the entire envelope is not required. We also study the utilization ratio curve for different parameters S and T . In practical applications, 5 and r should be selected such that the utilization ratio is close to one for the selected delay bound d, and the computation time is reasonable for the application. Based on this experiment, we suggest 100 < r < 300, and 5 = r. This selection results in a reasonable utilization ratio for almost all maximum waiting times d. 2.5.2 N u m e r i c a l Results of ' A d m i s s i o n C o n t r o l M e c h a nism' In this section, we present the results of applying our admission control method to a typical digital TV programs. All the video sequences used as the main 47 Source type Sequence Name 1 Mission Impossible Action movie 2 Muppets Children TV show 3 News News show 4 Talk Show Oprah Winfrey Show 4 Documentary Documentary 5 Court Show Judge Judy show 6 Muppets show Sesame St. show 7 Soap opera Days of our lives 8 Cartoon Tigger movie Table 2.4: Video sequences used in our study. Compression Standard MPEG-2 Resolution 720 x 480 Rate 4.50 Mbps Frame rate 30 fps GOP size 15 Number of P frames in each GOP 4 Number of B frames in each GOP 10 Table 2.5: Encoding parameters for the video streams used in our study. video streams (see table 2.4) were selected from typical TV programs. These video sequences were encoded with constant picture quality, and with a maximum bitrate of 4.5 Mbps. The picture quality of these videos was subjectively selected to be at a satisfactory level for TV applications. The length of every sequence used was 25 minutes. Table 2.5 summarizes the encoding parameters of the video sequences used in our simulation. By studying the empirical envelopes of the main video sequences, we observed that with m. = 5 (CT, p) pairs one can accurately model the video of most TV programs. Therefore, we select m = 5 in our application. Figures 2.14 and 2.15 show the empirical envelopes and the fitted (a.p) models for the traffic sources. Table 2.6 shows the numerical value of the (<?, p) model parameters for each source. As shown, E(t) is an increasing function where E(t)/t « 1 for small £'s, and dE[t)/dt is approximately the average rate of the source for large fs. E(t) drops faster for large fs in video sources with simple content than video sources with active and complex content. The fitted (a, p) 48 model with five (cr, p) pairs is also shown in these figures. The (a, p) pairs are selected such that (cri,pi) — (0,1) and the rate parameter of the last (a, p) pair (i.e., p$) is the average rate of the source. As shown, the model can approximate the E(t) very well with only a few (cr,p) pairs in the (&,p) model. 49 50 Sequence Name 1 Mission Impossible 2 News 3 Talk Show 4 Documentary 5 Court Show 6 Muppets show 7 Soap opera 8 Cartoon (a, p) parameters "60.5602 0.6376" 20.1277 0.7025 11.3654 0.7332 7.8706 0.7615 0 1.0000_ 100.9816 0.5018" 30.4283 0.5727 12.9072 0.6938 2.1920 0.8130 0 1.0000 "27.7782 0.4642" 11.0175 0.4860 4.9481 0.5433 1.7709 0.6584 0 1.0000 103.1472 0.6157 53.8328 0.6629 20.7147 0.7503 4.5189 0.8782 0 1.0000 "16.7626 0.4403" 5.8024 0.6030 2.3032 0.7112 0.6417 0.8298 0 1.0000_ "40.4100 0.8688" 36.5059 0.8703 34.3194 0.8740 3.8070 0.9445 0 1.0000_ "53.4927 0.3805" 11.4585 0.4622 2.9396 0.5837 1.7966 0.6573 0 1.0000 "23.4084 0.8482" 8.3675 0.8680 4.6805 0.8811 0.6067 0.9556 0 1.0000_ Table 2.6: Numerical values of (<r, p) model parameters for the main video sequences used in our simulation. 51 Transmission rate Number of T V programs sharing the channel, N Maximum bitrate assigned to each main video stream Transmission capacity reserved for video streams Transmission capacity reserved for audio streams and other ancillary data M a i n video stream sources Simulation parameter set I (cable medium) 19.8 Mbps 4 Simulation parameter set II (terrestrial medium) 39.8 Mbps 8 4.5 Mbps 4.5 Mbps 18 Mbps 36 Mbps 1.8 Mbps 3.8 Mbps 1. 2. 3. 4. 1. 2. 3. 4. 5. 6. 7. 8. Mission Impossible News Talk Show Documentary Mission Impossible News Talk Show Documentary Court Show Muppets show Soap opera Cartoon Table 2.7: Simulation parameters. After fitting a (a, p) model to each video sequence, we conducted another set of experiments where we considered a system similar to Figure 2.10, which consisted of N main (high-priority) streams and 1 incidental (lowpriority) stream. This system simulates the head-end of a digital TV transmission system. We conducted two experiments using two different simulation parameters, as shown in Table 2.7. The first set of parameters are selected to simulate cable transmission medium, while the second set simulates a terrestrial medium . As noted, no portion of the transmission capacity is reserved for incidental streams. In order to illustrate the relation between R and T for the incidental stream, we plotted R versus T as shown in Figure 2.16. This graph is interesting as it provides exemplary numerical values for the rate of an incidental stream in a typical digital TV transmission system. As we expected, R is an 6 w w A 6 MHz channel in the cable medium is capable of delivering digital data at the 19.8 Mbps rate. This capacity is usually shared by 4 or 5 T V programs. In terrestrial medium, a 6 MHz channel is capable of delivering at the 39.8 Mbps rate, which is usually shared by 8 or 9 T V programs. 6 52 increasing function of T . This means that by allowing a larger waiting time in the multiplexing system, the system can accept higher rate incidental streams. However, for very large T , R becomes constant and equal to C — R g, where C is the transmission rate reserved for the main video streams and R. is the total average rate of all the main streams. This is due to the fact that even by increasing T , R cannot become larger than the channel rate minus the main streams average rate. In next experiment, we tested the accuracy of our admission control scheme via simulation by observing the waiting time of the incidental streams data units during multiplexing. The results showed that if an incidental stream with rate R and maximum waiting time T is accepted by the admission test, then the waiting time of its data units in the system is always less than T . However, for the incidental streams which were rejected by the admission test, the waiting time of some data units was more than T seconds. W W aV AVG W W W W 2.6 Conclusion In this chapter, we presented methods for implementing the deterministic service class. We employed a model for the traffic of main video sources. We used the concept of traffic constraint function, and the empirical envelope as the tightest traffic constraint function. After discussing the current approaches to deterministic traffic modelling, we selected the (a, p) model as the traffic model for our application. We showed that the (a, p) model can accurately model the empirical envelope of main video sources. Then, we presented efficient methods for fitting the (a, p) model to a traffic source. We showed that our model fitting methods result in a more efficient and more accurate model parameters than other methods in the literature. Next, we adapted the newly developed 'Network Calculus' theory, and designed an admission control mechanism for the deterministic service class of the proposed 1TV application. Our admission control mechanism finds the maximum waiting-time T for an incidental stream with rate R. Our simulation results provided some exemplary numerical values for the maximum waiting-time T and the rate R of an incidental stream in a typical digital TV system. The deterministic admission control scheme presented in this chapter W W 53 relies on the traffic constraint function of the main video streams, which is a worst-case estimate of the traffic a main video source can generate. Therefore, the obtained T is based on the most pessimistic forecast of the system. This approach is attractive since it ensures that no incidental data packet will be lost. However, it does not result in high utilization of available bandwidth. In the next chapter, we will discuss the stochastic service class, where T is found such that some data loss is possible, however, this data loss is limited. Our design of the stochastic service class is fundamentally different from the deterministic service class, and is based on a different type of traffic models. w w 55 Chapter 3 Stochastic Service Class Chance favors only the prepared mind. -Louis Pasteur, A New Kind of Country, 1978. Overview This Cha,pter presents a scheme for implementing the stochastic service class based on the 'effective bandwidth' theory. The effective bandwidth characteristics are exploited. We also show how the effective bandwidth is used to design an admission control scheme for the stochastic service class of the proposed ITV application. Using the methods presented in this chapter, one can find T for an incidental stream with given rate R and data loss probability p. w 3.1 Introduction In this chapter, we develop a method for implementing the 'Stochastic Service Class.' As discussed in Chapter 1, when an incidental stream is added to a TV program using the stochastic service class, the transmitter does not guarantee to send all the incidental data units on time (i.e., before their transmission deadline). The data units which are not transmitted on time are considered lost data. Therefore, some incidental data loss is probable in the stochastic service class. However, the rate and waiting time of an incidental stream should be selected so that the data loss probability is less than a threshold. Therefore, an incidental stream that is to be transmitted using the stochastic service class should be first accepted by an admission control mechanism. This admission control mechanism verifies that the transmitter can send this incidental stream 56 with rate R and maximum waiting time t and with a data loss probability less than a given threshold, say p%. Our ultimate goal in this chapter is to present a scheme, by which the admission control finds the maximum waiting-time T , given the rate R and the loss probability p% for an incidental stream. A key issue in implementing the stochastic service class is to find a suitable traffic descriptor, which encapsulates the stochastic properties of the main streams traffic. Then an accurate admission control using the selected traffic descriptor should be designed. Our approach here is based on the theory of 'effective bandwidth'. The main motivation behind this theory is to provide a measure of bandwidth usage by a traffic source in a communication network, which can adequately represent the statistical characteristics of the source. In this theory, each traffic source is described with a traffic descriptor called 'effective bandwidth curve'. This theory then provides mechanisms to find a level of statistical service guarantee for usual network operations, such as multiplexing, buffering, etc. We use this theory to design the admission control mechanisms of our application problem. The rest of this chapter is organized as follows. In Section 3.2, the effective bandwidth is defined and its characteristics are described. In Section 3.3, we show how the effective bandwidth is used to bound the data loss in general network operations. Based on this theory, we design an admission control mechanism for the stochastic service class in Section 3.4. In Section 3.5, we discuss the current approaches to the numerical estimation of the effective bandwidth curve. w w 3.2 Effective Bandwidth The theory of effective bandwidth was first introduced in the early 1990's by Gibbens and Hunt [37], Kelly [38], and Guerin et al [39]. Since then, this theory has attracted much attention from both the mathematics and engineering communities, and emerged as a powerful but complicated mathematical theory. Currently a great effort is in progress to expand the effective bandwidth theory and its applications. The associated mathematical theory of the effective bandwidth concept is built upon the theory of Large Deviation Principle, LDP, which studies the tail properties of probability distributions. The effective bandwidth of a source is closely related to the moment generating function of the arrival process of 57 the source. The moment generating function of a random variable contains more information about the stochastic characteristics of a process than its mean. Hence, traffic characterization methods based on the effective bandwidth function are more accurate than the widely used traffic characterization methods based on 'Poisson processes', which rely on the average rate. A useful interpretation of the effective bandwidth concept is that the effective bandwidth theory gives the probability of a traffic source generating traffic at a rate higher than its average for a long period of time. More precisely, let us denote the instantaneous rate of a traffic source by y(t) and the average rate of the source by ji. Then we expect Y? =i V^) to be close to at for large t's. Effective bandwidth theory bounds the probability that X)t=i 2 / ( 0 ~ > where a > a. Hence, under some mild conditions, the effective bandwidth theory gives the probability that a variable rate source generates traffic that is equal to a constant rate source with rate a during a long period. The probability that the source follows the effective bandwidth model is incorporated into the model through a parameter named 'scale factor , 9. Therefore, the effective bandwidth is a function of the scale factor, usually denoted as a(6). The effective bandwidth value lies always between the average and the peak rate of the source. Higher levels of certainty result in a larger 9 and an effective bandwidth that is closer to the peak rate, e.g., a certainty value equal to one corresponds to 9 = oo and a(oo) is the peak rate of the source. The effective bandwidth concept can be viewed as a compromise between two alternative bandwidth allocation schemes, a pessimistic outlook and an optimistic one. In the pessimistic case, one uses a strict approach to bandwidth allocation, where the bandwidth allocation is based on the sources peak rate. This approach seeks to eliminate data loss. In the optimistic case, one uses a lenient approach to bandwidth allocation, where the bandwidth is allocated based on the source's average rate. This approach seeks to gain high bandwidth utilization. The effective bandwidth a(9) gives a spectrum between these two approaches, where the scale factor 0 < 9 < oo determines how lenient or strict this approach is. In next Section, we first briefly review the large deviation principle concept. Then, we present a precise definition of the effective bandwidth in Section 3.2.2. T ia 1 3.2.1 Large Deviation Principle As mentioned, the theory of effective bandwidth relies on the 'Large Deviation Principle', LDP. Large deviation principle is a theory that studies the tail properties of probability distributions. This theory refers to a collection of techniques used for estimating properties of rare events, such as the frequency of their occurrence, or the most likely manner of their occurrence. Large deviations do not apply to any event that has a very low probability of occurrence. Roughly speaking, a large deviation event is caused by a large number of unlikely things occurring together, rather than a single event of small probabilities. For example, winning a lottery cannot be studied with large deviations, since it is a single event composed of a single trial and cannot be broken into more than one sub-event. However, the probability that the average grade of a class in an easy exam becomes very low can be considered a rare event, since it can be decomposed to the improbable sub-events that each individual student gets a very low mark. In the proposed ITV application, a large burst of traffic is generated by a source when the source starts sending traffic at a rate higher than its average, and continues to do so for a long period of time. Therefore, the occurrence of a large burst of traffic can be broken into many low-probability sub-events. One can consider LDP as a tool to turn the probability problems into deterministic optimization problems. Loosely speaking, to calculate the probability of a rare event, one assigns a cost to each sample path that would cause an event to occur. In the example of having a very low average grade for a big class in an easy exam, a path is that all the students get a low mark, and an alternate path is that many students get zero and only a few get very good marks. Then one finds the cheapest (or the most probable) path in that set of sample paths. The probability of event is then estimated by: F(event) = e n x c o n s t (3.1) where n is an asymptotic parameter, usually the length of time over which we observe the process. Therefore, one can think about the rare events in terms of sample paths and costs, and to find the probability of a rare event, one can simply consider the cheapest way that the event can happen. 59 General Definition O f L D P . Let y(2),... be a sequence of a random processes. Let Y(t) = £ = i y( )We say that Y(1),Y(2),... satisfies the large deviation principle with the rate function / if (see [40]) r T 1. For every closed set C C M. we have lim sup - log F(Y(n) € C) < - inf /(a) ii—too Tl (3.2) a£C 2. For every open set G C R, we have lim inf - logP(y(n) £ ( ? ) > - inf /(a) (3.3) It is shown in the LDP theory that if the random variable Y has a finite moment generating function E(e ) for all 6, where E denotes expected value, then Y satisfies the large deviation principle with the rate function I = A* where: eY A(0) = lim - l o g E ( e eyw ) (3.4) t-*oo t A*(a) =sup[0a- A(0)] e (3.5) A((9) is called the 'Gartner-Ellis' limit. A*(a) is called the 'Legendre Transform' of A(9). Also, it is shown that A* (a) has the following characteristics: 1. A*(a) is non-negative, i.e., A*(a) > 0 2. A*(a) is strictly increasing in a. 3: A*(a) attains its minimum at a — E(y), i.e., at the average of y. 4. A*(E(y)) = 0 5. A*(a) is convex, i.e., A*(aa + (1 - a)a ) < aA*(a ) + (1 - a)A*(a ) x 2 l 2 6. If A*(a) is the Legendre transform of A((9), then A(r9) is the Legendre transform of A*(a) as well, i.e., (A*)*(0) = A(f9). For this reason, the two functions A* and A are called Legendre transform pairs. 60 3.2.2 Effective B a n d w i d t h Definition Let y(t) denote the instantaneous rate of a traffic source. The arrival process is defined as Y(t) = Yl =oy( )> which is the total traffic generated in [0,i]. Then the effective bandwidth for this source is defined as (see [41]) T T a{6) = lim -j- l o g E ^ W ) , t—»oo V# e K (3.6) + Ot Parameter 9 is called scale factor. By definition, f (0) — E(e ^) is the moment generating function of the random process Y(t). Therefore, the effective bandwidth of a traffic source is closely related to the moment generating function of the arrival process of the source. Figure 3.1 shows a typical effective bandwidth curve for a traffic source with bounded peak rate. It has been shown that [41] 9Y y 1. a (9) is a non-decreasing function of 9. 2. a(0) lies between the average and peak rate of the source, i.e., K(y) < d{9) < y, where y denotes the peak rate of the source y. 3. If we have ./V sources, each with arrival process Y^t), and if Y(t) is the multiplexed stream of these sources, i.e., Y(t) = ^ ^ ( i ) , then: a(0) =J>(0) (3.7) i 4. It is shown that the shape of a{9) around 9 = 0 primarily depends on the mean, the variance and the higher moments of Y[t), while.the shape of a(9) for large 0's is primarily influenced by the tail distribution of Y(t) around it's maximum. This can be justified using the Taylor expansion of a{9) at 9 = 0 and 1/6? = 0. <*{0) = m + V20 + 0(9 ) 2 - ^ + 0(1) (3.8) (3.9) It is shown that ^ = E(y), % = |var(i/), % = y where y is the peak rate of y and ry is the average length of the periods that y is equal to its maximum, that is y is the average length of burst periods [41]. 4 61 A(8) <X(8) 0 0.2 0.4 0.6 0.8 0.6 1 0.8 c a 1 Figure 3.1: A sample effective bandwidth curve a(6), and A(8), A* (9) and inverse of the effective bandwidth function 1(c) = a~ (6). The average rate of the source is .448. l 3.2.3 Physical Interpretation of Effective Bandwidth One useful physical interpretation of a(6) is that of using the effective bandwidth concept to approximate the behavior of a VBR source with a constant rate. Suppose y(t) has a finite mean value E(y) = fi. We know from the law of large numbers that with probability 1, Y(t)/t —> fi as t —> oo. Thus the probability that Y(t)/t is away from u goes to 0 as t increases. In the theory of large deviation principal, it is shown that this convergence to 0 occurs exponentially fast, that is, for a> fi, lim - logP(y(t) > ta) = -A*(a) (3.10) t—*oo t where A*(a) is known as the rate function. Roughly speaking, equation 3.10 states that ¥(Y(t)/t^a)^e- ' (3.11) tA {a) 62 Therefore, the value of A* (a) indicates how difficult it is for Y(t)/t to be close to the constant rate a. A*(6) is related to a(9) with A*(a) = swp [9a — 9a(9)]. That is A*(a) is the Legendre transform of 9a(9). g 3.3 Quantifying Data Loss in Communication Networks Using Effective Bandwidth In this Section, we describe how the effective bandwidth concept is used for quantifying the probability of loss in communication networks. Let us first consider a simple buffering model, where a single buffer of size B is filled by data from a variable rate source with rate y(t), and is emptied at constant rate c. (see Figure 3.2). Let W(t) denote the buffer workload at t. c is selected to be greater than the average input rate, but less than the maximum input rate. Therefore, we expect some data workload to build up in the buffer occasionally, and also that the buffer becomes empty regularly. It is shown that the probability of the buffer overflow, i.e., the probability that starting with an empty buffer the workload exceeds the buffer size before the buffer becomes empty again, is bounded as F(W(t)>B)<p (3.12) where p = e~ *. 9* is selected such that a(9*) = c and t is large . Equation 3.12 also gives a bound on the buffering delay, defined as d(t) = W(t)/c, F(d(t) > D) < p = e~ ' (3.13) B0 1 Be where D — B/c is the maximum acceptable buffering delay. Validity of equations 3.12 and 3.13 have been proven [41-43]. An intuitional proof of these equations can be given based on equation 3.11 as follows. Note that a buffer overflow in this simple buffering model happens if the input has the rate a > c for at least B/(a — c) time units. According to equation 3.11, the probability that the input behaves in that way is approximately exp{ —A*(a)\ a—c (3.14) Note that the buffer workload random process, W(t), converges to a marginal distribution as t becomes large enough, usually denoted with W^. Therefore, ¥(W(t) > B) actually denotes the tail properties of the distribution. x 63 y(t) Figure 3.2: A simple buffer of size B, filled at variable rate y and emptied at t constant rate c. where the parameter a can be any value larger than c. Let 9* be selected such that a(0*) = c. T h e n the probability that buffer occupancy reaches B is approximately given by: £ ^-dW' (B)} - <3 15) Now we approximate this sum of exponentials w i t h the exponential w i t h the largest exponent, that is, by p^exp{- . A*(a')} B a* — a(9*) (3.16) where a* is such that A*(«*) 1 = J a* - a(9*) • A*(o) nun — - A - A - a>a(s*) a - a(9*) , „, (3.17) 0 1 However, from the definition of A * i n equation 3.5, it can be noted that A* (a) > 9a — 9a(9) for a l l 9. Hence we have V ' P Therefore, one can say that a(9) min — ^ ^ a>a(6*) a - ^e~ B e ' . = 9* and a(9*) (3.18) is the rate at which the buffer must be emptied so that the buffer overflow probability decays exponentially w i t h rate 3.3.1 D a t a Loss P r o b a b i l i t y i n a Q u e u i n g M o d e l w i t h Priority W e consider a simple system w i t h two prioritized inputs as shown i n Figure 3.3. T h e inputs are buffered at two different buffers of size BH and BL, and are 64 H i g h priorily buffer VH(t) y (t) + y (t) H L I L o w priority buffer ih.it). vdt) B it) L (a) (b) The equivalent hypothetical buffer for A Multiplexer with prioritized inputs determining the low-priority buffer overflow probability Figure 3.3: A simple multiplexer model with prioritized inputs. then multiplexed to one output with constant rate c. The high-priority input has primitive priority over the low-priority input, which means high-priority packets will never be impeded by low-priority packets. Let and yi{t) be the rate of the high-priority and low-priority inputs at t respectively. The cumulative arrival functions for these inputs are denoted by Yfj and Yjr, where we have t Y (t) = H J2yH(r), T= Vt>0 0 t Y (t) = ^2y (T), L L Vt>0 (3.19) T=0 Also, let y and y denote the maximum rate of the input sources. The output rate c is such that y < c <y + y . Therefore, the high-priority buffer will never overflow, while it is possible that the low-priority buffer overflows. Also, c is greater than the average of y# + yi, which means that the low-priority buffer becomes empty regularly. Let p denote the probability that the lowpriority buffer overflows. Let Wi{t) be the low-priority buffer workload at t. The low-priority buffer overflows if W (t) > BL- During [0,t] the high-priority source sends Yn(t) bits to the system and the system will transmit ext — Yait) bits from H L H H L L 65 the low-priority buffer data. Therefore, we have W (t) L y [t) - ( c x t - Y (t)) L (Y (t) + H (3.20) Y (t))-cxt L H and thus F(W (t) L > B) L F((Y (t) + Y (t))-cxt> L B) H L (3.21) The right hand side of this equation can be interpreted as the probability of buffer overflow in a hypothetical single buffer of size Bi, filled at rate llL{t) + Un{t) and emptied at rate c. Therefore, in order to study the data loss probability of the low-priority source, we can replace the prioritized model with another model, which has a single hypothetical buffer of size Bi, filled at rate yi(t) + yu{t)\ and emptied at rate c, as shown in Figure 3.3-b. Then, as discussed in Section 3.3, if 9* is selected such that a ,(9*) + ai(9*) — c, we will have p = e~ " or B = — \og(p)/9*. If the low-priority input source has a constant input rate R (i.e., yi(t) = R for all £), then we can find a bound on the time that the low-priority data units wait in the buffer. Let say that W data units of the low-priority input are waiting in the buffer, as shown in Figure 3.4. Let di(t) denote the time that data unit #1 has being waiting in the buffer. Then, since the buffer fill up rate is constant, we have di{t) = Wi{t)/R. Therefore, F{di{t) > d,) = P ( l I i ( 0 > Rxd). Note that if the low-priority source is a variable rate source rather than a constant rate source, then this method will not be applicable. When the low-priority source has constant bitrate, we have H Be L / / ¥(W (t) L > B ) = n(Y (t) + Y (t) L L H -cxt)>B ) L = F{(R x t + Y {t) - c x t) > H = F(Y {t)-{c-R)xt> H B ) L B) L (3.22) (3.23) (3.24) which means that F(W (t) > BL) is equal to probability of loss in a buffer of size Bi, filled with the high-priority source at rate and emptied at constant rate c — R. Now we extend the simple model in Figure 3.3-a to a system with ./V high-priority inputs and one low-priority input, as shown in Figure 3.5-a. In this model, ./V high-priority sources are multiplexed with one low-priority L 66 W (t) L data unit data unit *W (t) #2 L Figure 3.4: A simple buffer filled at constant rate R and emptied at variable rate c. If the buffer workload is W , then we know that the first data unit has waited W /R seconds in the buffer. L L source. The i high-priority input has the instant rate of yn,ii maximum rate of ym, and effective bandwidth of a (9). The maximum rate of multiplexed stream is c. c is selected such that Y^iLiVn.i < < VL,I + Y^=iVH,ii therefore the high-priority buffers never overflows, but it is possible for the low-priority buffer to overflow. It is easily noted that the probability of the low-priority buffer overflow is given by F(W (t) > B ) = F((Y (t) + £ = i Y {t)) -cxt> BL), where is the arrival process of the i high-priority input. As stated in property 4 in Section 3.2.2, the effective bandwidth of a multiplexed source is simply the sum of the effective bandwidths of all the sources. Therefore, the probability of the low-priority buffer overflow is bounded by e~ "', where 6* is selected such that a + YliLi i{Q*) ~ - ^ low-priority source has a constant rate, then ai(0) = R for all 9; and 9* is selected such that th t c L 2 L L Hiri th m a c t n e L zZlx<0*) = 3.4 c-R Admission Control for the Stochastic Service Class In this Section, we describe how the effective bandwidth, and the queuing model presented in Section 3.3.1, are adapted to design an admission control test for our interactive TV system. Suppose N television programs are sharing one transmission line with constant capacity c. We assume that each main video stream is characterized by a known effective bandwidth curve, say ai(0) for the i main video. Now suppose an incidental stream with constant rate R, maximum waiting-time d, and probability of violating the maximum waiting time constraint of p, is requested to be added to the system using the stochastic service class. The multiplexer system can be modelled with the buffering system shown in Figure 3.5. As discussed in Section 3.3.1, the th 67 J/H,l(*> VHMQ BUt) vUt) (a) A Multiplexer with N high-priority and one low-priority inputs (b) The equivalent hypothetical buffer for determining the low-priority buffer overflow probability Figure 3.5: A simple multiplexer model with N high-priority and one lowpriority inputs. following equations hold p= e- ' B = Rxd N Be R (3.25) Hence, given two out of three parameters R, d and p, one can use the equation 3.25 to determine the third un-known parameter. If the triple (R,d,p) conforms to the quality requirements of the connection request, then the connection request is accepted. 3.5 Numerical Estimation of Effective Bandwidth In this section, we describe the current approaches to estimating the effective bandwidth of a source. Before doing so, it is noted from the definition of effective bandwidth (equation 3.6) that a(6) closely depends on the 'moment 68 generating function' of the arrival process of the source. Therefore, to estimate the effective bandwidth, one should estimate the moment generating function or all the generating momentums of the arrival process of the source, which is a very complicated task. Also, note that in many applications such as in the proposed ITV application, the whole effective bandwidth curve, i.e., a{9) for all 9, should be estimated. This actually makes the estimation process even harder. Current approaches to numerical estimation of effective bandwidth are as follows. 2 3 1. D i r e c t A p p r o a c h Recall that the effective bandwidth a is defined by a{9) = lim -J-logIE(e ) 9y(t) t^oo 9t Thus, by monitoring the traffic one can use m a (9) kxm = 8~ k- \og(m~ ^ l 1 l ^ ( Y ^ - Y ^ - I ) ^ ( 3 2 ? ) i=i as an estimator for a [43-46]. This approach is attractive since it circumvents modelling the traffic source. However, this approach takes a very long time to converge to an accurate result. According to [43], an accurate estimator of a{9) requires that both k and m be large. So, the monitoring time A; x m may in fact be very lengthy. In [45], a technique called re-sampling or boot strap is proposed for this problem. In these approaches some synthetic data are generated, which are similar to the original data in 'some sense' [47-49]. These synthetic data are used along the original data, as a data set with a larger sample size, to estimate a(9) using equation 3.27. However, this approach introduces two important and problematic issues. First, we should determine 2 Moment generating function of a random variable X is defined as rp(9) = E{e X) (3.26) e The n' momentum of random variable X is defined as E ( X ) . All the momentums of X can be successively obtained by differentiating ip(9) w.r.t 6 and then evaluation at 8 = 0, that is ^"(0) = E(X ) , n > 1. 3 h n n 69 how the synthetic data are generated. This depends on what kind of similarity concept is suitable for our data and application. In the original bootstrap method, which was developed for i.i.d. random variables, the empirical distribution of the original samples F is estimated. Then new samples are drawn from this distribution F. However, this method is not suitable for dependant data, fn a method proposed for dependant data, called 'moving block bootstrap', the data is partitioned to blocks of size b, and the same algorithm is then performed on these blocks. However, the disadvantage of this method is that it destroys all the dependency of lags larger than b, hence its is suitable only for short term dependant data. Furthermore, this approach introduces another complication, which is selection of an appropriate block size b. The second issue in using the bootstrap method is to determine how many synthetic data are enough to find an accurate estimate of a(6). This question is usually addressed by either using a large sample size, which is thought to be sufficiently large in advance, or by continuing to generate synthetic data sets until the final estimate of a(9) does not change by adding a new set of synthetic data. For example, in each iteration we generate a new set of synthetic data and estimate the a(9). If this a(9) is close to the a{9) estimated in the previous iteration by less than a threshold, then this estimate of a{9) is accepted. Otherwise, more synthetic data are generated. In general, this approach is not appropriate for the proposed ITV application, since the sample sizes from the traffic of a typical TV program are not large enough to result in an accurate estimate of a{9). Besides, using the bootstrap method is not appropriate, since the current approaches to generate synthetic data will destroy many important characteristic features in our original data. In this approach, a virtual buffer of size b is considered, which is filled with the source and is emptied at a constant rate c, such that c is between the average rate and peak rate of the source (see [42,44,50,51]). Let a (c) be the inverse of the a(9) function (i.e., a (c) = 9 a{9) = c). As mentioned in Section 3.3, the probability of buffer overflow can be estimated as exp(—ba~ (c)). Therefore, the buffer workload is monitored in simulation and the empirical distribution of V i r t u a l Buffer A p p r o a c h _1 _1 l 70 data loss, n(b), is obtained. Then, a~ (c) is chosen so that the distance between ix(b) and p(b) = exp(—ba~ (c)) is minimized [44]. The distance measure employed is usually the "Kullback-Leibler distance" measure [43,52]. The estimate for a (c) is given by: l l 4 _1 a-\c) = log(l + Z ^ n { b ) B -) Alternately, if a sufficient size of data is available, one can plot the logarithm of the loss probability versus b. As shown in [44], this plot has a straight part with slope a~ (c). Hence the slope of this part of this plot can be used as an estimate for a~ (c). l l The disadvantage of this method is that the buffer occupancy should be simulated for a long time to obtain an accurate estimate for a. This means that a large sample size is required to simulate the buffer occupancy. Furthermore, this procedure should be repeated for each c to obtain the effective bandwidth curve, i.e., a(9) for all 9. On the other hand, this approach is attractive in that it does not assume any signal model for the traffic, which circumvents modelling the traffic. Model F i t t i n g Approach This is a two step approach. First, an appropriate parametric model for the source is selected, and its parameters are estimated. Second, the effective bandwidth is numerically computed or obtained from the model parameters. This approach has been carried out successfully in many different applications. An important advantage of this approach is that one can estimate the model parameters in realtime, and use the current estimate of model parameters to update the effective bandwidth estimate online. Model fitting itself consists of several steps, including: 1) Model selection: a mathematical parameterized signal model should be selected for the traffic. 2) Model order selection: depending on the selected model, it is usually necessary to select the model size. There are some systematic Kullback-Leibler distance or the relative entropy of two discrete distribution p and q is defined by d = £>log (^) (3.28) 4 2 <?fc 71 approaches to this problem, like Akaike information criterion. Selecting a larger model size usually increases the complexity of the model. 3) Estimating the model parameters. For off-line traffic sources, traffic is known a priori and off-line parameter estimation methods can be used. This is usually easier than the online case, where the traffic itself is not know a priori. In online parameter estimation methods, the parameter estimate is usually updated after observing each traffic sample, such that our estimate converges to the actual value of parameters gradually. The online methods are usually able to track the changes in the model parameters. We employ the traffic modelling approach in the proposed ITV application to find the numerical value of the effective bandwidth curve of a traffic source. This modelling approach comprises three important issues. First, we should select a suitable traffic model, which can capture the important characteristics of TV video traffic. Second, we should find model parameter identification (i.e., model fitting) methods, which can estimate the model parameters from the actual traffic. Finally, we should design a method for obtaining the effective bandwidth from the model parameters. We address these three issues in the next chapter. 3.6 Conclusion In this chapter, we presented a scheme for the admission control of stochastic service call. Our approach is based on the effective bandwidth theory. We defined the effective bandwidth, and exploited the important characteristics of this concept. Then, we showed how the effective bandwidth concept is used to design an admission control scheme for the stochastic service class. Using the methods presented in this chapter, one can find the maximum-waiting time for an incidental stream whose bitrate is R and data loss probability is p%. We also discussed current approaches for estimating the numerical value of the effective bandwidth curve from traffic samples of a source. We' selected a modelling approach, where a traffic source is modelled using a stochastic model, and effective bandwidth is then obtained from the model parameters. However, we did not indicate which stochastic model shall be used for the video sources in the proposed ITV application. In the next chapter, we address this 72 important problem. That is, we select a stochastic model for modelling the traffic generated by full-screen video sequences of TV programs. We justify our model selection by some evidence from traffic samples of video sequences of actual TV programs. We present methods for estimating model parameters from the traffic samples. We also show how the effective bandwidth curve is obtained from the parameters of the selected stochastic model. Using the methods presented in the next chapter, one can estimate the numerical value of the effective bandwidth curve for a main video source. Hence, the methods presented in this chapter, in conjunction with the methods presented in next chapter, complete the big picture of the admission control mechanism for the stochastic service class. 73 Chapter 4 Video Traffic Modelling All Models are wrong, but some of them are useful. Modelling should never be an end to itself, but an aid to understand our complex world. -George Box Overview Stochastic modelling of main video streams traffic is considered, and General hidden Markovian models are discussed for these streams. Specifically, Hidden Semi-Markov Models, HSMMs, are selected for modelling the main video streams traffic. A new formulation for HSMMs is presented, which significantly improves the computational efficiency of HSMM parameter identification algorithms. Based on our new formulation of HSMMs, efficient algorithms for off-line and online identification of an HSMM parameters are presented. Then, it is shown how effective bandwidth curve is obtained from HSMM parameters. Using the methods presented in this chapter, one can estimate the numerical value of the effective bandwidth curve of a video source. The estimated effective bandwidth curve is then utilized in the admission control of incidental streams in the stochastic service class, as discussed in Chapter 3. 4.1 Introduction In this chapter we seek a stochastic model to characterize the traffic of the main video streams in a TV network. This model should accurately capture the important stochastic characteristics of the traffic. Our motivation is to 74 use this traffic model to find the numerical value of the effective bandwidth curve of the source. Hence, our objectives in this chapter are (1) to select an appropriate stochastic model for video traffic sources, (2) to present a method for estimating the model parameters from traffic (parameter identification), and (3) to obtain the numerical value of the effective bandwidth from the model parameters. As mentioned earlier, this traffic model is then used by the admission control unit to determine how much bandwidth will be available to the incidental streams that use the stochastic service class. Ideally, a good traffic model for our application should be (a) accurate enough to characterize the important statistical properties of the traffic (b) computationally efficient, and (c) could be used for obtaining the effective bandwidth curve. Note that video traffic demonstrates different characteristics at different time scales. Many of the current modelling efforts strive to capture the rate variability of traffic at the frame level. In those approaches, periodic pattern in frame sizes plays an important role in the traffic shape. This periodic pattern is caused by the frame coding pattern of DCT based video compression standards (e.g., IBBPBB frame coding pattern in each GOP of the MPEG2 standard). For the proposed ITV application, the traffic model should be able to capture the video rate variability at the GOP level (or scene level) rather than at the frame level. This is because the waiting-times in buffers in the proposed ITV application are much larger than the video frame rate, and hence, rate variabilities due to frame patterns are ruled out by buffering. The rest of this chapter is organized as follows. In Section 4.2, we review current approaches to video traffic modelling, fn Section 4.3, we present our modelling approach, which is based on 'Hidden Markov Models', HMMs. In so doing, we discuss the concept of 'state-duration modelling' using the notion of Semi-Markov signal models. We show that Hidden Semi-Markov Models, HSMMs, are a better model choice for the proposed ITV application rather than those based on the previously used HMMs. In Section 4.4, we present the theoretical background of HSMMs. As will be discussed, the challenging issue involved in employing HSMMs is that of 'parameter identification'. We address this problem in sections 4.5-4.7. For that effect, we first present a novel signal model for HSMMs in Section 4.5. Based on our new model, we present novel methods for parameter identification of HSMMs for the off-line and online cases in sections 4.6 and 4.7. In Section 4.8, we show how the numerical value of the effective bandwidth curve is obtained from the parameters of an HSMM model. 75 Finally, we present experimental results of applying the methods developed in this chapter to empirical traffic samples of typical TV programs in Section 4.9. 4.2 Existing Stochastic Video Traffic Models Current modelling approaches for video traffic modelling (also known as 'source modelling' in literature) can be divided intofivemain classes: Renewal models are mathematically simple and have a long history. These models describe the packets generation at certain points in time. Let A be the time between generation of n and ('ft + 1)"' packets, then A in a renewal model is identically distributed, but its distribution function is allowed to be general. Poisson and Bernoulli processes are the most popular cases of renewal models in continuous and discrete-time cases respectively. In a Poisson process. A, is described by an exponential distribution F(A < r) = 1 - e" , where A is the average number of packets generated per time unit. R e n e w a l Traffic M o d e l s th n n Ar L n These models have the severe drawback that the autocorrelation function of A for any non zero lag is equal to zero. Therefore, these models do not capture any dependency among the A 's in the past nor in the future. n n Autoregressive (AR) models estimate the next signal value in a stochastic process as a function of previous signal values. The most important class of these models is the linear autoregressive models, with the form Autoregressive processes N (4.1) where y is the random variable, a.;'s are real constants, and e is an i.i.d. random variable with zero mean. More complicated autoregressive models such as MA (Moving Average), ARM A (Auto Regressive Moving Average) and ARIMA (Auto Regressive Integrated Moving Average) have also being considered for modelling video traffic. The drawback of these models is that they cannot successfully model the marginal distribution of the video traffic [53]. t t 76 TES The Transform-Expand-Sample (TES) modelling approach strives to simultaneously model both the marginal distribution and the autocorrelation function of an empirical sample set. This means that TES models can capture the first and second order statistical characteristics of a stochastic time-series at the same time. TES models consist of two classes, called TES and T E S . where the plus or minus superscript distinguishes between the cases where the model gives rise to processes with positive or negative lag-1 autocorrelations respectively. TES models consist of two stochastic processes called background and foreground sequences [53-55]. Background sequences have the form: models + (£/+_, + 1 4 ) , tc = U+, {1 U~, - n>0 - [ n even n odd 1 1 ' ' where U is uniformly distributed on [0,1), {V }^t is a sequence of i.i.d. random variables, called innovation sequence, and the operator () denotes fractional part (also known as 'modulo-1 operator'). It is shown that the marginal distributions of £/+ and U~ are both uniformly distributed over [0,1), regardless of the distribution of V . The foreground sequence is of the form 0 n l n X + n = £>(£/+) X~ = D(UZ) (4.4) where D is a transformation from [0,1) to the real numbers 5ft, called distortion. Given an empirical data sample set, TES models select the innovation sequence V and the distortion function D, so that both the autocorrelation function and the density function of the foreground process match the autocorrelation and density functions of the empirical data. n The most important family of distortion functions consists of a compound distortion function of the from [54,55] D(x) = Hy\S (x)), i 77 XG[0,1) (4.5) where the inner transform, S$(x), is a 'smoothing' transform called the stitching transform, and is parameterized by 0 < £ < 1. and is given by Q ( \ * ( y ) = 0< y< £ J yl^ \(i-y)/(i-fl, * < v < i ( 4 6 ) The outer transformation, Hy , is the inverse of the empirical distribution function (i.e., histogram) computed from empirical samples {Y }^ . 1 n =1 The rationale for TES modelling approach is based on the following facts. First, all TES background processes are stationary, Markovian, and their marginal distributions is uniform in [0,1), regardless of distributions of innovation process V [54]. Second, the inversion method from elementary statistics, allows any uniform variate U on [0,1) to be transformed to a desired marginal distribution F by applying F~ (U). In particular, TES models use F = Hy. Third, for 0 < £ < 1 the stitching transform S^(U ) preserves the uniformity of U ; that is, S^(U^) is also uniformly distributed in [0,1). It is shown that the stitching transform has only a 'smoothing' effect. That is, a sequence {S^ (£/+)} is more 'continuous looking' than an underlying sequence {U+}. It is shown that this transform preserves uniformity, that is S^(U^) is also uniformly distributed in [0,1). Hence, it follows that any foreground TES process of the form n l n n X = Hy (St(U )) 1 n n (4.7) obtained from the background process {U } is always guaranteed to have the histogram distribution Hy, regardless of the selected innovation sequence V and the stitching parameter £. The choice of the density of the innovation sequence, fy, determines the autocorrelation function of the foreground process. Thus, TES modelling decouples the fitting of the empirical distribution from the fitting of the empirical autocorrelation function. Since the former is guaranteed, one can concentrate on the latter. In practice, model fitting is carried out by a heuristic search for pairs (£, fy). The search is considered successful if the corresponding TES sequence gives rise to an autocorrelation function that adequately approximates its empirical counterpart [56-58]. n n Though TES models have been recently used for modelling video traffic with acceptable accuracy, their implementation is too computationally 78 complicated for practical applications. In practice, the heuristic search for (£, fy) relies on a brute-force computation and on selecting the best n combinations of (£,/y) pairs, in the sense that the resulting TES model autocorrelation function minimizes the mean square error with respect to the empirical autocorrelation function. Then, another algorithm is employed to select among the n candidate models the one whose sample path bears the 'most resemblance' to the empirical samples. Furthermore, it is still not known how a TES model can be used for obtaining performance guarantees in a communication network. Specifically, there is no method available for obtaining the effective bandwidth of a traffic source from its TES model These models are based on self-similar process models. The self-similarity concept implies that samples of a process demonstrate similar statistical characteristics when considered at different time scales. The key characterizing parameter is the so-called Hurst parameter, H, which captures the degree of self-similarity in a given empirical signal. Recently, many researches have used self-similar process models for modelling video traffic [12,53,59-63]. Most of these researches deal only with the statistical analysis of data sets, including the estimation of the Hurst parameter. These studies provide only limited information about traffic characteristics. Furthermore, very little research has been done on the analysis of communication networks (or queues) with self-similar sources. Self-similar or F r a c t a l models Markov signal models have been successfully used for modelling video traffic [6,9,53,64]. One of the popular Markovian models is the two state Markov chain (also called ON/OFF model), where one state represents the peak rate and the other represents the minimum rate. Though this model is simple an'd is useful for modelling video traffic in some cases, it is not accurate enough for modelling fullscreen video sources. The other commonly used model is the Markov Modulated Process, including the Markov Modulated Poisson Process. M a r k o v i a n Signal Models Most of the previous research on video modelling study the statistical characteristics of 'video conference' sources and not 'broadcast video' sources. Broadcast video traffic , such as TV video, has different characteristics from 79 those found in video conference applications. Usually, video conference sequences are encoded at a very low bitrate and consist of head and shoulder pictures with little or no camera movements. Hence video conference sequences differ from full-screen television video in two ways: first they consist of very rare scene changes and object movements in the picture; and second the bitrate of video conference streams has little fluctuations when compared to bitrate of full-screen video. 4.2.1 S t o c h a s t i c M o d e l s for F u l l - S c r e e n B r o a d c a s t V i d e o The usual first step in modelling a real world's stochastic process is through analysis of the system generating the signal. The system that generates video traffic in digital TV applications is an MPEG-2 encoder. We are interested in modelling the amount of traffic that this encoder generates. MPEG-2 encoders use some important encoding parameters, such as number of slices, GOP pattern, GOP length, quantization scale and so on. These parameters affect the quality of the coded video, and are selected according to the application needs. Unfortunately, given the input video sequence and the coding parameters, there is no procedure to obtain the statistical parameters of traffic generated by the encoder. Therefore, it is not useful to incorporate the video coding parameters into the traffic model. However, analysis of MPEG-2 encoding techniques can provide valuable insights into the video traffic shape. One important characteristic of MPEG video traffic is the periodic frame pattern in each GOP. However, in the proposed ITV application, buffering delays for incidental streams (i.e., waiting-time in buffers) are much larger than video frame rates. This means that incidental buffer sizes are much larger than an average incidental video frame, and the periodic frame pattern in each GOP is filtered out during the buffering process. Thus, rate variability caused by the periodic frame pattern does not play any role in the proposed ITV application. Since the buffering delay for an incidental stream in the proposed ITV application is usually more than a couple of seconds, we consider the video source rate at the GOP level rather than the frame level. Note that in most video streams in digital TV application, each GOP contains 15 frames, which results in a GOP length of .5 seconds. Therefore, we seek a traffic model which can capture the correlation between consecutive GOP sizes in a video stream. It is well known that each GOP size depends on the visual content of 80 its corresponding scene. Two features of a video scene that affect its corresponding GOP size are 1) visual details in each frame, and 2) video activities, such as object movements, background movements (e.g., camera panning and zooming), scene changes, etc. In general, simple and non-active scenes result in a small GOP size, while complex scenes result in larger GOP sizes. This inspires us to select a traffic model which can capture the trend in video traffic caused by the underlying scene changes in video, while capturing the rate fluctuations caused by other parameters. 'Hidden Markov Models' (HMMs) are known to match this intuitional requirement. These models consist of a hidden layer and an observable layer. The hidden layer is a Markov chain process, which determines the state of the signal. The state of the signal is not observable, and is considered 'hidden'. One can only observe the output of the observation layer. State of the signal at time t determines the spectral characteristics of the observable layer of the model. That is, statistical characteristics of the observed signal at time t depend on the hidden state of the signal at time t. Other research studies have confirmed that HMMs are relatively successful in capturing the video traffic characteristics of video conference and broadcast video sources [65-70]. HMMs have also been widely used in many other engineering applications such as speech processing, signal estimation, and queuing networks [71,72]. The underlying mathematical theory for these models is well established, and efficient algorithms are available for their implementation. However, HMMs have a limitation. Before we exploit this limitation, we will present a few mathematical preliminaries on the Markov models in the next section. We will then discuss the limitation of HMMs, and use a new model, which can alleviate this limitation. 4.3 4.3.1 Mathematical Background of General Markovian Models Markov Chain The simplest form of Markovian signal models is a 'Markov chain'. Consider the discrete-time stochastic process {s }, t = 1,2,..., which takes its values t 81 from the set {1, 2 , N } . The space {1, 2,.... A/}, is called the state space. If s — i, then the process is said to be in state i at time t. s is a Markov chain if whenever the process is in state i, then the probability that it enters state j in next time-unit is constant. This property is the essence of Markovian signal models and is called Markovian Property, and simply states that the probability of transition from a state to another state does not depend on the previous states of the process, that is t t F{s t = j|s _! = i,5 _ t t = k,...)= 2 F{s = j\s ^ t t = i) (4.8) Let aij represent the probability of going from state i to state j, that is a =F(s ij = j\s . =i) t (4.9) t 1 Note that for all i and j, a^ is constant and time-invariant. The matrix A — [aij] is called the state transition probabilities matrix or state transition matrix. Note that since a^'s are probabilities and since the process s should make a transition to one state at each time instance, we have t N a >0, ZJ l<i,j<N; £ ^ = 1, i = l,...,N (4.10) i=i Similarly, the n-step transition matrix A is defined, where A™- represents the probability of being in state j after n state transitions starting in state i. n 4.3.2 Hidden Markov Models Hidden Markov Models, HMMs (also called Markov Modulated Processes, MMPs) are a powerful and widely used class of Markovian signal models. These models are 'doubly stochastic', and consist of a hidden layer and an observable layer. The hidden layer is a Markov chain process s , which follows equations 4.8-4.8, and determines the state of the signal at t. The state of the signal is not observable, and is considered 'hidden . We observe the observation process y . The spectral characteristics of the observed signal at time t is determined by the state of signal at t. A common model choice for modelling the observable layer is a parametric probability density function (pdf), where the pdf parameters are determined by the current state of the signal. This is denoted by t 7 t ¥(y \s t t = i) = bi(y ) t 82 (4.11) where bi(x) is a parameterized density function. For example, if bi(x) is a Normal distribution with mean /^ and variance of, then the probability of observing y given that the state process is in state i at t, is t 1 2a? ^ F(y \s = e ) = ^==e t t z (4.12) which only depends on i. Note that rather than probability density functions, other stochastic models such as Poisson process or TES models, have being considered for modelling the observation layer of HMMs [57], 4.3.3 L i m i t a t i o n of H M M M o d e l s One disadvantage of HMMs lies in their limitation in modelling the 'state duration' densities. State duration is defined as the time that the state process of an HMM spends in each state before making a transition to another state. Note that in a Markov chain, the probability of remaining in a state is constant. That is, P(s i = i\s = i) = an, where a is constant. Therefore, the probability of staying exactly d time units in state i is given by t+ t ti p(d) = a t • (1 - a ) 1 u (4.13) Hence, state duration densities in HMMs are limited to the form given in equation 4.13, which is known as 'Geometric' discrete distribution function in literature. This implies that the probability of staying d time units in each state exponentially goes to zero as d increases. However, the Geometric distribution is not appropriate for the state duration modelling of many physical signals. In order to remove this limitation, a more sophisticated model, called 'Semi-Markov chain,' is used where state duration densities are modelled with some non-Geometric distribution. A hidden Markov model that uses a semi-Markov chain for modelling the hidden state process is called 'Hidden Semi-Markov Model', HSMM. Generally speaking, HSMMs are more powerful in modelling physical signals than HMMs. However, HSMMs are more complicated. 83 4.3.4 Selecting Between H M M and H S M M Modelling Approach for the Proposed I T V Application Is HSMM a better model choice for TV video traffic rather than HMM? The answer relies on whether Geometric distribution is a good model choice for state duration of video traffic processes or not. fn order to gain an insight to this matter, we conducted two different experiments using empirical bitrate of a few typical TV programs, as discussed below. The first experiment was based on an empirical approach to estimate the state sequence from the empirical bitrate traces of typical TV programs. In this experiment we first partitioned the video sequences to different scenes. The partitioning method used is similar to that presented in [73], where sudden jumps in the empirical bitrate of video is used to detect scene changes. A manual comparison of detected scene changes and video content showed that this method successfully captures the scene changes from the empirical bitrate record. Then, all scenes were partitioned into three clusters (low, medium and high bitrate) according to the average encoded GOP size of each scene. The clustering algorithm used is presented in [57]. The approach of this clustering algorithm is to partition the given data set, so that elements in each cluster are 'closer' to each other than to elements in all other clusters. We assume that each cluster represents a hidden state of the Markov chain model. According to this classification, we constructed a 'state transition sequence' for each video sequence (see Figure 4.1). Next, we extracted the empirical state-duration traces for each state from the state transition sequence of each video, and calculated the histograms of these state-duration traces. In order to determine if Geometric distribution is a good fit for these empirical state-duration traces, we used both visual histogram comparison and 'Q-Q plot technique' (see appendix B). Figure 4.2-a shows the histogram of one of the state duration for a 'News' video sequence, and 4.2-b shows the typical shape of a Geometric and a Gamma distribution. It is noted that empirical data histogram reveals a curve shape more similar to Gamma distribution than Geometric distribution. Figures 4.2-c and d show the Q-Q plots for Geometric and Gamma distributions. As discussed in appendix B, if the statistical properties of the empirical data match the selected parametric distribution, then the Q-Q plot is expected to be approximately linear with an intercept of 0 and slope 1. It is noted that, the Q-Q plot for Gamma distribution is closer to a line with slope 1 rather than 84 (a) Normalized bitrate of the 'News' sequence. 3.5 (b) Normalized bitrate of the 'Mission Impossible' sequence. 3.5 ] (c) State transition path for 'News' sequence. (d) State transition path for 'Mission Impossible' sequence. Figure 4.1: Normalized bitrate and 'State transition path' for two typical T V programs. the Q-Q plot for Geometric distribution. Hence, we conclude that a discretized Gamma distribution is a better choice for modelling the state durations of TV video traffic. The second experiment was motivated by the results of the first experiment. In this experiment, we used the 'likelihood ratio test' hypothesis testing method (see appendix C) to determine if an H M M model is a better choice for digital T V traffic sources rather than an H S M M model with Gamma stateduration densities. We let y = {yi,U2, • • • ,VT} denote the empirical bitrate trace of the video sequence under study. Then, we test the null hypothesis "HO: y is an H M M " against H1: y is an H S M M " . Details of adopting T U T T 85 Sequence -21n(A) Mission Impossible II News Soap Opera 21.29 38.93 43.17 95% percentile point of x distribution with v = 3 degree of freedom 7.815 7.815 7.815 2 Test result Reject HO Reject HO Reject HO Table 4.1: Results of likelihood ratio test for different typical TV video sequences. The number of states in HSMM and, HMM models is N = 3. the general likelihood ratio test method for conducting this hypothesis test are given in appendix C. In this test, the likelihood ratio A is computed as the likelihood of y being generated by an HMM model to the likelihood of y being generated by an HSMM model. It is shown that —2/n(A) has a x distribution (see appendix C). Hence, —2/n(A) is compared to the 100(1 — a) percentile point of a x distribution, where a is the significance level of the test. Table 4.1 summarizes the result of this test for the empirical bitrate of a couple of TV programs. The significance level of the test is a = 5%. As shown, for all the sources, —2/n(A) is greater than the f00(l — a) percentile point, which means that the null hypothesis (i.e., 'JV is an HMM') should be rejected. Hence, we choose hidden semi-Markov models for modelling the TV video traffic in the proposed ITV application, where the state durations are modelled with Gamma distribution. In the next section, we present the mathematical formulation of HSMMs, and address the important issues raised in employing HSMMs. In Section 4.9, we present the result of fitting an HSMM model to empirical bitrate traces of typical TV programs. T 2 T 2 4.4 General Background on Semi-Markovian Signal Models 4.4.1 Semi-Markov Chains The discrete-time stochastic process {s }, t = 1,2,..., which takes its values from the state space {1,2, ...,N} is a Semi-Markov chain if the next state of signal depends only on the current state and the amount of time that signal t 86 Figure 4.2: a) Histogram of state (or cluster) duration for one of the states in the News Sequence, b) Geometrical probability mass function, c) Discretized Gamma probability distribution function, d) Q-Q plot for Geometrical distribution, e) Q-Q plot for Gamma distribution. 87 has spend in the current state. Formally, this condition is stated as P(s = i|s _i = s _ = • • • = s ^ t t t 2 t d = i, st-d-i = k, • • •) = F(s = j\s -i t t = i, d) (4.14) where i 7^ k. A — [a^-] is known as the state transition probabilities matrix, where is the probably of going to state j from state i, knowing that the signal is leaving state i. dij = P(s = i | s - i =i,s ^ t t t (4.15) i) Note ay's are all constant, and are constrained to N Oij>0, l<i,j<AT; £ ^ - = 1, i=i i = l,...,N (4.16) All ajj's are zero. In a Semi-Markov chain, the state duration densities are modelled in some non-Geometrical form, unlike that of equation 4.13. This non-Geometrical from is usually denoted by a state duration probability mass function <Pi(d), where <Pi(d) is the probability that the signal stays exactly d time units in state i during its visits to this state. (fi(d) may be selected to be one of the known parameterized probability mass functions, such as Poisson, Binomial and so on. Alternately, <Pi(d) may be selected as a discretized probability distribution function, such as Normal or Gamma pdf. The signal generation process of a Semi-Markov chain s can be summarized as follows. t 1. Start with a given initial state, e.g., s = i for t = 1. t 2. Select a duration d according to the state-duration density function of . the i th state, <pi(d). 3. For the next d time units, stay at the same state i. 4. Select the next state according to a constant state transition matrix A = [a,j], with the constraint that the signal should leave the current state (i.e., an = 0). 5. Go back to step 2. 88 4.4.2 Hidden Semi-Markov Models A Hidden Semi-Markov Model, HSMM, (also known as Semi-Markov Modulated Process, SMMP) is similar to an HMM except, that the hidden state process is a semi-Markov chain. Generally speaking, HSMMs are a generalization of HMMs and are more powerful in modelling physical signals. However, HSMMs are more complicated than HMMs. The most important problem that arises in using HSMMs is the identification of the model parameters. This identification problem is studied either in off-line or online cases, fn off-line case, given a set of observations from an HSMM signal, y = {y\,y2, • • • ,VT}, should find the parameters of the HSMM model, denoted by 6. In online case, one observes a signal sample y at time t, and should update the current estimate of the model parameters 6 such that 6 gradually converges to the actual model parameters. In the rest of this Chapter, we address the identification of HSMMs in both off-line and online cases, as they are both necessary in implementation of the proposed interactive TV system. Off-line identification methods are necessary for estimating the model parameters of pre-recorded video streams. Online identification methods are employed for online model parameter estimation from live video sources. Ultimately, the estimated model parameters are used to find the effective bandwidth curve of sources. o n e T t t t 4.4.3 General Background on Identification of H S M M s fdentification of HSMMs is conceptually similar to identification of HMMs; and current approaches to the identification of HSMMs constitute a generalization of the identification methods for HMMs. There are powerful methods available for identification of HMMs. Off-line identification of HMMs is based on an iterative algorithm, known as Baum-Welch or Expectation Maximization, EM algorithm. This algorithm finds the maximum likelihood estimate of model parameters 9. That is, 6 is estimated so that P(3M#) ^ maximized. It is shown that the maximum likelihood estimate converges to the true parameter value as the sample size T increases. The EM algorithm consists of two steps in each iteration; the E or the Expectation step and the M or the Maximization step. The algorithm starts 1 s The E M method method refers to a general class of approaches. The Baum-Welch algorithm is a variant of the E M algorithm for estimating the H M M parameters. 1 89 with an initial estimate of model parameters 6. Note that a part of an HMM model is hidden from us, and that is the state transition path of the underlying Markov chain, denoted by S = {si, s , • • • , s }. In the E step the "optimal" state transition path ST is estimated from samples Vr and.current parameter estimates 9. This is done by estimating a set of so called forward and backward variables, a 's and (3 's. The forward parameters, a 's, are computed by induction from a _ i . Similarly, the backward parameters, /3 's, are computed from Pt+i- In the M step, the model parameters are re-estimated by finding the MLE estimate of 6 from the computed forward-backward parameters and yT- The E and M steps are iterated until 6 converges. The EM algorithm has been extended to the context of HSMMs by using 'explicit state duration modelling' [72,74-76] or 'parametric state duration modelling' [77]. Thefirstapproach estimates the density of the state durations for all possible values explicitly, while the second approach uses a parametricdistribution to model the state duration, and only estimates the model parameters. However, current methods for identification of HSMMs, which are based on these two approaches, have the major drawback of greatly increased computational load compared to the HMM case. This increase in the computational load lies in the estimation formulas for forward-backward variables in the E step, where <x (f3 ) should be computed from {cx -i, c* _ , • • • ,a _o} (and from { / 3 , (3 , • • • ,(3 }), instead of a -\ (A+i) HMMs, where D is the maximum allowable state duration. More precisely, in EM algorithm for identification of HMMs, the forward variables a (j) are computed by induction from at_i's using T t 2 T t t t t t t t t 2 t m t+2 t+1 t+D t t l<t<T l<j<N (4.17) In this equation a (j) is obtained by adding up the probability that s has traversed one of the state transition paths shown in Figure 4.3-a. While in identification of HSMMs, a (j) is computed as t t t (4.18) That is, the probability of being in state j at t is computed by summing the probability of the paths shown for d = 1, 2, 3, • • • , D in Figure 4.3-b. Similar equations are used for computing f3 's. t 90 Figure 4.3: State transition paths that should be considered for computing a {j) in the HMM and HSMM cases. t It is shown that current approaches to identification of HSMMs using the EM algorithm increase the memory usage by the factor D times and the computation load by D /2 times, when compared to the EM algorithm for HMMs. Since D is usually large in many applications(e.g., D — 25 in most speech processing applications), these algorithms become impractical. In [78], an HSMM with N states and maximum state duration D is reformulated as an HMM with N D states, then the standard EM algorithm is used to estimate the model parameters. There are other approaches, which are based on the 'state duration dependant transition probabilities' [79]. In these approaches, the state transition matrix is time-varying or is replaced with a tensor. The drawback of the methods presented in [78,79] approaches 2 91 is addition of a large number of parameters into the model. These extra parameters should be estimated in addition to the usual HMM parameters, and this requires a large sample size in order to obtain an accurate estimate. Online identification of HSMMs is conceptually harder than the offline case. Currently, there are no available methods for on-line identification of HSMMs in the literature. However, the on-line identification of HMMs have been studied in [71,80-83]. These approaches are based on either the 'recursive maximum likelihood', (RML), or the 'recursive prediction error', (RPE), techniques. In the RML approach, the current estimate of model parameters vector, 9 , is updated at each time instance in such a direction that the 'Kullback-Leibler' (KL) information measure is maximized, ft is shown that this results in the maximum likelihood estimate of the model parameters as t —> oo. The RPE method is a class of general numerical parameter estimation method. RPE algorithms are, in essence, 'Extended Kalman filters' (EKF). More precisely, RPE methods are a special class of Extended Kalman filters for the case when the unknown constant parameters of the model are viewed, and estimated, as states, fn this approach a norm V{9) that measures the prediction error of the model is defined. The model parameters are updated according to (4.19) e = o + \-f t t+1 t where is a search direction based on some information about V(9) acquired at previous iterations, and A is a positive step size. There are different algorithms based on equation 4.60 which use different choices for the search direction / . The most important class of these approaches is the quasi-Newton methods, where the direction function /' is /« = -\V"{9 )\ • V\9 ) 1 t t (4.20) The mathematical theory for general RPE technique is presented in [84]. In [71,82], a formulation of HMMs is presented, which allows the general RPE scheme to be applied to on-line identification of the HMMs. The norm function used in this method is a weighted quadratic prediction error criterion: T (4.21) 92 where e(k, 6) = y(k, 6) — y(k) is the prediction error. -f is a normalizing factor and 8{t, k) is a weighting sequence, introduced to increase the effect of recent observations on the estimate. In [83], an algorithm based on the R P E technique is presented, which finds the M L E of the model parameters, and hence, can be considered as an RML algorithm. In summary, the existing off-line parameter identification methods for HSMM's are very computationally demanding, and none online parameter identification method have-been suggested. In the next section of this chapter, we will present new methods for identification of HSMMs for both off-line and online cases. Our methods are based on a new formulation of HSMMs. In our new signal model, we introduce a new variable to the traditional HSMMs, named the 'state duration' variable, d . d is actually a vector, where d {i) denotes the time that the signal has spent in state i, given that the state at time t is i. We then use state duration dependent transition probabilities, where the probability of transition from state i to j is not constant and depends on d (i). Hence, in our model, the probability of being in any state at time t depends on the state at t — 1 as well as d -\. We model the state duration densities with parameterized probability density functions. We then present a novel version of the EM algorithm for off-line identification of HSMMs based on our model. Our algorithm finds the local maximum likelihood estimate of the model parameters. The major advantage of our method over current ones is that it has almost the same computational complexity as the EM algorithm for the HMMs, and hence is useful for a larger set of practical applications. We also present a sophisticated method for online identification of HSMMs, which is based on our proposed signal model. Our approach adaptively updates parameter estimates, so that the log-likelihood of model parameters is maximized. We use an estimate of the Newton-Raphson direction as the update direction of our model parameters. This choice results in a near optimum convergence rate in our algorithm. In this regard, our method is similar to the online identification algorithm for HMMs in [82]. t t t t t 93 t 4.5 New Formulation of H S M M s 4.5.1 H i d d e n (or M o d u l a t i n g ) L a y e r M o d e l l i n g We consider a signal model where the state of the signal at time t, s , t € N, is determined by afinite-statediscrete-time semi-Markov chain. We assume the initial state s\ is given or its distribution is known. The state space has N distinct states. Without loss of generality, we assume s takes its values from the set {ei, e , • • • , e^}, where e is a N x 1 vector with unity as the i element and zeros elsewhere. If s = e*, we say the signal is in state i at time t. The semi-Markov property of the model implies that the probability of a transition from state ej to e.;, j ^ i, at time t depends on the duration spent in state e.j prior to time t. This can be written as t t th 2 t t F(s = ei\s = t+1 t s -i = e , • • • ,8i = e ) = F(s t k t = ei\s = t+1 t e ,d (j)) y t (4.22) where d (j) is the duration spent in the j state prior to time t, that is th t dt{J) = i \ t-i d s = ej, • • • , s _ t d + 1 = ej, s _ t d (4.23) ^ ej} For each time t, we define the 'state duration' vector d of size N x 1 as t f d,(j) d t = \, [1 t (4-24) 3 f t d (j) is easily constructed from d -i(j) t if s — e,. \ if s ^ ej t as d (j) = s {j) x d -\(j) + 1, which t t t can be written in vector format as dt = s © dt_i + 1 (4.25) t where 0 denotes element-by-element product. We model the state duration densities with a parametric probability mass function, pmf, 4>i(d). This means the probability that s stays exactly for d time units in state i is given by (f>i(d). 4>i(d) should be selected so that it adequately captures the properties of the signal under study. Hence, selection of <f>i(d) should be justified by some evidence from samples of the signal. Though HMM state durations are inherently discrete, it is noted in many studies that continuous parametric density functions are also suitable for modelling t 94 state durations in many applications, including speech processing [76,77]. In this approach state durations are modelled with the best fitting parametric probability density function, pdf, and then the discrete counterpart of this density function is taken as the best pmf. That is, if 4>i{x) is the continuous pdf for state duration of i state, then the probability that the signal stays f in state i for exactly d time units is given by / <pi(x)dx. Since negative th d Jd-l state durations are not physically meaningful, it is usually more appropriate to select 4>i(x) from the family of exponential distributions [76]. Specifically, the family of Gamma distributions are considered in [77] for speech processing applications. In this paper, we assume that <f>i(x) is a Gamma distribution function with shape parameter and scale parameter rji, that is <fc(d) = -^—d^- e'^ (0 < d < co) (4.26) l d where V is the gamma function. The mean and variance of <f>i are Vi/rn and Vi/rtf respectively (see [85]). Note that our signal model presented here is applicable with minor changes, if a pdf other than Gamma is selected. Furthermore, let <f>i(x) denote the cumulative probability distribution function of <j>i(x), i.e., <&i(d) = F(s stays in state i for at most d time units) (4.27) t = / (4.28) <j>i{x)dx We construct our model for HSMMs using state duration dependant transition probabilities. We define the state transition matrix Ad , as .Ad = [aij(d )} where a '(d ) = F(s = ej\s = ei,d (i)). Clearly, a;j(d )'s are not constant and change in time; however, we will denote aij(d ) with a^- for notational simplicity. The diagonal elements of A , a^s, are the probability of staying in state i knowing that s has been in state i for d (i) time units. t i:j t t t+1 t t t t t dt t an = F(s t+1 = ei\s = e t h t d (i)) t = F(s +i = ei\s = e s _, = e ... S ( _ ( t _ F(s t+1 t u = e s = ej,..., u t u t t = s -d {i)+2 t t F(s - ei, s -i = ei,..., s _ t dt f d t ( i ) + 1 = e s _ i; i)+1 t ^ e^) d((l) — e s -d {i) e = ei\s -d {i)+i — e s -d (i) e e-i\St-d {i)+i t t t i} u t t t t i) t) (4.29) 95 Meanwhile, the denominator of equation 4.29 can be written as HkLl ^( t+k = e . . . , S -d (i)+2 i, t+k-l S e s t i; = i\ t-d (i) + \ = ei,S -d (i) 7^ e t s t t t i) 2~lT=i ^( t stays in state i for exactly d (i) — 1 + k time units), which is 1 — <&i{d {i) — 1). Therefore, equation 4.29 can be written as e o r s t t °« i -^ ( 0 ( d t ( t ) )= - i) (4 - 30) The probability that the state process stays in the i state during its visit to this state for exactly d time units is given by (1 — au(d)) • Yl'kZi u(k). By substituting a„ from 4.30, it is easily shown that the probability density function of the state space durations is actually equal to the selected model th a <f>i(d). For i 7^ j, a,ij is the probability of leaving state i and entering state j, and is given by a j = ¥{s i t ^ ei\s = e d (i)) x P(s i = eAs = e ,i ^ j) t+ t u t t+ t t = (1 - a. ) • a°- (4.31) a where a°- = P(st+i = ej\s = ei,i ^ j) is defined as the probability of transition from state i to state j, knowing that the signal leaves state i. We write the matrix A in terms of a diagonal matrix P{d ) representing the recurrent state transition probabilities, and a constant matrix A° representing the non-recurrent state transition probabilities. t dt t A =P(d ) dt + {I-P{d ))A° t (4.32) t 0 PM) := { 1 - Hdt(i)) 1 - * (d (i) - 1) 4 t 0 ' J ,i = j , \?{s i = j\s = i) t+ (4.33) = \ ,\ (4-34) t Note that a° are constrained to Yl!j=\ 1j Since P(d ) is a diagonal matrix, one can show that Ejli ij{dt) 1 f° h ta = t = a 96 ra Hence, the hidden state process s evolves in time based on the following equations: t s i =A t+ dl • s +v i t (4.35) l+ A dt = P(d ) + (I - P(d )) • A° d t+1 =s t t+1 t Qdt + 1 where v \ is a martingale increment; that is, E ( t ; | s i , S 2 , • • • , s ) = 0. R E M A R K . Our modelling scenario can be encapsulated as a time homogeneous first-order Markov model. Consider the 2-vector process S as S — (s , d ). s takes its values from the finite set {ei\i = 1, 2, • • • , N}, and d takes its values from the infinite set {f?\i = 1, 2, • • • ,N;d = 1, 2, • • • }, where ff is a N x 1 vector with d as the i element and unity elsewhere. According to equation 4.25, d +\ depends only on s +\, s and d , and according to equation 4.22, s +i depends only on s and d . Therefore, P(5 |S ) is independent of t, and hence S is a homogeneousfirst-orderinfinite states Markov chain. t+ t t + 1 t t t t t t th t t t t , t t t t+1 t t 4.5.2 Observation Layer M o d e l l i n g The state process s is hidden and is not observed. We observe the observation process y , where the probabilistic distribution of y is determined by state at time t, i.e., s . In this paper, we assume that for each state i, y has a normal distribution. That is, if s — et then F(y \s = e^) = N(yt, fJ-i,o~ ), where Pi and of are the mean and standard deviation of the observation process y for state i. We denote the probability of observing y in state i with bi(y ) throughout this paper, that is t t t t L 2 t t t t t k(ijt) = F{yt\st = t (4.36) Therefore, y may be written as t y = (At, s ) + (Va , s )w 2 t t t t (4.37) where fj, — / i , • • • , P-N]', °" [°~h h ''' > Af] > (•> •) denotes inner product and w is Gaussian white noise with zero mean and variance 1 . Equations 4.35 and 4.37 define the signal generation process in our signal model. Figure 4.4 summarizes and compares the signal generation process in our model with that of other HSMM signal models. 2= a 2 t 97 a 1. Start with an initial state, e.g., s — ei for t = 1. t 2. Initialize d = [l 1 t ... l]' 3. Generate y based on the density function of the current state. t 4. Compute A on Ad - dl (equation 4.32), and choose a new state st+i based t 5. Update the state duration variable, dt, using equation 4.25. 6. Go to step 3. a) Our H S M M model. 1. Start with an initial state, e.g., St — ei for t = 1. 2. Select a duration d according to the i function. th state duration density 3. For the next d time units, stay at state st, and generate yt based on the densities of state S j . 4. Select the next state according to a constant state transition matrix A, with the constraint that the signal should leave the current state. 5. Go to step 2. b) Traditional HSMM model. Figure 4.4: Comparison of the signal generation process in our model with that of existing signal models for HSMMs. 4.5.3 M o d e l Parameterizations There are iV + 3N parameters that define an HSMM signal using our model. These parameters are N — non-recurrent transition probabilities a°-, the mean and variance of the observation process Hi and of for 1 < i < N, and the Vi and % parameters of the gamma distribution of the state-durations for 2 2 98 1 < i < N. We define 6 as a vector containing all the model parameters & — (Mil M2, • • ' , °2i " ' P N , ^1,^2, • • • ,VN,VI,V2, 4.6 ' Ja N > „0 12> 13> a a ' ' ' ) i a ' (4-38) N-l,Ni u ••• ,VN) A New Algorithm for Off-line Identification of H S M M s In this section, we present a new algorithm for off-line identification of HSMMs. Our algorithm is based on the signal generation model presented in Section 4.5, and requires less computational effort when compared to presently existing methods. Given a set of observations from an HSMM signal, = {yi, 2, • • • > yr}, we like to find 6, the parameters of the HSMM model. The algorithm we use is a variant of the EM algorithm. Wefirstinitialize 6 to an initial guess. Similar to the EM algorithm for identification of HMMs, in the 'E' step of our algorithm we define a set of probabilistic measures which describe the evolution of the hidden state variable s . We define 'forward variables' a (i) as the conditional probability of observing the partial sequence yi,y , • • • ,yt and being in state i at time t, given the model parameters 0. That is, v t t 2 t i t t t t 1 2 t where d {l) d (2) ••• d (N) Let d (4.39) = e ,y y ...y \0) a (i)=F(8 t (4.40) d (i) = E(d {i)\8 = i,0,y ,y ,...,yt) t t 1 t 2 is our estimate of the state-duration variable for state i at time t. d is initialized to [l 1 • • • ' l] for t = 1. We reconstruct d ^i(i) iteratively as t t d {i) = l+E(s (i)\y ...y ,e)-d (i), t+1 t m t l<i<N t = l + -pQ--d (i), l<i<N t (4.41) i=l The state transition matrix Ad is updated for each t as t A dt = P(d ) + (I - P(d ))A° t t 99 (4.42) where P is given in equation 4.33. The forward variable a (i) for t = 1 is initialized to the given initial state, that is, = for 1 < i < N. The other forward variables are constructed iteratively as t N l < t < T, l< j < N (4.43) Similarly, the backward variables 6 {i) are defined as the probability of observing the partial sequence j / , 7 / , . . . ,yr, given the model parameters 6 and that the state at time t is t t + 1 t + 2 = W(y y ... t+1 t+2 y \s = e 6) T t u B 's are computed by initializing B {i) = 1, for 1 < i < N and constructing the other variables iteratively using t T JV B {i) = £ P t + i ( j ) • a • b {y ) t i3 3 l+l 1 < t < T, 1 < j < N (4.44) We define j (i) as the probability of being in state i at time t, given the observation sequence V and the model parameters 0. t T it{i) = nst = i\y ,o) (4.45) T j (i) is expressed in terms of cc's and /?'s as t (4.46) N i=l Also, we define £t(i,j) as the conditional probability of being in state i at time t, and state j at time t + 1 6(*,j)=P(St =2,^+1=^,0) (4.47) j) is given in terms of a's and B's as a (i)ayt)j(yt )A+i(j) t +1 N iV ^ ®t(i)a<ijbj{yt+i)Bt+i{i) i=l j = l 100 (4.48) The variables 7 and £ are useful in interpreting the number of transitions between states. That is, Y^t=i 7t W ^ *he expected number of transitions from state i to any other state in f YlJ=i €t(.hj) the expected number of transitions from state i to state j in J-VIn step M of the algorithm, the model parameters 6 are updated to the maximum likelihood estimate of the model parameters computed from the forward-backward variables in step E . There are different approaches to obtaining the updating equations, all of which result in the same equations. These approaches are: 1) One can find 0 such that the 'Baum's auxiliary function,' Q, is maximized. Baum's auxiliary function is defined as the expectation of \og¥(yT, S\8)). It can be shown that maximizing the Baum's auxiliary function is analogous to maximizing the likelihood P ( V | 0 ) . Hence, one can solve dQ/dO — 0 to obtain the update equations for the model parameters. 2) The problem can be set-up as a constrained maximization problem and the Lagrange-multiplier approach can be used to maximize the auxiliary function Q. 3) One can use the filtration variables, a 's and f3 's, to count the expected number of transitions and use the concept of counting the event occurrence to obtain the update equations. We use the latter approach here, though the result is analogous if other approaches were used. Based on the definitions, Ylt=i 7t W * ^ expected number of transitions from state i in 3^T and Ylt=i Ct{i,j) the expected number of transitions from state i to state j in V . Then, the transition probabilities are estimated as s an< IS r t s t n e IS T expected number of transitions from state i to state j in V expected number of transitions from state i in V 0 T T 5>(*,J) _ t=i ELY (4.49) iS) T-1 ^2 <xt(i)aijbj(y +i)6t+i{j) _ t=i t ~ T-1 t=i 101 (4.50) Similarly, the mean and variance of the observation process are estimated as T-l ^lt(i)yt = T=i ( ' 4 51) t=i T-l J2it(i)(y - Hi) 2 t ^ = (- ) 4 52 Let /j, and a be the mean and variance of the state-duration for state i respectively. fj,i , is estimated as 2 iiS s s T-l = (- ) 4 £>(s t + i ^ , ^ = ^13^,0) We have P(S i 7^ i, S = Z ^ T , #) = 1t(i) - £t(i, and /3's is given by t+ 53 Hence, t in terms of a's T-l ^a (i) (0 {i) ~ aiibi(y )Pt+i{i)) d (i) t _ t t+1 t 4 = 1 (4.54) T-l Y^ S) t=i a Wt{i) - aiibi(yt i)Pt+i{i)) + Similarly, the variance of state-duration distribution is given by T-l £a 0Q (8 (i) - a 6 (3/ i)A+i(*)) t <t = * = 1 r t _ ii i t+ _ ^M) 2 ' t ^ a i ( i ) (A(«) - aabi(yt+i)Pt+i{i)) t=i 102 ( - ) 4 5 5 • E Step: 1. Forward Filtering: Compute ctt(i) and dt(i) for 1 < i < N and 1 < t < T — 1 using equations 4.41 and 4.43. 2. Backward Filtering: Compute (3 (i) for 1 < i < JV and t 1 < £ < T using equation 4.44. 3. Find the State Probabilities: Compute £,t(i,j) and ji.(i) using equations 4.48 and 4.46. • M Step: 1. Update the model parameters, 8: Use equations 4.49, 4.51, 4.52, 4.54 and 4.55 to update the model parameters. Figure 4.5: ' E ' and ' M ' steps of our algorithm for off-line identification of HSMMs. The parameters of the state duration distributions, Vi and m, are given in terms of and of as s (4.56) Figure 4.5 summarizes our algorithm. The algorithm stops when the 9 converges to a constant vector. The forward-backward algorithm has the computational complexity of 0(N T) per pass and requires a memory of 3NT because all the forward-backward variables and the estimate's of the state duration variables need to be stored. 2 4.6.1 Implementation Issues Since the EM algorithm finds the local maximum of the likelihood function, it is important to start the algorithm with suitable initial values. Though there is no straightforward method for selecting proper initial values, there are a few techniques in the literature which can assist in selecting the initial values. One of these methods uses segmentation of the observations into states, and averaging the observations between C h o i c e of I n i t i a l E s t i m a t e s 103 the segmented states. Segmentation can be performed manually, using maximum likelihood segmentation or the ''segmental k-means' method. For more information on these techniques, please refer to [86]. Scaling From equation 4.43 it becomes clear that as t increases, a 's become small very fast, and can quickly fall out of the numerical range of any computer. The same argument applies to /? 's, when t decreases. To avoid this, we suggest using a scaling scheme similar to the scheme used in [72] and [86] for the HMM case, where a 's are scaled to sum up to 1 for all t. More precisely, let a denote the unsealed variable, a (i) denote the scaled variable and d- the local version of a's before scaling. Let c be the scaling factor at time i , where c — 1/ 2~2iLi &t(i)- Both a's and /?'s are scaled using c , that is, a (i) — c a (i) and J3i{i) = c (3 (i). It can be easily shown that t t t t t t t t t t t t t & (i) = C a (i) t t (4-57) A+iW = A + i A + i W t where C = n f = i » ^ A + i = PI^Lt+i s- Using these equations, it can be shown that the scaling factors are cancelled out in all of the final update equations, except equations 4.54 and 4.55. In order to cancel the scaling effect on equations 4.54 and 4.55, these equations should be rewritten as c an( c t r-i (A(O i c M M = H=i ^"tW t=\ _1 aiibi{yt+i)Pt+i{i)) d {i) - t ( - ) 4 58 {Pt{i)ct - a bi(yt+i)Pt+i(i)) l u T-l {Pi{i)ct - aiibi(y i)Pt+i{i)) (M*) ~ M M ) £"*(*) 4, = l z L — r l 2 t+ ~ (4-59) 1 {Pt{i)ct - aabiiyt+JPt+^i)) l t=i This assures us that the scaling will have no effect on parameter estimates. 4.7 Online Identification of H S M M s In this section, we present a new algorithm for online identification oi HSMMs. Our algorithm is based on the signal generation model presented in Section 104 t 4.5. Our approach is to set up the problem of online identification of HSMMs such that the general recursive prediction error (RPE) method can be applied to the problem. First, we will present a general background on general RPE technique in Section 4.7.1. Then, we will present how we adopt this technique for online identification of HSMMs in Section 4.7.2. 4.7.1 General R P E technique RPE is a class of general numerical parameter estimation, where a norm V(6) is defined that measures the prediction error and the estimate of the model parameters is updated according to e = e + x- /w t+l (4.60) t where is a search direction based on some information about V{0) acquired at previous iterations, and A is a positive step size. An important class of these methods is the Newton type algorithms, where the correction in 4.60 is chosen in the Newton Direction /W = -[K"(0 )]- -V (0 ) 1 :, t t (4.61) There are approaches that use values of function V as well as of its gradients. The most important subclass of these approaches is the quasi-Newton methods, which somehow form an estimate of V" and then use equation 4.61. Now consider a weighted quadratic prediction error criterion t V{d)^ Y P{t,k)e\k,6) lt (4.62) J k=i where e(k, 0) = y(k) — y{k) is the prediction error. B(t, k) is a weighting sequence with the following property B{t,k) = \{t)B{t-l,k) ,0<k<t-l 3(t,t) = 1 (4.63) (4.64) 7 is a normalizing factor. Then it is shown that the general search equation is given by B = 9t-i - MRt]- • V (4.65) 1 t 105 where V is the gradient of V, and R is a matrix that modifies the search direction. It is shown that this updating formula can be written as t 8 = + t • [Rt]- • ib(t, 0 _,) • e(£, 9 ^) (4.66) 1 7 t t t where ijj is the gradient matrix of y with respect to 8 The simplest choice for R is R — I, which results in the gradient or the steepest-descent method. This method becomes fairly inefficient in the region close to the minimum of the norm function. Choosing R = V"(9 ), makes the equation 4.66 a Newton method, which typically perform much better close to the minimum. In this case, V" is given by t t N N t=i t=i t It may however be quite costly to compute all the terms of ip'. Then an approximation of V" is used, where the second sum in equation 4.67 is ignored 1 N (4.68) t=i which is by definition the Hessian of y. If we choose R = V" and use the approximation in equation 4.68, then it is shown that R is given as t t t Rt = -rt^2l3(t,k)-it>(k)-Tj;'(k) (4.69) fc=i It is also shown that R can be constructed recursively as t Rt = Rt-i + 7tbJ>(W(t) - R(t - 1)] - (1 - ) • R(t - 1) + ^' 7 t lt (4.70) The RPE algorithm is summarized with the recursive scheme presented in Figure 4.6. 4.7.2 Online Identification of H S M M s Using the R P E Method Let 0 denote our estimate of the model parameters at t. We define the 'objective function' ^ ( 0 ) = logP(yi, 2/2, • • •, Ut\dt), as the log-likelihood of the t t 106 1. e(t) = y(t) - y(t) 2. 6 = 6 - + t t -[R ]- -W)-e{t) l 1 lt t 3. Rt = Rt.! + {m^'(t) ~ R(t ~ 1)] lt Figure 4.6: Parameter update equations performed in each step of a general RPE algorithm. observations up to time t given 9 . t J2 can be rewritten as £t(9t) t tt{o ) = t yr-i-,e ) i o g P ( t / | t / i , y , • • •, T t 2 T-l t = EM«t) ( - ) 4 7 1 T= l where = logP(y |yi, y , • • •, y -i, Qt) is the log-likelihood increment. Our approach here is to update 6 in a direction that maximizes the objective function £ {Qt)- We use the recursive prediction error (RPE) method, where the parameters are updated in the Newton-Raphson direction (see [84] for an extensive discussion of this method). This selection greatly increases the speed of the algorithm. Starting with an initial guess for 9 at t = 1, 9 is updated at each time instance t using T 2 T t t t Ot+i - 9 + t t • iV+i • A+i (4-72) where R = d £ (9)/d9 is an estimate of the 'Hessian Matrix', ipt = du (9)/d0 determines the search direction and is the gradient of u with respect to 6 . X is a step size. R and ip are called 'sensitivity parameters' and are estimated recursively in our algorithm. Figure 4.7 illustrates the basic block diagram of our scheme. The details of how each block performs its function are described in sections 4.7.4 to 4.7.7. Below, we summarize the basic function of each block. 2 t 2 t t t t t t t 1. Estimation of hidden layer variables: Given the observation y and model t parameters estimate 9 , this block computes the forward variable a and state duration estimate d . These variables are constructed iteratively from a -i and d -\. t t t t t 107 da 90 t 9d 90 t ' 99 H Sensitivity Equations yt Update Hessian Matrix Rt+1 yt Oct At U 1 Estimation of Hidden Layer Variables yt Update Model Parameter t+ et+ r r Figure 4.7: Recursive Prediction Error scheme for estimating HSMM parameters. 2. Updating the gradient vector V't+i- Using the obtained ct and d , this block computes the gradient vector ipt+i- For this computation. da _i/d9 and dd /dO should be estimated as well. Hence, this block constructs da _i/d9 and dd /d9 recursively and uses these parameters to update t t t t t t 3. Updating the parameters estimate 9: This block updates the model parameters estimate, 9 , using equation 4.72. t 4. Updating R : This block recursively updates the estimate of the Hessian matrix, R , from and ib . t t t We present the details of each part of our algorithm in the following sub-sections. For simplicity, we use 9 instead of 9 in our notations. t 4.7.3 M o d e l Parameterizations Similar to the off-line case, our signal model has N +3N parameters. However, as shown in [71], it is more convenient to parameterize the model as 2 9 = (AI, a , c , c , • • • , cyv_ 2 12 13 1)iVl ^, v)' (- ) 4 73 where c are simply defined as c - = (a") ^ . This parameterization selection simplifies the development of the update equations. As in the off-line case, pt 1 tj i3 108 2 and cr are the vectors of the mean and variance of the observations process, and v and r\ are the vectors of the parameters of the gamma probability density functions of the state durations. 2 4.7.4 Estimation of the Hidden Layer Variables As with the off-line case, we define the forward variables as a (i) = F(yiy • • - yt, s i\0). Let a = [at(l) ot {2) ••• a (N)] . Then the forwardfilteringrecursion equation is given by a = BA' a (4.74) t t t 2 t t t + 1 d t where B is a diagonal matrix, bn = bi(y +i) (see equation 4.36), and A' is the transpose of the state transition matrix (equation 4.35). The conditional estimate of the state at time t is given by t dt 7t = E(s \y y , • • • ,y 0) t u 2 u = ct (cx l)- (4.75) 1 t u Given the observations up to time t, the next state and next observation of the signal are estimated as T£-{8 \y ,y ,...,y 0) t+1 1 2 = A 'y (4.76) , u dt t iit+i = (M,E(s( i|yi,2/ , • • -,yt,0)) + 2 = (^A' a -{ ,l)- ) (4.77) 1 dl t at The estimate of the state duration variable is updated similarly to the off-line case (equation 4.41), as d i = E(s \y ,y ,..., y , 9) © d + 1 t+ t l 2 t t = a (a l)- 0d + l (4.78) 1 t t l The log-likelihood increment, u t+1 u t+1 t (equation 4.71), is given by = logF(y \ ,y ,...,y ;9) t+l yi 2 (4.79) t N = log£P(7/ t + i|s t + i = e ,y ,y ,...,y ;0) i 1 2 t x P ( % = e ^ , y ,..., y \ 0) 2 t i=i (4.80) (4.81) = \og(l,BA' ) dtlt 109 4.7.5 Gradient Vector Update Equations We denote the derivative operator with respect to variable x by D , that is, D (.) = d(.)/dx . Thus, the gradient vector ip is written as X x t _ du i, t+ du i du t+ du t+l dpi ' do (D^UI+I, = du t+1 t+1 ' dcij ' dfj, ' da 2 iiS D 2U i, G du t+1 2 D t+ u , t+l C s s u J \e=e t t+1 , D A 2 u i) (4.82) t+ We write the elements of t/jt+i terms of the derivative of the filtration parameters (i.e.,a,, and d ) with respect to the model parameters. Our estimates of the derivatives of the a and d, are constructed recursively from their estimates at t 1. For example, the first element of f/'t+i, D u \. is written in terms of D^ott and D d . D a i and D d i are also constructed recursively from D ct and.D d . In deriving the update equations, it is assumed that the probabilities of non-recurrent transitions (i.e., c^'s), and the parameters of the state duration probability density functions (i.e., v and 77) do not depend on each other. Thus, to calculate i/j +\, we use the following equations for the derivatives of u i with respect to a , Cij, v and r\. m t t — IH H H t l t j FLJ t H t + t+ t+ t t 2 t+ 1-D„ D^ut+i = (l,BA' )- ((l,D^(B)A' ) 1 dtlt + (l,BD^(A' ) ) dtlt dt + (1, 7t BA' D^ ))) (4.83) dt t ^(aOa.Qt)" +a (l,Z? (a ))(4.84) D^at = D^(B)A' a _! + BD^A'Jat.! + B A ^ ^ a ^ O 1 D I H L L 1 = t dt w t t (4.85) £> B = /J4 DA tlt dt (^=5^) • B • diag( ) ej = D, (P)(I-A°) (4.86) (4.87) t 110 d.ag { ( 1 _^ _ O Z> * 1 ) ) 2 w (4.88) D d t = d _ i 0 D ^ 7 _ i + I> d _i 0 7 _ i w {l,BA' )- 1 dtlt t t ((l,D i{B)A! ) dtlt a w t (4.89) t + (l,BD ,{A! ) ) + (1, BA'^D^))) a dt lt (4.90) D 2j = D ,( ){l,cx )- 1 a t Da a? a at t = D^A'^oc^ t + cx (l,D (cx ))- (4.91) x t al t + BD^A'Ja^ + BA' D ,(cx ^) d a t (4.92) DB al DA a? ( - ) 2 2 • diag(e ) (4.93) 8 = D (P)(I-A°) dl d i a g = ( ^ ~f (4.94) a? (i - Hd { - i))* t D ^ d t = D ( 7 t - i ) © d -i t a ? 111 + 7t-i © 0 D^dt.x) ^ (4.95) (4.96) {(l,BD {A> ) ) (^BA'^t)- 1 D C m n U t + l Cmn (l,BA' D M)) + dt lt Cm d (4.97) D^iaMl,^)- + cx (l,D ( ))- 1 BD (A' )<*t-i D cx, Cmn r D Cmn at + BA! D {oL -i) dt dt (4.99) t Cmn -P-D (A°) A Cmn (4.98) 1 t { dt D a°3 Cmn (4.100) Cmn 0 if m 7^ i 2cij iim = i,n = j -2c„ m ' (4.101) if m = i, n ^ j 4- £>„ (4.102) £> = D {a ){l,a )- 1 w7t Ih t t + a {l,D {a ))t TK d t t = B Z ^ f A ' J a , - ! + BA' D ( ^) (4.104) =^(P)(/-A°) (4.105) d ^ A (4.103) 1 m at D (P) is a matrix with all zero elements, except the element in row i and column i, which is given by m 1 - $(dt(0;7?i,t/j) . ^ 4 - ^>K(0 - 1;^,^)^ A»[$ W ) - i)](i - A„[<f>K(*))](i (i - $(4(0 -1)) = -1)) 2 (4.106) D <&(d; rji, v ) is obtained by differentiating <f>(<i; r/j, z/j) as defined in equation 4.27 Vi % D ${d; Vi, ^i) = *{${d; r, , u ) Vi t 112 % r*, ^ + 1)) (4.107) 5- D u Du Vi = (l,BA' )- {(l,BD {A' ) ) + (l.BA' D^(lt))) (4.108) 1 t+l dtlt Ui dt lt d D = A , ( a ) ( l , a ) - + a (l, ^ ( a , ) ) " D a =B ^ ^ J a , . ! +B^D^at.!) A,(A ) = A,(P)(/-A°) 1 Vilt Vi t t (4.109) (4.110) (4.111) 1 t t d t D {P) is a matrix with all zero elements, except the element in row i and column i, which is given by Vi 1 - $(d (i);^,i/i) D t D [*(dt(i) - 1)1(1 ~ $K(*))) - DvjMdtWW Vi (1-*K(*))) ~ $ W ) ~ 1)K 2 j (4.112) Unfortunately, differentiating §(d; rj, v) as defined in equation 4.26, with respect to v does not result in a simple form as in equation 4.107. However, we can easily find the numerical value of D §(d\r)i,vi). We have Vi A,($(d;77,i/)) = (logfa) - ^(v))§(&\i),v) + / log(x)<^(x; , u)dx v Jo (4.113) where is the 'digamma' function [87,88]. The digamma function is a known special function defined as (4.114) where T(x) is the gamma function. The numerical value of the digamma function at any point is easily obtained from 7 + J2(-l) ((k k+1 ^(x+n) = <n—1 + l)x , forn = 1 and - 1 < x < 1 ^ h *(x' + 1) fc=i k r 4- , for n > 1 and -1 < x < 1 A; • (4.115) 113 It is shown that only twenty terms of equation 4.115 suffice to compute the $(x) to the full machine precision in 32-bit floating point format [77, 87]. The term log(x)4>(x; rj, v)dx is easily computed using numerical definite integral calculation methods. Therefore, we can compute the numerical values of D (§{d (i) - 1) and D ($(d (i)), and substitute them into equation 4.112 to obtain D {P). Vi t Vi t Vi 4.7.6 Parameter Update Equations Afterfindingtpt+i, the parameter vector 0 is updated using 9 t+l 4.7.7 = 0 +\ t t + 1 • RT^ • ifrt+i U p d a t i n g the Hessian M a t r i x Estimate, (4.116) R t Rt is an estimate of the second derivative of the criterion function. R is updated recursively (see [84,89]) as t Rt = R -i + \ mW{t) t 4.7.8 - R{t - 1)] t (4.117) I m p l e m e n t a t i o n Issues Our online identification algorithm finds the local maximum likelihood estimate of the model parameters. Hence, similar to the off-line case, it is important to start the algorithm with proper initial values. Furthermore, the following issues should be considered when implementing our algorithm. Scaling Similar to the off-line case, the forward variable a., converges rapidly to zero as t increases. This can be avoided by using the same scaling technique as in the off-line case, where a is scaled as t a t = a {l,a )t t 1 (4.118) It can be shown that this scaling does not affect the update and recursion equations. 114 C h o i c e of the step size A t A in equation 4.116 is a step size. In theory, X can be any sequence satisfying t t oo A >0, t oo ] T A = co, J^A <oo 1=1 t=i (4.119) 2 t A common choice is A = 1/t, where X has a weighting effect on the norm function, such that recent observations have a stronger effect on the update than older observations. There is a trade-off between the 'tracking ability' and 'noise sensitivity' of the algorithm in selecting A . Choosing a larger step size (e.g., X = 1/y/t) results in faster convergence of the estimate to the real parameters and allows the algorithm to track the changes in the actual parameter values faster. However, selecting larger step sizes makes the estimate more sensitive to noise. The possibility of using different and more sophisticated choices for step sizes is discussed in [89,90] and [91]. t t t t A v o i d i n g m a t r i x inversion i n the update equations The update equation 4.116 involves inversion of R . This matrix inversion can be avoided by using the matrix inversion lemma [89]. If A,B,C and D are matrixes then the matrix inversion lemma states t [A + BCD}- = A' 1 1 By taking A = (1 — j )Rt-i, written as: t R7 = l 1 - A- B[DA-IB + X - 1 ] - 1 ^ - 1 (4.120) and C — 7 , parameter 4.116 can be B — D' = 'tp ( R ; \ - C t W W t t ) ( 4 1 2 1 ) P r o j e c t i o n of parameters to the constrained d o m a i n Since the parameters of an HSMM are constrained, the estimated parameters should be projected into the constrained space. More precisely, the parameters Cij {a°j) l are constrained to J2iLi n 1- Thus, after updating the parameters using equation 4.116, c^-'s are re-normalized as l 2 = c — 4 =7#^r T N 115 c2 (4-122) 4.8 Effective Bandwidth of a Hidden SemiMarkov Model In this section, we show how the effective bandwidth of a signal is obtained from its HSMM's parameters. Our approach is to reformulate the Semi-Markov chain of an HSMM as a Markov chain with larger number of states. First, we will show how the effective bandwidth of an HMM is obtained. Next, we show how a Semi-Markov chain is reformulated as a Markov chain with larger number of states. Finally, we show how the effective bandwidth of an HSMM is obtained using this approach. 4.8.1 Effective B a n d w i d t h of H i d d e n M a r k o v Models In this Section, we show how the effective bandwidth of an HMM is obtained. Let y be the observation process of an HMM with the hidden Markov state process x , and with the transition probabilities matrix A. Let Y(t) be the t t t cumulative sum of observation process y in [0,t], that is Y(t) = ^ ^ y - Also T r=0 let ipi(9) be the moment generating function of the observation layer for state i, that is, <pi(0) = E(e \x = i). We define V^OM) := E(e «|x'i = i). We have 6yt 9r t 116 ^(9,1) =E(e \x =i) 0Y{t) l =E(e \x = i) x E(e ( W- '( »|x- = 1) Byi e y y 1 1 1 M =ip\0) J2^(e ~ \ 2 = i=i 6Y{t) YW =i)x F{x = j\x = i) x 2 x M =^(e)J2ne - \x =j)a. 6Y{t) YW 2 lJ M ^{e)^2ne - \xi=j)a eY[t l) l3 i=i M (4.123) =^(0)£>*(0,*Ki=i -If we consider the vector $f(9,t) — [tpi(9,t)} and the diagonal matrix $(0) = diag[<^i(0)], then the above equation can be written in matrix form as \b(0,t) =*(0) -A- # ( 0 , t - 1) (4.124) with the initial condition ^(9,1) = <&(9) • 1 , where 1 is a column vector with all its elements being one. Now let 7Tj be the probability that initial state x\ = i, and also let T 7T = [7ri,7r , ' • • , 7 T ] , 2 M tllUS E{e ) = 7T^{9,t) eY{t) = TT{${9)Ay- <b{9)l l T (4.125) Since the transition matrix A is irreducible, then A is primitive. Since <J>(0) is diagonal, then matrix $(9) A is also primitive. Hence, we can use the Perron-Frobenious theorem (see Theorem 8.5.1 in [92]) from matrix theory to show that lim m9)A/s m9)A)] = L{9) n P (4.126) where L{9) is a constant matrix and sp($(9)A) is the spectral radius of the matrix $(9) A. The spectral radius of a matrix is simply the largest absolute 117 eigenvalue of that matrix. Therefore, equations 4.125 and 4.126 result in the following expression lim \ - logE(e<"' ) = (1/9) \og(sp(<5>(9)A)) (4.127) (t) t—tea 9t Hence, we have a(9) = (1/9) log (sp($(6)A)) (4.128) where <f>(#) is a diagonal matrix. 4.8.2 R e f o r m u l a t i n g a S e m i - M a r k o v c h a i n as a M a r k o v chain Let s be a Semi-Markov chain with N distinct states, taking its value from the space {1,2,..., N}. We assume that the time that signal stays in each state is limited, say by D time units. This means that the state process s will not stay in any state for more than D time units. Our motivation in here is to reformulate the N state Semi-Markov chain s in the form of a Markov chain with N x D states. Our approach in similar to what was presented in [78]. Assume, that the state process is in state i at time t, i.e., s — i. Then define d as the time that s has spent in state % prior to time t. That is t t t t t t = z, s -i ='«,..., St-d+i = i, s -d d = {d\s t t t (4.129) t Notice that (4.130) L1 Otherwise Therefore, d +\ depends only on d , s and s . Note that by definition, variable d is restricted to d w = \ t d t + 1 l i S t + 1 = t t S t /+1 t if s i = s t+ if and t ^ D (4.131) t t t+ < t and d = D st+i = s if s i d s t Now consider the vector stochastic process x defined as x = (s , d ). x takes its values from the space {(i,d)|l < i < NA < d < D} and clearly is a finite-state process with N x D states. Note that a (d ) if x = (i, d ) and x = (i, d + 1) (x i\xt) = { a (d ) if x = (i,dt) and x = (j, 1), i + 3 (4.132) 0 Otherwise t u l+ l3 t t t t t t t + i t+1 118 t t t t where au(d ) is the probability that s stays in state i given that it has spent d time units in state i. Similarly a,ij(d ) is the probability that s transits from state i to state j given that it has spent d time units in state i. Note that F(x i\x ) is independent of t, and hence. x is a Markov chain. For simplicity of notation, we will denote the state space for x as {1,2,..., M) from now on, where M — N x D. If x — m, then we have t t t t t t t+ t t t t 5f= [ ! ! 7r J+1 (4133) d = m - (s - l)D t t Til — 1 = m-[^-\D (4.134) where [xJ is the largest integer less than or equal to x. We also have vi — [s - l)D + dt. Let d = m - \J^\D, i = [ ^ J + 1 and j = [^\ + 1; then the state transition probabilities matrix for the Markov process x is given by t t L^j j a {d) if n = m + 1, and a {d) ifn-L 5 JI> = l , a n d ^ J ^ L ^ J u tj 3 = 1 ( 4 1 3 5 ) 0 Otherwise Since x has M = N x D states, then the state transition matrix for x has M = (ND) states. However, since variable d is restricted based on equation 4.131, it can be easily shown that (M - N D) elements of the state transition matrix are zero and only N D elements of the transition matrix are non-zero. t t 2 2 t 2 2 2 4.8.3 Effective B a n d w i d t h of a n H S M M l a t i n g as a n by Reformu- H M M Let y be an HSMM, with state process s . Following the approach presented in Section 4.8.2, s can be reformulated as a Markov chain x with larger number of states; Hence, y can be modelled as an HMM with modulating (or hidden) process x . Then, the effective bandwidth of y is easily obtained using the equation 4.128 as presented in Section 4.8.1. Let bi(y ) = F(y \s = i) denote the densities of the observation process y when y is considered as an HSMM. Also let <pi(6) be the moment generating function of the observation process, given that the state process is in state i, that is <pi(9) = E(e \s — i). Now consider y as an HMM with state process t t t t t t t t t t t t me t t 119 x . Let b' (y ) denote the densities of the observation process given that state at time t is x — m. According to equation 4.133, we have t m t t b' (yt) = nyt\xt (4.136) = m) m m— 1 = nvt\s = L- D J + l) (4.137) = 6 -i (yO (4.138) t L2 J+1 Based on this equation, one can easily construct the diagonal matrix $(6>) (see equation 4.8.2) of size ND x ND as ${9) = diag{< (0),--- ,<Pi(6),M9),--- » '" > Pl D times » D times ' ><PN(9)} > • ,<PN(9),--^ ' D times (4.139) By substituting the obtained $(6) in equation 4.128, one can easily obtain the effective bandwidth of y , a (6). t 4.9 y Numerical Results In this section, we present the numerical results of implementing the methods presented in this chapter. In Section 4.9.1, we present the results of applying the off-line and online identification methods presented in sections 4.6 and 4.7 to synthetic data generated by simulating an HSMM. Our objective is to experimentally verify that our identification method actuallyfindsthe true parameter values. As our online identification method results in the same estimate as the off-line method, we only apply the off-line identification algorithm to empirical traffic samples of a few typical TV programs in Section 4.9.2. We also present numerical results of estimating the empirical bandwidth curve for these sources. 4.9.1 Numerical Results for Synthetic Data To verify that our identification methods give accurate estimates of an HSMM parameters, we conducted two experiments, one for the off-line case, and one for the online case. In our first experiment, the parameters of two HSMM signals, each having N = 3 distinct states, were estimated using the off-line algorithm presented in Section 4.6. The number of samples for each model was 120 Parameter A° a 2 Ms n V Actual parameter values for for model 1 Actual parameter values for for model 2 0 0.30 0.70 0.75 0 0.25 0.15 0.85 0 -10 0 10 i 5 5 5]' 10 20 30' i 5 10 15] [50 200 45C)]' 0 0.30 0.70 0 0.50 0.50 -10 0 10 10 10 20 5 10 [50 200 0.70 0.30 0 T 10" 10 1 30]' 6]' 180 i / Initial parameter values for for models 1 and 2 0.00 0.50 0.50 0.10 0.00 0.90 0.50 0.50 0.00 -15 3 15 8 8 8]' 10 10 10' f ' l 10 20] [10 100 20C)]' Table 4.2: Actual and initial values of the parameters of HSMM models used in simulating our off-line identification algorithm. T = 10000. The actual values of the parameters are given in Table 4.2. The first model can be considered to be in a low-noise condition (i.e., |//,; — ^> a ) while the second model is in high-noise condition. The initial values for the model parameters in both cases are similar, and are shown in Table 4.2. Figures 4.8 and 4.9 illustrate how some of the parameter estimates are updated in each iteration. We observe that the parameter estimates converge to the actual value of the parameters after a few iterations. The log-likelihood of the total observation y given the parameters estimate, l o g ( P ( ^ T | ^ ) ) , is also plotted in figures 4.8-d and 4.9-d. As shown, this log-likelihood increases in each iteration, demonstrating that the algorithm finds the maximum-likelihood estimate of the model parameters. In the next experiment, we applied our method for online identification of HSMMs to two HSMM signals, with the parameters shown in Table 4.3. As shown, the actual parameters of the second model change at t = 5000. The results of estimating some of the parameters of these models are presented in figures 4.10 and 4.11, respectively. We observe that the parameter estimates converge to the actual value of the parameter as t becomes large, and that our algorithm successfully tracks the changes in model parameters. 2 T 121 Parameter A" A 4 a 2 Us V V Parameter 0.80 0 0.70 0 0.50 0.30 3 0 0.50 0.30 0.20 0.50 0 0 iol' [-10 [-10 0.50 0 0.70 0 0.50 0.50 0 10]' [2 2 21' [10 20 30]' [10 20 30]' 15]' [5 10 15]' [5 10 [4 [50 200 450]' [50 Actual parameter values for model 2 5 x 10 < t < x l O 3 A° Actual parameter values for model 2 1 < t < 5 x 10 Actual parameter values for model 1 0.00 0.15 0.30 0.50 0.00 0.70 0.50 0.85 0.00 4 0.00 0.50 0.50 0 10]' [-13 [4 4 4]' [10 2 4]' 200 450]' Initial parameter values for models 1 and 2 [-5 a 4 0.50 0.00 0.50 0.50 0.50 0.00 4 20]' 10 10]' Ms [10 20 30]' [5 10 10]' V [5 10 15]' [8 10 20]' V [50 200 450]' [40 100 200]' Table 4.3: Actual and initial values of the parameters of H S M M models used in simulating our online identification algorithm. 122 Figure 4.8: a-c) Parameter estimates for the first model versus the iteration number. The dotted lines show the actual value of the parameters, d) The log-likelihood function, log(P.(yi, 2/2, • • •, Vt\Q)), versus the iteration number. As shown, log(P(yi,y2, • • • ,Ut\9)) increases in each iteration. 123 Figure 4.9: a-c) Parameter estimates for the second model versus the iteration number. The dotted lines show the actual value of the parameters, d) The log-likelihood function, log(P(j/i, y , • • •, Vt\Q)), versus the iteration number. As shown, log(P(y j / 2 , . . . ,yt\0)) increases in each iteration. 2 1; 124 (c) Figure 4.10: Online estimation of a 3 state H S M M : a) state transition probability a ° ; b) observation mean for state 1, pi, c) state duration mean for state 1, /i i. The dotted line shows the actual value of the parameter. The parameter estimate converges to the actual value of the parameter. 2 St 125 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13. 6000 (a) 8000 10000 (b) Figure 4.11: Online estimation of a 3 state HSMM, where the actual parameter changes at t — 5000: a) state transition probability a ; b) observation mean for state 1,^1- The dotted lines show the actual value of the parameter. The parameter estimates follow the temporal changes in the actual value of the parameter. 21 126 4.9.2 N u m e r i c a l R e s u l t s for E m p i r i c a l T V Traffic Traces In this experiment, we first fitted an HSMM model to the empirical traffic traces of a few typical T V programs using the offline parameter identification method presented in Section 4.6. The empirical traffics are from the video sequences described in tables 2.4 and 2.5 in Section 2.5. Table 4.4 shows the numerical value of the estimated model parameters for three of the sources. Figures 4.12-4.14 show how the model parameters converge to the final estimate as more iterations of the off-line algorithm are performed. Then, we obtained the effective bandwidth curve a(0) from the estimated HSMM model parameters using the method presented in Section 4.8. Figure 4.15 shows the obtained effective bandwidth curves. As shown, a{9) is an increasing function of 6, where a(0) is the average rate and a(oo) is the maximum rate of the source. Then, we employed the obtained effective bandwidth curves in our admission control scheme, and plotted the maximum waiting-time T versus rate R (see Figure 4.16) for an incidental stream that uses the stochastic service class, and for a constant loss probability p. In next simulation, we examined the accuracy of our stochastic admission control mechanism. We considered a transmission system, which multiplexes N main video streams, one incidental stream using the deterministic service, and one incidental stream using the stochastic service. We conducted two experiments using two different simulation parameters, as shown in table 4.5. The first set of parameters are selected to simulate cable transmission medium, while the second set simulates a terrestrial medium . The incidental stream using the deterministic service class has the rate Then, we used the stochastic admission control scheme to obtain the loss probability for an incidental stream with rate R and waiting time T that uses the stochastic service class. Finally, we simulated the buffering operations in the transmission system , and observed p the loss probability of incidental stream that uses stochastic service class. Figures 4.17-a and 4.17-b illustrate the loss probabilw 2 w 3 A 6 MHz channel in the cable medium is capable of delivering digital data at the 19.8 Mbps rate. This capacity is usually shared by 4 or 5 T V programs. In terrestrial medium, a 6 MHz channel is capable of delivering at the 39.8 Mbps rate, which is usually shared by 8 or 9 T V programs The packet scheduling method employed in this simulation is described in detail in Chapter 5. 2 3 127 ity p versus R obtained from simulation and admission control scheme for the simulation parameter sets I and 11 respectively. As shown, the actual loss probabilities obtained from simulation, shown by the plus signs, are very to the loss probabilities estimated by the admission control scheme. These results verify that our stochastic admission control scheme can accurately estimate the system performance in terms of loss probabilities. Furthermore, it is noted that the admission control is more accurate for a larger T parameter. That is the actual loss values are closer the loss probabilities estimated by the admission control in Figure 4.17-a, where T = 30, than Figure 4.17-b, where T — 5. This is due to the fact that our admission control scheme is designed based on the assumption that the buffer waiting-time (or the buffer size) is very long. w w w 128 CU co co co a) r- « .2 CO co o 3 .2 CT> L LO O CM Q. ~ If » c ^ o rf S , > M o o o o CO CO oo CTl C CN N CM o OS -, CO CO CO a s o co LO '"Si rf rf > LO o LO O cu cu O LO LO 00 co cn co co CO rf a CT> CO CX cn CO o o o o cu 3 ° "rf > CO o Q CD co oo CN a> -1-5 CN LO CT CN LO LO LO LO CN co o o CN o o o s CO o o CN CO o o CM co O p o © +^ cu S S-i 129 130 Figure 4.13: Online estimation of H S M M parameters for the Documentary (Best Places in Canada) sequence. 131 132 Transmission rate Number of T V programs sharing the channel, N Maximum bitrate assigned to each main video stream Transmission capacity reserved for video streams Transmission capacity reserved for audio streams and other ancillary data Rate of the incidental stream that uses the deterministic service class Main video stream sources Simulation parameter set I (cable medium) 19.8 Mbps 4 Simulation parameter set II (terrestrial medium) 39.8 Mbps 8 4.5 Mbps 4.5 Mbps 18 Mbps 36 Mbps 1.8 Mbps 3.8 Mbps 0.8 Mbps 1.5 Mbps 1. 2. 3. 4. 1. 2. 3. 4. 5. 6. 7. 8. Mission Impossible News Talk Show Documentary Mission Impossible News Talk Show Documentary Court Show Muppets show Soap opera Cartoon Table 4.5: Simulation parameters. 133 134 7 6 T5 w 4 3 2 1 0.2 0.4 0.6 R 0.8 1.2 1 0.5 1.5 (b) Simulation parameter set. II, p = .05 (a) Simulation parameter set I, p = .05 120r 40 20 0.2 0.4 0.^ 0.8 1.2 1 Simulation parameter set I, p = .01 0 0.5 ^ 1.5 (d) Simulation parameter set II. p = .01 Figure 4.16: T versus R for an incidental stream that uses the stochastic service class. w 135 2 Probabaility of loss versus R, T =30 0.14 0.12 — • Probabaility of loss, computed by the admission control s Actual loss observed in experiment 0.1 0.08 P 0.06 0.04 0.02 (a) Simulation parameter set I, T = 30 w Probabaility of loss versus R. T = 5 0.35 Probabaility of loss, computed by the admission control i Actual loss observed in experiment 0.3 0.25 0.2 P 0.15 0.1 0.05 0 0.2 .0,4 0.6 (c) Simulation parameter set II, T 0.8 w =5 Figure 4.17: Probability of loss p versus rate R, where T is constant. The solid lines illustrate the loss probabilities obtained by the admission control scheme, and '+' signs illustrate the actual loss probabilities obtained from simulating the multiplexing system. As shown, actual loss probabilities obtained from simulation are close to what obtained from the admission control. w 136 4.10 Chapter Conclusion In this chapter, we studied stochastic traffic models for modelling the traffic of full screen video sources in digital TV. We showed that though HMMs can adequately capture most of the stochastic properties of video traffic sources, they cannot adequately model the state duration densities for the full screen video sequences. Therefore, we selected the more sophisticated HSMM models for modelling the video traffic sources in the proposed ITV application. Furthermore, we gathered some evidence from empirical traffic samples of typical T V programs, which showed that Gamma distribution is a good model choice for modelling the state durations densities. Next, we presented a novel signal model for HSMMs. We showed that our signal model results in easier and more efficient parameter identification algorithms. Based on our new model, we presented a variant of the E M algorithm for off-line identification of HSMMs. Furthermore, we presented an online identification algorithm based on our new signal model, and based on the general recursive prediction error technique. Using these methods, one can efficiently estimate the parameters of an HSMM for off-line or online cases from the traffic samples . Next, we showed how the numerical value of the effective bandwidth function is obtained from the parameters of an HSMM. Our approach is based on reformulating an HSMM as an HMM of a higher dimension. In summary, one can use the methods presented in this chapter to obtain the numerical value of the effective bandwidth function of a source from the traffic samples. The obtained effective bandwidth curve is used in conjunction with the admission control methods presented in Chapter 3 to find the maximum waiting-time for an incidental stream that uses the stochastic service class. Up to this point (i.e., Chapters 2-4), we have presented methods which find the maximum waiting time for an incidental stream that uses the deterministic or the stochastic service classes. These methods determine when the data packets of an incidental stream should be made available to the transmitter for transmission. In the next Chapter, we describe how the data packets of main and incidental streams are actually handled during multiplexing. 137 Chapter 5 Broadcast Head-End Architecture If something anticipated arrives too late it finds us numb, wrung out from waiting, and we feel nothing at all. The best things arrive on time. -Dorothy Gilman, A New Kind of Country, 1978. Overview A system for multiplexing main and incidental stream data is presented. The role and importance of packet scheduling policy is discussed, and a novel scheduling algorithm is presented. Our approach ensures that all the main and incidental streams are treated according to their importance. 5.1 Introduction In this chapter, we present our design of the transmitter head-end for our interactive digital T V system. As discussed in Chapter 1, the digital T V standard requires that the encoded video and audio streams of a TV program be delivered to the T V receivers in the form of a single multiplexed stream called 'Transport Stream', (TS). The syntax of transport stream is defined in the digital TV standard. Here, we present our design of a multiplexer system, which is capable of multiplexing incidental streams data alongside the main streams data. Our system takes the priority of the main and incidental streams 138 into account, and manages the flow of the data packets in the system such that: 1) all the main data packets are transmitted on time, 2) all the incidental data which are transmitted using the deterministic service class are transmitted on time, 3) the bandwidth that is not used by the main streams or deterministic service class streams is fairly shared among the incidental streams that use stochastic service class, and 4) the remainder unused bandwidth is fairly shared by the incidental streams that use the best-effort service class. One crucial part of our multiplexing system is a scheduling algorithm, which determines the order of packets in the interleaved packet sequence that forms the transport stream. We present a scheduling algorithm for our multiplexing system. Our scheduling algorithm employs a prioritizing policy, where input data streams are divided into six different priority classes. Data packets belonging to each class are considered for transmission only if the higher priority classes do not have any data packet ready for transmission. This ensures that more important data (e.g., main streams) are given a higher priority than less important data (e.g., incidental streams). Furthermore, we use a weighted fair queuing policy for scheduling the streams of each priority class. In this policy, the waiting time of each data packet is considered as the key decision factor to decide which packet must be served next. Our approach ensures that the bandwidth is fairly divided among the streams of each priority class. Our scheduling algorithm also ensures that the generated transport stream is compliant with the standard TV receiver model, as indicated by the American digital TV standard ATSC. This ensure that any standard digital TV receiver or set-top box can extract and decode the main streams from the transport stream generated by our system. The rest of this chapter is organized as follows. In Section 5.2, we present an overview of of standard digital T V multiplexing systems. Then, we present our design of multiplexing system for our interactive TV system in section 5.3. In 5.4, we present our scheduling algorithm. Chapter conclusion is presented in Section 5.5. 139 5.2 Multiplexing System Structure of Standard Digital T V System A conceptual diagram of a multiplexer system at the head-end of a TV transmission system is shown in Figure 5.1. This system multiplexes the video and audio streams of a number of TV programs, and creates a multiplexed transport stream. This transport stream is then broadcasted over a cable, terrestrial or satellite channel to the T V receivers. For most T V programs, the source video and audio source sequences are captured at a different location than the transmitter head-end. In this scenario, the source video and audio streams are usually delivered to the transmitter system through a private high-speed link, such as a satellite link. For more details about the standard digital T V transmitter architecture see [93-96]. 5.3 Multiplexing System for Our Interactive T V System Figure 5.2 shows the diagram of our multiplexing system. The data inputs to our multiplexer systems are the video and audio streams, which come from a broadcast station in the case of live programs, or from an off-line storage medium in the case of prerecorded programs. The output is a single transport stream, which has the constant bitrate of RTS bps. As shown, our multiplexer system consists of four basic units: 1) 'TS packetizer' units, which packetize the input stream and generate TS packet streams; 2) 'Traffic Shaping Unit', which passes the TS packets to the multiplex buffers at a regulated rate: 3) 'Multiplexing Buffers', which hold the TS packets ready for transmission; and 4) 'Scheduling Unit', which takes the TS packets from the multiplex buffers and creates the multiplexed transport stream. We will discuss the mechanisms of these units in more details in the following sections. As shown, a 'scalable transcoder' re-encodes an incidental stream to generate a three layer scalable stream. In a general scenario, the base layer is transmitted using the deterministic service class; the first enhancement layer is transmitted using the stochastic service class; and the second enhancement layer is transmitted using the best effort service class. The bitrate of each layer is determined during the admission control process by the admission control 140 Program 1 ^jg^v. ^lV 4 Uncomprcsscii Video Sequence V i d e o Encoder ISO/IEC 13818-2 Uncompressed Audio ^Packetizcr) , p i ; S V i d ^ A u d i o Encoder ISO/IEC 13818-3 Time Base STC 1 TS MUX Broadcast (Error prone media) Program N Uncompressed ""S [;-. Video Sequence V i d e o Encoder ISO/IEC 13818-2 Uncompressed Audio A u d i o Encoder ISO/IEC 13818-3 Time Base STCN System Coder ISO/IEC 13818-1 Figure 5.1: Conceptual diagram of a Transport Multiplexer. unit. 141 142 5.3.1 T S packetizer The TS packetizer units simply break the input bitstream into 184 byte units, add the 4 byte TS packet header as indicated by the standard, and generate the TS packets of size 188 byte. For more information about TS packet structure see [93,94,96,97]. 5.3.2 Traffic S h a p i n g U n i t As shown in Figure 5.2, the traffic shaping unit passes the TS packets from the packetizer units to the multiplex buffers. The function of this unit is to regulate the rate of packet submission to the multiplex buffers. This rate regulation is necessary to ensure that the actual amount of data submitted to the multiplex buffers is in accordance with the bandwidth reserved for the main streams. Furthermore, the traffic shaping unit controls when the incidental data packets are submitted to the multiplex buffers. That is, this unit is responsible for sending the incidental data to the multiplex buffer T seconds before their transmission deadline. As shown, the traffic shaping unit uses a buffer for each stream, and controls the packet departures from each buffer by using a buffer control unit for each buffer. These buffer control units use different schemes for each stream, as described below. w M a i n A u d i o Streams Main audio streams in digital T V applications have a constant bitrate. ff the bitrate of a main audio stream is Ri, then the buffer control unit ensures that no more than |" ^ ] TS packets are submitted to the multiplex buffer during each 1 second period, where \x] denotes the smallest integer greater than or equal to x. For this purpose, the buffer control unit uses a Token variable. The token variable is updated periodically every 1 second to r8x^184 "I • Whenever a packet from the buffer is submitted to the multiplex buffers, its token variable is decremented. The buffer control unit sends a TS packet only if the assigned token variable is greater than zero. 8 84 M a i n V i d e o Streams If a main video stream is characterized by its maximum bitrate and not a (a, p) model, then a similar scheme to what is used for main audio streams is used by the buffer control units. 143 However, if a main video stream is characterized by a (a, p) model, then the method used by the buffer control unit is different. In this case, the buffer control unit must ensure that the aggregate number of TS packets submitted to the multiplex buffer in any time window of length t is less than mini(<Ji + prf) . Fortunately, there is a very powerful and efficient method for implementing this mechanism based on Token Buckets. A token bucket is a mechanism for ensuring that the traffic generated by a source is compliant with a single (cr, p) model. A token bucket is simply a variable initialized to o and incremented at rate p. This variable is bounded from above by a. Whenever a packet is submitted to the multiplex buffer, the token bucket variable is decremented. The buffer control unit submits a TS packet only when the token variable is greater than zero. 1 For a (<7, p) model consisting of N (o, p) pairs, N token buckets should be employed. The buffer control unit submits a TS packet only when the minimum of all the N token variables is greater than zero. This mechanism ensure that the total traffic delivered to multiplex buffer complies with the (a, p) model. P S I tables Program Specific Information (PSI) tables are data tables embedded into the transport stream, which contain important information necessary for demultiplexing the transport stream [93,95,97]. For example, PSI tables carry the so called 'identification numbers', which tell the digital T V receivers which packets should be decoded for a specific T V program. There are four types of PSI tables: Program Allocation Table (PAT), Program Map Table (PMT), Conditional Access Table (CAT) and Private Tables [95,97]. Since PSI tables carry information necessary for decoding the transport stream, it is necessary that PSI tables are repeatedly inserted into the transport stream. The repetition frequency of PSI data is not specified by the MPEG standard. However, it is advised that the PSI tables be repeated between 10 to 50 times per second. We employ a token variable for controlling the transmission frequency of each PSI table. These token variables are updated periodically according •'Note that the variables a and p should be translated from bits and bps to 'TS packet count' and 'TS packets per seconds' by dividing them to 8 x 188 and rounding towards infinity. 144 to the repetition frequency of the PSI tables and the actual size of the PSI table in bytes. The buffer control unit of PSf tables submits a TS packet to the multiplex buffer only when the assigned token variable is greater than zero. Incidental Streams The function of buffer control unit for incidental streams is to send the incidental TS packets to the multiplex buffer T seconds before their transmission deadline. Since incidental streams have a constant bitrate, this mechanism is easily implemented by sending the first TS packet of an incidental stream exactly T seconds before the decoding time of the first frame of the incidental stream; the consecutive TS packets are submitted to the multiplex buffer at the constant rate of the stream. For example, consider an incidental stream with maximum waiting time T and rate R bps, which is equivalent to Rj(8 x 188) TS packets per second. Also suppose that the decoding of this stream should start at time T. Then the buffer control unit will send the first TS packet of this stream at T — T to the multiplex buffer; and each consecutive TS packet is transmitted after (8 x 188)/i? seconds. W w w w M u l t i p l e x Buffer 5.3.3 Multiplex buffers hold the TS packets that are ready for transmission. The size of multiplex buffers for main streams is selected such that these buffers never overflow. A buffer of size .5 x R, where R is the maximum bitrate of the stream in bps is usually enough. The size of multiplex buffers for incidental streams is T x R, where T is the maximum waiting time assigned to the incidental stream during the admission control process. w 5.4 w Scheduling Scheduling Unit Objective 5.4.1 Suppose the bitrate of transport stream is Rrs bps. Since each TS packet has the constant size of 188 bytes, then each TS packet is transmitted in exactly A = „ seconds. We call A a transmission time-slot. That is, a transmission time-slot represents the time required for sending 188 bytes of data. Hence a 1 8 x 8 145 Stream #1 Stream #2 Stream #N Scheduler: Which packet will occupy the next time-slot? Figure 5.3: Scheduling unit decides which TS packet should be sent at the next transmission time-slot. free time-slot represents the opportunity of sending only one TS packet. The function of the scheduling algorithm is to assign each transmission time-slot in the TS stream to one of the input streams, as shown in Figure 5.3. 5.4.2 Prioritizing Policy Our algorithm employs a prioritizing policy, where the input streams are divided to different priority classes. Each class of streams is served only when higher priority streams do not have any data packet ready for transmission. In priority order, these classes are: 1) PSI tables, 2) main audio streams, 3) main video streams, 4) incidental streams with deterministic service class, 5) incidental streams with stochastic service class, and finally 5) incidental streams with best effort service class. 5.4.3 Limitations Imposed on Scheduling by the Digital TV Standard The digital T V standard has defined a reference model for the buffering process in digital T V receivers, called 'Transport Stream System Target Decoder' or TS-STD. This reference model specifies a standard layered buffering structure required to de-multiplex and decode a transport stream [93-96]. It also specifies the minimum size of each buffer and how data flows between the 146 buffers. The purpose of this reference model is to standardize the buffering process at T V receivers. All digital T V receivers are required to implement the TS-STD, and all the transport streams must be generated in compliance with this reference model. Hence, our transport stream multiplexer must employ a mechanism to ensure that the generated transport stream is compliant with the reference target decoder. This function is performed by the scheduling unit. That is, the scheduling algorithm should ensure that assigning the current transmission time-slot to a specific stream will not result in a buffer overflow in the reference TS-STD model. We implement this mechanism by simulating the TS-STD model for each T V channel. Using the simulated TS-STD model, we first check that assigning the current time-slot to a stream does not result in a buffer overflow in the TS-STD model. 5.4.4 Scheduling Algorithm As discussed before, the function of the scheduling algorithm is to decide which stream should occupy the next transmission time-slot in the transport stream. The mechanisms of our scheduling algorithm can be broken into two conceptual steps. In the first step, our algorithm creates a set of candidate streams. The candidate streams are selected by selecting the streams that 1) whose multiplex buffer is not empty, 2) the TS-STD model allows them to be transmitted at the current time-slot, and 3) they have higher-priority than other streams that satisfy the first two conditions. Therefore, candidate streams are all selected from the same priority class, and are all eligible for transmission at the current time-slots. In the second step, the algorithm selects one stream among the candidate streams for transmission. Depending on the type of the candidate streams, we use different policies to decide which candidate stream should be transmitted. For PSI tables data, a simple round robin policy is used. For main streams (either video or audio), let Wi denote the buffer workload and Ri the maximum rate of the stream. Then, we select the stream for which Wi/Ri is maximum. For incidental streams (either deterministic, stochastic or best effort services), we use an 'Earliest Deadline First' (EDF) policy. Let W be the buffer workload, Ri the stream rate and Ti the maximum waiting time l 147 Packet that will be dropped off packet packet packet packet #N+1 #N #2 #1 packet packet packet #N+1 #N #2 Figure 5.4: Packet drop off in Multiplex buffers of stochastic and best-effort service classes. When a new packet arrives to a full buffer, it is pushed into the buffer. assigned to the stream. Then we assign the deadline Wd l = T l ~ l t ( 5 ' 1 } to each stream. Note that a small d means that data packets in the buffer are close to passing their maximum waiting time. The scheduler selects the stream that has the smallest d. . l ; 5.4.5 Packet Drop-Off The admission control schemes, along with the mechanisms used by the traffic shaping unit, ensure that the multiplex buffers of PSI tables, main streams, and incidental streams using deterministic service will never overflow. That is, these mechanisms ensure that the aggregate number of TS packets submitted from these stream to the multiplex buffer during each scheduling cycle is less than or equal the number of packets that can be transmitted. Therefore, we expect to experience no packet drop off for these streams. However, we anticipate that the multiplex buffers of the incidental streams with stochastic and best effort service classes overflow occasionally. This overflow occurs when the transmission line is committed to the main and incidental streams with deterministic service class for a long time, and the scheduler cannot send enough TS packets from the incidental streams with stochastic or best effort service classes. .In this case, the exceeding TS packet in the multiplex buffer should be dropped off. This is shown in Figure 5.4. 148 As shown, when a new TS packet arrives to a full multiplex buffer, the new packet is pushed into the buffer. That is, a packet from the buffer beginning is dropped, other packets are shifted, and the new packet is added to the buffer end. The reason that we drop the first packet from the beginning of the buffer, and not the new packet, is that first packet has been in the buffer for more than the assigned maximum waiting time T , and hence it is too late to transmit this packet. Note that the multiplex buffer size is T x R, and is filled at the constant rate R. w w 5.5 Chapter Conclusion In this Chapter, we designed a multiplexer system for the transmitter head-end of our interactive T V system. We described the buffering structure required for handling the main, incidental and other ancillary data packets. Then, we presented a novel scheduling scheme for controlling the multiplexing operations. Our scheduling method ensures that all the main and incidental streams data packets are treated according to their importance during the multiplexing process. Furthermore, our scheduling algorithm employs a technique which ensures that the broadcasted stream is backward compatible with conventional digital T V receivers. This guarantees that conventional digital T V receivers, which are not programmed for our interactive T V system, are able to display the main video and audio content without any discrepancy. 149 Chapter 6 Thesis Summary In this thesis, we proposed and defined a new interactive system for digital TV. This system gives T V viewers the freedom to control T V program content. In so doing, we have introduced a new technological concept, which improves the home entertainment technology. We then addressed the most challenging issue involved in the design of the proposed interactive T V system. This issue concerns adding extra incidental data to a digital T V transmission channel. This must be accomplished without increasing the bandwidth or degrading the quality of other programs. We then presented data transmission schemes for our interactive T V system that allows to transmit the incidental data. We efficiently took advantage of any unused bandwidth in the transmission channel to transmit the incidental data. We classified the transmission schemes of incidental data into three classes, deterministic, stochastic, and best-effort service classes. We proposed to use scalable video coding for the incidental streams. In this approach, an incidental stream is encoded to a three-layer scalable stream. The base, first enhancement and second enhancement layers are transmitted using the deterministic, stochastic and best-effort service classes respectively. This technique not only results in very efficient bandwidth utilization, but also improves the perceived picture or audio quality of incidental streams. We then designed an admission control scheme for the deterministic and stochastic service classes. These admission control schemes answer the crucial question of whether an incidental stream can be added to a T V program or not. Our approach in designing the admission control schemes was based on modelling the traffic of main video streams using a traffic model. This model is then used for designing the admission control test. 150 In the case of the deterministic service class, we employed the (a, p) model for modelling the traffic of main main streams. We developed methods for fitting the (a, p) model to traffic sources. These methods are more efficient and more accurate than previously available methods. In so doing, we helped to advance the knowledge in the deterministic traffic modelling field. We then adapted the 'Network Calculus' theory, and designed an efficient admission control scheme for the deterministic service class. For the stochastic service class, we employed Hidden Semi-Markov Models (HSMM) for modelling the traffic of main video streams. We developed efficient methods for the identification of HSMM model parameters for off-line and online cases. In so doing, we have advanced existing knowledge about the general semi-Markovian signal models, off-line and online identification of HSMMs, and stochastic traffic modelling of full-screen video sources. Using the 'Effective Bandwidth' theory, we then designed an efficient admission control scheme for the stochastic service class. Then, we presented our design of a data multiplexer for the transmitter head-end of our interactive digital T V system. Our design is capable of multiplexing incidental stream data alongside the main streams data. We described how the flow of main and incidental data packets are controlled during the multiplexing process. We presented a novel scheduling scheme, which determines the order of data packets in the broadcasted packet sequence. Furthermore, we employed mechanisms which ensure that the conventional digital TV receivers can extract and display main video and audio content from the multiplexed stream. This makes our system backward compatible with the presently existing conventional digital TV receivers. We have tested the validity and efficiency of the methods presented in this thesis via simulation experiments. Numerical results of these experiments are presented throughout the thesis. In summary, this thesis presents efficient data transmission schemes for transmitting extra video and audio content alongside conventional digital T V data. By exploiting the methods presented in this thesis, new interactive T V applications are enabled, and the home entertainment technology is advanced. Furthermore, some research results presented herein, can benefit other research areas, such as deterministic traffic modelling, QoS enabled data networks, and semi-Markovian stochastic models. 151 6.1 Thesis Contributions Summary The major contributions of this thesis are summarized as follows, where they are listed in the order of appearance in the thesis. • N e w interactivity concept: We defined a new interactivity concept for TV, which allows TV viewers to personalize the video or audio content of T V programs. This new interactivity concept drastically enhances TV viewers experience, and advances the home entertainment technology. • D a t a transmission before presentation t i m e : We developed a novel transmission technique for transmitting the incidental data units ahead of their presentation time. This technique allows us to take optimal advantage of the transmission bandwidth that is unoccupied by the main streams. • D e t e r m i n i s t i c admission control: A new admission control scheme was developed in chapter 2 to be used in the deterministic service class. This is the most important line of development of this thesis in the context of the deterministic service class. • (CT, p) M o d e l fitting: The algorithm presented in chapter 2 for fitting (a, p) model to a traffic source is one of the contributions of this thesis. This algorithm is useful in any application that employs (<?, p) model. • P h y s i c a l i n t e r p r e t a t i o n of effective b a n d w i d t h : A new physical interpretation of effective bandwidth is offered in chapter 3. Such interpretation is important because it helps in advancing the stochastic queuing theory. • Stochastic admission control: A new admission control scheme was developed in chapter 3 to be used in the stochastic service class. This is the most important line of development of this thesis in the context of the stochastic service class. • E m p l o y i n g H S M M for m o d e l l i n g the full-screen video traffic: We showed in chapter 4 that HSMMs are a better model choice than HMMs for modelling the stochastic properties of full-screen high bitrate video. This line of development advances the video modelling field. 152 • New signal model and identification algorithms for H S M M : We presented a new signal model for HSMMs in chapter 4 . This model results in more efficient model parameter identification algorithms. We also presented off-line and online parameter identification algorithms based on our new signal model. The new signal model and identification algorithms are useful in any application that employs HSMMs, and are one of the most important contributions of this thesis. • Effective bandwidth of HSMMs: In chapter 4 , we showed for the first time how to obtain the effective bandwidth of an HSMM signal. This line development is useful in any queuing application that uses HSMM traffic. 6.2 Future Research In this thesis, we mainly focused on the mechanisms used at the transmitter head-end. Obviously, in order to display an incidental stream, a digital set-box receiver is required, which should be specifically designed and programmed for the proposed ITV application. In the context of this thesis, a set-top box is considered as a black box equipped with a large buffer (e.g., a hard disk) for caching the incidental stream data. This set-top box is assumed to be capable of controlling the playback of main and incidental streams. Though, the design concepts for the set-top box receiver architecture are simple, there is room for improvement. Thus, it is beneficiary to exploit the set-top box architecture in more detail in future research. For example, one can improve the buffer management schemes used at the receiver end for controlling the cashing of the interactive content, such that the incidental data is not lost when a T V viewer switches channels, and such that the random access delay for incidental streams is reduced. One can also improve the user interface (e.g., menus where TV viewers select their choices about a T V program), such that T V viewers can interact with the T V program more efficiently, and navigate among the main and incidental streams easier. Furthermore, future research may improve the traffic models used in this thesis. For example, one can exploit the possibility of using stochastic traffic models other than HSMMs, such as self-similar models, or Transform-ExpandSample (TES) traffic models, for modelling the traffic of main video sources. 153 This can result in easier, more efficient, or more accurate traffic models. 154 Appendix A Proof of Theorem 1 in Section 2.4 T h e o r e m 1 Assume a traffic source constrained by A* (t) traverses a system that offers the service curve 8{t). The waiting time d(t) for all t satisfies: d(t) <h(A* ,8) [35,36]. n n P r o o f It follows from the definitions of d(t) (equation 2.28) and li(A , (equation 2.31) that A ) m r < h(A , A ) in out out <==> A {t - T) > A {t) in (A.l) out Now consider some fixed t. From the definition of d(t), for all r < d(t) we have A {t) > A {t in out + T) (A.2) Now the service curve property at time t + r (equation 2.29) implies that there is some s such that A {t + T)> A {t + T-S) + 8{s) out (A.3) in So, from A.2 and A.3 we have A (t) > A {t + T-S)+ m in p(s) (A.4) It follows from this equation that Ai (t) > A (t + r — s), which implies that t > t + r — s. Thus, n in 8(s) < A (t) - A (t + T- s) < AUs m m (A.5) T) From the definition of h(A* , 8) (see A.l) and A.5 it follows that r < h(A* , 8). Since this is true for all r < d(t), we conclude that d(t) < h(A* ,8), Q.E.I. n n n 155 Appendix B Q-Q Plot The 'Quantile-Quantile' (Q-Q) plot, also known as 'probability plot' is a graphical technique for assessing whether or not an experimental data set follows a given distribution such as the normal or Weibull [98-100]. This technique is also used for determining if two data sets come from populations with a common distribution. By a 'quantile', we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70%fall above that value. A Q-Q plot is a plot of the quantiles of an experimental data set against the quantiles of the assumed distribution. If Q-Q plot is used to determine if two data sets come from the same distribution, then quantiles of the first data set are plotted against the quantiles of the second second data set. If the experimental data are actually from the assumed theoretical distribution, then the points in Q-Q plot should form approximately a straight line. This case is illustrated in Figure B.l-a. In this figure, the vertical axis is the quantile of the experimental data, and the horizontal axis is the quantile of a candidate probability distribution. As shown, the points in this Q-Q plot are very close to form a line. This indicates that the experimental data are actually drawn from a population with the assumed distribution. However, departures from this straight line indicate departures from the specified distribution. This is illustrated in Figure B.l-b. Hence, one can use the correlation coefficient associated with the linear fit to the data in the Q-Q plot as a measure of the goodness of the fit. In practice, Q-Q plots can be generated for several competing distributions to see which provides the best fit. In this case the probability plot generating the highest correlation coefficient is the best choice since it gener156 ates the straightest probability plot. 157 Appendix C Likelihood Ratio Test Cl General Likelihood Test Let ST {si,s , • • • ,ST} be samples from a stochastic model, and let 6 denote the model parameters, which takes it values from the space Q. Using the maximum likelihood principle, one can estimate the model parameters by finding 6 such that L{6) = P(Sr|0) is maximized. This can be regarded as finding the 'best' explanation for the observed S^. Now suppose one wishes to test whether some of the model parameters are restricted ox not, for example, if some of the model parameters are bounded or if some of the model parameters are zero. Formally, this is denoted by testing if 0 G to, where to is a subspace of Q. The usual approach to this problem is based on the likelihood ratio concept, which is defined as — 2 sup L(0) A(Sr) = ^-JJ^ supL(0) n (Cl) That is, A ( S ) is the ratio of the best chance of observing ST for 6 G ui to the best chance of observing ST for 0 G Q . Since to C fi, then A is always between 0 and 1. Values of A close to 1 suggest that the data are very compatible with 6 £ LU. That is, ST is explained almost as well by the parameter estimates under 6 G to as by parameter estimates under 9 G fi. For these values of A we should accept 6 G to. Conversely, if A is close to 0, then the data would not be very compatible with 9 e w and it would make sense to reject 0 G to. This is the rationale behind the likelihood ratio test. A likelihood ratio test is T 158 a hypothesis test for testing HO: 0 e cu (C.2) against HI: 0 6 fi, tu C fi (C.3) In order to obtain the rejection region and confidence intervals, it is necessary to know the distribution of A. However, this is ordinarily very complicated. Fortunately, it is shown that under very general conditions —21n(A) has a x distribution with n degree of freedom, where n is the difference in the dimension of LU and fi. Hence, by comparing —2 ln(A) to the upper 100 x (1 — a) percentile point of a x distribution, one can decide to reject HO or not. a is known as the significance level of the test, and is usually selected a — 5. R E M A R K As a general concept in hypothesis testing, the HI hypothesis represent a more general case (or more complex concept) than HO. In these cases, the Hi hypothesis is adopted unless there is sufficient evidence to reject the special (or simple) hypothesis HO. This concept is conveyed in the test by the notion of w C fi. 2 2 C.2 Likelihood Ratio Test for testing the H M M model against H S M M model Suppose we have two signal model candidates for modelling an empirical sequence Sr, and would like to use the likelihood ratio test to determine which candidate is the better choice. The first model candidate is a Markov chain with N states. In such a model, the signal makes a transition to a new state or stays at the same state at each time instance. The next state of signal depends only on the current state, and is selected according to a constant state transition probabilities matrix A = [a^]. This model is parameterized with 9 — (an, • • • , ajv-i,jv). It is easily shown that state durations have a Geometrical probability mass function, where the probability of staying exactly d time units in state i is given by ipi(d) = u) — a The second model candidate is a Semi-Markov chain with the same number of states (i.e., A^ states). In this model, once the signal enters a new state, a state duration d is selected, and the signal stays exactly for d time units 159 in the same state. After d time units, the signal will make a transition to a new state. The state duration d for state i is selected according to the probability mass function <Pi(d), where (pi(d) is a discretized Gamma probability density function. That is, <Pi{d) = I ^j—x^ e-^dx (C.4) l Vi and i]i are parameters of the Gamma pdf. This signal model is parameterized with 6 = {a ,a ,--, a _ , rf • • • ,r) ,v --- ,v ), where A = [a ] is the state transition probabilities. Note that a^'s are zero, and X l j L i ij 1 1 < i < N. We use the likelihood ratio test to determine if the Semi-Markov chain is a better model choice for the empirical trace ST- More precisely, we test the null hypothesis 'HO: Sy is generated by a Markov chain' against 'HI: ST is generated by a Semi-Markov chain'. According to equation C.3, it is necessary to parameterize the signal models such that HO represent a special case of HI. This means that we should model a Markov chain as a special case of a Semi-Markov chain with Gamma state duration densities. This is easily done by defining l2 13 N ltN U N u N i3 a = ror a n N HO: 0 ecu, to ={9\v = v = • • • = v x 2 N = 1, ] T ciij = 1,1 < i < N} i=i N Hl:0€n, = { 0 | ^2 ij = 1,1 < i < AT} (C.5) a J=I Note that LO is a subspace of fi. We should just show that conditions in HO represent a Markov chain. That is, letting v — v = • • • = v^ = 1 in the Semi-Markov chain model will result in a Markov chain. This is easily done by letting Vi — 1 in equation C.6. x = e 2 - j ( d - i ) _ -vd ( e C ) 6 Selecting an = e~ results in <f>i{d) = ct^ (l — an), which is identical to Geometrical state duration densities of Markov chains. Hence, one can find —2 ln(A) and compare it to the upper 100 x (I —a) percentile point of a \ distribution with degree of freedom to decide to reject the null hypothesis HO or not. v _1 /2 160 Note that the hypothesis testing approach presented here is applicable to testing the validity of a HMM model against a HSMM model for a given empirical trace 3^r with very minor changes. 161 Bibliography [1] J. Liebeherr and D. E. Wrege, "Traffic characterization algorithms for VBR video in multimedia networks," ACM/Springer Multimedia Systems Journal, vol. 6, no. 4, pp. 271-283, 1998. [2] ATVEF forum, "ATVEF specicifcations for enhanced TV," avaialbale from http://www.atvef.com. [3] Y . Wang, J . Osterman, and Y.Q. Zhang, Digital Video Processing and Communications, chapter 11, Scalable Video Coding, Prentice Hall, 2002. [4] W. Dapeng, Y.T. Hou, and Y.Q. Zhang, "Scalable video coding and transport over broadband wireless networks," Proceedings of the IEEE, vol. 89, no. 1, pp. 6-20, January 2001. [5] H. Yanagihara, M . Sugano, A. Yoneyama, and Y. Nakajima, ''Scalable video decoder and its application to multi-channel multicast system," IEEE Transactions on Consumer Electronics, vol. 46, no. 3, pp. 866- 871, 2000. [6] D.P. Heyman, A. Tabatabai, and T.V. Lakshman, "Statistical analysis and simulation study of video teleconference traffic in A T M networks," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 2, no. 1, pp. 49-59, March 1992. [7] O. Rose, "Simple and efficient models for variable bit rate MPEG video traffic," Performance Evaluation, vol. 30, pp. 69-85, 1997. [8] M.Devetsikiotis M.R.lsmail, I.Lambadaris and A.R.Kaye, "Simulation and modeling of variable bit rate MPEG video transmission over A T M networks," International Journal 162 of Communication Systems, vol. 9, 1996, http://www.sce.carleton.ca/bbnlab/bnlpapers.shtml. Available from [9] T.V. Lakshman D.P. Heyman, "Source models for V B R broadcast-video traffic," Proc. IEEE Infocom, pp. 664-671, June 1994. [10] H. Hughes M. Krunz, "A traffic model for MPEG-coded V B R streams," Proc. ACM Sigmetrics, pp. 47-55, May 1995. [11] A. Mukherjee A. Adas, "On resource management and qos guarantee for long range dependent traffic," Proc. IEEE Infocom, pp. 779-787, April 1995. [12] W. Willinger M . W. Garret, "Analysis, modelling and generation of self-similar VBR video traffic," Proc. ACM Sigcomm '94, pp. 269-280, August 1994. [13] J. F. Kurose, "On computing per-session performance bounds in highspeed multi hop computer networks," Proc. ACM Sigmetrics and Performance '92, pp. 128-139, June 1992. [14] H. Zhang and E. W. Knightly, "Providing end-to-end statistical performance guarantees with bounding interval dependent stochastic models," Proc. of ACM Sigmetrics, pp. 211-220, May 1994. [15] R. L. Cruz, "A calculus for network delay, part f: Network elements in isolation," IEEE Trans, on Information Theory, vol. 37, no. 1, pp. 114-131, January 1991. [16] R. L. Cruz, "A calculus for network delay, part ff: Network analysis," IEEE Trans, on Information Theory, vol. 37, no. 1, pp. 132-141, January 1991. [17] D. E. Wrege et. al., "Deterministic delay bounds for V B R video in packet switching networks: Fundamental limits and practical tradeoffs," IEEE/ACM Trans, on Networking, vol. 4, no. 3, pp. 352-362, June 1996. [18] Dallas E. Wrege, Multimedia Networks with Deterministic Quality-of- Service Guarantee, Ph.d. thesis, University of Virginia, August 1996. 163 [19] R. L. Cruz and C. M . Okino. "Service guarantees for window flow control," 34th Allerton Conf. Communication, Control, and Computing. October 1996. [20] J.Y. Le Boudec, "Application of netwrok calculus to guaranteed services network," IEEE Tran. on Information Theory, vol. 44, no. 3, pp. 1087- 1096, 1998. [21] C. S. Chang, "On deterministic traffic regulation and service guarantees: A systematic approach by filtering," IEEE Trans. On Information Theory, vol. 44, no. 3, pp. 1097 -1110, 1998. [22] ATM Forum, "ATM forum traffic management specification version 4.0," Contribution 95-0013R.il, March 1996. [23] ATM Forum, "ATM user-network user interface specification version 3.0," Prentice-Hall, 1993. [24] E. W. Knightly, "Traffic models and admission control integrated services networks," Ph.D. Thesis, University of Californium, Berkeley, May 1996. [25] E. W. Knightly and H. Zhang, "Traffic characterization and switch utilization using a deterministic bounding interval dependent traffic model," INFOCOM, vol. 3, pp. 1137-1145, 1995. [26] K. Moth L. Dittman, S. B. Jacobsen, "Flow enforcement algorithms for ATM netwroks," IEEE Journal on Selected Areas in Communications, vol. 9, no. 3, pp. 343-350, April 1991. [27] E. P. Ratgheb, "Modelling and performance comparison of policing mechanisms for A T M networks," IEEE Journal on Selected Areas in Communications, vol. 9, no. 4, pp. 325-334, April 1991. [28] M. Pancha, P.; El Zarki, "Leaky bucket access control for VBR MPEG video," INFOCOM, vol. 2, pp. 796-803, 1995. [29] F. Guillemin, C. Rosenberg, and J. Mignault, "On characterizing an ATM source via the sustainable cell rate traffic descriptor," INFOCOM, vol. 3, pp. 1129 -1136, 1995. 164 [30] D. E. Wrege and J. Liebherr, "Video traffic characterization for multimedia networks with a deterministic service," INFOCOM, pp. 537-544, 1996. [31] T. H. Cormen, Introduction to Algorithms, McGraw Hill, 2001. [32] P. E. Gill et. al., Practical Optimization, Academic Press, 1981. [33] R. Fletcher, Practical Methods of Optimization, John Wiley, 1980. [34] C. Rosenberg et. al., "New approach for traffic characterisation in ATM networks," IEEE Proc. on Comm., vol. 142, pp. 87 -90, 1995. [35] P. Thiran J.Y. Le Boudec, Network calculus : a theory of deterministic queuing systems for the Internet, Springer, 2001. [36] J.Y. Le Boudec and P. Thiran, "A short tutorial on network calculus. I. fundamental bounds in communication networks," The 2000 IEEE Int. Symposium on Circuits and Systems, vol. 4, pp. 93-96, 2000. [37] R. J. Gibben et al, "Effective bandwidth for multi-type UAS channel," Queueing systems, vol. 9, pp. 17-28, 1991. [38] F. P. Kelly, "Effective bandwidth at multi-class queues," Queueing systems, vol. 9, pp. 5-16, 1991. [39] R. Guerin et al, "Equivalent capacity and its application to bandwidth allocation in high-speed networks," IEEE J. Selected Areas in Commu- nication, vol. 9, pp. 968-981, 1991. [40] Alan Weiss, "An introduction to large deviations for communication networks," IEEE Journal on selected areas in Communications, vol. 13, no. 6, pp. 938-952, 1995. [41] C.S. Chang, Performance Quarantees in Communication Networks, Springer-Verlag, 2000. [42] N.G. Duffield, "A large deviation analysis of errors in measurement based admission control to buffered and bufferless resources," Queueing Systems, vol. 34, no. 1, pp. 131-168, January 2000. 165 [43] Jean C. Walrand Gustavo de Veciana, George Kesidis, "Resource management in wide-area ATM networks using effective bandwiths," IEEE Journal on Selected Areas in Communications, vol. 13, no. 6, pp. 1081- 1090, 1995. [44] N.G. Duffield, J.T. Lewis, N . O'Connell, R. Russell, and F. Toomey, "Entropy of ATM traffic streams: a tool for estimating qos parameters," IEEE Journal on Selected Areas in Communications, vol. 13, no. 6, pp. 981-990, August 1995. [45] P. Rabinovitch, "Statistical estimation of effective bandwidth," M.S. thesis, Carleton University, April 2000. [46] S. Tartarelli, M . Falkner, M . Devetsikiotis, I. Lambadaris, and S. Giordano, "Empirical effective bandwidths," Global Telecommunications Conference, 2000. GLOBECOM '00, vol. 1, no. 27, pp. 672-678, De- cember 2000. [47] Ff. R. Kunsch, "The jacknife and the bootstrap for general stationary observations," The annals of Statistics, vol. 17, pp. 1217-1241, 1989. [48] Raoul LePage and Lynne Billard, Eds., Exploring the Limits of Bootstrap, John Wiley and sons, April 1992. [49] G.A. Young, "Bootstarb: More than a stab in the dark?," Statistical Science, vol. 9, pp. 382-415, 1994. [50] N.G. Duffield; J.T. Lewis; N. O'Connell, "The entropy of an arrival process: A tool for estimating QOS parameters of ATM traffic," Proceedings of 11th IEE UK Teletraffic Symposium, March 1994. [51] N.G. Duffield, "A large deviation analysis of errors in measurement based admission control to buffered and bufferless resources," AT&T research lab report, 1998. [52] C. Courcoubetis; G. Kesidis; A. Ridder; J. Walrand; R. Weber, "Admission control and routing in ATM networks using inferences from measured buffer occupancy," IEEE Trans, on Communication, vol. 43, no. 2/3/4, pp. 1778-1784, April 1995. 166 [53] A . Adas, "Traffic models in broadband networks," IEEE Communications Magazine, vol. 35, no. 7, pp. 82-89, July 1997. [54] D. L. Jagerman and B. Melamed, "The transition and autocorrelation structure of TES processes, part I: General theory," Stochastic Models, vol. 8, no. 2, pp. 193-219, 1992. [55] D. L. Jagerman and B. Melamed, "The transition and autocorrelation structure of TES processes, part II: Special cases," Stochastic Models, vol. 8, no. 3, pp. 499-527, 1992. [56] B. Melamed, D. Raychaudhuri, B. Sengupta, and J. Zdepski, "TESbased traffic modeling for performance evaluation of integrated networks," INFOCOM '92. Eleventh Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 1, pp. 75-84, 1992. [57] B. Melamed and D. E. Pendarakis, "Modeling full-length VBR video using markov-renewal-modulated tes models," IEEE Journal on Selected Areas in Communications, vol. 16, no. 5, pp. 600-611, June 1998. [58] Chang Bum Lee, Kyeong Bong Ha, and Rae-Hong Park;, "Computation of effective bandwidth of aggregated vbr mpeg video traffic in atm networks using the modified equivalent capacity," IEEE International Conference on Communications, ICC 96, Conference Record, Converging Technologies for Tomorrow's Applications, vol. 2, no. 2, pp. 627-631, June 1996. [59] N. Ansari, Y.Q. Hai Liu Shi, and Hong Zhao, "On modeling mpeg video traffics," IEEE Transactions on Broadcasting, vol. 48, no. 4, pp. 337 -347, December 2002. [60] J.C. Cano and P. Manzoni, "On the use and calculation of the hurst parameter with mpeg videos data traffic," Proceedings of the 26th Euromicro Conference, vol. 1, pp. 448-455, September 2000. [61] R. Narasimha and R.M. Rao, "Discrete-time self-similar systems and stable distributions: applications to vbr video modeling," IEEE Signal Processing Letters, vol. 10, no. 3, pp. 65-68, March 2003. 167 [62] B.N. Bashforth and C L . Williamson. "Statistical multiplexing of selfsimilar video streams: simulation study and performance results," Proceedings of the Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 119-126, July 1998. [63] O. Lazaro, D. Girma, and J. Dunlop, "Statistical analysis and evaluation of modelling techniques for self-similar video source traffic," The 11th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, vol. 2, pp. 1540 -1544, September 2000. [64] V.S. Frost and B. Melamed, "Traffic modeling for telecommunications networks," IEEE Communications Magazine, vol. 32, no. 3, pp. 70-81, March 1994. [65] Z. Liu, J. Huang, and Y. Wang, "Classification T V programs based on audio information using hidden markov model," IEEE Second Workshop on Multimedia Signal Processing, pp. 27-32, December 1998. [66] J. Huang, Z. Liu, Y. Wang, Y. Chen, and E.K. Wong, '•'Integration of multimodal features for video scene classification based on HMM," IEEE 3rd Workshop on Multimedia Signal Processing, pp. 53-58, September 1999. [67] D.L. Mclaren and D.T. Nguyen, "An HMM-based simulation model for the production of coded image data," International Conference on Image Processing and its Applications, pp. 93-96, April 1992. [68] J. Huang, Z. Liu, and Y . Wang, "Joint video scene segmentation and classification based on hidden markov model," IEEE International Conference on Multimedia and Expo, ICME 2000, vol. 3. [69] L. Chaisorn, C T . Seng, and C.H. Lee, "The segmentation of news video into story units," IEEE International Expo, ICME 2002, vol. 1. Conference on Multimedia and [70] P. Morguet and M. Lang, "An integral stochastic approach to image sequence segmentation and classification," IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1998, vol. 5. 168 [71] R.J. Elliott, L. Aggoun, and J.B. Moore, Hidden Markov Models: Esti- mation and Control, Springer-Verlag,New York, 1995. [72] L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 4, pp. 257-286, February 1989. [73] A. A. Lazar, G. Pacifici, and D. E. Pendarakis, "Modeling of video sources for real time scheduling," Multimedia Systems, vol. 2, no. 6, pp. 253-266, 1994. [74] M . J. Russel and R. K. Moore, "Explicit modelling of state occupancy in hidden markov models for automatic speech recognition," IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5-8, March 1985. [75] B. Sin and J. H. Kim, "Nonstationary hidden markov model," Signal Processing, vol. 46, pp. 31-46, 1995. [76] L. H. Jamieson and C. D. Mitchell, "Modelling duration in a hidden markov model with the exponential family," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 331 -334, 1993. [77] S. E. Levinson, "Continuously variable duration hidden markov models for automatic speech recognition," Computer Speech and Language, vol. 1, pp. 29-45, 1986. [78] V. Krishnamurthy, J. B. Moore, and S. H. Chung, "Hidden fractal model signal processing," Signal Processing, vol. 24, no. 2, pp. 177-192, 1991. [79] S. V. Vaseghi, "State duration modelling in hidden markov models," Signal Processing, vol. 41, no. 1, pp. 31-41, 1995. [80] E. Weinstein, M. Feder, and A. V. Oppenheim, "Sequential algorithms for parameter estimation based on the kullback-leibler information measure," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 9, pp. 1652-1654, September 1990. 169 [81] U. Hoist and G. Lindgren, "Recursive estimation in mixture models with markov regime," IEEE Trans. Inform. Theory, vol. 37, pp. 1683-1690, 1991. [82] I. B. Collings, V. Krishnamurthy, and J. B. Moore, "Online identification of hidden markov models via recursive prediction error techniques," IEEE Transactions on Signal Processing, vol. 42, pp. 3535-3539, 1994. [83] I. B. Collings and T. Ryden, "A new maximum likelihood gradient algorithm for online hidden markov model identification," IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2261-2264", May 1998. [84] L. Ljung and T. Soderstrom, Theory and Practice of Recursive Identifi- cation, MIT Press, Cambridge, 1983. [85] R. S. Burington, Handbook of probability and statistics with tables, McGraw-Hill, 1970. [86] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. [87] M . Abramowitz and I. Stegun, Handbook Of Mathematical with Formulas, Graphs, and Mathematical Tables, 1964. Functions, [88] G. E. Andrews, R. Askey, and R. Roy, Special Functions, Cambridge University Press, 1999. [89] L. Ljung, System Identification: Theory For The User, Prentice Hall PTR, 1999. [90] B. T. Polyak, "New method of stochastic approximation type," Automat. Remote Control, vol. 51, pp. 937-946, 1990. [91] D. Ruppert,. "Stochastic approximation," in Handbook in Sequential Analysis, B. K. Ghosh and P. K. Sen, Eds., pp. 503-529, New York: Marcel Dekker 1991. [92] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge, 1987. 170 [93] Advanced Television Systems Committee, "ATSC digital television standard A-53, ATSC digital television standard," 2001. [94] Digital Video Broadcasting Project, "DVB digital television standard," [95] R. S. Chernock, Data broadcasting: Understanding the ATSC Data, Broadcast Standard, McGraw-Hill, New York, 2001. [96] H. Benoit, Digital television : MPEG-1, MPEG-2, and principles of the DVB system, J. Wiley and Sons, 1997. [97] Advanced Television Systems Committee, "ATSC digital television standard A-65, program and system information protocol for terrestrial broadcast and cable," 2000. [98] M . Evans, N. Hastings, and B. Peacock, Statistical Distributions, .NewYork: John Wiley, 2000. [99] V. Barnett, "Probability plotting methods and order statistics," Applied Statistics, , no. 24, pp. 95-108, 1975. [100] V. Barnett, "Quantile-quantile plot," NIST/SEMATECH of Statistical Methods. 171 e-Handbook
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Data transmission schemes for a new generation of interactive...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Data transmission schemes for a new generation of interactive digital television Azimi, Mehran 2004
pdf
Page Metadata
Item Metadata
Title | Data transmission schemes for a new generation of interactive digital television |
Creator |
Azimi, Mehran |
Date Issued | 2004 |
Description | Interactive television (ITV) is an attractive technology, which changes the way TV viewers experience home entertainment. In this thesis, we design an interactive television system, which truly gives TV viewers freedom to control and individualize the presentation of TV program content. In this context, we present methods that add extra video and audio streams (called incidental streams) containing interactive content, to the transmission line of a digital TV system. The addition of these extra streams does not result in increasing the transmission bandwidth, nor in degrading the picture or sound quality of the main TV program content. Our design consists of two major transmission mechanisms for transmitting the incidental data, called deterministic and stochastic service classes. The deterministic service class is designed such that no incidental stream data packet is lost during transmission. On the other hand, the stochastic service class is designed such that some incidental data loss is possible; however, the data loss rate is bounded. We present a strategy based on scalable video coding, which in conjunction with the deterministic and stochastic service classes, achieves the best possible picture and sound quality for the incidental streams under the constraint of available bandwidth. We also present data transmission methods for the deterministic and the stochastic service classes. In the context of the deterministic service class, we employ a deterministic traffic model for modelling the traffic of main streams, and then design a data transmission scheme based on the 'Network Calculus' theory. In the context of the stochastic service class, we employ Hidden Semi- Markov Models (HSMM) for modelling the traffic of main streams. We then design a data transmission scheme based on the 'Effective Bandwidth' theory. Furthermore, we design a data multiplexing system for the transmitter head-end of the proposed interactive TV system. This includes a novel scheduling algorithm for controlling the multiplexing of incidental and main streams data packets. We present numerical results of simulation experiments, which support the theoretical methods presented in this thesis. |
Extent | 8435153 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-12-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0091848 |
URI | http://hdl.handle.net/2429/16167 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2004-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2004-901459.pdf [ 8.04MB ]
- Metadata
- JSON: 831-1.0091848.json
- JSON-LD: 831-1.0091848-ld.json
- RDF/XML (Pretty): 831-1.0091848-rdf.xml
- RDF/JSON: 831-1.0091848-rdf.json
- Turtle: 831-1.0091848-turtle.txt
- N-Triples: 831-1.0091848-rdf-ntriples.txt
- Original Record: 831-1.0091848-source.json
- Full Text
- 831-1.0091848-fulltext.txt
- Citation
- 831-1.0091848.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0091848/manifest