UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design, implementation and evaluation of a variable bit-rate continuous media file server Makaroff, Dwight J.

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


831-ubc_1998-345866.pdf [ 11.73MB ]
JSON: 831-1.0051307.json
JSON-LD: 831-1.0051307-ld.json
RDF/XML (Pretty): 831-1.0051307-rdf.xml
RDF/JSON: 831-1.0051307-rdf.json
Turtle: 831-1.0051307-turtle.txt
N-Triples: 831-1.0051307-rdf-ntriples.txt
Original Record: 831-1.0051307-source.json
Full Text

Full Text

Design, Implementation and Evaluation of a Variable Bit-Rate Continuous Media File Server by Dwigh t J . Makaroff M . Sc., Universi ty of Saskatchewan, 1988 A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Doctor of Philosophy in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Computer Science) we accept this thesis as conforming ^ to the required standard The University of British Columbia September 1998 © Dwight Makaroff, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Co^l/TEfl Scig^JCE The University of British Columbia Vancouver, Canada Date £epfemW3>Q, 1 ^ 9 8 DE-6 (2/88) Abst rac t A Continuous Media File Server (CMFS) is a computer system that stores and retrieves data that is intended to be presented to a client application contin-uously over time. The primary examples of this kind of data are audio and video, although any other type of time-dependent media can be included (closed-caption text, presentation slides, etc). The presentation of these media types must be per-formed in real-time and with a low latency for user satisfaction. This dissertation describes the design, implementation and performance analysis of a file server for variable-bit-rate (VBR) continuous media. A C M F S has been implemented:"on a variety of hardware platforms and tested within a high-speed network environment. The server is designed to be used in a heterogeneous environment and is linearly scalable. A significant aspect of the design of the system is the detailed consideration of the variable bit-rate profile of each data stream in performing admission control for the disk and for the network. The.disk admission control algorithm simulates reading data blocks early and storing them in memory buffers at the server, achieving read-ahead and smoothing out peaks in the bandwidth requirements of individual streams. The network algorithm attempts to send data early and reserves bandwidth only for the time that it is required. The algorithms are sensitive to the variability in the bandwidth requirements, but can provide system utilization that approaches 100% of the disk bandwidth achievable for medium length video streams in the test hardware environment. ii Contents Abstract ii Contents m List of Tables ; vii List of Figures ix Acknowledgements xi Dedication xii 1 Introduction 1 1.1 Motivation 1 1.2 Current Issues in C M F S Design 4 1.3 Thesis Statement • • • • 9 1.4 Research Contributions . . . . . . . . . 11 1.5 Thesis Organization and Summary of Results 12 2 System Model 15 2.1 Overall System Design Model 16 2.2 Design Objectives 19 2.3 Admission Control 27 2.4 System Architecture 30 iii 2.5 Data Delivery 31 2.6 Design Limitations 33 2.7 Stream Characterization 34 3 System Design and Implementation 42 3.1 System Initialization and Configuration 43 3.2 User Interface to C M F S Facilities 45 3.3 Slot Size Implications 55 3.4 Data Delivery and Flow Control 57 3.5 Real-Time Writing 59 3.6 Implementation 60 3.6.1 Environment and Calibration 60 3.6.2 Implementation Environment 63 3.6.3 Transport Protocol Implementations 63 3.6.4 Server Memory Requirements 66 3.7 Example Client Application 68 4 Disk Admission Control 72 4.1 Admission Control Algorithm Design 73 "4.1.1 ' Experimental Setup and System Measurements 74 4.1.2 Simple Maximum 78 4.1.3 Instantaneous Maximum 79 4.1.4 Average 82 4.1.5 vbrSim 84 4.1.6 Optimal Algorithm 94 4.2 Analytical Evaluation of Disk Performance 94 4.3 Disk Admission Algorithm Execution Performance . . . . 97 4.4 Performance Experiments 101 4.4.1 Scenario Descriptions 103 iv 4.4.2 Simple Maximum 104 . 4.4.3 Instantaneous Maximum 104 4.4.4 Average 106 4.4.5 vbrSim 108 4.4.6 Summary 133 4.5 Full-length streams 134 4.6 Analytical Extrapolation 138 5 Network Admission Control and Transmission 143 5.1 Server Transmission Mechanism and Flow Control 144 5.2 Network Admission Algorithm Design 147-5.3 Network Bandwidth Schedule Creation 153 5.4 Network Bandwidth Schedule Creation Performance 160 • 5:5 . Stream Variability Effects : 164;" • 5.6 Network Slot Granularity 167 • . 5.7 Network Admission and Scalability 173 • 6 Related Work 175 6.1 Complete System Models 176 6.2 Synchronization of Media Streams 180 6.3 User Interaction Models 182 6.4 Scalability 184 6.5 Real-Time Scheduling/Guarantees 186 ' 6.6 Encoding Format 189' 6.7 Data Layout Issues . 1 9 0 ' 6.8 Disk Admission Algorithms 193 6.9 Network Admission Control Algorithms 196 6.10 Network Transmission 198 6.11 Summary 200 v 7 Conclusions and Future Work 203 7.1 Conclusions 203 7.2 Future Work . . 207 7.2.1 • Long Streams 207 7.2.2 Disk and Network Configurations . 208 7.2.3 Relaxing the value of minRead 208 7.2.4 Variants of the Average Algor i thm . 209 7.2.5 Reordering Requests 209 Bibliography 210 Appendix A CMFS Application Programmer's Interface 219 A . l Object Manipu la t ion 220 A . l . l C M F S Storage A P I . . 220 A .1 .2 M o v i n g and Deleting . 221 A . 2 Stream Delivery and Connection Management 222 A.2 .1 Stream Con t ro l 225 A . 3 M e t a D a t a Management • • 228 A . 4 Directory Service 229 A . 5 Miscellaneous . 229 A . 5.1 Conversions and Stream Display Information . 229 Appendix B Stream Scenarios 231 B . l Stream Groupings 231 B.2 Scenario Selection 232 B . 2.1 A l g o r i t h m Comparison 232 B.2 .2 A l l Remaining Comparisons 241 vi List of Tables 2.1 Stream Characteristics 39 2.2 Stream Sources 40 3.1 Cmfs Interface Procedures '. .• 46 4.1 Block Schedule Creat ion Timings (msec) 98 4.2 Admiss ion Cont ro l T imings (msec) 99 4.3 Stream Groupings 110 4.4 Selection of Stream Scenarios I l l 4.5 Short Streams - Admiss ion Results - Staggered Arr iva ls 130 4.6 Long Streams - Admiss ion Results - Staggered Arr iva ls 131 5.1 Network Bandwid th Characterizat ion Summary : . . 161 5.2 Network Admiss ion Performance: Simultaneous Arr iva ls (% of Net-work) 165 5.3 Network Admiss ion Performance: Staggered Arr iva ls (% of Network) 165 5.4 Network Admiss ion Performance: Simultaneous Arr iva ls (% of Net-work) 167 5.5 Network Admiss ion Performance: Staggered Arr iva ls (% of Network) 168 5.6 Network Bandwid th Schedule Summary for Different Slot Lengths . 170 5.7 Network Admiss ion Granular i ty : Simultaneous Arr iva ls (% of Network) 171 5.8 Network Admiss ion Granular i ty : Staggered Arr iva ls (% of Network) 172 vi i 6.1 Research Summary 202 B . l Disk Admission Cont ro l Stream Groupings 231 B.2 Network Admission Cont ro l Stream Groupings - M I X E D 232 B.3 Network Admiss ion Cont ro l Stream Groupings - L O W 233 B.4 Network Admission Cont ro l Stream Groupings - H I G H 233 B.5 Stream Selection into Scenarios (First Tests) 240 B.6 Stream Selection into Scenarios (Remaining Tests) 245 B .7 Selection of E x t r a Scenarios: F i r s t 22 streams 246 B.8 Selection of E x t r a Scenarios: Last 22 streams 247 vi i i List of Figures 2.1 Communica t ion M o d e l • 16 2.2 Organizat ion of System 30 3.1 Software Structure of Server Node 44 3.2 Prepare Timings 53 3.3 F i rs t Read Operation 54 4.1 T y p i c a l Stream Block Schedule 76 4.2 Stream Block Schedule (Entire Object) 77 4.3 Server Schedule Dur ing Admiss ion '79 4.4 Server Block Schedule - 8 1 4.5 Example of Server Schedule and Buffer Al loca t ion Vectors 87 4.6 Admissions Cont ro l Algor i thm 88 4.7 Modif ied Admissions Cont ro l A lgo r i t hm 91 4.8 Buffer Reclamation 93 4.9 Streams Accepted by Admiss ion Algor i thms 95 4.10 Simultaneous Requests - Invalid Scenario 96 4.11 Acceptance Rate - Simple M a x i m u m 105 4.12 Acceptance Rate - Instantaneous M a x i m u m 106 4.13 Acceptance Rate - Average 107 4.14 Acceptance Rate - vbrSim 109 4.15 A lgo r i t hm Performance Comparison - Simultaneous Arr iva ls 112 ix 4.16 Algo r i t hm Performance Comparison - 5 second stagger 113 4.17 A l g o r i t h m Performance Comparison - 10 second stagger 114 4.18 Stream Var iabi l i ty : Acceptance Rates for Simultaneous Arr iva ls . . . 115 4.19 Stream Var iabi l i ty : Acceptance Rates for Stagger = 5 Seconds . . . 117 4.20 Stream Var iabi l i ty : Acceptance Rates for Stagger = 10 Seconds . . . 118 4.21 Observed Disk Performance: Stagger = 10 Seconds 119 > 4.22 Buffer Space Analys is Technique 121 4.23 Buffer Space Requirements: Simultaneous Arr iva ls 122 4.24 Buffer Space Requirements: Stagger = 5 seconds 124 4.25 Buffer Space Requirements: Stagger = 10 seconds 125 4.26 Short Stream Scenario 135 4.27 Looped Scenario 136 4.28 Short Stream Excerpt 137 5.1 Network Admissions Cont ro l A lgo r i t hm 152 5.2 Network Bandwid th Schedule - Original ( M i n i m u m Client Buffer Space) 156 ' 5.3 Network Bandwid th Schedule - Smoothed ( M i n i m u m Client Buffer Space) 158 5.4 Simultaneous Arr iva ls : Network Admiss ion 163 5.5 Staggered .Arrivals: Network Admiss ion 164 5.6 Network Slot Characterizat ion 169 x Acknowledgements There are many people who deserve thanks and credit for contr ibut ing to this part of my life. Thei r roles are many and varied, and I 'm sure to miss someone. I'd like to thank the remainder of the C M F S design team, most notably my supervisor, D r . Gerald Neufeld, and Dr.- Norman Hutchinson, who clarified both the big picture and the several small pictures of the design issues and the mechanisms for implementing these within the C M F S . A long with their own contributions in this area, the support of D a v i d Finkelstein and Roland Mechler with respect to the implementation of network-level and client-level code was invaluable. The technical management and assistance of M a r k McCutcheon was also greatly appreciated. The personal encouragement, support and steadying influence-of many friends has helped me perserve. In no particular order, I'd like to mention a number of them: Peggy Logan, Margaret Petrus, Chr i s t ina Chan , Kr i s t i n Janz, B i l l Thomlinson, A l -istair Veitch, Peter Smi th , Tarik Ono-Tesfaye, E l i s a Baniassad, Chris toph K e r n , and Hol ly Mi tche l l . I'd also like to acknowledge the particular editing skills of Nigel Todd, Kr i s t i n Janz, Peter Lawrance, Sharon M c L e i s h , and Sarah Walker . Final ly , I'd like to thank my family, most notably my mother and my sister, for encouraging and believing in me through this project. D W I G H T J . M A K A R O F F The University of British Columbia September 1998 xi In memory of my father, Alber t James Makaroff. xu C h a p t e r 1 Introduct ion 1.1 Motivation Within the last 100 years, communication technology has undergone several signif-icant revolutions. Beginning with the telegraph and telephone, ideas expressed in text and audio could be transmitted almost instantaneously across vast distances. In subsequent years, radio and television technology has enabled instantaneous mass electronic communication. The mode of this communication has traditionally been broadcast, where one signal sent on a particular frequency is received by many listeners or viewers. The content of the data has therefore been controlled by orga-nizations with the transmission capability. Although the technology necessary to communicate point-to-point using video has existed for several years, it has only been widely used in very specific, high-end systems, such as video conferencing for large corporations and government organi-zations. One of the difficulties related to point-to-point video communication has been the high bandwidths and other associated costs related to the transmission of analog signals. This produced a broadcast system where the receivers have very little direct control over what content they receive and when they receive it. Another technology enabling the long-distance sharing of ideas expressed in audio and/or video is the recording of analog signals for later playback, which began 1 with the phonograph and continued with tape and disk recording devices for both audio and video. These devices enable a user to control when s/l ie views or listens to this data, but the choice of content is st i l l l imited by the size of a video/audio library. Concurrent with these developments in continuous media (defined as media ' which must be presented continuously over time) has been the use of computer systems for data communications and high-quality graphics display. High-speed networks and efficient compression techniques have permitted da ta to be trans-ferred at extremely high rates over long distances. It has become feasible to trans-mit continuous media in a digital format between computer systems. A s well, the processing power of today's computers and the development of specialized video en-coder/decoder cards permits the conversion of digital audio/video da ta into pictures and sound for a human user in real-time. Software has been developed for presentation of continuous media by com-puter systems from local storage (magnetic disk or C D - R O M ) in the form of Ency-clopedia C D - R O M s and high-resolution video games. The content is st i l l l imited by the size of an individual 's or organization's media library. These concepts were merged by the development of continuous media server technology. W i t h a specialized server (known as a C M F S , for Continuous M e d i a F i le Server), a service provider with large resources for storing a wide variety of media content can provide access to this data to clients in an individualized manner or on demand. This media content is stored on the server as presentation objects. A presentation object is a video cl ip, feature-length movie, an audio sound-track, closed-captioned text or any time-sensitive data, i.e. any continuous media object that is stored in the C M F S as a separate entity. The terms audio object and video object, etc. are used to denote presentation objects of a specific media type. The infrastructure for this technology has been provided by independent technologies and much interest has been generated by the potential for a low-cost ap-2 proach to demand-driven delivery of continuous media data. T w o of these technolo-gies previously mentioned are high-speed networks and video compression. W i t h o u t compression, the bandwidth of television-quality video is enormous. Lossy compres-sion techniques have been developed which can reduce the amount of data necessary to store a video by factors of up to 100 to 1, wi th acceptable levels of degradation in picture quality. Similar techniques have been used with audio. Even with com-pression, transmission of a single video object in real time consumes a significant portion of the capacity of first generation local area networks. Thus, for a system to be capable of supporting a large number of simultaneous users, bandwidths in • excess of 100 Megabits per second (Mbps) is required by the server. In this re-stricted environment, there are s t i l l significant technical problems to be overcome in the design and implementation of a C M F S . • Even with Moore ' s law in effrect for several decades, which brought the computer industry ever increasing'processing speeds at steadily decreasing costs, handling continuous-media by computers has remained a formidable task. The reason for this is that the basic design of computers and their operating systems has: always favoured the efficient use of resources which essentially has been achieved by dynamic allocation of C P U cycles, memory and input /output capabilities to processes. Th i s does not mean that designers do not know how to design hard real-time systems, but they must resort to static-allocation of cri t ical resources, if deadlines impose to processes have to be met. The applications of C M F S technology are wide ranging. Video-On-Demand [70, 75, 89] has been an application whose time may never come, but st i l l remains interesting from a technical point of view. The abili ty to provide consumer-oriented, mainstream entertainment such as movies, concerts and sporting events retains the allure of convenience and flexibility for content providers and customers alike. A l -though previous experiments in Video-On-Demand trials have been spectacularly unsuccessful for a variety of reasons, research continues. Perhaps a more promising form of on-demand continuous media technology is News-On-Demand where short audio/video objects can be combined to form a "video newspaper" environment tailored to an individual's preferences. Specialized servers for distance education or corporate training can be deployed on a smaller scale with a narrower choice of content and a smaller user population. A l l of these technologies enable ideas, news, and entertainment to be made available to a com-munity in ways that are more sensitive to the heeds of that community. This dissertation investigates the technical challenges of developing a C M F S in the context of a high-speed network. Such a server must be capable of deliver-ing multiple high-quality audio and video objects simultaneously to multiple client applications in an efficient and flexible manner. The delivery of live video and/or audio objects is outside the scope of this dissertation as there is no storing or re-trieval required in such a system. Many of the concepts can be modified or extended to,be applicable to live presentations. • 1.2 Current Issues in CMFS Design A C M F S is a computer system that stores and retrieves data that is intended to be presented to a client application continuously over time. The primary examples of this kind of data are audio and video, although any other type of time-dependent media can be included (closed-caption text, presentation slides, etc). The presen-tation of these media types must be performed in real-time and with a low latency for user satisfaction. Such a file server shares many characteristics with traditional file servers, but has added requirements which are unique to continuous media systems. A l l file servers need file access primitives to allow the creation of files and the storing of information, as well as for reading information out of these files. A C M F S also requires primitives to allow access via virtual-VCR functions (i.e. play, record, stop, fast-forward, slow-scan, and rewind), which are more natural methods for interacting with this type of media. A file server is implemented in. the context of a computer network with mul-tiple "client" machines that request data from the server. Appropr ia te network protocols must be provided to ensure the proper delivery of request messages and raw data. In a continuous media environment; request messages must be delivered reliably and with very little delay. A n efficient protocol which ensures correctness of da ta is required. In contrast, the delivery of continuous media data has different performance priorities than the request messages. Retrieval for playback/presentation to a user must be. performed in real-time in the,presence of deadlines. In this case missing or corrupted data can be tolerated more easily than large latencies. The transmission of continuous media for storage at the-server has large bandwidth requirements and real-time performance is desirable, but the media data must be delivered reliably, so corruption .or loss cannot be tolerated. There have been several research and commercial applications of continuous media technology employed in recent years. The continuous media servers have used parallel computing technology to create systems which retrieve large amounts of video data in relatively straightforward manners. These systems do not neces-sarily incorporate intelligent reservation mechanisms or attempt to maximize the performance and/or uti l ization level of scarce system resources such as disk and • network bandwidth. T w o methods of delivering continuous media are possible: store-and-display and streaming. Store-and-display requires that the complete object be copied to the client machine before display, whereas streaming allows presentation to begin as soon as enough data has been received so that the continuity of the presentation wi l l not be disrupted due to a lack of data during the remainder of the presentation. The former method may be suitable when the media objects are relatively small or the network bandwidth is inadequate for streaming, but the latter is preferable because 5 the delay between the time the user requests the object and the time the object can be presented is minimal (1 to 2 seconds). W i t h store and display, this delay can be minutes for a stream which is only minutes, in playback durat ion. Because data is transferred sequentially in real-time; the term, "stream" is often used to identify continuous media objects. The size of video data in.uncompressed form is extremely large. A t a resolu-tion comparable to current Nor th Amer ican ( N T S C ) television standards (640x480 colour pixels at 20 bi ts /pixel) , a single image contains 6,144,000 bits. One second of full-motion video (30 frames per second) thus requires 184,320^000 bits. Computer networks are incapable of t ransmit t ing d i g i t a l d a t a at that rate for more than one object simultaneously. Addi t ional ly , the amount of storage for video objects of, mod-erate length cannot be provided on conventional storage media. Thus , compression of the data is required. One of two, ;approaches is taken in all existing compression techniques: constant bit-rate ( C B R ) encoding with varying quality (when compared with the original signal), or variable bit-rate ( V B R ) with consistent quality. - C B R streams are easy for computer systems; The bandwidth required from the disk sys-tem and the network can be simply calculated and allocated, but user satisfaction is compromised. The benefits of consistent quality V B R streams are obvious from a human user's point of view, but variable bit-rates introduce complexities for the computer, system that delivers and displays the data. The algorithms for disk re-trieval and network data transmission must either make conservative estimates of resource usage, or be made explicit ly aware of the t ime-varying bandwidth needs of each object. The combination of the large size of continuous media objects and the ca-pacity l imitat ions of conventional disk technology restrict the amount of video data that can be stored on a single disk. Even with current compression technology, television-quality, full-motion video consumes approximately 3 to 7 Megabits /sec, depending on the complexity of the images. A 9 G B y t e disk can thus store between 6 10,500 and 24,000 seconds of video (less than 7 hours). The bandwidth that can be achieved off a single disk is less than 40 M b p s , which limits the number of indepen-dent users to between 6 and 12.' F rom these numbers, it can be seen that a C M F S for a reasonably-sized user population with a moderate l ibrary of video objects must comprise multiple server components. A scalable server design permits the capacity . of the server to increase by adding components. Th is is only possible when the mode l incorporates such component integration. The most common mode of interaction with a C M F S is the transfer of very : large volumes of video/audio data in a time-dependent continuous flow from the server to the client. In order to guarantee the continuity of the flow of data, the allocation of network resources such as bandwidth must be guaranteed. The avail-ability of other resources at the server, such as processor cycles, R A M , and disk bandwidth must also be guaranteed to properly service the client. The understand-i n g of how these resources are guaranteed has been a fruitful area of research for many years. The process of. reserving bandwidth at the server is known as admission-control. Bandwid th must be reserved for both, the disk and the network to ensure that resources exist to deliver the data to the client application in time for presen-tation to the user; A n estimate of the bandwidth requirements is required for every stream and these requests are summed in some manner and presented to the ad-mission control a lgori thm. If a new request results in fewer resources required than available, the request is accepted; otherwise it is denied. The choice of admission control algorithms at the server greatly influences the system's abili ty to maximize performance as measured by the cumulative bandwidth of accepted streams. Thus , provision of efficient and accurate admission control algorithms is one of the most significant problems to solve in C M F S design. Methods that provide deterministic guarantees wherever possible result in conservative usage of server resources, but the results of this dissertation show that a system designed with deterministic guar-7 antees and variabili ty of bit-rates in mind is capable of ut i l izing the resources in a near-optimal fashion. M u c h research into admission control methods has been done in recent years, but most of the work provides results v i a simulation and has not been integrated-into real systems. A C M F S should also be able to accommodate media of many encoding types (including any defined in the future), be capable of running in different software and hardware environments, and be able to handle varying da ta rates of objects in order to efficiently use system resources. A s a result, an abstract design is implemented in this dissertation which has l imited dependence on particular hardware performance characteristics. In this way, the design can remain constant when increases in disk access speed and improvements in compression techniques are encountered in the future. One of the characteristics of continuous media that helps alleviate the prob-lems of variabili ty is that some loss of data can be tolerated by an application without-degrading the presentation enough to be detectable by a human user. One of the most compelling design decisions/dilemmas is .how to l imit the effect of data loss on the human user. The most conservative solution is to reserve bandwidth at both the disk interface and the network interface so that transient:overload or other resource contention is not possible. Simplist ic implementations of this policy under-utilize the system resources because they must account for peaks in the requirements of the data streams. Even this conservative reservation does not absolutely guarantee delivery to the client application, as there are various network hardware components (routers and switches) between the server and client machines which could drop or delay packets for reasons beyond the control of the server. The analysis of com-pressed V B R video transmission over lossy networks has been covered extensively [12, 26, 27, 34, 35, 59], and is beyond the scope of this dissertation. One of the benefits to having C M F S technology is for the user to be able to request the media content on an individualized basis. Th i s should extend to the abili ty to only request portions of a stream, to view in fast-motion or slow-motion in either forward or reverse. A s well, the choice of audio accompaniment should be available so that multiple languages can be supported. Th i s requires flexible user interface primitives that independently access portions of video and audio objects. One method of providing fast-motion is to skip the reading and transmission of some of the object to provide the illusion of fast motion, significantly increasing the speed • of playback without significantly increasing the bandwidth usage (i.e. delivering every other video frame, or every other second of video footage). 1.3 Thesis Statement The development of continuous media file server technology has made straightfor-ward use of high-performance computer components and sophisticated methods for storage and retrieval of media data possible. These techniques have most often been implemented in isolation from each other in the sense that simple procedures have, been used in systems with large, parallel delivery mechanisms, whereas the elegant techniques for maximizing performance of particular components have been eval-uated primari ly v ia simulation only. A complete model and implementation of a C M F S can greatly improve the understanding of realistic performance and design issues in the use of continuous media in a distributed environment. Therefore, the thesis of this dissertation is: A n efficient, scalable, real-time Continuous M e d i a F i le Server ( C M F S ) can be implemented that is based on an abstract model of the disk and network performance characteristics, which explicit ly considers the vari-able bit-rate nature of compressed continuous media, accommodates het-erogeneous data format syntax and permits arbitrary client synchroniza-tion of multiple streams, using heterogeneous hardware components and a parallel, distributed software architecture. The model upon which the system is based includes all aspects of system design from a flexible client interface to admission control algorithms and resource 9 management of both the disk and.the network. A n efficient server provides, admissions decisions wi th a minimal amount of overhead (in terms of execution time for the decision) and maximizes the use of phys-ical resources, such as disk .bandwidth; server and client buffer space, and network bandwidth, subject to the constraint that promised performance is guaranteed. A scalable system has the capabili ty to add components and increase perfor-mance in a linear fashion as measured by the number of simultaneous users and the cumulative data transfer rate. A system is real-time if it can guarantee delivery of continuous media to a client workstation such that each client can maintain the continuity of presentation throughout the entire play back durat ion. A n abstract model of the performance of the primary system resources (disk and network bandwidth) is one that is free from detailed dependence on hardware characteristics. For the disk, this means precise data layout and knowledge of the mechanical characteristics are not incorporated into the model, but subsumed by summary performance measures. Likewise, for the network, details of the transport layer are hidden by a simple performance metric. A heterogeneous system permits differences in data format, as well as the software platforms and hardware platforms upon which the system can be installed. Variable bit-rate data (data rate heterogeneity) is accommodated by acknowledging that the sizes of presentation units of continuous media (video frames/audio sam-ples) may vary considerably both wi th in a short amount of time and on larger time scales, and by explicit ly incorporating this variabili ty into the resource allocation (admission) and resource uti l izat ion (retrieval and transmission) strategies. Various standards exist for encoding digital audio and video. A general pur-pose server is capable of storing different data formats (data format heterogeneity) on the same storage devices. Ideally, a server would be composed of commercially available workstation computer hardware to reduce the incremental costs associated 10 with this new functionality. It would also be capable of being configured and in-stalled on various architectures (heterogeneous hardware) with minimal changes to performance parameter settings. The continuous media data sent across the net-work can be displayed by client systems of equally varied flavors, subject to the client application's abili ty to perform the appropriate decoding. A flexible C M F S . allows a client application to retrieve streams of continu-ous media (possibly from different locations) in a manner that easily permits syn-chronous playback of any combination of video, audio, continuous text (i.e closed captioning), or other continuous media. In particular, the choice of differing qual-i t ies/encoding formats of video, and different language audio or text streams is provided to the client application and provided transparently by the server. Th is is a substantial enhancement to the functionality of a video server in that it per-mits access to the system resources in a less constrained manner. A u d i o and video streams may be requested independently,, possibly from different servers that could be in different locations for the presentation of a single mult i-media document. 1.4 Research Contributions Previous research has considered many of the issues involved in the creation of a C M F S in isolation. Very few systems have been built that consider the issues of scalability, heterogeneity, and the variable bit-rate nature of the da ta itself. The majori ty of the analysis of variable bit-rate data has been in the context of trace-driven or statistical model-driven simulations that have not been integrated into a working system. The development of a C M F S and its associated admission control schemes provides the three main contributions of this dissertation: 1. A comprehensive C M F S model is developed and implemented on several hard-ware architectures, verifying the feasibility of the design objectives. To aid in 11 achieving scalability, the system is designed in a distributed fashion allow-ing the components to reside on different computers, and thereby to achieve parallel execution wherever possible. '• .. . 2. A new disk admission strategy is designed and analyzed. Th is strategy exam-ines the detailed bit-rate profile of a continuous media object and the available server disk bandwidth and buffer space to determine if sufficient resources exist to deliver the object from the server in a guaranteed fashion. Th is algori thm simulates the future behaviour of the disk, accounting for variabili ty in the requirements of the streams and is thus named the vbrSim a lgori thm. 3. A network bandwidth characterization scheme is developed and integrated with a network admission algori thm., The network admission algori thm is a relatively straightforward extension of algorithms presented in other research, but which have not been sufficiently integrated into a comprehensive C M F S . 1.5 Thesis Organization and Summary of Results The remainder of this dissertation is organized as follows: The system model and motivation for the specific features of the C M F S are given in Chapter 2. Th is includes a description of the scope of the intended application domain and the type of continuous media streams that comprise the testing environment. The design details of the server components are described in Chapter 3. The user interface is also presented along wi th the manner in which the model of interaction influenced the design of the system. The major functional components of da ta delivery (retrieval), server flow control, and storage of data are discussed in order. Final ly , the implementation environment is introduced which includes the hardware and software context for the testing and validation of the unique VBR-sensitive algorithms of the C M F S . 12 Disk admission control is the focus of Chapter 4 . Several possible approaches to disk admission are presented and compared with the vbrSim a lgori thm. The vbr-Sim algori thm is shown to be both efficient to execute and superior to the alter-natives in terms of admission performance. Th is is done both analytically and v ia performance experiments. Analyt ica l ly , the vbrSim algori thm approaches the per-formance of an Op t ima l algorithm as the variabil i ty in disk performance decreases. A large number of performance experiments explore the effect of differing request patterns for different types of streams in terms of the number of simultaneous users and the amount of sustainable bandwidth that a single-disk server can support . In the performance experiments, a set of requests presented to a scenario as a group is defined to be a scenario. The results show that the disk system can admit scenario's wi th larger cumulative' bandwidth when the streams have lower variability than : when the streams have higher variability. A s well, the introduction of stagger-into the arrival pattern permits contiguous reading which allows the admission control to accept more simultaneous streams. Groups of requests which utilize nearly 100% of the disk bandwidth available can be accepted for the short to medium length video objects considered. The addition of client buffer space and increased stagger between requests allows a marginal increase in supportable request bandwidth, but only for the shorter streams (i.e. less than 3 minutes in length). The network admission strategy is discussed in Chapter 5 . The first aspect of network admission control is the development of an appropriate network bandwidth characterization. A simple smoothing technique which takes advantage of available' client buffer space is able to reduce the overall bandwidth reservation required as well as the variabili ty in the bandwidth profile. Th is characterization is then in-tegrated with the network admission algori thm. The network admission algori thm is able to accept requests that use up to 90% of the network interface bandwidth. Performance tests also show that simultaneous requests for streams of low variabil-ity can successfully use more network resources than high variabili ty streams. A 13 staggered arrival pattern, however, improves admission performance for the high variability streams more than for low. variability streams. Network admission uses a larger time granularity than disk admission. A comparison of network slot sizes shows that relatively small network slots provide the best admission performance for the type of video streams being studied. A survey and evaluation of related work in distributed continuous media servers is given in Chapter 6. The contributions of the dissertation are summarized in Chapter 7.along with discussion of directions for future work. 14 C h a p t e r 2 System Model Continuous media file servers must operate in the context of a network environment that, is capable of sustaining the high bit-rates associated with high-resolution video. A t one side of the network are the server components, which contain presentation objects, while at the other side are the client workstations (or set-top boxes) which request delivery of presentation objects. The first major contribution of this dis-sertation is the development of an end-to-end system model that incorporates all aspects of design from the client application interaction with the server (and hu-man user interaction) to the server composition and connection. This model is then verified by the design and construction of a C M F S that conforms to the model. This purpose of this chapter is to define the scope of the system design and describe the guidelines behind the design of the system. This model specifically includes the application programmer's interface (API ) available to client applications and the desired levels of abstraction provided for storage/retrieval of the media data. The details of the data delivery in the network itself, that is between the server's network interface and the client's network interface, are outside the scope of this dissertation except where they influence the manner in which data is sent from the server. 15 2.1 Overall System Design Model The highest level of system description is shown in Figure 2.1. The client first com-municates with a database server (1) to obtain information about the presentation objects stored in the C M F S . It is not str ict ly necessary to have a separate database server, as there is some amount of database functionality in the C M F S itself. Th i s is provided by an attribute facility for descriptions and annotations of presentation r objects. Some attributes are necessary in order to retrieve the object, such as data rates, frame sizes, and disk locations, whereas others provide cataloging and de-scription functions only, such as date of creation, service provider ( C B C , Globa l , Paramount , etc), or genre (drama, news story, or music video). Each presentation object has a unique object identifier (hereafter called a UOI) that client applications require in order to access the object or any of its metadata. The U O I is unique over space and time. Database Server Continuous Media File Server 1 4 | Figure 2.1: Communica t ion M o d e l W h e n a client application wishes to initiate the presentation of an object, it sends a request to the server (2) to establish a real-time data connection between the server and the client (3). If the connection is successfully established, separate out-of-band requests (4) are given by the client for the server to perform vir tual V C R functions. Interactions (2) and (4) use a reliable request-response protocol, Client 16 whereas the delivery of the continuous media data (3) is unidirectional from the server to the client wi th no reliability guarantees. In continuous media applications, the timeliness of data retrieval is more important than precise fidelity [19] and methods of transmission requiring retransmission of lost or corrupt data introduce unacceptable worst-case latencies. Storing continuous media has the same high-bandwidth requirements as re-trieval, but correctness of the data must be preserved in this si tuation, so a reliable connection is established in the reverse direction from interaction (3). The storage of da ta in such a file server can be concurrent with retrieval, but it is unreasonable in most cases to have real-time guarantees, since even if the pattern of data traffic to be sent can be completely specified, a server cannot require the client application to push the continuous media data to the server in real-time. The data can be sent over a real-time connection as long as it is the client's responsibility to fill the channel with data. A modified version of T C P could be utilized that involves selective retransmission of lost or corrupted packets. Selective acknowledgments or negative acknowledgments from the server are necessary in such a protocol to let the client know when to release data that has been successfully stored at the server. Selective acknowledgment is absolutely necessary for "live" video, as client resources must be reserved until the data is safely on disk at the server. For stored media at the client, it may be sufficient to s imply re-read the blocks associated wi th the missing data. O n a highly-reliable network, most of the data is transferred in real-time. M o s t continuous media servers divide time into intervals called slots or rounds, during which, for each active stream, sufficient blocks of data are read off the disk and/or transmitted across the network to allow continuous playback at the client application. A reasonable length for such a slot is 500 milliseconds, as it provides a fine level of granularity while l imi t ing the amount of overhead required for the operation of the server. The slot is the fundamental unit of time which drives the 17 software design of the C M F S . It is a unit of t ime which can be configured in a differ-ent manner for different servers, if desired. Section 3.3 analyzes some of the major implications on system resource usage that are influenced by the. choice of slot size. Continuous media clients can tolerate some loss of data and st i l l provide an acceptable presentation to the user. In some cases, this loss is caused by corruption of the network packets themselves, while in other cases, overflows at network inter-faces of either the client or the server or intermediate points in the network between the server and the client result in dropped packets. The design of the C M F S must have appropriate methods to deal wi th disk and network overload. There are a number of possible approaches, ranging from ignoring potential overload to providing deterministic guarantees that the server wi l l always be capable of reading the required data from the disk system and never overflow the network. Enforcing the actual'delivery of the data at the client is beyond the scope of server design because it involves network components over which the server has no control . The server can determine at what points in time the requested bandwidth exceeds the capacity guarantee, since the bandwidth requirements of each object are known when the user requests real-time delivery. If the predicted overloads are short in durat ion, a reasonable server policy could be to accept the series of requests and not send all the data during overflow conditions. The loss would appear indistinguishable from a loss caused by network problems. The overload may be in the disk system or the network system, or both. Potent ial problems in the disk system could be ignored if it is expected that the disk is likely to achieve bandwidth in excess of the guarantee. In this case, the disk can read early and store data in server buffers, thus reducing and perhaps eliminating the amount of overload time. This is a vague prediction, since neither the number of buffers needed nor the future bandwidth can be accurately estimated. If the problem is in the network, then both the disk and the network systems must 18 adapt. The server does not send all the data during overflow rounds. The disk does not need to read data which cannot be sent, so the reading schedule should also be altered when overflow does occur. • . • ' • > • • It is a complicated process for the server to determine which blocks of data should not be retrieved and/or not delivered-so as to cause as litt le disruption as ; possible, but, progress is being made in some research efforts [82, 87]. In particular, knowledge of the encoding format could allow a server to neglect inter-coded frames (i.e. B and P frames for an M P E G encoded object) while maintaining priori ty on intra-coded frames (I frames). Unfortunately, if this knowledge is required by the server to deliver only some data, this l imits the flexibility of the server in its abili ty to be useful for many heterogeneous encoding formats simultaneously. It is possible to •achieve this adaptabil i ty by storing the different types of frames as separate streams and combining, them at the client. When loss occurs due to network overload, the client could s imply drop, the request for the least important of the streams (i.e. the B-frames). Th is unpredictabili ty and dependence on encoding format can be avoided by a deterministic guarantee that all the data wi l l be sent from the server such that •it wi l l arrive at the client by the time: it is due. Bandwid th reservation prevents overload, but also results in conservative resource ut i l izat ion, since it is based on worst-case estimates. Th is is the approach taken in this dissertation. The server incorporates read-ahead and send-ahead policies to increase the effective resource uti l izat ion without causing the system to become overloaded. 2.2 Design Objectives In order to achieve the design goals of scalability, heterogeneity, and an abstract performance model of hardware resources in a natural and efficient manner, a dis-tributed system model was chosen. Implementing the logical components of the server in a manner that allows them to be executed on separate hardware plat-19 forms enables scalability and enhances the opportunities to support heterogeneity efficiently. The disk devices and network interfaces have bandwidth l imitations that prevent centralized servers wi th a single network interface from support ing more than a few dozen high bandwidth streams. It should be possible to store'different media types or encoding formats on, different (possibly heterogeneous) server com-ponents, .potentially in different geographic locations. The user interaction model to support synchronization of multiple streams is complicated somewhat by having a distributed model, but has added flexibility. A distributed design allows network latencies between servers and clients to be reduced by strategic placements of the server components within the network. For instance, it is possible to consider a server for audio data in French located nearer the French-speaking population, with ' . :the Chinese audio server in another location. The video servers could be located in a central location to serve users of all languages. The specific design decisions made to achieve each of the objectives are described in;the remainder of this section. Heterogeneity. Several types of heterogeneity influence the system design. The first aspect of heterogeneity is the data encoding format. Compression is necessary both to reduce the data rate required for real-time video da ta transmission and to enable the storage of reasonably long streams (1-hour in duration) on conventional disks. Several standards have been proposed ( M P E G [65, 38], M J P E G [1], H.261 [92], and H.263 [39] to name a few). It is likely that the continuing work on data compression wil l lead to the development of additional standards with better com-pression speeds and higher quali ty (i.e. less lossy) results. To accommodate data format heterogeneity, the C M F S ignores any details of da ta encoding with respect to data placement, retrieval, or transmission. To achieve this independence, the C M F S uses two fundamental units of stor-age: the presentation unit and the sequence. A presentation unit could be a video frame, a second's worth of audio samples, a PostScript version of a slide, closed captioned text, or any other time-sensitive media. Presentation units are grouped 20 into units of storage called a sequence. The number of presentation units per se-quence is defined by the client application that stores the data to the C M F S and can vary within a stream according to the anticipated needs of client applications that present data in real-time. This implies a co-operation between display clients and storage clients. . A l l encoding and decoding is performed at the client side, allowing the server to be used by many different types of clients. It is important to note that each mono-media object may be encoded and stored independently, or where the encoding permits, as a.systerri stream. System streams have audio, video, and text as a single object, which simplifies admission control at the server as well as display at the client. The capability of separating the media types allows the server to retrieve each stream independently and for the client to combine them in various ways. In particular, various audio objects could be retrieved with the same video object where the audio is in different languages or from different providers (as in different news services from different networks). Some video encoding algorithms are capable of producing objects which can be decoded at differing resolutions [18, 22, 25]. The resulting encoded object consists of a "base stream" and one or more "enhancement streams". The C M F S treats each resolution as an independent presentation object and does not influence the manner in which client applications can retrieve them or store them. A client wi th lower network bandwidth and decoding capabilities could provide' a lower-quality display by requesting fewer enhancement streams. In order to provide constant quality video and audio, the compression tech-niques produce media streams which have variable bit-rates. The resource require-ments of the streams exhibit both long-term and short-term variabil i ty [37, 96] and this variabil i ty adds to the heterogeneity of the system. M a n y components of the C M F S give explicit consideration to the V B R nature of the data. It has been noted that V B R streams tend to exhibit the burstiness characteristics of regular file traffic [54]. Th i s burstiness is precisely predictable, given knowledge of the size of each 21 presentation unit in every stream. The method of disk block allocation to stream data affects the heterogeneity that can be accommodated in the system. Previous work h a s focussed on careful disk layout [73], often at the expense of heterogeneity in data encoding format. W h e n V B R da ta streams in varying encoding formats are placed on the same server, any attempt at using a layout policy optimized for a single data format has questionable validity. Th is becomes a particular problem for data encoded in M P E G format. If a disk layout method optimized the allocation based on an M P E G group of pictures ( G O P ) as a unit of storage, this may provide poor performance for video objects .encoded in M J P E G , which has no concept of G O P s . For example,-video streams could be stored on a system that is configured as a redundant array of inexpensive disks ( R A I D ) . If 9 frames comprise a G O P in a 30 frame per second video, one choice could be to allocate blocks based on 9 frames, with consecutive G O P s on different stripes of a R A I D system. Thus , the stripe size would correspond to 9-frames. A n M J P E G video of the same average bit-rate would be constrained to use the same stripe size. A server may choose to implement fast motion based on retrieving.alternating G O P s . Whi le this makes sense for the M P E G objects, there is no logical reason for grouping M J P E G video in 9-frame units. The allocation design for M P E G unnecessarily restricts the way in which M J P E G video can be accessed. Thus , the C M F S makes no layout decisions based on sequential or striped access for a single stream or a group of streams. Some systems use striping to achieve higher disk bandwidth [15, 86, 56, 70]. Such a system must choose a stripe size appropriate for the common disk access patterns. If the streams have similar average data rates, the stripe size can be chosen so that the average number of seeks per stream per slot is very close to 1. In this case, the greatest amount of speed up is achieved. Even wi th variable bit-rate data, a system with sufficient buffering can provide good performance by reading in a striped fashion. If the average data rates vary widely, the desired stripe 2 2 size for the high bandwidth streams wil l conflict wi th the desired stripe size for the low-bandwidth streams. Scalability. Disk bandwidth performance l imits the number of streams that can be simultaneously retrieved from a system with a single disk [76, 87, 89]. Consider moderate quality video streams which have an average bit-rate of between 3 and 5 M b p s , and that the disk device is capable of transferring data at 40 M b p s . N o more than a dozen moderate quality video streams can be delivered simultaneously from a single disk. Depending on the intended user community, a C M F S should be capable of supporting hundreds or even thousands of simultaneous users [55, 89]. Th is requires a server with tens to hundreds of disks. For distance education or corporate training programs, smaller systems may be viable because the variety of content provided and the user populations are constrained. A C M F S intended to provide News-On-Demand or Video-On-Demand services to a large user community must be capable of significantly higher bandwidth. Simulations in [20] and [89] are but two of the studies which show the relative performance of differently configured systems with hundreds of disks and thousands of users, but give no support for their claim that such systems can be built . C M F S components are independent and can be executed on separate C P U and disk platforms to provide scalability. No particular hardware resource can be an ultimate performance bottleneck in a point- to r point network environment. A s long as high-speed real-time connections can be established between the server and clients, the size of the system can be incrementally expanded. A shared network topology cannot provide this scalability, since the physical medium has a maximum bandwidth which is divided among the connected components according to their da ta transfer patterns. A small amount of administrative state is necessary to co-ordinate the various server components. Thus , more users can be supported by adding server nodes until the cumulative disk bandwidth exceeds the network's ca-pacity to route continuous media traffic to client destinations. The storage capacity 23 for administrative state is proportional to the total number of objects stored in the C M F S . Me tada t a storage is also required, which is proportional to the total display time of the objects, but this is only a small portion of the total da ta storage needs. Since bandwidth l imitations of a single disk restrict the number of simulta-neous retrievals to a small number (i.e. less than 10 for moderate bit-rate video streams), a method of increasing the disk bandwidth from the server is necessary, especially for popular video objects. There are at least two ways of providing this increase in bandwidth: striping and replication. The method chosen in the C M F S for bandwidth enhancement for individual objects is replication, because it is completely scalable. Mul t i p l e copies of objects can be stored on a single server, or on different servers that are configured to share information. Thus, each instance of an object is treated as a separate presentation object for the purposes of bandwidth allocation. It serves no purpose to instantiate more than one copy of an object on the same disk, so replication is done on different disks, thus providing a minimal level of fault-tolerance as well. Al though fault-tolerance is not a major goal of this dissertation, the extra reliability is an added benefit from a system with replication. When a client requests an object, the copy that is most appropriate is selected. Str iping is used in several other systems [9, 40, 70], but in these cases, allowing additional users to access an individual stream involves a complicated scheduling process whereby delivery of a new stream request is delayed until it can fit in with the current retrieval pattern. Since multiple copies of an object may exist, a method of distinguishing be-tween the copies is necessary while at the same time maintaining a record that the copies contain the same data. Th is is done by providing two types of unique identi-fiers for each object: a high-level U O I ( H L U O I ) and a low-level U O I ( L L U O I ) . A n H L U O I identifies the content of a presentation object (i.e. C B C News 01/01/1998, or Star Wars M P E G - 2 ) and is independent of the object's location. A n L L U O I refers to an instance of an object and identifies the location of raw data for the 24 object. W i t h i n an individual server, it may be advantageous to perform replication for the popular objects, as well as migration to perform load balancing activities. - Migra t ion facilities are discussed further in Chapter 3. When replication is used in a system, a method of choosing a particular copy is needed. If clients keep track of the individual copies, they may directly request . the L L U O I of an object. Alternatively, this selection can be accomplished by a directory service, which can be implemented in several manners, 1 including man-ual t ranscription. A directory service is also required when multiple self-contained servers are located in the same network environment and have been configured to be able to share their data. 1 . ' One implementation of a-directory service is a computer system known as a Locat ion Server [46]. A Locat ion Server contains metadata about presentation ! objects. It operates by providing the appropriate mapping between an object's . T I L U O I and the associated L L U O I s . The Locat ion Server can choose a copy for the client, or provide a list of copies and locations that allows a client to choose which copy to retrieve, based on factors that:may include geographic distance and current load at the servers. The details of location service functionality which has been integrated into the C M F S are described in Chapter 3 as they relate to the design of the C M F S . The full details are given in Kraemer [46] and are outside the scope of this dissertation. Abstract Disk/Network Performance Model. In, the C M F S , the only signif-icant parameter in the disk subsystem,is the number of I / O operations guaranteed per slot (hereafter referred to as minRead). This "number of reads per slot" value is constant for each disk configuration, and is determined by running a calibration program on a logical disk device 1 to determine the largest number of blocks that can be guaranteed to be read off the disk, given the worst possible selection of disk 'Depending on the implementation environment, this could be a UNIX file, a raw disk, or a set of RAID disks. 25 block locations from which to read. Th i s is done by uniformly spacing the blocks across the disk, thus maximizing the seek times (assuming a S C A N algorithm [76]). Th is value most accurately reflects the actual capacity of the server since it includes all transfer delays (through I / O bus to memory) as well, as server software overhead. W i t h respect to network transmission, a fixed maximum bandwidth exists. The number of blocks that can be transmitted during a slot is defined to be maxXmit, because it represents the maximum amount of.data that can be sent out from the server across its interfaces in a specified amount of time. The value of maxXmit can be calculated in the same manner as the value of minRead: by running a calibra-tion test and observing the guaranteed number of packets that can be transmitted. Th i s value depends on the network connection bandwidth as well as the packet size chosen. A s far as the server-is concerned, this is an upper bound. N o more data can be sent in any slot, regardless of location of other factors. Synchronization Support. The C M F S is designed to accommodate flexible user access. One aspect of this is availability of the server. For many applications, it is undesirable to have the server operate in two exclusive modes: playback (retrieve) and record (store). A potential application for the C M F S is News-On-Demand, which requires frequent updating of relevant stories. It must therefore be possible to store information concurrently with playback. The C M F S provides primitive operations to store objects dur ing the normal operation of the server. The C M F S is capable of storing mono-media streams independently to fa-cilitate the most flexible method of client access. The simplest mechanism for the client is to store all media for a presentation as one object. The client application would then need, only to send appropriate data to the peripheral devices as it is received without the need for synchronization mechanisms. In many cases it is desirable to decouple the audio from the video. The interface provided allows clients, to synchronize the multiple objects in a way that does not require server knowledge of that relationship. It also eliminates correlated 26 data loss resulting from combined audio/video streams. Another benefit of having separate streams at the server for synchronization at the client is the support of scalable video, which enhances the support for heterogeneity. A n example which highlights the synchronization support provided by the A P I is the viewing of lectures in a distance learning environment. Several inde-pendent continuous media objects could be created from this type of event. They include: video of the speaker, text of the presentation slides, postscript versions of the slides, audio of the speaker, multiple language translation of the audio, trans-lation of the text, video of the audience, and even audio of the audience during question periods. These can all be obtained independently and stored in the C M F S with appropriate t iming information for presentation to the user. A client applica-tion may combine these objects to view the lecture with two video windows (each of which may have more than one substream associated with the video), a text win-dow, and an audio device (with the abili ty to switch between audio streams at any moment). Ei ther of the video windows could be in "full-screen" resolution with the other in a " thumbnail version". The design of the C M F S allows for such a com-plex viewing scenario to be supported in a straightforward manner. In Wong et al . [90], such a scenario is presented which utilizes the C M F S as one component of the technology in that system. A detailed example of client synchronization is given in Section 3.7. 2.3 Admission Control To ensure that the server has the necessary resources to deliver a requested presen-tation object, it performs admission control . Th is process examines the bandwidth required by the object and compares it to the system's remaining capacity. The disk admission control algorithm uses minRead as its only system-dependent pa-rameter. It is therefore independent of the mechanism used to lay out blocks on the disk or any other disk management technique such as s tr iping. The approach does 27 not preclude using disk layout algorithms or striping as an independent method of increasing the bandwidth of the disk. A highly optimized disk management system that uses such techniques may havea higher value for minRead. In that case a higher level of service can be guaranteed which should be able to accept and schedule more simultaneous clients. .The admissions scheme itself, however, is not affected by the details of these optimizations. The disk admission control algorithm addresses the allocation of the disk re-source among client requests. A t least two other resources have the potential to be in short supply at the server: processor cycles and network bandwidth. G P U schedul-ing for real-time applications, including continuous media, has been explored by many other researchers ([93] as one example), and their results are generally appli-cable to either dedicated continuous media servers or general workstation operating system environments. Since the rate at which : processor speeds are increasing is faster than the rate at which disk access times are decreasing, a system can easily be configured so that the system is not C P U bound. The major tasks of the C M F S that are performed at the server node are the.admission control calculations and the protocol processing for the network packets. The performance analysis of Chapters 4 and 5 show under what specific circumstances the C P U has extensive work to do. -In general, for medium length streams, the C P U requirements for admission con-trol are moderate, if requests for delivery are relatively infrequent. W i t h respect to protocol processing, it is most often the network card that restricts the processing, but the C P U cycles required are proportional to the number of packets and total bandwidth transmitted. W i t h regard to the network bandwidth, an admission control a lgori thm is also required to ensure that the accepted connections do not over-utilize the capacity of the network. The situation is slightly different from the disk bandwidth situation in that the receiving end of the data transmission is beyond the control of the server and so the server can only ensure that it l imits the amount of da ta sent from the 28 network interface. It has no direct influence on the rate at which the client can receive the data. Server memory is also a l imited resource which may be the cause of a per-formance bottleneck. The requirements of server memory for a particular scenario of streams is directly related to the disk and network bandwidth. If the disk system is permitted to read ahead, server buffers can be used to store the data that is read earlier than required. A n y admission control algorithm which incorporates the abil-i ty to read data early, thus reducing the bandwidth needed to service the data for the remaining streams can take advantage of extra.server memory for disk buffers. Th is is because more streams can be accepted as the total future disk requirements at admission time are less than they would be in a system where no read-ahead was achieved. This can befurther enhanced if client buffer space and network bandwidth permit data to be sent early as well. The client buffer space is not a resource that is directly under the server's control, so it is not given a great deal of consideration. It wi l l be shown later that the server buffer space is indirectly managed and accounted for by the disk admission control algori thm used in the C M F S . When a client:stores an object, a presentation unit vector (or playout vector) is created which contains the number of bytes for each presentation unit of the stream. Because the stream may be VBR-encoded , the values for the vector may vary considerably. This playout vector is used to provide a bandwidth characteriza-tion of the stream, either in fine detail or in summary form. The bandwidth required does not necessarily match the playout vector, since different data is transferred if an alternate speed of delivery is requested or sequences are to be skipped. The exact bandwidth schedule (known as the stream schedule) is calculated on demand when the request for playback is received. 29 2.4 System Architecture The design of the file server is based on an administrator node and a set of server nodes, each with a processor and disk storage on multiple local I / O buses. Each node is connected to a high-performance network (we currently use both A T M and Fast Ethernet) for delivering continuous media data to the client systems (see Figure 2.2). < Client Application Reader Network Interface' Client Application Writer Network Interfac; Server Node(s) Processo Network Interface [ Controller | [Controller | 9 -.9 [Controller"] 9 9 Administrator Node" ~N Processor Network Interface! Attribute Database Dedicated! [Network o I/O BUS Client Side Server Side Figure 2.2: Organizat ion of System Disk drives are attached to the I / O buses in a configuration that wi l l provide the bandwidth sufficient to fully utilize the network interface. M u l t i p l e server nodes can be configured together to increase the capacity of the server. Since the operation of each server node is independent of the other server nodes, any number of nodes can be added dynamically (without requiring the other nodes to be reset) subject only to the capacity of the network switch configuration. Thus, a server can be made up of different types of server nodes, ranging from powerful computers wi th 30 many large disks, possibly in R A I D configurations, to a smaller computers wi th 2 or 3 disks and somewhat slower network interfaces. Configurations can even consist of server nodes which are simple processor cards interconnected v ia an I / O bus such as P C I . In the latter case, the nodes can communicate over the I / O bus rather than a dedicated network. This network does not need to be high-speed, since the traffic between the administrator node and the server nodes is control information, which is comparatively t iny in size, so an ethernet is sufficient. Depending on the physical, configuration of the server components, control information could be transmitted on the same network as is used for the continuous media data itself. A similar architecture is used in other scalable, high-performance video servers [6, 47]. When a client wishes to retrieve a presentation object, the ini t ia l client open request goes from the client to the administrator node. This node determines which of the server nodes has the requested object and forwards the request to the ap-propriate server node. The node obtains the playout vector and other attributes necessary for presentation from the administrator before accepting the open request. Communica t ion from then on takes place directly between a particular server node and the client. The server restricts each object to be total ly contained on a single server node, although it may span several disks. If a very large object exceeds the capacity of a single server node, then it is up to the client application which stores the object to define an appropriate division into multiple objects. F rom the server's point of view, there is no relationship between objects which have been split in this fashion. • 2 . 5 Data Delivery Continuous media streams must be delivered before real-time deadlines to preserve the continuity of presentation to the user. D a t a delivery can be controlled by the client requesting data packets (pull model), or by the server sending packets to the client as it has resources (push model). In the push model, if the transport layer 31 is incapable of receiving data at the rate it is being sent, a flow control mechanism (such as in T C P / I P ) is often implemented to prevent the sender from flooding the receiver. In a continuous media server using the simplest push model, the server trans-mits bits at a constant negotiated rate and trusts the receiver to decode and present them to the user at the appropriate time. This is unacceptable for V B R streams because the client's requirements vary over time and the server would be unaware of when the client had resources (buffers) available to accept the data. If the server sent at the maximum bit rate allowed, then the client buffer uti l ization would grow over time until all the data had been sent; Th is is because when the server sends at the maximum transmission rate, for.every k bits sent by the server, at most k bits (almost always fewer) are freed by displaying. Alternat ively, sending data at the average bit rate could 1 result in starvation of a stream whenever the cumulative bit rate necessary for display is above the average. A larger transfer rate than the precise average is necessary to avoid starvation. Th i s rate can be easily computed. If this larger rate is used, the problem of buffer build-up would occur again, because for the portion of the stream following time when the peak rate is required, buffers are released by the client at a slower rate than they are. received across the network. Some of these issues are discussed in greater detail by Feng [28]. A receiver-based flow control model is equally undesirable since the round tr ip delay in sending the request for more data may result in an underflow at the client. If the client correctly anticipates its needs for data and has sufficient buffering capabilities, it could request data early, avoiding this problem. This requires the server to be ahead in both reading and sending in order to be able to respond to the client's requests. A s well, the request traffic in the reverse direction could be significant if sufficiently detailed granularity is desired. The design of the C M F S utilizes an alternative method that provides flow control in the sense that the server never sends data faster than the client can handle 32 it , but does not require explicit client requests. The server has knowledge of the exact presentation requirements from the playout vector, stored at the administrator node. Thus, it can send data at precisely the rate needed every second. This information, plus knowledge of the client buffering capabilities and the rate at which the client can handle.incoming packets, permits the server to send data in order to keep client buffers from starvation or overflow. Details of this flow control mechanism are provided in Section 3 .4 . ' 2.6 Design Limitations In addition to the previous categories, design decision trade-offs were, identified in the design of the C M F S . The decisions made in these particular cases are described in the remainder of this section. • The t ime between the client request for-delivery and the return of control to the client for the ini t iat ion of presentation has an upper l imi t . Th is is the longest amount of time needed to read and transmit the first slot of data,.: plus round-trip message latencies. Other servers delay requests by a variable amount of t ime, ranging from several seconds to minutes in order to optimize on server resources. One of these resources is disk bandwidth, which may be saved by pre-fetching data at a slower rate than required and beginning to send when sufficient buffer has been retrieved so that constant rate retrieval and transmission wi l l satisfy the client's requirements[37, 60]. A s well, requests' may be batched so that multiple requests for the same object that arrive wi th in a small window of time are treated as one object from the disk retrieval point of view. It is further advantageous to wait in the hope that more disk bandwidth wil l be available in the near future. Th is feature is also an advantage. For one, l imi t ing the response time for the delivery request allows playback to begin almost instantaneously. A s well, 33 a precise knowledge of the time at which data wi l l arrive permits a client to request streams from different server nodes, or different servers; which possibly have differing delay latencies and present them to the user in synchrony. If the response from one of the streams can be delayed an unknown amount of time, this mode of interaction becomes impossible. • A l l data for each stream connection is treated independently, and multiple users requesting the same stream concurrently could create a situation in which multiple buffers containing the same data block may exist at the server. Other work on opt imizing disk access [43] is t ightly integrated into the server design. 2.7 Stream Characterization A video-on-demand service can provide audio and video streams of many differ-ing characteristics, such as average bit-rates,, stream playback durations, and peak rate of data transmission., Typ ica l environments for such video servers can range from movie-on-demand, where the average, length of a stream is 100 minutes, to news-on-demand, where many stories may be-quite short in durat ion (i.e. less than one minute), and nearly all wi l l be shorter than 10 or 15 minutes., Between these extremes may be environments such as video juke-boxes, wi th typical stream play-back durations of between 2 and 5 minutes, distance learning servers with lessons o r . seminars that could range from several minutes to an hour in length, or live trans-missions with indefinite durat ion. The quality of encoding can also differ greatly, depending on the intended viewing environment. Ful l -mot ion, high-resolution video would almost certainly be required for movies,, while viewers of news may accept half-motion or lower resolution, or both. Whi le it is possible for a general server to handle all of these types of streams at once, the admission algorithms and resource usage wi l l be more efficient if tailored to a particular type of viewing environment. The types of streams that have been 34 chosen as the test bed for the admission algorithms developed for the C M F S are full-motion, medium-to-high quality V B R video streams. Mos t audio streams are constant-bit-rate if encoded with standard pulse-code-modulation ( P C M ) encoding techniques. Voice quality audio objects are also reasonably low bit-rate. A u d i o can also be encoded at a variable bit rate, at varying levels of quality, but does not typically provide bit rates of similar size. A large number of reasonably short video streams were digitized and com-pressed using a Para l lax M J P E G encoder/decoder card attached to a S U N Sparc 10. The bandwidth requirements of the streams are summarized in Table 2.1. Th i s card is capable of capturing V H S video at 30 frames per second at a resolution of 640x480. After compression, the average bit rates of the streams ranged from 1.9 M b p s to 7.28 M b p s . Some streams were in black and white, but most were in colour and the frame rates for the video were either 20, 24, or 30 frames per second. Mul t ip l e audio formats were available, but their low bit-rate and the fact that most encoders produce constant bit-rate streams makes them uninteresting from a perfor-mance point of view. Some streams were also encoded using an Ul t imot ion M J P E G encoder card installed in an I B M RS/6000 running the A I X operating system. The version of the card available was not capable of encoding .at full-resolution at 30 frames per second, so it was used sparingly. 35 Stream Frames F P S T ime in Seconds Bandwid th Required (Blocks /Slo t ) M i n M a x A v e Stdev Cov A d s - M M 3596 30 119.867 2 9 3.82 1.01 0.26 Aerosmith - M M 11682 30 389.4 1 6 3.6 0.851 0.24 A k i r a 3822 30 127.4 2 8 4.745 1.24 0.26 Annie Hal l 4503 30 150.1 3 6 3.711 0.84 0.225 A r e t h a Frankl in - B B 12535 30 417.833 2 7 3.71 0.68 0.184 The A r r o w - C B C 4446 30 148.2 2 7 3.31 0.933 0.28 Baseball - SI 2570 30 85.667 1 7 3i65 1.1 0.3 Basketball - SI 4072 30 135.733 3 13 6^69 2.23 0.354 36 Stream Frames F P S T ime in Seconds Bandwid th Required (Blocks /Slo t ) M i n M a x Ave Stdev Gov Beach Boys - M M 4039 30 134.633 2 6 3.93 0.66 0.17 Beatles - M M 6448 30 214.933 1 5 3.19 0.7 0.22 Bengals - SI 10498 30 349.933 2 7 4.62 . 0.98 0.212 Boxing - SI 10775 30 359.167 2 11 5.17 1.3 0.25 Buddy Hol ly - M M 5887 30 196.233 2 5 3.92 0.68 0.18 The Cars - M M 6473 30 215.767 ••2 7 3.83 1.12 0.29 Car toon Trailers 1791 20 89.55 2 9 5.26 1.59 0.302 Ca t In The Ha t 1040 20 52 , 1 6 4.664 • 1.067 0.223 Cl in ton - C B C 3710 30 123.67 2 6 4.4 0.829 0.186 Count ry Mus ic - C B C 4718 30 157.267 3 9 3.75 0.84 0.224 Chases - S W 9924 30 330.8 1 9 4.14 1.03 0.249 Coaches - SI 11023 30 367.43 2 8 4.95 1 0.202 Criswel l - Plan-9 1517 20 75.85 1 3 1.71 0.49 0.287 Dallas Cowboys - SI 3805 30 126.83 2 11 5.62 1.9 0.34 Due South 16824 30 560.8 1 4 2.63 0.53 0.2 Er ic Clap ton 6246 30 208.2 3 6 4.43 0.59 0.13 Evacuation - Empi re 13888 30 462.93 1 7 3.33 0.77 0.23 F B I M e n - Raiders 5201 30 173.36 2 6 4.45 0.896 0.201 Fender Anniversary - M M 7637 30 254.57 2 5 2.94 0.6 0.2 Fires - C B C 4663 30 155.433 2 8 4.34 0.973 0.224 Fleetwood M a c - M M 6447 30 214.9 2 7 3.19 0.74 0.23 Forever Rivals - C B C 12177 30 405.9 2 8 3.67 1.18 0.32 George of the Jungle 1192 20 59.6 2 9 5.95 1.32 0.222 Island of Whales 2798 20 139.9 1 5 2.86 0.776 0.272 Joe Greene - SI 8804 30 293.47 3 8 5.09 0.94 0.185 37 Stream Frames F P S T ime in Seconds Bandwid th Required (Blocks /Slo t ) M i n M a x A v e Stdev Cov John Elway - SI 3117: ;30 103.9 3 10 5.93 1.77 0.299 The K i n k s - M M 3994 . 3 0 133.13 2 5 2.9 0:541 0.19 Chr is t ian Laettner - SI 9973 30 332.43 2 11 5.944 . 1:86 0.312 John M a j o r - C B C 4306 30 143.53 2 6 3.66 .0.821 0.224 M a p r o o m - Raiders 10843 30. 361.43 1 9 4.24 1.46 0.34 Minnesota Twins - SI 5476. • 30 182.53 3 .14 6.03 2.11. 0.35 M o o d y Blues 6565 30 218.83 2 5 2.96 0.575 10.19 Moria 'r ty - Star Trek 10906 30 363.533 2 5 2.73 .0.578 0.212 Montrea l Canadiens - SI 3490 30 116.33 3 9 5.86 1.36 0.23 M r . Whi t e - R D 2086 20 104.3 1 3 2.163 0.51 0.237 N F L Deception - SI' 17245 30 574.83 2 8 4.62 1.08 0.235 N F L Footbal l - SI 13332 30: 444.4 2 11 5.6 1.64 0.29 1998 Olympics 9123 30 304.1 2 8 3.59 0.79 0.221 P ink F loyd - M M 5922 30 197.4 1 , 7 3.5 1.06 0.3 Plan-9 3186 20 159.3 2 4 2.285 0.459 0.20 Raiders - Raiders 12048 20 602.4 1 6 2.9 ; 0.75 0.258 Ray Charles - B B 8491 ,30 283.033 5 9 7.28 0.86 0.119 Bloopers 93 - SI 10810 30 360.33 2 9 4.52 1.29 0.287 Intro 7 SI 10560 . - 3 0 352 2 9 4.74 1.35 0.285 Spinal Tap - M M 10199 30 339.967 2 9 5.41 .1.3 0.241 Sports Page Hili tes 4520 30 150.667 2 • 8 . 4.42 •1.16 0.26 Star Trek - Voyager 3129 30 104.3 1 4 2.5 0.621 0.25 Death Star - S W 18455 30 615.167 2 9 4.17 .' 0.93 0.22 Princess Leia - S W 18774 30 625.8 1 6 3.55 0.736 0.207 Rescue - S W 9299 30 309.967 2 6 3.38 0.71 0.209 38 Stream Frames F P S T ime in Seconds Bandwid th Required (Blocks /Slo t ) M i n M a x A v e Stdev Cov Snowstorm - Empi re 7037 . 30 234.57 1. 9 3.38 1.46 0.43 Summit Series 1972 14450 30 481.667 2 7 3.81 1.175 0.308 Super Chicken 599 20 29.95 2 6 3.86 0.99 •0.258 Tenchi M u y o 2853 30 95.1 3 11 5.89 1.29 .0.22 Tom Connors - M M 4097 30 136.567 3 7 4.63 0.715 0.154 Toronto Blue Jays - SI 4245 30 141.5 2 7 3.92 1.3 0.33 X-F i l e s 14798 30 493.267 .1 5 2.39 0.63 0.26 Yes Concert (24 fps) - M M ' 10253 24 427.21 1 5 2.54 0.701 0.275 Yes Concert (30 fps) - M M 13077 30 435.9 2 8 3.75 0.944 0.251 Table 2.1: Stream Characterist ics Var iabi l i ty in the bit-rate of the compressed video signal comes from differences in ;motion within a scene or complexity differences between images. Since M J P E G encoding produces only intra-coded frames, no motion detection is performed. The M P E G compression standard incorporates motion detection, and can thus achieve higher compression in scenes wi th very little motion (i.e. newscaster footage). The variableness in the data rate of M o t i o n - J P E G video objects comes from the differ-ence in the complexity of each frame, and by extrapolat ion, each scene. To achieve streams with substantial variability, clips were chosen that contained' alternating scenes of complex action and pr imari ly solid-colour low motion. Even though the server is designed to accommodate multiple heterogeneous formats; hardware l imi ta-tions restricted the choice of formats readily available for performance testing. Only the two M J P E G hardware encoders were available. A software encoder for M P E G was available, but the performance in the existing systems made it unsuitable for encoding reasonable length video streams. Short clips of Quickt ime, M P E G - 1 , and M P E G - 2 video are available on the Internet, but they were not used in the experi-39 merits because either the bit-rate was too low, the playback was too short, and the streams typical ly lacked audio. A l l of these formats have been used successfully in the C M F S for demonstration purposes, validating the heterogeneity of the system. The video streams were chosen to be representative of a News-On-Demand environment. Thus, many clips have alternating scenes of narration, followed by news footage with higher complexity. Other clips are short scenes from movies, or music videos that exhibit similar scene changes, although some of them are not as dramatic. The most significant variability is found in the sports highlight, clips and movie scenes. The bandwidth requirements in Table 2.1 are stated in blocks per slot. In the version of the server used for the performance experiments of this dissertation, blocks are 64 K B y t e s in size and a slot is 500 msec in length. Thus, 1 block/s lot is equivalent to 1 Megabi t per second, which is just slightly more than 1 M b p s . 2 Various abbreviations are used in identifying the source of the streams. These are listed in Table 2.2. Abbrevia t ion Source B B Blues Brothers C B C Canadian Broadcast ing Corpora t ion M M M u c h Mus i c R D Reservoir Dogs Raiders Raiders of the Lost A r k SI Sports Illustrated S W Star Wars Empi re The Empire Strikes Back Table 2.2: Stream Sources The length of the video streams ranges from 30 seconds (a cartoon theme song) to 10:25 (a scene from the movie Star Wars). The number of blocks required in any one slot ranges from a low of 1 (in many streams) to a high of 14 (in the 21 Megabit = 1048576 bits. 40 sports highlight cl ip, Minnesota Twins ) . Var iabi l i ty can be measured in three ways: the standard deviation of the block schedule, the coefficient of variation (which normalizes the variability) of the block schedule, and the ratio of peak to average bit-rates. These give different rankings for the streams, so the coefficient of variation was used when dividing streams according to their variability. Th is measure captures the long-term variabili ty and is not biased in favour of the large bandwidth streams. Streams with these characteristics were chosen because they are typical of a News-On-Demand environment: moderately high-quality video objects with a typical playback durat ion of less than 15 minutes. The average bit-rates are common for full-motion, 30 frame per second M J P E G video and moderate quality M P E G - 2 video. The variabili ty within the streams themselves covers a wide range. Aga in , this would be reasonable in a news-on-demand environment, with some stories consisting of only newscaster footage, (low variabil i ty) , while others may have rapid scene changes of varying complexity (high variabi l i ty) . 41 C h a p t e r 3 System Design and Implementation The system model defined in the previous chapter provides the context in which the Continuous M e d i a F i le Server was built . The purpose of this chapter is to describe the particular design decisions made in the construction of the C M F S and thereby to validate the feasibility of the system model. The structure of the server components is first described, wi th respect to client-server and server node - administrat ion communicat ion. The client interface to the server is presented which enables the facilities of the server to be tested in a practical environment. The description of the interface includes the corresponding server action that is triggered by each particular request, providing more insight into the organization of the server components and; their uti l ization of system resources. Issues relating to the delivery of data from both the retrieval and the storage points of view are discussed along with the implications for the mechanisms/protocols necessary to implement the designed functionality. Final ly , an example of how the server's flexible interface permits a complex client application to access presentation objects is provided, complete with code fragments. Evaluat ion of a real server enables more directly applicable and convincing ' results than simulation modeling alone. The hardware configurations available l im-42 ited the extent to which testing could be performed on an actual server, but the design objectives were verified in the construction of the server. 3.1 System Initialization and Configuration A s mentioned in Section 2.4, the server is designed as a distributed system, consist-ing of at least two components with different network identities: an administrator node and a server node. Server nodes register and De-register wi th particular ad-ministrator nodes on a particular network interface address. The network interface address is identified by an IP address/port number pair. The server node must identify which network address it wishes to use to communicate wi th continuous media clients. It is possible to use a different interface to communicate with the administrator. O n the other side of the network are the client applications. The basic structure of the software on a typical client and the organization of the server is shown in Figure 3.1. The operations performed and the relationships between the tasks are described in the remainder of this section and in Section 3.2. The administrator node's primary function is to receive messages that re-quest the establishment of the real-time connection between a server node and a client. The administrator also functions as the centralized database server for meta-data/at t r ibutes associated with presentation objects, some of which are necessary for retrieval and admission scheduling. The administrator waits for requests from its clients, which include server nodes. The first message that must be received prior to the opening of any connections is. the registering of server nodes. Each server node must register wi th the administrator to enable the administrator node to forward the appropriate open connection requests. When a server node begins its processing, it must allocate main memory buffer space for each of its disks. Memory also must be allocated for connection management, and the disk allocation vector (i.e. superblock) for each disk which is retrieved from the administrator . A permanent copy of the superblock for each disk 43 Schedules Figure 3.1: Software Structure of Server Node is kept in the administrator database to prevent a disk crash or other node failure from leaving the server node with, an inconsistent view of what data is stored on its disks. Then , the server node registers its availability to the administrator, and is ready to accept requests from clients. The organization of tasks and flow of control in each component of the server and client can be modeled by interdependent threads of execution which exchange messages and share resources, such as server buffers and network bandwidth, and coordinate processing based on the availability of those resources. In Figure 3.1, the dotted lines represent message transfer between the client, the administrator and a server node. These messages contain requests for continuous media transfer or attr ibute data and must delivered wi th a reliable protocol . The thin solid lines indicate the manipulation of shared state by the threads at the server node. The 44 thick solid lines represent real-time transfer of media data, which must be real-time for the retrieval process. Wr i t i ng objects to the server must utilize a reliable protocol, and may be near real-time. The software components involved in storing and retrieving an object are also shown in Figure 3.1. The major thread at the server node is the node manager, which receives client and administrator requests. For retrieval, the network manager apportions credit for sending data to client applications. Each disk has a disk manager thread which manages the disk buffers allotted to that disk and enqueues blocks for transmission. Final ly , each opened stream has a stream manager thread which dequeues buffers that have been read off the disk and sends them on the network connection according to the credit issued by the network manager. When storing an object, the write manager is responsible for receiving the media data from the client and storing it on the disk as well as storing the object attributes at the administrator . More detail on the interactions and activities at the server as the result of user requests is given in the following section. 3.2 User Interface to CMFS Facilities The client interface calls can be categorized as follows: calls which relate to objects, calls which relate to connections for delivery of continuous media, and calls which relate to metadata for objects, and calls which involve directory service functionality, as summarized in Table 3.1. The details of the more significant interface calls are provided in the remainder of this section. Appendix A contains a complete description of the A P I , including the interpretation of each parameter required in each cal l . The most significant interface call is CmfsPrepare, which results in four major activities at the server: 1) the block schedule is calculated, 2) disk and network admissibil i ty is determined, 3) the respective disk manager is informed of the new stream's disk block schedule, and 4) the transfer of the real-time data is ini t iated. Con t ro l only returns to the client when an ini t ia l buffer of data has been sent. The 45 ini t ia l buffer is sufficient to ensure that the client wi l l always have sufficient data for presentation of the object to the user at the requested rate. Task Interface Routines Object Manipula t ion CmfsCreate , CmfsWri te , CmfsComplete CmfsRemove, CmfsMigra te , CmfsReplicate Stream Delivery and Connection Management CmfsOpen, CmfsClose CmfsPrepare, CmfsReprepare, CmfsStop CmfsRead , CmfsFree M e t a D a t a Management C m f s P u t A t t r , C m f s G e t A t t r Directory Service CmfsLookup , CmfsRegister Table 3.1: Cmfs Interface Procedures Object Creation and Removal. M o s t client interaction with the server is in retrieval mode. It is necessary, however, to store C M objects in the server before they can be retrieved. Over the course of time, these objects may also be moved, replicated, or deleted in response to user requests or server load-balancing needs. W h e n an object is created, the client application uses the CmfsCreate cal l . Initially, a message is sent to the administrator to set up the identification and location of the object. A server node on which to place the real-time da ta is chosen by the administrator . In turn, the server node chooses the disk device(s) for the media data. The client receives U O I which is to be used in all further queries concerning the object. The server must know about the normal display rate of the presentation object in order to calculate the rates at which data must be transferred to the client. Th is is one of the parameters provided by the client application in the CmfsCreate cal l . Since many media types dp not have a rate that can be expressed as an integer number of presentation units per second, a ratio of presentation units to milliseconds is used to allow specification of arbi trary display rates. For example, M P E G audio can be encoded at approximately 19.14 frames per second, but the specification for 46 the encoding is precisely 49 frames per 2560 milliseconds. The interface procedure Cmfs Write stores a sequence oi continuous media at the server node. A n individual sequence is stored in a contiguous fashion on the disk. Segmenting the object in this manner allows a client application to choose to only retrieve a certain portion of the stream in order to achieve fast-motion display at similar bandwidth levels to that required for full-motion display. This is further elaborated on in the section on CmfsPrepare. One interesting possibility is to store an M P E G video object in the following manner: each I-frame could be a sequence, and all the B and P frames which rely on that I-frame for interpretation could be another sequence. A client requesting every other sequence would then be able to retrieve I-frames only. Another possibility could be storing one video frame per sequence (in a purely intra-coded video object) so that retrieving every other sequence results in perfectly smooth fast forward at twice the normal frame rate. These details only affect the relationship between the client applications which store and retrieve the media data. After the last sequence has been stored, the client issues a Cmfs Complete call , which informs the server node that it can commit the changes to the admin-istrator database that are associated with this presentation object. Th is includes the attributes that are defined by the server node which are necessary for stream-retrieval. These are: 1) the location of the raw data, 2) the sequence map (an array of sequence beginning and end points, wi th associated display unit information) and 3) the presentation unit sizes for the entire object. In addit ion, the revised copy of the disk block layout (superblock) is stored at the server to ensure consistency in the event of a failure of the server node part-way through storing an object. Objects can also be removed from a server node v ia CmfsRemove. Th is could happen as a result of migration or direct removal by a client applicat ion. A l l attributes are removed from the administrator database and space on the disk device is reclaimed. 47 Object Replication and Migration. Each disk device has a limited bandwidth that restricts the number of independent retrievals of high-quality, full-motion video streams to a relatively small number (i.e. fewer than 10). The access patterns of all types of media, including video rentals [36], show that at any given time, some objects are much more popular than others., If a server is to be capable of support ing dozens or hundreds of simultaneous users requesting presentation objects wi th a realistic distr ibution pattern, the bandwidth available for these objects must be greater than that provided by an individual disk.. Replicat ion provides the most benefit to a server like the C M F S . Replicat ion can be done on at least three levels: between disks within a server node, between server nodes on an individual server, or between servers. The first two levels of replication increase the number of simultaneous users of an individual object at an individual server. Replicat ing between nodes in a single server has the added benefit of load-balancing within the server and increases the reliability and availability of objects. The final level of replication also increases the availability of objects should a server failure occur and may reduce the overall cost of retrieving remote objects by copying them closer to the location where they are frequently accessed. The main benefits of migration are load balancing and reducing remote retrieval costs, but it does not increase availability, since the same number of copies of each object exist. Several servers may be installed on a particular wide-area network or through-out an internet. A location service can be added which allows a client application to determine the existence and location of the objects it wishes to present to the user [46]. Th is location service is independent of the structure of the C M F S . The only enhancement needed is that an administrator node must register wi th the Locat ion Server if it is wil l ing to export the objects stored on that server v ia CmfsRegister. If a client wishes to retrieve an object in a system with a location server without consideration of which instance is returned, the client can perform a Cmfs-Lookup request. Th is call wi l l contact the location service and return the location(s) 48 of all copies of the object. Replicat ion and Migra t ion are achieved v ia the CmfsReplicate and the Cmfs-Migrate interface calls, respectively. These' can be initiated manually or performed automatically, based on some threshold of, use of a particular stream or a threshold of load on a particular server. A n unfortunate consequence of migrating due to heavy load may be that this condition of heavy load is prolonged by the migration process itself. If automatic moving of objects is enabled, a load monitoring facility is activated in each administrator node to determine when to initiate the copy operation. • • . Replicat ion and Migra t ion take place on-line by ut i l izing server resources which are in excess of those required to perform the delivery for requested streams. Dur ing periods of heavy use, this may result in a very, slow migration procedure. The analysis and implementation of location service and migration functionality is given ,a complete discussion in. Kraemer [46]. Connection Establishment and Teardown. For connection maintenance, client applications have two interface calls: CmfsOpen and CmfsClose. CmfsOpen estab-lishes a transport layer connection from the server to the client for delivery of the stream da ta of an object. The caller provides the U O I for the object that it wishes to receive and sends a message to the administrator node. The request is then for-warded to the server node that contains the object. If the object which is to be opened does not exist in the directory, a corresponding failure status is returned immediately. A connection identifier (cid) is returned for use in all further communicat ion with the server node regarding the object that it has just opened. In this respect, a connection identifier is analogous to a U N I X file descriptor. The other useful information returned from CmfsOpen is an upper bound on the amount of time that a call to prepare a stream for delivery (CmfsPrepare) wil l take. It is based on the time necessary to perform the admission control and transfer an ini t ia l buffer 49 of data over the network connection. The client application uses this information to coordinate the playback of multiple streams. If the client knows how long the preparation of streams A , B , and C wil l take, it can then determine the proper times to issue these prepare requests so that the reading and synchronized presentation of these streams can be accomplished with minimal buffering at the client application. The Node Manager initializes an e n t r y i n the connection table and creates a Stream Manager thread for the object. Th is thread actually establishes the trans-port layer connection to the appropriate port on the client machine. The parameter list for CmfsOpen includes a callBack procedure which is executed at the client before accepting the connection. The callBack procedure evaluates the bandwidth parameters of the connection and; the amount of client buffer space to be dedicated to this, connection. If the client has more resources than required,.i t informs the server of this fact, so that the delivery of data can use those extra resources. If the client has fewer resources than, necessary, the connection is not established and a failure status is returned. If and only if the client and the server node can accept the connection parameters, the connection request is granted. The granting or refusal of the connection is relayed back to the client v ia the administrator node. The client must have at least enough buffer space to store the largest two consecutive slot's worth of data for the opened presentation object. Th is is because the client l ibrary performs double buffering. D u r i n g the playout of the current slot, the next slot is transferred into client memory across the network interface. Due to the variable bit-rate nature of the data to be displayed, all the data for a slot must be present at the client before playback of the slot can be initiated and this space must remain available for decoding for the durat ion of the slot. The client does not necessarily know exactly how many bytes are str ict ly necessary before beginning playback to avoid starvation at the client. It is possible that 50% or more of the bytes in a slot is for the first presentation unit, or equiv-alently, that 50% is for the last presentation unit . In the former case, starvation 50 would result in j i t ter wi th in the slot as,the next video frame could not be displayed or the audio device would run out of data. In the latter case, a large amount of buffer build-up for the last frame would occur. . If this space was needed by the transport layer for the data to be displayed in the next slot, then buffer overflow would result. Therefore, at worst, this requires the maximum amount of data that must be presented in the largest two consecutive slots. When the delivery of data is no longer required for the object, a client ap-plication invokes CmfsClose on the connection. A l l the resources allocated at the server are released and the transport level connection is gracefully torn down. It is i possible that a malfunctioning client or disconnected network could result in a lost CmfsClose request. Therefore, the server implements a timeout mechanism that tears down the connection if there has been no traffic for a certain amount of t ime. 1 Data Delivery. For stream delivery, the client interface is CmfsPrepare. Th is call requests that a certain portion of the stream be delivered at a specific rate, and provides a guarantee that the client wi l l begin consumption of the data within a specific amount of time, given as a delay bound. This constitutes a "contractual obligation" by the client to retrieve the data at the prescribed rate. When a client issues CmfsPrepare, the request is sent from the client directly to the server node. The client request is put on the Admissions Queue for the appropriate Disk Manager thread. CmfsPrepare allows the client application to achieve all the "vir tual V C R " support implemented by the server. The four parameters which thus empower the client are: start, stop, speed, and skip. The start and stop positions in the stream are given as sequence identifiers. Th is defines the portion of the stream to be transferred. If start is later, than stop, the stream is delivered in rewind mode. Fast-motion or slow-motion display can be accomplished by the selection of speed and skip parameters. A value of 100 for speed and 0 for skip indicates that the stream 'This timeout value can be set differently for each system configuration. 51 is to be delivered at full speed (same speed at which it was recorded) and that no sequences are to be skipped. Increasing the value of speed to a value greater than 100 implies that more server bandwidth wi l l be necessary to obtain the desired display rate. Fast motion is more easily obtained by altering the skip parameter which wi l l cause the C M F S to only retrieve a subset of the sequences in a stream (i.e. skip=l indicates that one sequence wil l be skipped for every one read, skip=2 indicates 2 skipped for every one read, etc). Given the selection of parameters, the appropriate stream schedule is constructed and the request is presented for admission control . • The Disk Manager constructs the stream schedule for the requested portion of the object and performs admission control for the request. Details of of the disk admission control algorithms for the C M F S are given in Section.4.1. If the object can be scheduled from both the disk and the network point of view, a positive response is sent to the node manager and the schedule is updated. Cont ro l returns to the client when a sufficient quantity of data has been sent that the client is guaranteed to not encounter starvation. There is no other provision for start-up latency in the C M F S . If the object cannot be scheduled for immediate transmission, the request for delivery is refused. If the request is accepted, however, the server continues to read and transmit blocks subject to buffering constraints at the server and client.' The delivery of data is guaranteed in the sense that the server wi l l always send data ahead of time, or just in time to allow presentation of the data to the user. The correct arrival of this data cannot be guaranteed, but lost data can be compensated for by client applications. Starvation is prevented by sending the first slot of data before returning from the call to prepare a stream. A t the server, this requires scheduling the disk reads for the entire stream, completing the disk reads for the first slot, and sending the bytes of data across the network. This is shown in Figure 3.2. On a lightly loaded system, this may happen in a very small amount of time, and prepare could return as early as time T i (if the scheduling and reading 52 operation was done so quickly that buffers were available for send ahead at that t ime), although the data is not guaranteed to arrive until T2 (the end of slot n+2). If the client begins reading at T i , then later in time; the system may become heavily loaded, preventing transmission of data unti l the end of the guaranteed slot. Th is results in starvation for the client applicat ion. Therefore, the protocol waits until t ime T 2 before returning from CmfsPrepare. Slot n Slot n + 1 Slot n + 2 Slotn + 3 Schedule Read Slot 0 Read Slot 1 Read Slot 2 Stream S of Stream S Send Slot 0 Send Slot 1 of Stream S of Stream S Prepare arrives for Stream S Total Server Schedule (Real Time) Guaranteed Operations for Stream S Prepare returns for S t r e a m s Figure 3.2: Prepare T i m i n g s W h e n control is returned from the CmfsPrepare operation, the client is ready to read and process the media stream. T h i s ' i s done v ia CmfsRead requests. The first call to CmfsRead informs the server that processing of the stream has begun. This is done via the sending of a "start packet". The start packet tells the server at what time the client began reading. N o further communicat ion from the client to the server is necessary, because the server then assumes that the client wi l l continue to consume data at the rate which was specified in the prepare cal l . There is delay in the transmission of the start packet, so the client sends the local time (assuming synchronized clocks) inside the packet. Th is allows the server to get an estimate of network delay ( T a - T s ) . Addi t ional ly , the server calculates the proport ion of a slot that has been consumed at the client at the exact t ime of a slot boundary. O n the first t imer interrupt after the receipt of the start packet (at T c i 53 in Figure 3.3), a fraction of a slot proportional to the time T c i - Ts is added to the client buffer capacity and thereafter, complete slots are used. This is known as the Tota l Cl ient Credi t ( T C C ) schedule, which is calculated as the stream is delivered. Client Server T Time Time Figure 3.3: F i r s t Read Operat ion The server utilizes this information in the data delivery flow control mecha-nism. Every subsequent call to CmfsRead is a local client operation which simply passes the data from the network buffers to the application. Once CmfsPrepare has returned, the client must begin reading within a designated interval of time determined in the prepare request by the buffering allocated at the client. CmfsFree returns that storage to the system when the client application has finished using i t . Dur ing the delivery of an object, the client application may find it necessary to alter the delivery parameters. The parameters which may be adjusted are speed and skip. Th is can be accomplished by calling CmfsReprepare. The circumstances and mechanisms for implementing CmfsReprepare are given in Section 3.4. The call to terminate delivery of data is C'mfsStop. A request is sent directly to the Node Manager thread at the server node, which causes the Disk Manager to remove the object from its active list, as well as the related disk block requests. Queued server buffers are flushed without sending them across the network. Final ly , control is returned to the client. Before returning control to the application, the 54 client code for CmfsStop also throws away buffers that have been received at the client, but not yet consumed by CmfsRead operations. The identifier of the last ; sequence successfully processed is returned to the client application, so that display can resume at approximately the same place wi th in the stream. Metadata storage and retrieval. The administrator node contains information about each presentation object. Th is information is written by server nodes or client applications to be retrieved later. CmfsPutAttr stores an attr ibute, while CmfsGetAttr retrieves an attribute. The server node stores attributes of this nature during the creation of an object. Client applications may also make use of the attr ibute facility and store arbitrary metadata regarding an object. Some examples of attributes that a client might find useful are: date of creation, copyright owner, and encoding format. Client applications can also utilize the server attributes in a read-only fashion. There are l imited directory functions in this simple database that allow a client application to determine which UOIs are stored in the database. It is also possible to view the attributes that are associated with each U O I . Since the at tr ibute values are arbi trary bit strings which can be wri t ten by various client applications, some attr ibute values may not be useful to other client applications. The formats of the attributes writ ten by the server node are known to all applications. 3.3 S l o t S ize I m p l i c a t i o n s The choice of slot sizes has many implications for server resource usage. One such resource is the amount of memory needed for disk blocks to be buffered at the server and the client. The server performs at least double buffering of the da ta for each stream. Whi l e the data for slot n is being retrieved, the data for slot n — 1 is transferred to the client. If excess server bandwidth and buffer space exist, slots n + l,n + 2,... may be retrieved at the same time, but the min imum amount of 55 buffer space required for each stream-is two slot's worth, because the disk system fills one set of buffers, while the network system empties the other set of buffers. The network system empties the buffers and sends across the network at the negotiated bit rate until all the data it is required or allowed to send has been sent. The same process of buffering is performed at the client, where one set of buffers is used to read data from the network and the other set is used by the display system to decode and present the data. The C M F S has chosen to make the required client buffers the size of the two largest slots. In this case, there is space in which to receive all the data for the largest slot before decoding and presentation to the user as well as space to .• receive the next slot of data. For variable bit-rate data, it is possible that a large percentage of the data in a slot is required for a particular presentation unit, and so the entire slot's worth of data must be present before decoding, since the data arrives at a constant rate and may not be available in time if decoding starts early. Addi t ional ly , if the large amount of data was required for the last presentation unit, buffer space at the client would be st i l l in use for decoding when the data needed to be read for the next slot. Th i s would manifest itself in intra-sldt j i t ter in the former case, and buffer overflow in the latter case. Therefore, a slot size of several seconds would require multiple Megabytes of client buffer space for a moderate bandwidth video stream. For a stream with ! a peak-rate of 10 Megabi ts per second (1.2 MBytes / second) and a slot size of 5 seconds, this would be approximately 12 M B y t e s . For the same stream with a 500 msec slot, the client buffer space would be 10% of this value. The disk system keeps a schedule of the number of read operations which are required for every slot for the active streams on each disk. For a two-hour schedule and 500 sec slots, this is 14,400 slots. Each active stream also has a particular delivery schedule that indicates which bytes wi th in each disk block are to be delivered per slot, since not all the data must be delivered to the client in every case. W i t h small slot sizes, the amount of bookkeeping information that must be 56 stored at the server is quite significant. Larger slot times are better from a disk performance point of view, since a greater amount of contiguous reading is possible (assuming the data for a stream is stored contiguously on the disk). Smaller slot,times may increase the relative amount. of time the disk spends seeking, since the read operations for a slot correspond to a shorter playback durat ion. 3.4 D a t a D e l i v e r y a n d F l o w C o n t r o l A potential problem for the client is that the negotiated bandwidth for the individual connection may not be available for the entire .duration of retrieval. If significant loss is experienced in the presentation for some reason (perhaps the network becomes overloaded with unrelated traffic), a human user may become dissatisfied with the presentation. A few options exist for solving this problem. One option is to have the server detect this loss (either directly or by negative acknowledgments from the client), and automatically adjust da ta delivery to elimi- ' nate the network congestion. Th i s assumes that the client can st i l l decode enough of the residual portions of the stream which wi l l be sent and/or that the server can intelligently determine what to send and what to discard. A s well, the server would st i l l be reading all the data originally requested, ut i l izing disk resources for data that cannot be sent. A solution based on this principle is given in [82], where the client and server co-operate on defining the order of the units to be delivered to the client and the server continues to send complete presentation units. Not all units are sent and thus, a lower frame rate for video display is presented to the user. Another possibility is for the server to transcode the data which is read off the disk to provide a lower data rate for the stream. This would require extensive server C P U resources, or hardware support for every encoding format. A s well, disk bandwidth is used to extract data from the disk which cannot be sent to the client. In this case, a stream which cannot send all the data it is reading off the disk may 57 prevent other stream requests from being accepted due to this wasted bandwidth. W i t h point-to-point network connections between the server and the client, it is a better use of resources to accept streams which can be successfully delivered as well as retrieved. In keeping with the design philosophy of the C M F S which disregards stream encoding details, the best place to handle the degradation of the quali ty is in the client application. The client can issue a request to prepare the stream with dif-ferent delivery parameters, while maintaining as much continuity of presentation as possible. The interface to this facility is CmfsReprepare. If the server is able to support the new request, the new block schedule is used and buffers belonging to the original prepare request may be flushed and/or sent to ensure the continuity of presentation. Whether a block that is buffered at the server is sent or discarded depends on when it is required at the client. If a stream with low bandwidth is re-prepared, it would be more appropriate to discard blocks which are queued at the server, since it is conceivable that many seconds of data could have been read ahead, and these buffers no longer correspond to requested data. It would be very awkward to adjust the offsets in existing queued data so that only the bytes appro-priate for the new request request are transmitted. In the case of video, if only a small number of seconds of video are buffered, then continuing to send them would be more appropriate. The client must also have some way to determine when resources have been freed up so that the quality of transmission can be resumed. A n exponential back-off t iming mechanism could be used to incrementally request more bandwidth. The design of the C M F S makes this entirely a client issue. The client is responsible for determining when to issue a CmfsReprepare with' an increase in bandwidth re-quirements. The server translates this into a pseudo- CmfsPrepare request where the "stream" to be admitted is s imply the difference between the new schedule for the stream and the existing schedule. If that stream can be accepted, then CmfsRepre-58 pare returns successfully with increased presentation quality. Otherwise, delivery continues according to the previous schedule. To ensure that client buffers do not.overflow, the C M F S implements a mech-anism for flow control based on a credit. Credit is defined to be the number of bytes that the server is allowed to send to the client during the current slot. Th is value is determined based on the knowledge of client buffer space and network bandwidth associated with both the connection and the entire node. W h e n there is ample buffer space at the client, credit is issued so that the server can send at the full rate of the connection to fill client buffers and reduce future network ut i l izat ion. Once the client buffer has been filled, credit is issued based solely on the number of bytes of da ta that have been presented to the user in that slot time, and thus freed at the client. The mechanism is implemented by the Network Manager thread. Th is thread knows the rate of each connection and the amount of buffer space at each client as well as the amount of data to be displayed per slot. Wi thou t flow control of some kind, the Stream Manager would send as fast as the network would allow or as.fast as the disk could read, causing overflow at one or more of the following locations: 1) network buffers at the server, 2) buffers in the network, or 3) buffers at the client. The flow control prevents overflow or starvation by having the Stream M a n -ager wait for credit from the Network Manager before sending data across the net-work. Buffers are queued between the disk and the Stream Manager until the system runs out of buffer space. N o network communication from the client is required once the start packet from CmfsRead has been received. Further details of the implemen-tat ion of this mechanism can be found in Section 5.1. 3.5 Real-Time Writing The server allows reading and wri t ing to be performed at the same time. When the server is able to achieve a read bandwidth greater than minRead, the extra 59 bandwidth can be used for additional read-ahead, subject to buffer availability. Th is portion of the retrieval read-ahead is not guaranteed by the server, so the server can postpone this read-ahead in favour of wr i t ing an object to the server. The size of this "bonus" read ahead varies according to the block location and seek act ivi ty on the disk for retrieving the requests o f the accepted streams. If the amount of reading required in the current slot is less than minRead, the remaining bandwidth could also be used for wri t ing. Designing the system so that real-time wri t ing can take place is possible. Th is reservation may be in vain, however, because the server cannot require the client to provide the network packets at the given appropriate time in order to keep up with the promised rate of wri t ing to the disk. Reservation of bandwidth could be done in a similar manner to that done in a CmfsPrepare request, if the size of each presentation unit is known ahead of t ime. In this respect, read and write operations are exact inverse of each other. If the client is slower than the reserved rate, then server resources are allocated which are not being used properly. Retrieval may be denied when resources are actually available. The major reason that real time wri t ing is infeasible.for continuous media is that data must be done reliably. Th is requires retransmissions of lost or corrupt data. The round-trip latencies involved prevent guaranteed real-time delivery. 3.6 Implementation 3.6.1 Environment and Calibration The C M F S has been implemented and tested on several hardware and software platforms. M o s t of these are UNIX-based workstation environments. In particular, versions of the server exist for I B M R S / 6 0 0 0 computers using A I X , S U N Sparcsta-tions running S U N OS 4.1 or Solaris 2.5, and Pentium-based P C s running L inux , F r e e B S D , Solaris, or Windows N T . 60 Client applications have been writ ten on all of these platforms, as well as • Windows 95. The client applications range from simple directory listing programs; to complete wri t ing utilities and several display clients. The display client for the-S U N Sparc architecture utilizes a Para l lax M J P E G decoder card, the I B M A I X client utilizes the Ul t imot ion M J P E G decoder card. B o t h of these clients request independent audio and video streams and synchronize them at the client. The Paral lax decoder is capable of displaying N T S C quality video (640 x 480) at 30 frames per second, as is the Ul t imot ion decoder card. Unfortunately, as mentioned previously, the Ul t imot ion card is not capable of encoding at 30 frames per second. Client applications that decode M P E G video in software on U N I X have also been wri t ten, but sustain much lower frame rates and resolutions. A Windows client uses a Real Mag ic M P E G decoding card as well as software decoding. A s well, various dummy clients have been implemented and used for stress testing the server. These clients discard all the data and are used to keep various statistics on the delivery of the data; The network environment for ini t ia l testing consisted of a Newbridge A T M network switch connecting the clients, the administrator, and the server nodes v ia 100 M b p s multi-mode fibre. Th is provided a small scale C M F S with one adminis-trator and as many as two nodes. Another server environment has been established on a 100 M b p s switched Ethernet network. Several Pent ium based machines are connected to this network. Currently, the hardware platforms on which the C M F S has been implemented utilize the raw disk interface as provided in the U N I X operating system in configu-rations that have dedicated disks. These disks are attached by a S C S I 2 F a s t / W i d e adapter providing a bandwidth of 20 M B y t e s / s e c o n d . The configuration of such nodes contains four disks with 2 G B y t e s capacity each. The low-level I / O facili-ties of some versions of U N I X provide an asynchronous mechanism for reading and wri t ing of data blocks. Th i s feature is utilized wherever possible. The server node 61 issues requests in groups so that the disk controller (typically SCSI) and lower-level software/firmware can reorder the requests,for the best performance. W h e n buffer space is available, minRead requests are issued simultaneously. They are guaranteed to complete within a slot t ime. When fewer than minRead buffers are available, re-quests are made simultaneously for the number of available buffers, as there is no point in delaying the disk requests unnecessarily so that the asynchronous paral-lelism can be achieved. The ini t ia l calibration of the disk utilized the raw interface for A I X con- > nected to a Seagate Barracuda model ST32550W on an I B M RS/6000 model 250. A bandwidth of 40 blocks per second was achieved in every test of the calibration program, suggesting 20 as the value for minRead. The read requests were each for one 64 K B y t e block when the blocks were spaced evenly across the surface of the disk. When asynchronous facilities were used in the same test, 23 blocks was the largest number of.evenly spaced requests that could be satisfied within 500 m s e c ' The worst case read time for 23 blocks was 508 msec. Given t iming granularities and the fact that this is a worst case example, 23 is a more accurate value for minRead. A third method of calibration utilized the C M F S to calibrate the disk per-formance. Simultaneous requests for several C B R streams were submitted to the server to determine the worst case disk performance. The server was capable of supporting 23 streams which were spread out across the entire surface of a single' disk and required an average of 1 block per slot, so the level of seek act ivi ty was high. Dur ing this calibration phase, an anomaly regarding disk performance was observed. One of the disks was capable of reading 26 blocks per slot if it occupied a certain position on the S C S I chain. If the disks were physically rearranged, then it was able to only achieve 23 blocks per slot. Some of the examples in Chapter 4 use 26 as a value for minRead, as some of the ini t ia l experimental work was carried out on that disk. Such anomalies highlight the importance of using a calibration 62 program to calculate miriRead rather than static analysis based- on the disk drive technical characteristics. 3.6.2 Implementation Environment The C M F S is implemented in C and utilizes a user-level threads l ibrary (RT Threads '[62]), developed at U B C to support real-time distributed software systems. R T Threads provides mechanisms for co-ordinating access to shared data within a U N I X process v ia semaphores and uses a Send/Receive /Reply message passing mechanism 'that can be used between threads in the same address space or between threads in different address spaces. On operating systems (such as Windows N T and Solaris) which are already threads-based, or have system level threads available, an R T Threads application can run within a single thread or R T Threads can be mapped one-to-one with system-level threads. Detailed performance analysis in Mechler [62] shows that in nearly all cases, R T Threads performance is comparable to tha t . provided by host operating systems. In some cases, the primitives provided by R T Threads significantly outperform those of the host operating system. In particular, the real-time features of Solaris Threads and A I X Threads either require special privileges (such as running as root) in order to achieve real-time performance or create situations in which overloading the system makes the entire machine unusable [69]. 3.6.3 Transport Protocol Implementations The details of the transport protocol are beyond the scope of this dissertation as the low-level delivery of data does not influence the internal design of the server. A l l the server requires is: 1) request/response messages and continuous media storage messages must be delivered reliably, using a bounded quantity of server resources, and 2) continuous media data must be delivered to client applications on time. The overhead associated with protocol stack processing does affect the amount of 63 remaining processor resources at the server node and client, however, so a few words are in order. The minimum requirements of a transport protocol with respect to the C M F S are that: 1) retransmissions of lost C M data do not affect the timing of delivery at the presentation device (i.e. the client application is unaware of retransmissions), 2) lost data can be detected, and 3) quality of service parameters can be specified for the connection. Retransmissions need not be excluded, but should only use excess bandwidth and then only to send packets that will be delivered before the application's deadline [82]. In other words, for the C M F S , if it is possible for the transport layer to detect missing data for a stream that has several seconds of data enqueued at the client,.retransmitting that packet will be invisible to the client (in terms of delay) as long as the packet arrives before the application requests it. The server must make some decision as to whether or not it has the bandwidth to re-send the data. In order to be capable of performing retransmissions, the server must retain packets previously sent on a connection for some length of time in anticipation of retransmission requests. The server must also decide how much buffer space it can devote to these packets. In some cases, the packet may have expired at the server before the retransmission request is received. Since many existing hardware environments do not efficiently support quality of service specification, it is possible to relax the third condition in some cases. Where quality of service is not supported in the network, the server can be installed so as to provide service for low to medium bandwidth requests that are below the bandwidth of the entire network. The network bandwidth itself becomes the limiting performance factor and neither the disk bandwidth or the network interface limits can be reached. A server configured in this manner can still be used effectively for these type of requests (such as audio or lecture slides, etc.). The first choice for a transport level protocol for both messages and raw data was X T P [83]. It provides the ability to have reliable or non-reliable data flows on 64 either side of a connection along with other QoS parameters. Init ial experiences indicated that the implementation of X T P in our environment incurred significant queuing and processing overhead which limited the amount of throughput and thus, the number of simultaneous users. Another protocol was then developed for raw data transfer which utilized the basic features of U D P / I P with some added sequence number checking. T C P / I P was used to implement the reliable message transfer protocol. The data transfer protocol is called M T (Media Transport) and is described in [63]. In certain network environ-ments, the round tr ip time for messages sent v ia T C P / I P was often unacceptably high. Thus, a UDP-based reliable request/response protocol was introduced for the transmission of reliable messages which fit into a single U D P packet. Addi t ional ly , another protocol which has been used for data transfer is R T P (Real-time Transport Protocol) [79]. Th i s protocol provides timestamps which a client application may utilize in sending data to decoders and/or display devices. R T P operates on top of M T . M T provides sequence numbers for detecting holes and just-in-time retransmission. There are at least two advantages to the client for using R T P . F i r s t , the latency in receiving t iming information for the stream is eliminated, since this infor-mation is placed in the stream. Previous client applications obtained this t iming in-formation from attributes stored in the administrator database. For low-bandwidth clients (such as those accessing data across a modem-link) , this delay is unaccept-able. The t iming information placed in an R T P packet is the display time of the first presentation unit contained within the packet. T i m i n g information about sub-sequent presentation units in the packet can then be determined either by parsing the data itself (depending on the format) or by using an R T P payload type which includes such information. The second major advantage is that using a standard R T P payload type to transmit C M F S data allows n o n - C M F S client applications to be the ult imate 65 recipient of the data, without the need for parsing the C M F S header information on-the-fly at some intermediate location. The control of the transmission would need to be performed by some proxy client, but the raw data could be sent directly to a different client. Th i s discussion of protocol implementations has been restricted to unicast point-to-point communicat ion. This can be extended to multi-cast transmission [21]. It is quite possible that several clients could request the same object at ap-proximately the same t ime. A simple enhancement to the server would be to treat this as a single request. It could do this by creating a multi-cast group for the re-ceivers, and retrieving only one copy of the object, sending it out on the multi-cast address. It is also possible to have a proxy,client represent the multiple recipients, so this functionality could be provided in a manner transparent to the server. 3.6.4 Server M e m o r y Requirements One of the uses of memory at the server is to keep state information. Large data structures are needed at two levels: per-disk and per-stream. Each of the prepared connections has a significant amount of storage dedicated to recording the precise blocks which must be read and the bytes which must be delivered. If a large number of simultaneous streams are permitted, this memory usage may be very large. The first da ta structure stored is the' schedule for each disk. This is used in admission control and contains one integer for each slot. Th is is 14,400*4 = 57,600 bytes per disk for a two-hour circular schedule. The more resource-intensive data structure is the specific block list and cor-responding offsets into each block that must be stored for each prepared stream. This is also influenced by the size of the sequences used in storing the object and whether or not the sequences to be delivered are stored contiguously. If some of the stream is to be skipped, there wi l l be discontiguities in the disk block locations for a stream. For each stream in each slot, there is a blockDescriptor structure. Th is 66 contains the start ing block number, the number of blocks to be read and the ending offset byte pointer, as well as some other counters and flags. For large bandwidth streams that are stored contiguously on the disk, this amounts to 28 bytes per slot. For a 10 minute stream, this is 33,600 bytes. A server which is capable of supporting 100 streams of this length requires 3.3 M B y t e s just for the block descriptor array. For 100 minute streams, this would be 33 M B y t e s . For the network system to prop-erly apportion credit to the connection for each stream, the playout vector is stored as part of the connection state, so this adds 4 bytes per slot as well. The situation is worse if small sequences are used and a non-zero value of skip is provided in the prepare request. A linked list of fragment descriptors is kept for each sequence that must be retrieved within a slot. Th is could add as much as 12*14=168 bytes per slot for the 14 extra sequences with 500 msec slots, 30 frame per second video objects and a sequence size of 1 frame. In this worst case, approximately 200 bytes are required per slot and this would be 2,400,000 bytes for a 100 minute stream. This consideration in scalability must be taken into account when configuring the C M F S . It should be noted that in order to support such a large number of streams from a single server node, either the number of disks must be large, or the individual stream bandwidth must be small . In the former case, server memory for disk buffers would also be great and the total memory requirements would be very large. In the latter case, it could be that less buffer space is needed to support the variable bit-rate streams at lower bit-rates. Thus, more of the server memory could be allocated to connection state management. Another use of memory at the server is for buffer structures (QueueBuffers) which are manipulated by the network sending process. A pointer to the disk buffer along wi th offsets is stored for every block that is read off the disk. Thus, this amount of memory has a lower bound. If a buffer is to be shared for more than one slot, then a new QueueBuffer structure is created to indicate the starting and 67 ending offset into the block, since only part of the block is to be sent during that particular slot time. Some obvious optimizations could be. made which would reduce the total memory usage, but not by a significant factor. A single disk server node support-ing up to 10 large bandwidth video streams and 10 associated audio streams (of playback length not more than 10 minutes) would require a modest 672^000 bytes, at a min imum. If small sequences were used, the.memory needed for schedule and connection state management could reach as high as 5 M B y t e s . Al though the server is capable of handling,many different types of streams at differing bit-rates, it may be advantageous, purely from a performance/cost point of view to configure systems differently for low bit-rate audio streams of low variabil i ty (or constant bit-rates) than for highly variable, high-quality video streams. A server that must store both types of streams could use a hybrid approach. The C B R stream server could have more memory associated with connection state, since fewer buffers are needed to smooth out peaks, while the highly-variable video streams could l imit the number of simultaneous connections thereby freeing more space for use as disk buffers. 3.7 Example Client Application The flexible A P I of the C M F S allows a client application to arbi trar i ly combine mul-tiple streams for presentation to the user. In this section, an example is developed which illustrates this flexibility. Some of the details involved in accessing data from different servers is omit ted. Consider a video display client wi th associated audio and text players. The user requests a particular copy of the audio, namely the Japanese language with classical music background, the close captioned text in Engl ish , and a 30 frame per second video object of the news story. The user does not care which copy of the video or text object is retrieved. Sample code fragments in the C programming 68 language are shown in the following description for each major interaction with the server. The first operation is to identify the server node to which the client first wishes to make contact. Since the. client knows the identity of the server with the Japanese audio, this is the first server contacted. Cmfslnit(jap_admin_addr, ADMINPORT); Then, the objects associated wi th the presentation must each have a connec-tion opened for them. Some method of determining the UOIs is 'required and then the following open calls are performed. i f (CmfsOpen(LL_audioUOI, myCallback, ftaudioPrepBound, &a_cid, - l o c a l l p , 0) != STREAMOK) { printf("CmfsQpen f a i l e d for Audio Stream\n"); return ( - 1 ) ; i f (Cmfs0pen(HL_videoU0I,.myCallback, ftvideoPrepBound, &v_cid, l o c a l l p , 0) != STREAMOK) { printf("CmfsOpen f a i l e d for Video Stream\n"); return ( - 1 ) ; > i f (Cmfs0pen(HL_textU0I, txt_Cal lback, &textPrepBound, &t_c id , l o c a l l p , 0 ) != STREAMOK) { printf("CmfsOpen f a i l e d for Text Stream\n"); return ( - 1 ) ; > If the CmfsOpen cal l fails because the object does not exist on that server, then a call to CmfsLookup could be performed. The next operation is the prepare of each stream. Assume that it is possible to structure the application as a set of threads which independently request transfer of the presentation object. If vPrepBound is 2.3 seconds and aPrepBound is 1.6 seconds and tPrepBound is 0.7 seconds, then the following code fragments could be used as the bodies of each mono-media player. Each thread waits a different amount 69 of time before issuing CmfsPrepare so that it can more easily schedule the CmfsRead operations in the proper t ime. A u d i o Thread: WakeUpAt( now + .7 seconds); i f ((status = CmfsPrepare(a_cid, scheduletime, STARTOFSTREAM, ENDOFSTREAM, 100, SkipFactor, delay)) != STREAMOK) { f pr i n t f (stderr, "CmfsPrepare f a i l e d , status = '/,d\n" , status); return (-1); } do { status = CmfsRead(cid, (void **)&buf, (int *)&numRead); switch (status) { /* .... put data into device queue */ /* free data buffer i f a l l data i s accounted f o r */ i f ( allDataUsed) CmfsFree(buf); } while (!done); ' Video Thread: i f ((status = CmfsPrepare(a_cid, scheduletime, STARTOFSTREAM, ENDOFSTREAM, 100, SkipFactor, delay)) != STREAMOK) { f p r i n t f (stderr, "CmfsPrepare f a i l e d , status = '/,d\n", status); return (-1); > do { status = CmfsRead(cid, (void **)&buf, (int *)&numRead); switch (status) { /* .... put data into device queue */ i f (framelsComplete) sendDataToDisplayDeviceO; i f ( allDataUsed) CmfsFree(buf); } while (!done); Text Thread: WakeUpAt( now + 1.6 seconds); i f ((status = CmfsPrepare(t_cid, scheduletime, STARTOFSTREAM, ENDOFSTREAM, 100, SkipFactor, delay)) != STREAMOK) { f p r i n t f (stderr, "CmfsPrepare f a i l e d , status = 7,d\n", status); 70 return (-1); } do { status = CmfsRead(cid, (void **)&buf, (int *)&numRead); switch (status) { /* .... put data into device queue .....*/ /* free data buffer i f a l l data i s accounted for */ i f ( allDataUsed) CmfsFree(buf); } while (!done); The synchronization of these streams could be performed by another group of threads which wait on a barrier, or other synchronization primitive and then display the appropriate data on the respective device. The primary display client (Parallax SUN M J P E G client) which has been used for demonstration purposes and for some performance testing uses the audio device as the master and re-synchronizes the streams once a second. It is important to note that in rewind mode, the sequences are sent in reverse order, but the data in each sequence is sent forwards. This allows some of the contiguity of the placement to be used in disk retrieval. It also permits the server to be unaware of presentation unit boundaries. Although this information is present at the server, the effort involved to retrieve and send the presentation units in reverse order was not considered a wise use of processor time. In the case of M P E G video data, this would completely confuse any decoder, because it would require the video frames in forward order for proper decoding of the inter-coded frames. It is the client's responsibility to determine in what order to present the video frames to the user. The Parallax video client places the decoded video frames on a software stack as they are received across the network and then pops the stack once an entire sequence has been received. 71 Chapter 4 Disk Admission Control The second major contribution of this dissertation is the development of a detailed disk admission control algorithm that explicit ly considers the variabil i ty in the band-width requirements of each presentation object. Th is algori thm examines both the raw disk bandwidth and the server buffer space available when determining if the server has enough disk and memory resources to retrieve the data stream associated with each new request. Th is is in contrast to other approaches which consider one of these two resources in isolation, or provide a coarse-grained characterization of the stream bandwidth over t ime. The disk admission algorithm emulates/simulates the disk reading of all the blocks required for the set of streams presented to the server when a new request arrives, and so it is called the vbrSim algori thm [68]. Th is is a somewhat misleading name, as the algori thm does not perform a simulat ion, but rather a worst case emulation of which disk reads would be performed during each slot time. The purpose of this chapter is to examine all aspects of the disk admis-sion control question. Several alternative approaches to disk admission control are presented. They are compared with the vbrSim algori thm in terms of complexity and accuracy of admission results. Next , a series of performance tests are analyzed which show that the algorithm performs well on real data that is representative 72 of a News-On-Demand environment with high-quality, full-motion video streams. These experiments expand on the ini t ia l findings reported in Makaroff et al . [58]. If a request pattern that has a stagger between requests of several seconds, enough streams can be accepted such that requests for nearly all of the disk bandwidth can be reserved at the same time. Even in situations where requests arrive simultane-ously, the vbrSim algorithm accepts stream requests for up to 2 0 % more bandwidth than the next best deterministic-guarantee algori thm. A s a conclusion to the chap-ter, an analytical discussion of the asymptotic behaviour of the vbrSim a lgori thm is presented. Th is shows that the admission performance compared with an opt imab algorithm degrades linearly as the estimate of disk performance (minRead) differs from the observed rate of disk performance. " 4.1 Admission Control Algorithm Design There are several possible approaches to disk admission control . They can be deterministic-guarantee algorithms or statistical-guarantee algorithms. Provid ing a deterministic guarantee ensures that there wi l l be no loss of continuity because the server's requirements were over-subscribed. Such admission control algorithms may be too conservative a,nd admit too few streams, thereby under-util izing the available resources. O n the other hand, statistical-guarantee admission policies can typically admit more streams, resulting in better ut i l izat ion. It is possible that such an algorithm admits too many streams, resulting in over-util ization, which manifests itself as delay or loss of data at the client. Al though probabilistic methods exist to amortize the cost to the clients of this failure [7, 87], this is undesirable in general. Th is is a tradeoff that must be evaluated when designing a C M F S , and indeed any system that provides quality of service guarantees. Deterministic-guarantee algorithms tend to consider peak bandwidth require-ments to prevent overload situations, while the aggressive algorithms use average re-quirements and summary characterizations. In this section, five distinct approaches 73 to V B R disk admission algorithms are considered. Only four of these can be i m -plemented. Each algorithm represents a class of admission approaches which pro-vide generally similar results. They are examined analytically and quanti tat ively in terms of admission performance and buffer uti l ization for a realistic set of stream re-quests. Three deterministic-guarantee algorithms are considered: Simple Maximum, Instantaneous Maximum, and vbrSim. - One algorithm provides a statistical guaran-tee: Average. The results will show that the vbrSim algori thm can efficiently make correct admission decisions and that its admission performance approaches that of an opt imal algori thm. O f the three deterministic algorithms, vbrSim is provably the best in admission performance. In order to accept more streams, significant buffer space is required to accommodate read-ahead. It wi l l also be shown that under realistic server conditions, vbrSim also outperforms the Average a lgori thm. 4.1.1 Experimental Setup and System Measurements A set of stream requests submitted to a C M F S as a unit is defined as a scenario. Scenarios can consist of simultaneous request arrivals, in which case all acceptance decisions are made during the same slot time. They may also consist of staggered arrivals, modeling a more realistic workload for a single disk in a C M F S . To analyze the disk admission algorithms, all scenarios considered in this chapter are for streams located on the same disk. The scenarios which are used for testing are described in detail in Append ix B . When requests are staggered, a uniform stagger is used. Th i s method was chosen partly because it was easy to implement and enforce with the client appli-cation software available, but also because it provided the best performance. The benefit of staggered arrivals comes from contiguous reading and with even amounts of time reading each stream, the results show that the system is able to read from one stream only until all streams are active. Enough read-ahead is achieved during the start-up time for each stream that no more blocks are needed while the next 74 stream is at tempting to catch up. The worst case would be if most of the streams arrived together with some arr iving a long time later. Th is situation would suffer the seek penalty of having, multiple streams start at approximately the same time. It could also have the effect of having a larger amount of read-ahead for a single stream if the delay between the first and the second stream was long. Al though it is unclear what the exact effect of a non-uniform stagger would be on read-ahead, these tests assume that a uniform stagger of n seconds would not be significantly different than staggers randomly or normally distributed wi th a mean value of n seconds. To understand the relevant differences between the disk admissions algo-rithms, three resources are measured: the C P U cycles used in determining admis-sibility, the disk read bandwidth, and the number of buffers available. The number of machine instructions required to execute the admission control algori thm is im-portant, because a very accurate algorithm that cannot make a decision in a timely manner is not useful in a real-time system such as the C M F S . The bandwidth mea-sure has three components itself: the first is the bandwidth guaranteed by the system (previously defined as minRead). Th is estimate is used by all algorithms as the ca-pacity of the disk subsystem. The second component is the bandwidth requested by the set of stream requests (via CmfsPrepare) that comprise a typical workload sub-mitted to the server. This is measured as the sum of the average bandwidths of each stream, and represents a more realistic measure of the service provided than simply the number of simultaneous users. The third component is the actual bandwidth achieved in the delivery of a scenario. A n algorithm is considered to perform well if it can accept a scenario with average requirements that approach or exceed minRead and approach the actual bandwidth achievable. The number of buffers available to the algorithms is l imited by the amount of main memory at the server. A n algorithm which makes use of significant buffer space is more costly than one which does not, and wil l reject streams if the buffer requirements exceed the available capacity even 75 though the disk bandwidth l imit may not impose a restriction. Before the differences in each algori thm are described, the common activities within each approach wil l be identified. In some algorithms, the result of one or more of the steps described may be precomputed off-line and the results stored for use at admission time. Whenever a client makes a prepare request for a portion of a media stream at a particular display rate, a block schedule for the stream is created that contains one entry per slot for the duration of the stream playout. Each entry in the schedule is the number of disk blocks (in this instance, 64 K B y t e blocks) that must be read ; and delivered for the stream in that slot to ensure continuous client playout. These : values are influenced by the speed and skip parameters of the prepare request. The input for the block schedule calculation is the playout vector that was stored when the stream was wri t ten to the server as well as the start, stop, speed, and skip parameters. For a constant bit rate stream, each value in the block schedule would be the same (modulo disk block granulari ty) . The values would vary for V B R streams in a manner dependent on the encoding. For instance, Figure 4.1 presents an excerpt of the block schedule from one of our sample streams. The number of ! blocks in the schedule may actually provide more data than required for a particular slot, because blocks are always read in their entirety to maximize performance. A specific block schedule for an entire stream (Maproom - Raiders) is shown in Figure 4.2. This particular schedule is from a six-minute scene from the movie Raiders of the Lost Ark. Slot 1 2 3 4 5 6 7 8 9 10 11 12 Blocks 2 3 6 6 6 7 6 6 7 7 6 8 Figure 4.1: Typ ica l Stream Block Schedule 76 MapRoom Block Schedule C O C M l O O T f o O C M t O O ^ r c O C M l O O T f Q O C M T- ei et m m ' *t ** •* v> u> ID ID <D t*. Disk Slot Number Figure 4.2: Stream Block Schedule (Entire Object) A s blocks are read from the disk, they are stored into buffers which are then passed to the network for transmission to the clients. The speed at which buffers are filled is dependent on how fast the server reads blocks from the disk (it wil l be at least as fast as minRead). The speed at which the buffers are freed depends on how quickly the network can transmit the data to the client. Th is latter speed is itself dependent on the speed of the network and the number of buffers that the client has allocated to receive the data. The network management system is assumed to transmit data only as fast as the client can consume the data. The cumulative block schedules are combined into a server block schedule. A l l server block schedules that have the disk in a state where the requirements do not 77 exceed the resources are said to be valid schedules, corresponding to valid scenarios. This characteristic is independent of whether any of the algorithms admits all the streams in a scenario. 4.1.2 Simple M a x i m u m The most straightforward characterization of a stream is to reduce the bandwidth requirement description to a single number. In the Simple Maximum a lgori thm, the maximum number of reads required in any slot is chosen. Th i s is referred to as peak allocation [24, 89] in other research. If the sum of this maximum value for the new stream plus the current sum of the values for the accepted set of streams is greater than minRead, the new stream must be rejected. Using the block schedule in Figure 4.1, for example, 8 would be chosen as the value for the stream. If the current sum was 17 and minRead equaled 23, the new sum of 25 would result in a rejection. A clear advantage of this algori thm is its simplici ty. If the variation in the stream's block schedule is small , then this is a reasonable algori thm. In fact, it has been used in several C B R file systems [32, 73, 74]. Another advantage of this algorithm is that it produces deterministic guarantees for reading from the disk. Unfortunately, it significantly under-utilizes the resources as block schedule variation increases, rejecting streams which could be delivered. If the peak is twice the average, as is the case in most of the streams digitized for the performance tests, no requests for total bandwidth greater than 50% of minRead could be accepted. In one particular study [8], twelve video samples were used where the peak to mean ratio ranged from 6.6 to 13.4. In such an environment, Simple Maximum, would accept a very small number of streams and waste a large amount of bandwidth. 78 4.1.3 Instantaneous M a x i m u m The next admission control algorithm considered keeps the sum of all of the currently admitted stream block schedules in a vector called the server block schedule. When a new stream is to be admitted, its block schedule is added to the current server block schedule. If the value in any slot in the resulting schedule is greater than minRead, the new stream is rejected; otherwise it is accepted. A variation of this algorithm is described in Chang and Zakhor [16]. The following example illustrates this' method. A g a i n , assume that minRead. = 23. Figure 4.3 shows the current server block schedule and the hew stream's block schedule. The entire block schedule for an individual stream is combined with the server block schedule for streams which are already admit ted. In this case, the i + 2nd slot would have a value of 26 which is higher than the min imum number of blocks the server can read (23). The new stream must be rejected. Current Server Schedule 13 15 19 2 •7 9 3 6 i-1 i i+1 i+2 New Stream Block Schedule 4 4. 7 3 2 0 1 • 2 3 4 Combined Server Schedule 17 19 26 5 9 9 3 6 i-1 i i+1 i+2 Figure 4.3: Server Schedule Dur ing Admiss ion It is possible that delaying the acceptance of the new stream by shifting the 79 block schedule for the new stream into the future by a small number of slots could eliminate peaks that caused a rejection. It is equally likely that this shift could produce a peak of equal or greater magnitude than without shifting. Regardless of the effect on the shape of the resulting server block schedule, allowing such shifting prevents the server node from guaranteeing a bound, on the time that CmfsPrepare can take and the client can begin reading. It also increases the worst case execution time of the admission algorithm by a constant factor, namely the number of slots into the future that the user is wi l l ing to wait. This analysis assumes that this number is zero. . A complete server block schedule is shown in Figure 4.4. Th i s scenario was a simultaneous request for six of the streams from Table 2.1. Th i s is scenario 101 with the ini t ia l set of streams from Append ix B . It is clear that there are many peaks in bandwidth above the average of. approximately 22 blocks per slot. For the Instantaneous Maximum algorithm to accept the scenario, a minRead value of 32 is required. Th i s algori thm also provides a deterministic guarantee of delivery, and it can do no worse than Simple Maximum since it performs a more fine-grain evaluation of the schedules. It is st i l l rather conservative and also may reject streams the disk system could deliver. Theorem 1 The set of streams accepted by Simple M a x i m u m is a subset set of the set of streams accepted by Instantaneous M a x i m u m . Proof. Let the blocks required; by stream S in slot k be defined as Blocks$:k-Assume there is some scenario X which is accepted by Simple Maximum, but is rejected by Instantaneous Maximum. Thus , there must be some slot j in which Sn Blockssj > minRead. (4-1) S = 5n Since Simple Maximum accepts the stream scenario, it must be the case that 80 33 Instantaneous Maximum Cumulative Average Bit Rate OO ID O T CO CM to O <? CO CM to o T - i - C M O J C M O O O f T T l O l O l O Disk Slot Number Figure 4.4: Server Block Schedule Bs <= minRead, (4.2) s=S0 where Bs is the peak rate for stream s. B u t for every slot, Blockssj <= Bs, so (4.3) the Instantaneous Maximum algorithm must also accept the stream. • Blockssj <= ^2 Bs <= minRead, S=SQ S=SQ The Instantaneous Maximum algori thm is equivalent to the Simple Maximum algorithm if the block schedule used contains the peak rate in each slot, rather 81 than the actual variable bit rate profile. The Instantaneous Maximum algori thm is capable of accepting more streams than Simple Maximum if there is at least one slot where the value in one stream's block schedule is less than the maximum. The actual difference in acceptance behaviour for real streams depends on the specific value of minRead, the bandwidth of the streams in relation to minRead, and the juxtaposi t ion of peaks from some streams and valleys in others. If the value of minRead is such that a large reduction in the peak is necessary to accept an additional large-bandwidth video stream, then very little performance difference between the approaches wi l l be observed. 4.1.4 Average One problem with both of the previous algorithms is that they do not take into account the amount of read-ahead possible during slots that require fewer than minRead blocks. If the server is permitted to read data earlier than required, then some of the peaks in a V B R schedule can be smoothed by reading early. One way to utilize the fact that read-ahead occurs is to consider the average blocks per slot for each stream as the bandwidth characterization. If the average is used instead of the peak value, the same process can be used as for the Simple Maximum a lgori thm. C a l l this the Average a lgori thm. This algori thm wil l admit more V B R streams than the previous algorithms, because the average bit rate is less than the maximum of each stream and also less than the possible bandwidth peaks in combinations of streams. Unfortunately, this algorithm does not provide deterministic guarantees that it wi l l be able to deliver each block on time. Theorem 2 The set of streams accepted by Instantaneous M a x i m u m is a subset of the set of streams accepted by Average. 82 Proof. Assume scenario X is accepted by I-Max, but not by Average. For I-Max to accept the scenario, this would require that for all slots k, Sn Blockss^k < minRead. (4-4) s = s 0 The cumulative requirements of scenario A" over k slots is Sn BtocksSik < k * minRead, (4-5) fc s = s 0 since each slot value is less than or equal to minRead. Th is can be rearranged as Sn Blockss^ < k * minRead. (4-6) s = s 0 fc But , S = So A: S = 5o S = So where f.is is the average bit rate of stream s. Thus, Sn k * ^2 /-h < k * minRead. (4-8) s = s o Div id ing by fc, yus < minRead. (4-9) s = s 0 The Average algorithm must also accept this scenario. • M a n y slots wil l require more bits than the average for a video stream with alternating scenes of varying complexity. On ly the most extreme cases would have a small number of very high peaks, and all the remainder below the average. Even accounting for the aggregation into large disk blocks, a large proportion of the slots wil l likely have requirements above the average. If there is insufficient da ta buffered for these high bandwidth periods, the server wi l l be unable to deliver the data to the client. Interestingly, as wi l l be shown later, there are circumstances in which 83 the average algorithm is also conservative in its admission decisions and thereby under-utilizes the server resources. There are many variants of the Average a lgori thm. Some of them take into account the shape and relative occurrence of bandwidth peaks. The characteristic data rate histogram is used by Biersack and Thiesse [8]. Here, the probability of overload is computed and a stream is rejected if the probabili ty is too high. V i n et a l . [87] use a Gaussian distr ibution to model the video trace and Chang and Zakhor [13] calculate a histogram for the amount of buffer needed per video stream. Even with the more complicated characterizations, admission control is based on calculating an overflow probabili ty and accepting all requests that provide an acceptable level of failure probability. They are all more sophisticated than the simplistic version of Average which is given in this section, because these are quantitative ways to estimate when and by how much the system wil l fail to meet its commitments. Since these algorithms consider more than just the simple average, the abili ty to ensure a low probabili ty of overload indicates that they wi l l be more conservative in admission decisions than Average. 4.1.5 vbrSim The next algorithm uses both read-ahead provided by buffer space at the server and the detailed disk block schedule for each stream. The admission process emulates the execution of the disk management system for the durat ion of the server block schedule. In order to admit more streams, the server reads ahead as fast as it can. B y permit t ing the server to read ahead, a stream can be admitted even if the total number of blocks required wi th in a single slot for all accepted streams is greater than minRead. Read-ahead Simulation. Consider the block schedules shown in Figure 4.3. Re-call that slot i + 2 had a value of 26 which is higher than minRead (23). If the server is permitted to read more than the required number, however, some of slot i + l ' s 84 blocks (6 of them) would be read during slot i. Likewise, in slot i + 4 more blocks would be read early. Only 16 blocks remain which are required in slot i + 2. This read-ahead permits the given schedule to be accepted. B y the time the server reads the data for slot i + 2, it wi l l st i l l be in slot i + 1 in terms of real time, assuming the server reads no faster than the guarantee. The extra blocks are held in buffers at the server for transmission at the appropriate time. If a slot requires more blocks than can be accounted for by both minRead and the accumulated read-ahead, then the scenario must be rejected as the disk cannot guarantee that ail the data wi l l be read off the disk. It may be the case that the disk is capable of reading all the data in time, but a deterministic guarantee cannot be provided. The server, in practice, often reads more than the min imum number of blocks per slot. Whi l e this cannot be counted on for the future, vbrSim takes advantage of read-ahead that has already been accomplished in the past when making admission decisions. The algorithm starts with the number of blocks already read ahead and then continues the simulation assuming the server wi l l read the minRead blocks in each future slot. Thus, when a request arrives, there may be many slots in the current server block schedule that have values of zero due to read-ahead. So far in this discussion, it has been assumed that there are an arbitrary number of buffers. Th i s permits the simulation to read ahead at minRead indefinitely without fear of running out of buffers. Clearly this is not the case, even though a large portion of the server memory is dedicated to a pool of disk block buffers. Therefore, the simulation of unbounded read-ahead stops when there are no free buffers. A t this point, blocks can only be read as quickly as buffers are released at the server by transmission across the network. , For purposes of buffer consumption, it is assumed the server reads minRead blocks per slot. A s buffers are transmitted on the network, they are freed. The number of buffers being freed is factored into the admissions algori thm. D a t a is transferred to the client, and the corresponding buffers freed, however, at a rate 85 which depends on the amount of buffer space available at the client and the negoti-ated bit rate of the network connection. Another vector, called the buffer allocation vector, is maintained for this purpose. Th is vector is ini t ia l ly the same as the server schedule vector. A s data is transferred to the client and buffers are freed, the values in the vector are decremented. Note that there is a one-slot delay in recovering buffers. Buffers containing data to be sent to ;a client in- slot i are not reclaimed unti l at least the start of slot i + 1. It may be the case that buffers are not available ' unti l much later than one slot for low-bandwidth streams (i.e. 64 K b p s audio or 20 K b p s video) that have one disk, block providing data for many slots. Thus, the admission algorithm simulates the availability of buffers in the actual slot during which they wi l l be released. ' • For both the server schedule and the buffer allocation vector there is a current slot index. For the server schedule, this index identifies the-next slot to be read. For the buffer allocation vector it indicates the point of division between buffers that have been filled and buffers that wi l l be required for future reads. A s well, there is a should be index, which-indicates the current real time from the beginning of the execution of the server in slots. Th is index is incremented by one after each slot t ime. The difference between current slot and should be is the number of slots that the disk system has read ahead. A s the'server reads a slot, it sets the value in the corresponding entry in the server schedule to zero, showing that the slot has been read. Figure 4.5 illustrates the manipulat ion of these values. Notice in this example, that in the first three slots, the network was able to transmit slightly earlier than required by the playout vector. A s a result, five extra buffers were freed and added to the pool. Figure 4.6 describes the vbrSim a lgori thm in detail . In this a lgori thm, new-Stream is the block schedule for the stream to be added and slotCount is the size of newStream. The total number of free buffers at the start of running the admissions test is given in the variable totalFreeBuffs. The serverSchedule and buffer Allocate 86 Should Be Current Slot 0 4 6 12 5 9 2 4 11 5 9 These are the slots that were read ahead. Buffer Allocation Vector Figure 4.5: Example of Server Schedule and Buffer Al loca t ion Vectors vectors are the disk and buffer allocations per slot. The variables shouldBe and cur-rentSlot refer to the should be and current slot indices respectively. For i l lustration purposes, these vectors are arbi trari ly long. Theorem 3 The set of streams accepted by Instantaneous Maximum(I-Max) is a subset of the set of streams accepted byybrS'im. Proof. B y definition, Instantaneous Max accepts any stream scenario where Sn Blockssj <= minRead. s=S0 (4.10) Assume there is a scenario X that Instantaneous Maximum accepts that vbrSim rejects. Then it must be the case that there is some slot k where the J2S BlocksStk > minRead or there are insufficient buffers to read J2s BlocksStk- The first case is impossible, because then Instantaneous Maximum would have rejected the scenario. Only the second case is possible. Th is can happen only if the number of buffers read ahead is less than ^2S BlocksStk and there are fewer than Yls Blockss>k — 87 AdmissionsTest( newStream, slotCount) begin to ta lReadAhead = 0 for i = 0 to slotCount dp server Schedule[shouldBe + i] = serverSchedule[shouldBe + i] + newStream[i] bufferAllocate[shouldBe + i] = serverSchedule[shouldBe + i] + newStream[\] end for i = shouldBe to M A X V E C T O R SIZE do { Note that readAhead may be negative.} readAhead = minRead — serverSchedule[i] — ( 1 ) if (totalFreeBuffs < minRead) then — tota lReadAhead = to ta lReadAhead — serverSchedule[i] — else — totalFreeBuffs = totalFreeBuffs — minRead — tota lReadAhead = to ta lReadAhead + readAhead if to ta lReadAhead < 0 then ABORT { Buffers released from previous slot. } totalFreeBuffs = totalFreeBuffs + bufferAllocate[i — 1] end end Figure 4.6: Admissions Cont ro l A lgo r i thm ReadAhead buffers available. In other words, there are fewer than BlocksSjk buffers available in the system. If the server is read ahead by at least BlocksSjk, then no blocks are required to be read and the stream would be accepted. Since Instantaneous Maximum accepts the stream (i.e. Blockss^ <= minRead), any system wi th at least minRead buffers using the vbrSim algori thm must also accept scenario X. • 8 8 Theorem 4 The vbrSim algorithm accepts a stream request only if that request is part of a valid scenario. * Proof. For a total scenario that lasts k slots, the disk can read at least k* minRead blocks. Let a valid state of the disk system be any state where the number of blocks read from the disk is greater than the number required by the schedule. P roof by Induction: Base Case: If the disk is idle, the disk system is in a valid state. Th i s is tr ivia l ly true as long as minRead is greater than zero. Inductive Case: If the disk is in a valid state, then accepting a new request wil l leave the disk in a valid state only if the new request is supportable by the disk system. P roof by contradiction: Assume that there is an invalid request that is accepted by vbrSim. This means there is a slot for which the bandwidth re-quired from the disk exceeds the system's abili ty to read. The disk reads Read/, = min(minRead,buf Avails) during every slot. Let k be the first invalid slot. For every slot j < k, the admission algorithm adds to the read-ahead by (minRead — blocks j), A t slot k, the read-ahead must be decremented by blocks^ — Read^. Since this is an invalid slot, readAheadk < Block.Sk — Readk- Processing the next slot wi l l result in readAheadk becoming less than zero. B u t the algorithm clearly states that anytime the read-ahead value goes below zero, the new stream request is rejected. Since this is the only possible way that a schedule can be invalid, the algori thm would not be able to accept the new stream request. Thus, only valid requests are accepted by the vbrSim a lgori thm. • Buffer Reclamation. If the server reads ahead arbitrari ly far, it wi l l use up all of the buffers. Th is wi l l be the case in the steady state when the system is servicing a set of requests and there has not been a new request for some time. Buffers wi l l build 89 up at the server, because vbrSim cannot accept a stream if the cumulative bandwidth in the future exceeds minRead blocks per slot. If a new request arrives during steady state, the vbrSim algori thm may easily reject clients due to insufficient buffers for the first few slots of the new stream. A simple approach to this problem would be to always keep in reserve some number of buffers for new client requests. How many buffers to withhold for this purpose, however, would be difficult to determine. Another approach would be to free some number of buffers that contain data with the latest "deadline" (i.e. data that wi l l not be needed for the longest t ime). The blocks corresponding to these buffers could be added back to the server schedule, and then the admissions test could be repeated. It is easy to see that either freeing too few buffers or freeing too many buffers would cause the admissions algori thm to fail. Freeing too few buffers (in the l imi t , 0) would not provide sufficient read-ahead on the new stream to smooth out the peaks of its da ta rate requirements. Freeing too many (in the l imi t , all of them) would eliminate read-ahead that streams which have already been accepted are relying on to smooth out their peak requirements. A simple implementation of this scheme would be the following: run the admissions control algorithm with the current number of free buffers (i.e. 0); if it succeeds, then report success. If it fails, move currentSlot back by one and simulate the freeing of the buffers indicated in the buffer allocation vector at that slot, taking buffers from the streams which are read ahead, and try again. If it fails again, then back up another slot and t ry again. This process could be repeated with some back off strategy (either linear or exponential) until either the algori thm succeeds or currentSlot has been moved all the way back to shouldBe, in which case the new stream is not admissible. The simulation of freeing buffers in the future can be done dynamically. Th is is done by modifying the processing when there are not enough free buffers. The details of this modification are given in Figure 4.7 which replaces the code identified by (1) in Figure 4.6. If this modified algori thm rejects a new stream, there are not 90 enough buffers to free that would allow this stream to be accepted. The intui t ion behind this claim is that the new algorithm never wastes disk bandwidth due to a lack of buffers, and at every step aggressively reclaims buffers containing data wi th the latest deadline and fills them with data with the earliest deadline. If such an approach cannot find enough bandwidth to service the combined set of streams, then no approach can do better. In terms of complexity, this approach is st i l l 0(n), since each slot in the schedule is examined exactly once by the read-ahead calculation part of the algori thm and at most once by buffer-freeing calculation part. while (totalFreeBuffs < minRead) and (currentSlot > i+1) do currentSlot = currentSlot — 1 reclaim = m'm(minRead — totalFreeBuffs, bufferAllocate[currentSloi\) totalFreeBuffs = totalFreeBuffs + reclaim se7-verSchedule[currentSlot] — serverSchedule [currentSlot] + reclaim end read = min (totalFreeBuffs, minRead) totalFreeBuffs = totalFreeBuffs — read to ta lReadAhead = to ta lReadAhead + read — serverSchedule[i] Figure 4.7: Modif ied Admissions Cont ro l A lgo r i t hm Let the state of the server be described in terms of the number of buffers read ahead by the disk (Ahead), and the server block schedule (Blocks). The components of the server block schedule are denoted as Blocks^ for every slot k. If there is a configuration (state) corresponding to a possible execution of the server while reading and delivering the previously accepted streams in which the new stream can be accepted, then stealing buffers wi l l find an accepting state. A transit ion from one state (s) to another state (s') is accomplished by one of the following actions: (a) reading a block from the disk, (b) reclaiming a block from a stream (and correspondingly adding it back into the schedule), or (c) t ransmit t ing 91 a block across the network, A l l other components of the server configuration are held constant. For the purposes of this discussion of admission approaches, transition 2 is the only transition considered. Define the set of all possible states of the server to be S. Let r be a new stream request which arrives at the server in some state s £ S during slot k. Let the set of states which can accept r be defined as A. These states have enough buffer space and bandwidth to successfully accept and deliver the data for stream r, while sti l l supporting the existing streams. T h e o r e m 5 A server receives request r while in state S, with (Ahead, Blocks) dur-ing slot k. The buffer stealing approach from Figure 4.7 which steals buffers from the end of the schedule will find an accepting state of the server if an accepting state exists. P r o o f . Define two approaches to reclaiming buffers as: E (Early) and L (Late), as is shown in Figure 4.8. Assume that there is a state a £ A which is reachable from state s by only reclaiming transitions which accepts r , but does not steal buffers from the end of the schedule. Assume also that all states B which are reachable from state s and steal buffers from the end of the schedule are states which reject r . Assume the disk is currently reading at slot j > k. A s long as approaches E and L choose the same buffer to reclaim, they wil l arrive at the same decision. Let m be the first slot at which method E.chooses a different block to reclaim, in that E chooses to reclaim a block at slot m, while L chooses a block from the end of the schedule. M e t h o d E places the server into state a; , and method L places the server into state Obviously at slot m , Blocks(A)m > Blocks(Bi)m. The same number of blocks have been freed and the same number of blocks must be read in total . Th is implies that there exists a slot n > m where Blocks(A)n < Blocks(B)n. Slot n may be different for every rejecting state B. A t this point, each method has the same number of free blocks, the resulting block schedule for E requires Blocks(cti)m blocks, while for state f3, Blocks(f3i)m 92 blocks are required. Eventually, methods E wi l l reach a state a where the stream can be accepted. If the server can read Blocks(a{)m blocks in slot m from state a-and accept the stream, then it can also read Blocks([ii)m blocks in slot m in some state /3 (arrived at by method L). It could also read and store Blocks(ai)m — Blocks(0i)m blocks as read-ahead for slot n , since the operation is on the same disk. Thus , the schedule must also be acceptable from state (3. Th is is a contradiction. Thus if there is any accepting state a, then method L wi l l find an accepting state. • Original Schedule Remaining Schedule mi E L Time -> Figure 4.8: Buffer Reclamation .The actual reclaiming of buffers is not performed at admissions time. W h e n -the disk blocks are read according to the schedule, each read operation attempts to locate a free buffer. If there are no free buffers, the buffer wi th the latest deadline is reclaimed. Not all of the buffers predicted for reclamation may end up being used • in this way, since buffers may have been released in another way. Transmission may occur more quickly than anticipated, a user may stop playback or close a connection on an unrelated stream, or other similar circumstances may occur. The admission 93 algori thm need only, determine that, even in the worst case, sufficient buffers are available to be reclaimed to make the schedule feasible. • *• 4.1.6 Optimal Algori thm In order to calibrate the algorithms, they are compared to an opt imal algorithm that is allowed complete knowledge of the future in making its admission decisions. Th i s • algori thm can predict the bandwidth that wi l l be achieved for every slot time in the future and thus can be thought of as performing the same read-ahead simulation as the vbrSim algori thm, but using a different value for minRead in each slot, namely the number of blocks actually read. Th is wi l l always be equal to or greater than minRead whenever there is sufficient buffer capacity for the set of reads. The set of streams accepted by the opt imal algori thm is called the valid set. O f course, it is not possible to realize this opt imal algori thm; it is included for comparison purposes only. 4.2 Analytical Evaluation of Disk Performance There is an arbitrari ly large number of possible combinations of stream requests that could be presented to a continuous media file server. They can be characterized in the following way: all possible scenarios,, all valid scenarios, and all scenarios ac-cepted by a particular a lgori thm. Th i s is depicted in Figure 4.9 for the Average and' vbrSim algorithms. The scenarios accepted by the Simple Maximum and Instan-taneous Maximum algorithms are subsets of those accepted by vbrSim (as proven in Theorems 1 and 3) and so are not shown. This diagram allows us to examine only the qualitative differences 1 between the scenarios accepted by these algorithms. Quant i ta t ive comparisons of the admission results are presented in Section 4.4. For the present discussion, assume minRead = 26. The example scenarios are chosen 'Diagram is not "to scale", i.e. does not represent quantitative differences. 94 from the streams in Table 2.1. In this section, the name of the algorithm is used to refer to the set of streams accepted by that algori thm. Figure 4.9: Streams Accepted by Admiss ion Algor i thms There are several cases of inclusion to consider: Case 1: vbrSim C Valid. A l l streams which are accepted by the vbrSim algorithm are valid streams. Th i s was proven in Theorem 4. Case 2: Average —Valid ^ {}. The Average algorithm accepts invalid scenar-ios. A scenario which contains 12 simultaneous requests for portions of six different streams (bloop93, chases, rescue, si-intro, coaches, and aretha) at a speed of 50 is shown in Figure 4.10. Each stream was requested twice, but with different s tart ing points. Six requests started 25 seconds into the stream, while the other six started 75 seconds into the stream. The average bandwidth needed was approximately 26 blocks per slot, calculated by adding up the sums of the individual average band-widths. Th i s scenario would be accepted by the Average a lgor i thm, but the disk could not support this scenario, as too many blocks were required in the early part of the scenario. Case 3: Valid — (Average U vbrSim) ^ {}. There are valid scenarios that neither Average nor vbrSim accepts. Th i s occurs if a 7-stream scenario (aretha.avi, 95 35 10 — Scenario Bit Rate — Cumulative Average CM co t T - T - CM o CO (O CO CM co o ID (0 CM co r-- co o cn (0 Disk Slot Number Figure 4.10: Simultaneous Requests - Invalid Scenario bloop93, chases, si-intro, maproom, coaches and boxing) is presented to the C M F S with a 5 second stagger between request times. The sum of the average bandwidths is 27.13 blocks per slot. Th is is higher than minRead, so the Average algorithm would reject the scenario. The vbrS im algorithm also rejects this set of streams, but the scenario is valid, since the achieved bandwidth is higher than the guaranteed bandwidth. Case 4: (Valid - Average) D vbrSim ^ {}. There are valid scenarios which vbrSim accepts while Average rejects. Consider a two-stream scenario where Stream A has a constant bit rate of 12 blocks per slot for 50 slots and Stream B arrives 3 seconds later and has a requirement of 20 blocks per slot for 20 slots. The Average algorithm rejects this scenario because the average is 32. The vbrSim algorithm 96 accepts this set of streams because it wil l have achieved read-ahead of (at least) 84 blocks by the time the request for the second stream arrives (assuming sufficient buffer space). The disk wi l l read the new stream only at a rate of minRead blocks per slot for 4 slots adding to the amount of read-ahead blocks by 6 blocks per slot, because it was only necessary to read 20 blocks in each slot. For the next 7 slots, 32 blocks are needed, so this uses up 6 read-ahead blocks per slot. 108 blocks have been read ahead in the previous slots, however, and these streams can be accepted. In practice, such an exact scenario is unlikely, but possible. There are many similar scenarios in which the read-ahead achieved by contiguous reading on a subset of the active streams wil l allow a set of streams with average requirements greater than the guarantee to be supported by the disk system. Case 5: (Valid — vbrSim) Pl Average f= {}. There are valid stream scenarios which Average accepts and which vbrSim rejects. Th is is observed in many exper-imental scenarios. The sum of the average bandwidth of each the streams is less than minRead, but there is insufficient read-ahead guaranteed early in the scenario. Thus, vbrSim rejects the scenario, but the server is capable of delivering all the data on time, due to greater than minRead performance from the disk. 4.3 Disk Admission Algorithm Execution Performance One of the important factors in the performance of the server is the time consumed by the admission control a lgori thm. If this is a significant length of time, the addi-tional benefit in accuracy of admission is reduced by the complexity of determining admissibility. For the Simple Maximum and the Average algorithms, the admission process takes constant t ime. The single-valued characterization of the new stream is added to the sum of the currently accepted streams and compared with minRead. Success or failure is returned immediately. For the Instantaneous Maximum and the vbrSim algorithms, there are two components to the admission control process: the construction of the block schedule 97 and the evaluation of the resulting server block schedule. The number of bytes for each presentation unit and each sequence is the input to the algori thm that constructs the block schedule. A large amount of time is consumed in creating the block schedule. The block schedule creation process is linear in both the number of sequences and the number of slots in the stream.: Table 4.1 contains the results of t iming experiments with blocks schedule creation for streams that have 30 presentation units per second and 500 msec slots. Therefore, 15 presentation units are .required per slot. The execution ' t ime results show that sequence size is a large factor in the time required to create the block schedule. M o v i n g from a sequence size of 30 to a sequence size of 1 increases the schedule creation time by a factor of more than 2 for the longer streams. When the sequences are larger than the slot size (as in the 30 frame se-quences), skipping sequences does not significantly affect the time required to cre-ate the schedule. On. the other hand, wi th a sequence size of 1 frame the schedule creation time is approximately 50% greater for a skip = 1 than for skip = 0. Stream Stream Characteristics Length skip = 0 Skip — 1 in Frames Sequence Size Sequence Size 1 10 30 1 10 30 5000 10 6 6 15 7 6 10000 18 8.5 7.5 27 10 9 15000 38 16 14 51 19 15 50000 70 30 25 97 33 26 Table 4.1: Block Schedule Creat ion Timings (msec) In terms of the computat ion required to evaluate the server block schedule,' both algorithms are similar in complexity. In the Instantaneous Maximum algo-r i thm, each value in the server block schedule is compared with minRead. Th i s comparison takes 0(n) t ime, where n is the number of slots in the stream block schedule. Every other slot in the schedule was acceptable before this new request 98 was made and does not need to be examined again. The time required to evaluate the schedule in the vbrSim admission algorithm is linear with respect to the number of slots in the entire schedule (i.e., not just the stream block schedule). In- partic-ular, the server must check every slot from ithe end of the new stream forward in time, until there are no pending disk requests, to see if there is a peak which cannot be smoothed. The new stream request may push bandwidth peaks, further into the^ future; it may also create peaks nearer in the future than in the previous schedule. Typical ly , the request furthest in the future is somewhat sooner than the end of the schedule, so the algori thm is actually linear with respect to the non-zero slot furthest into the future. The server block schedule is compared against minRead and the read-ahead value is updated. Some slots are examined once in order to check the: bandwidth requirements, while others are checked a second time to reclaim buffers. The execution time is s t i l l 0(n). Table 4.2 shows the results o f : t iming tests for the vbrSim admission control algorithm on PC-Sola r i s on a Pent ium P ro 200.. A t 30 frames/sec, 108000 frames provide a 1 hour stream. It can be seen that the amount of time needed to perform the admission decision is reasonably small , relative to the slot size. If no other processing in the server were needed, more than 20 requests could be serviced per slot. Stream Sequence Skip = : 0 Skip = : 1 Length Size Determine Admit Determine Admit in Frames Schedule Schedule 108000 1 144 27.7 200 31 108000 10 60 22.7 65 21.7 108000 30 50 22.7 52 20 Table 4.2: Admiss ion Cont ro l T imings (msec) These results indicate that the server must take the prepare parameters, the sequence size, and the total length of the stream into account before beginning the 99 schedule calculation. If CmfsPrepare cannot complete before the end of the current slot, the request must be rejected. W i t h c u r r e n t processor technology, a request for a 100,000 frame clip with a sequence size of 1 and a skip value of 1 takes 40% of a slot time just to calculate the block schedule:- It would not be possible to accept more than two such stream requests per slot, due to the admission test alone. If there is not enough time to execute the admission;algorithm, the server should immediately return a failure status to the client. Alternatively, it is possible to delay the admission test unti l the next slot, but this conflicts wi th the general model which does not include variable start-up latency for delivery of streams. A l lowing this possibility could result in-a situation in which admission control tests are regularly delayed, so that the bound on the t ime CmfsPrepare can take (see Section 3.2) cannot be enforced. There is a low probabili ty that multiple prepare requests for long streams wil l arrive in the same slot or subsequent slots for the same disk. Assuming that the arrival time within a slot is randomly distributed,, there is also a small probabili ty that requests for any length streams wil l arrive just at the end of the slot when there is no time to perform the admission test;. Thus, if a CmfsPrepare. requests fails for this reason, it is likely that i t ' w i l l not fail for. the same reason in the near future. The client can simply reissue the request. Further analysis in a production level system is necessary in order to deter-mine the probabili ty of delaying an admission calculation more than one slot. If it is sufficiently small for a highly active system, then the prepare bound can be extended by one slot to accommodate a long schedule creation time. A s previously mentioned, the probabili ty that many requests from different clients wi l l arrive in the same slot for the same disk is very smal l . It is quite possible, however, that a small number of requests for long streams could arrive from a single client application (1 video, 1 audio, 1 text) . One possible opt imizat ion to reduce the amount of time CmfsPrepare takes for long streams is to precalculate a single 100 version of the stream schedule which corresponds to a prepare of the entire stream at a speed of 100 and skip of 0. In many environments, this would be the expected mode of interaction. If too much disk' space is needed to store the schedule as the administrator, then the schedule could be calculated during the CmfsOpen call , during the time in which the connection parameters are being established. For. non-zero values of skip, the number of presentation units, retrieved is smaller (in some cases, only a small fraction of the original stream), so the time required to calculate the schedule is shorter, and would not impose as great a C P U scheduling problem (assuming relatively large sequences). It is tempting to use part of the pre^calculated stream schedule in these cases. Th is would be incorrect, however, because the V B R nature of the stream is unique to every possible start and stop position and every value of speed and skip. 4.4 Per formance E x p e r i m e n t s It can be shown that the vbrSim a lgori thm is opt imal when the future performance of the disk is exactly equal to minRead. The proof is given in Section 4.6. This is not the case, however, in a real,disk system. D a t a transfer rates differ depending on many factors. To begin wi th , the location on the drive can affect the bandwidth by ' up to a factor of two, with faster retrieval from the outside of the disk. Addi t ional ly , when multiple streams are actively being read, seeking from track to track introduces addit ional latency and reduces the bandwidth. In this section, several performance experiments are conducted to investigate how the characteristics of the streams presented to the disk admission control algori thm affect the number of simultaneous video users and the bandwidth which is sustainable by the system. Sustainable accepted bandwidth is a more precise measure of system uti l izat ion because of the variabil i ty in the rates and lengths of the video objects in the samples. Requesting multiple copies of the same object and counting the number of simultaneous users of that object is not reasonable, as the correlation effects of bandwidth peaks would 101 make the system perform very poorly. In this section, the relative performance of the algorithms is compared. In order to isolate disk bandwidth issues from buffer space, we model a server wi th an unlimited amount of buffer space. Three specific values for minRead are used-as obtained by calibration in Section 3.6.1: 20, 23, and 26. Those scenarios having cumulative average bandwidth near minRead blocks per slot and which would be accepted by the opt imal algorithm are the most interesting to consider, since requests of very low bandwidth are accepted by all algorithms. A s well, scenarios that cannot be supported by the disk are not considered, because they wil l be rejected by all the deterministic-guarantee algorithms. For the vbrSim a lgori thm, server buffer space is required for the blocks which are read ahead. This requirement is examined separately from the admission per-formance in terms of guaranteed bandwidth. The results for the test performed on each algori thm are described in the remaining subsections after the scenarios used in the tests are characterized themselves. The admission tests confirm that the deterministic-guarantee algorithms per-form quite poorly in terms of admission of high-bandwidth scenarios. Only rarely is a request above 50% of the achievable disk bandwidth accepted when using Simple Maximum. Instantaneous Maximum accepts some scenarios with up to a 30% higher cumulative bandwidth request than Simple Maximum. Average accepts larger sce-narios than either of these algorithms, but shows a steep decline in acceptance once the request bandwidth reaches minRead. For all types of requests, vbrSim exceeds the other deterministic-guarantee algorithms. For simultaneous requests, Average performs better, but the set of scenarios which Average accepts is not affected by inter-request stagger. When arrivals are staggered, vbrSim accepts requests wi th much larger bandwidth than minRead, and so outperforms Average. 102 4.4.1 Scenario Descriptions For all of the experiments conducted, each disk was loaded with streams ranging from 1.7 M b p s to 7.2 M b p s in average bandwidth and between 1 and 10 minutes ii i durat ion. The number of streams which was required to fill the disk ranged from 9 to 11. The first tests used 9 streams on a disk which filled the disk, while the remainder of the tests had 11 streams on a disk, because the high-variability streams filled a 2 G B y t e disk used for the experiments. Each disk configuration utilized between 70% and 90% of the disk capacity. A t these bandwidths, up to seven streams can be supported simultaneously off of a single disk without regard for admission control of any k ind . W h e n requests were tendered simultaneously, very few requests of seven streams could be supported, but a stagger of as litt le as 10 seconds greatly increased the achievable bandwidth. W i t h larger values of stagger, such as 20 seconds, more streams can be supported, but rarely are any more than 8 of the streams of this length requesting data simultaneously. In this environment, these large scenarios have sufficient values of stagger that all the data for the first stream has been read by the time the request for the last stream is received. Stream scenarios with fewer than four streams never requested more than 50% of the disk bandwidth and were always accepted by the vbrSim a lgori thm. These scenarios were almost always accepted by Instantaneous Maximum as well . Some scenarios of three streams were barely rejected by Instantaneous Maximum wi th the lower values of minRead, since some peaks were quite large in some scenarios. Th i s range of stream selection provided scenarios that enabled fair comparisons of the algorithms. Since each scenario used different disk resources, the scenarios were grouped in bands corresponding to the percentage of the disk bandwidth requested. The request band was calculated as the sum of the average bandwidth of each stream in the scenario divided by the actual disk bandwidth achieved during the execution of the scenario. M o s t of the scenarios had the requests arrive simultaneously, while in 103 other scenarios, requests arrived with a stagger of 5 seconds or 10 seconds. 4.4.2 Simple M a x i m u m The results for the Simple Maximum a lgori thm were disappointing as expected. The tests showed that very few scenarios requesting more than 49% of the achievable bandwidth were accepted for the highest level of minRead. The results are given in Figure 4.11. In fact, no scenario requesting above 30% of the disk bandwidth was accepted with minRead set to its lowest value. W i t h minRead = 26, less than 10% of the requests in the 50-54% range were,accepted. These scenarios did request more than 50% of minRead. Th is usage level results in unacceptable performance for the disk system. A n anomaly appears for minRead — 20 and the 30-34% request range. Here, there were no scenarios accepted. Tha t is because of the low number of scenarios in that range. Only three scenarios had that request range, and all three were rejected. 4.4.3 Instantaneous M a x i m u m Significantly more scenarios were accepted by Instantaneous Maximum than by Sim-ple Maximum. The actual peak values for Simple Maximum and Instantaneous Max-imum differed by 20-30%, but in terms of admission results, Instantaneous Maximum could accept scenarios at a much higher rate. Th is almost always permitted an extra stream to be accepted, and this increased the uti l izat ion of the,disk by 30% or more, as a three-stream scenario could be accepted by Instantaneous, Maximum, while only two of those streams would be accepted by Simple Maximum. Figure 4.12 shows the acceptance rates of stream scenarios for the three values of minRead. A n interesting anomaly was observed in the values for Instantaneous Maxi-mum wi th minRead = 20. A s the request level increases on percentage band (from 30-35 to 35-39), the percentage of scenarios accepted actually increased. This was due to the fact that there were very few scenarios in the lower request band. T w o 104 • S-Max 26 B S-Max 23 • S-Max 20 t en >* cn cn cn cn cn cn cn cn C J CM CO CO t m m to h- r- CO CO cn cn o o O O O o O o O o o o o o O o O o o in o in o w o to o in o m o in o m o in CM CM CO CO t to in u> to CO CO cn cn Percentage of Disk Requested Figure 4.11: Acceptance Rate - Simple M a x i m u m of three were accepted (66%), whereas in the next highest band, 13 out of 14 were accepted (93%). F i n d i n g more scenarios in the 30-34% request range was difficult; because it would have required moving from three streams to two. This would have reduced the overall bandwidth required by 30%, which would have the remaining streams requesting less than 25% of the disk bandwidth. Figure 4.12 shows that for low levels of disk ut i l izat ion, all stream scenarios were accepted. W i t h minRead = 20, no scenarios above 54% disk uti l ization were accepted, and some scenarios that requested less than 35% of the available disk bandwidth were rejected. W i t h minRead = 26, no scenarios were accepted whose bandwidth request was greater than 69% of available disk bandwidth. 105 • Inst Max 26 • Inst Max 23 • Inst Max 20 cn o *r cn o> O) o> cn O) T— 1— CM CM CO CO *r *r IT) in to r-- CO CO cn cn o o o O o o o o o o o o o o o o o o o w o in o W o w o m o m o in o m o in T— T— CM CM CO CO m in (O f- CO CO Percentage of Disk Requested Figure 4.12: Acceptance Rate - Instantaneous M a x i m u m 4.4.4 Average The results of admission for the Average a lgori thm are shown in Figure 4.13. Since all requests that had cumulative bandwidth below minRead were accepted, regardless of their t ime-varying requirements, the acceptance was expected to drop off suddenly immediately above minRead. Th is was observed in general. A l l requests up to 90% were accepted for minRead — 26. This was to be expected, since the actual performance of the disk ranged from 27 to 30 blocks per slot. The graph does not completely go down to 0 in the next request range, precisely because of this range in the achieved bandwidth from the disk. One scenario out of 7 was accepted which 106 cn cn cn cn O) •a- o> «3- o> >a- cn T cn 1— T— CM CM CO CO t in m CO <£> t-~ oo 00 cn cn o o O o o o o o o o O O O o O o O o o w O in o m o w o in O m o m o m o m T— CM CM CO CO t 1- m m ID U3 GO CO cn cn Percentage of Disk Requested Figure 4.13: Acceptance Rate - Average requested 95% or more of the disk bandwidth . The request was 25.9 blocks per slot, and the achieved disk bandwidth was 27.4, which was exactly 95%. The graph for minRead = 20 shows a more reasonable drop-off. A l l requests below 65%••were accepted, and all requests above 75% were rejected. A l l the scenarios of these types of streams which the Average a lgori thm ac-cepted were valid. Using minRead as an estimate with large bandwidth video objects was a conservative decision, because minRead was calculated with smaller bandwidth streams requiring a physical seek for every block in every slot. In the case of video streams, 7 or fewer seeks were required, so the access time was dominated more by the transfer time than was the seek t ime. A s well, the ini t ia l portions of each stream 107 were located further to the outside of the disk, where transfer rates are faster, than on the inside of the disk. Thus, the peaks above minRead in the block schedule could be read by the disk. There is no quantitative indication as to how much buffer space was required by these scenarios. It is obvious from the detailed disk system log that there were slots in which fewer blocks were read than required by the schedule, but it is not certain at w h a t previous time those blocks were indeed read. • 4.4.5 vbrSim The vbrSim a lgori thm performed much better as shown in Figure 4.14. It is in -teresting to note that the same anomaly occurred ; with minRead = 20 as occurred in the Instantaneous Maximum a lgori thm. In this case, 14 of 17 scenarios were accepted in the 60-64% range and 16 of 17 were accepted in 65-59% range. Th i s was due to the small sample size of scenarios. T w o of the three rejected scenar-ios in the lower request range contained three of their 4 streams in common, and the shape of the block schedule did indicate a significant peak approximately 50 slots into the schedule. Below 60%, the vbrSim algorithm accepted all stream scenarios for all selected values of minRead. When minRead = 20, the acceptance rate started to decline in the 60 to 64 percent band, which is very close to the percentage that minRead was of averageRead. The vbrSim algorithm with minRead — 20 accepted some streams whose disk uti l ization request was over 70%, due to the combination of read-ahead smoothing in the past (prior to the arrival of some streams) and read-ahead in the future (at minRead). When minRead = 23, almost all scenario requests below 75% were accepted and a steady drop-off was observed as the requested level of disk uti l izat ion was increased. The pattern was similar for minRead = 26. The range of minRead/averageRead is .79 to .96, and some stream scenarios were accepted in the 90 to 94% range. Th i s shows that vbrSim does come quite close to accepting streams with cumulative average bandwidth very near the 108 T t o> c n T t e n e n T t c n T t c n T t CO T t c n T— 1— CM CM CO CO T f w in t o t o !-~ h - CO CO Cf) o> o o O O o o o o o O O o O O o o o O o w O ID o in o in o in o in O in o in o in CM CM CO CO Ti- t w m t o t o CO 00 o> c n Percentage of Disk Requested Figure 4.14: Acceptance Rate - vbrSim level of minRead/averageRead. For some of these scenarios, the cumulative average bandwidth required did indeed exceed the value of minRead. Th is ini t ia l test shows that the vbrSim algori thm is quantitatively superior to the Instantaneous Maximum algori thm. Since the Simple Maximum algori thm performed very poorly, even compared with Instantaneous Maximum, it is not considered in the remaining performance tests in this chapter. When the results from Average and vbrSim are compared directly, it can be seen that for this set of stream requests, Average is very close to vbrSim, in terms of admission performance. For minRead = 20, vbrSim rejects some scenarios of smaller bandwidth, but accepts more scenarios in the 70-74% 109 request range. For minRead = 23 and minRead = 26, a similar shape is shown, wi th Average accepting a somewhat larger percentage of larger scenarios. The remaining algorithms which are considered in the remainder of the tests are Instantaneous Maximum, Average and vbrSim. Stream Variability To determine the sensitivity of the disk admission control if to variability in the disk bandwidth requirements of the streams, three disks were loaded with different types of streams and the admission results were compared. Each disk contained 11 streams. One disk contained synthetically generated constant-bit-rate streams which had almost all slots containing the same value in the block schedule. The only differences occurred where block boundaries were crossed at the end of a slot in such a way that one slot needed to read one fewer block. The second disk had streams whose coefficient of variation averaged .19. On the third disk, the average coefficient of variation was .32. A list of which streams were in each group is given in Table 4.3. These are hereafter referred to the low-variabili ty streams and high-variability streams respectively. Low Variability High Variability A r e t h a Footbal l Coaches Y E S 3 0 Rescue M a p r o o o m Joe Greene Bloop93 , Ray Charles Laettner F B I M e n Snowstorm Plan-9 Twins C B C - Coun t ry Mus ic Basketball C B C - Fires Dallas Cowboys T o m Connors , A k i r a 3 0 E C B C - C l in ton John Elway Table 4.3: Stream Groupings Mul t ip l e scenarios that requested different streams and different numbers 110 of streams were created and then used as input to the C M F S . The scenarios were selected by the following process. F i r s t , a size for the scenario was chosen, measured by the number of streams to be selected. Each scenario size had a specific number of scenarios, which were determined using a uniform distr ibution among the streams located on that disk. Summary results of the selection process are given in Table 4.4. A s an example, there were 33 scenarios chosen that consisted of 7 streams. These scenarios included each stream 21 times, as there were 33 * 7 streams selected with 11 streams to choose from. The selection process chose unique scenarios from the C ( l l , 7 ) = 330 possible combinations of these streams. The same process was used to select scenarios for each subsequent size of stream. Scenarios of 3 streams or fewer were not considered for reasons mentioned previously. Stream Size. Number of Scenarios 7 33 6 33 5 33 4 44 Table 4.4: Selection of Stream Scenarios The first test of variabil i ty examined the relative performance of the three deterministic-guarantee algorithms on the scenarios of low-variabil i ty streams. The admission results for simultaneous arrivals are given in Figure 4.15. They show that the Instantaneous Maximum algori thm rejects;many more streams than the vbrSim algori thm for the low-variabil i ty streams. The Average algori thm admits more valid stream scenarios, based on percentage of disk bandwidth achieved, than either of the deterministic-guarantee algorithms, but shows the steady drop once the requests become larger than minRead. The deterministic-guarantee algorithms reject more because they can never exceed minRead, either in any slot, or in any cumulative manner. The difference in admission performance is even more str iking when staggered arrival patterns are considered separately. Th i s is shown in Figures 111 13 Simultaneous Arrivals - IM • Simultaneous Arrivals - vbrSim • Simultaneous Arrivals - Average 50- 55- 60- 65-, 70- 75- 80- 85- 90- 95-54 59 64 69 74 79 84 89 94 99 Percentage of Disk B/W Requested Figure 4.15: A lgo r i t hm Performance Comparison - Simultaneous Arr iva ls 4.16 and 4.17, for staggers of 5, and. 10 seconds respectively. W h e n requests are staggered, the Average and Instantaneous Maximum al-gorithms perform more poorly, while vbrSim accepts scenarios which have a higher cumulative request bandwidth than for simultaneous arrivals. The two former algo-rithms decline in performance because the staggering of arrivals increases the disk bandwidth achieved, but has no effect on admit t ing more scenarios. The perfor-mance of vbrSim improves due to its abili ty to use the read-ahead to smooth out the schedule. It is true that each scenario is longer with, stagger between arrivals (i.e. 60 seconds longer for a 7-stream scenario with 10- second stagger); and thus, the disk is reading the same number of blocks in a longer period of time. The 112 • Stagger = 5 - l-Max • stagger = 5 - vbrSim • Stagger = 5 - Average 1 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 Percentage of Disk B/W Requested Figure 4.16: A lgo r i t hm Performance Compar ison - 5 second stagger Average algori thm does not consider any effect of staggering arrivals, and Instanta-neous Maximum is equally likely to have peaks in requirements at any point in the stream. Since it is clear that vbrSim is superior, this test was not performed on the high-variability streams, as Instantaneous Maximum would have given even worse results and Average would be unchanged, while vbrSim would show the same type •of improvement. vbrSim Admission Performance. The results for simultaneous arrivals of stream requests are shown in Figure 4.18. Constant bit-rate streams that request a cumu-lative average less than minRead are always accepted. The disk accepts scenarios of 113 100% 90% 80% •O 0) g- 70% u o « 60% o c 50% v o (A •g 40% c 8 30% V . V a, 20% 10% 0% 50-54 55-59 60-64 65-69 • Stagger = 10- IMAX • Stagger = 10 - vbrSim • Stagger = 10 - Average 70-74 75-79 80-84 85-89 Percentage of Disk B/W Requested 90-94 95-99 Figure 4.17: A lgo r i t hm Performance Compar ison - 10 second stagger low-variability streams with smaller amounts of cumulative disk bandwidth, com-pared to C B R stream scenarios. Final ly , the scenarios with high-variability streams are observed to have the poorest admission performance. Th i s confirms the intuit ion that there is some difference in acceptability of scenarios due to the variability in the stream bandwidth requirements. Since high-variabili ty streams have the largest peaks, it makes sense that there should be more of these scenarios that cannot be smoothed by the vbrSim smoothing techniques. W i t h simultaneous arrivals, all scenarios (of every variabil i ty factor) that request more than 80% of the disk bandwidth are rejected. W i t h constant bit-rate streams, all requests below 80% are accepted. Th i s is reasonable to expect, since 114 100% 90% -\ 80% 0) Q. 70% 0) • 60% o I 50% u c 0) o 30% a a. 20% 10% 0% 50-54 60-64 65-69 • CBR Streams • Low Variability Streams • High Variability Streams 80-84 85-89 90-94 95-99 Percentage of Disk B/W Requested Figure 4.18: Stream Var iabi l i ty : Acceptance Rates for Simultaneous Arr iva ls minRead is approximately 80% of the achieved bandwidth in the execution of these scenarios. The average disk bandwidth was 29.24 M b p s and ranged from 28.3 to 30.5 M b p s . Low-variabi l i ty streams do not perform as well wi th requests below 80%. Approximate ly 50% of the requests can be supported in the 70-74% range. Even fewer of the high-variabili ty requests are accepted. Approximate ly only 15% of the scenarios of high-variabili ty streams in that request range are accepted. In the 60-65% range and the 65-70% range, there are s t i l l a significant number of scenarios rejected. Only below 60% are vir tual ly all scenarios accepted. The more noticeable differences between the types of scenarios are seen when stagger is introduced to the arrival pattern. The acceptance rates of the same scenarios, under staggers of 5 seconds and 10 seconds are shown in Figures 4.19 115 and 4.20, respectively. The increase in bandwidth that is the result of increased contiguous reading reduces the request range for many scenarios. It also makes more scenarios valid from the disk subsystem's point of view. The abili ty to buffer the data when only a single stream is active has the effect of smoothing out the peaks of the remainder of the variable bit rate scenario more significantly than in the simultaneous arrivals case. W i t h stagger = 5 seconds, all of the low-variabili ty stream scenarios below 85% of the disk capabil i ty are accepted, and over half of those between 85% and 89%. This indicates that the admission algorithm is effective in making use of the stagger. It is also the case that, while the peaks are generally of the same size in the simultaneous arrival and the staggered arrival case, they occur later in the scenario. This is because fewer streams are active during the first several seconds of the scenario. There has been more time elapsed for read-ahead, so that the scenario can now be accepted. Th i s addit ional smoothing effect is less pronounced for the high-variability streams as very few scenarios in the 85% to 89% range are accepted. Since both the percentage accepted and the achieved bandwidth are increased by a significant amount, the total bandwidth supportable increases to a level beyond minRead in many cases. For the C B R streams, many scenarios are accepted in the 95% - 100% range. This is because of the effect that the achieved read-ahead in the past has in reducing the overall level of bandwidth required for the remainder of the scenario. B y the time the last stream is admit ted, enough buffer .space has been used to reduce the maximum slot to below minRead. Th is is restricted to streams of moderately short length as have been chosen for this experiment. The stagger would not have any significant effect on very long C B R streams as the average requirement of the remainder of the scenario would st i l l be significantly greater than minRead. W i t h a larger stagger value of 10 seconds, very few stream scenarios were rejected by the vbrSim a lgori thm. This is due to the fact that the high disk per-formance enabled a very large amount of data to be read ahead. The buffer space 116 50- 55- 60-, 65- 70- 75-. 80- 85- 90- 95-54 59 64 69 74 79 84 89 94 99 Percentage of Disk B/W Requested Figure 4.19: Stream Var iabi l i ty : Acceptance Rates for Stagger = 5 Seconds utilized for this read-ahead is quite significant, but from the bandwidth point of view, the use of the vbrSim a lgori thm effectively uses nearly all available disk band-width . In fact, there are a few scenarios that request over 100% of the average bandwidth achieved, yet are st i l l accepted. The highest actual bandwidth accepted was 37.2 M b p s , which is 161% of minRead (and 117% of the achieved bandwidth in that particular disk scenario). Th is appears to be impossible on first glance. A closer look at the execution of the scenario shows that this can indeed happen, due to the fact that the bandwidth achieved varies over t ime. The granularity at which the server measures the achieved bandwidth showed some slots wi th very large numbers of blocks read. The maximum value observed was 55. Previous performance tests done wi th sequential reading 117 ,50- 55- 60- 65- 70- 75- 80- 85- 90- 95- > 54 59 64 69 74 79 84 89 94 99 100 Percentage of Disk B/W Requested Figure 4.20: Stream Var iabi l i ty : Acceptance Rates for Stagger = 10 Seconds only have shown that the disk cannot read this fast. Thus, this number measured the number of disk reads completed during the slot, although some of them were initiated in the previous slot. Thus, the cumulative average bandwidth measures disk performance more accurately, but has its drawbacks as a performance measure as well. Figure 4.21 shows this in detail for one particular scenario. It can be seen that the cumulative average bandwidth steadily decreases over time, after an ini t ia l adjustment. 2 There are two factors contr ibuting to the decrease. F i r s t , as more streams become active, the amount of seek act ivi ty increases, reducing the number of blocks that can be read. Second, the disk bandwidth decreases as the blocks requested are closer to the centre of the drive. Since the large video streams are most often stored contiguously on disks, the bandwidth is smaller for the later 2This is due to measuring inaccuracies, whereby the very first slot may be shorter in duration than all remaining slots. 118 portions of the streams because the blocks are closer to the inside of the disk for all streams. The bandwidth necessary for the earlier part of the scenario is available, and later, when fewer streams are reading, less bandwidth is achievable due to this factor. Less bandwidth is needed, however, to keep up with the requirements because of the large amount of read-ahead previously achieved. Cumulative Bandwith to cn CM io oo i - «* T— T— T— <N CM CM o CO CO CO CO CO CM Disk Slot Number Figure 4.21: Observed Disk Performance: Stagger = 10 Seconds When long staggers are used, the variabili ty of individual streams appears to have only, a minute effect on the acceptance rate of scenarios. Th is is because a lot of smoothing takes place wi th buffering many slots' worth of data for the earlier streams. Thus, they contribute only a small amount to the remaining streams. This must, by definition, require a significantly larger amount of buffer space. The 119 detailed analysis of buffer space is considered in the next experiment. Buffer Space Utilization. In order to take advantage of the read-ahead in any of the preceding scenarios, there must be sufficient buffer space at the server. The buffer space needed is expected to be greater for the high-variability streams in order to smooth out peaks which are above minRead. Add i t iona l buffers are used for blocks which are read at a faster rate than minRead, but since all streams have already been admitted, this does not affect the admission process. A s the previous results show, a small amount of stagger with moderately short video streams is enough to increase the accepted bandwidth to nearly the level of actual disk bandwidth when the server is modeled with unlimited buffer space. M o s t of the increase in bandwidth is due to contiguous reading when only one stream is actively reading, during the catch-up phase immediately after stream acceptance. In this si tuation, there are no seeks required and bandwidth is very high. Some tests showed that the bandwidth was more than 2 * minRead. When the next stream request arrived, the schedule for the existing streams was reduced to 0 for several slots into the future. Th i s reduction served to smooth peaks in the remainder of the schedule. The amount of buffer space required to accept a scenario was calculated by a static examination of the schedule. The largest contiguous area of the scenario's requirements above minRead is found. The blocks referred to by that area in the scenario schedule above minRead must be in server buffers. Otherwise, the server cannot guarantee the delivery of the blocks to the clients, because the disk can only be guaranteed to read at minRead. Th i s can be seen in Figure 4.22, which is a small portion of one particular scenario (Scenario 90 wi th high-variability streams). The rectangle below minRead when the bandwidth requirement is above minRead accounts for the blocks which can be guaranteed to be read during those slots. The blocks above the rectangle must be transmitted as well. If they cannot be guaranteed to come from the current set of disk reads, they must have been read earlier and 120 the transmission is. satisfied from the read-ahead buffers. 10 ~ ~ Sample Scenario Bandwidth 5 - minRead = 23 0 -I , , , , , , , , 1 , , r-oj *r to Disk Slot Number Figure 4.22: Buffer Space Analys is Technique The buffer space required for the simultaneous arrivals of low-variabil i ty streams and high-variabili ty streams is shown in Figure 4.23. For requests of band-width that were significantly below minRead, a small amount of buffer space was needed. Even larger requests needed only slightly larger amounts of buffer space. The largest amount of buffer space needed by low-variabili ty streams was 75 buffers (5 M B y t e s ) , when an average of 22 M b p s (or 97% of minRead) was requested. For scenarios of high-variabili ty streams, the largest buffer request required 90 buffers (6 M B y t e s ) and had an average bandwidth of approximately 20 M b p s . Some requests for larger bandwidth required fewer buffers, due to the shape of the request. The 121 buffer space, requirements are not significant when requests arrive simultaneously, as admission into the system is limited by the instantaneous bandwidth required at slots early in the schedule. High Variability Streams Low Variability Streams -MinRead=23 12 14 16 18 20 Bandwidth Requested 22 24 Figure 4.23: Buffer Space Requirements: Simultaneous Arr iva ls In the case of staggered arrivals, the pattern of buffer usage is much different, because the order of reading is significantly changed. Buffers are needed to accom-modate the high bandwidth achieved when only one stream is actively being read off the disk. The same static analysis procedure can be used for these scenarios, but it must be adjusted in some cases. The analysis assumes a constant disk reading rate of minRead blocks per slot. W i t h staggered requests, the vbrSim algori thm accounts, for the extra blocks read during times when the bandwidth has been more than was 122 guaranteed. Thus, a larger amount of read-ahead is achieved and all the buffers are filled by the time they are needed. The analysis technique cannot model the effect of this increased bandwidth on buffer allocation without knowing the exact number of buffers read in each slot. A n approximation can be performed, because the bandwidth achieved in the early part of the scenario is substantially above min-Read. Thus , a value of 4 * minRead is used in the simulation when only one or two streams are actively reading. This is at a much higher rate than the disk can read, but does not simulate reading into buffers which do not exist in the particular server configuration. It simply ensures that all the necessary read-ahead is available before buffers are required for the bandwidth peaks. The buffer usage for staggered arrivals is shown in Figures 4.24 and 4.25. ' The figures show that most requests for bandwidth below minRead use a; very modest amount of buffer space. A s the request bandwidth increases, the buffer space required increases somewhat linearly. Requests with bandwidth greater than minRead use steadily more buffer space, wi th the maximum buffer space needed being 4491 buffers (287 M B y t e s ) for a scenario which had a staggered arrival in terval ' of 10 seconds and requested 33.5 M b p s (104% of the achieved bandwidth and 150% of minRead). Th is is an enormous amount of memory. This scenario was comprised of the 7 longest streams from the high-variabili ty streams, and there was a long substantial peak in the bandwidth required. Since this scenario was accepted, this peak occurred late in the scenario when a great deal of read-ahead had been achieved. M o s t of the requests with a 5 second stagger can be satisfied wi th fewer than 1500 buffers. These requests use close to 100% of the disk bandwidth. For the requests wi th 10 second stagger, 3000 buffers is enough for nearly all the scenarios which can be accepted. A g a i n , these scenarios are all those that request less than. 100% of the disk bandwidth. The largest buffer requirement for a scenario that requested less than 100% of the disk bandwidth was 2699 buffers. Th i s scenario requested 99% of the measured disk bandwidth. The scenario which required 4491 123 2000 1800 1600 1400 I 1200 3 CT V cc 1000 in v ^ 800 m 600 400 200 0 x x High Variability Streams • Low Variability Streams Constant Bit Rate Streams ° ° — MinRead=23 3 i ?• ' ! • ° x"  3 • '"  A p ^ ^ X a 1 : x • ' :' ' * ' • • * * • n x * : : A . : a 4 : i i - * j j * ^ r a 15 17 19 21 23 25 Bandwidth Requested 27 29 Figure 4.24: Buffer Space Requirements: Stagger = 5 seconds buffers had a cumulative request of over 103% of the disk bandwidth. The graphs thus indicate that a server with substantial, but not exorbitant, memory for disk buffers can accommodate requests that require a very high percentage of the disk bandwidth achievable. Scenarios wi th larger values of stagger can achieve more read-ahead and take advantage of the transient higher bandwidth that accompa-nies contiguous reading of the disk, and even accept scenarios that request more bandwidth than is nominally available. The scenarios composed of low-variabili ty streams had only slightly different buffer usage patterns than those composed of high-variabili ty streams. A l l of the accepted requests for bandwidth below minRead required fewer than 200 buffers. For 124 5000 4500 4000 3500 T3 3000 3 CT CC 2500 m ? 2000 m 1500 1000 500 0 High Variability Streams Low Variability Streams Constant Bit Rate Streams -MinRead=23 ; a P-a-S-• Eh x x» 12 14 16 18 20 22 24 26 28 30 32 34 36 Bandwidth Requested Figure 4.25: Buffer Space Requirements: Stagger = 10 seconds constant bit-rate streams, requests in the same bandwidth range required even fewer buffers, approximately 2*minRead, which is the min imum necessary for the server's double buffering. The scenarios wi th high-variabili ty streams required more buffers than the low-variabil i ty stream scenarios for both the 5 second stagger and the 10 second stagger situations. Those scenarios that requested below minRead blocks per slot required up to 400 buffers wi th 5 second stagger and up to 500 buffers with 10 second stagger, approximately twice what the low-variabili ty streams needed. This is because the size of the peaks is definitely larger and the duration may be longer for scenarios with high-variability streams and thus, more buffer space is required to smooth out these peaks. 125 For requests above minRead, the linear relationship between request size and buffers required continues. Slightly fewer buffers are needed for low-variabili ty streams than for high-variability streams when the request level is just above min-Read. It does not appear that there is much difference in the buffer requirements between low-variabili ty streams and high-variabili ty streams when the request level approaches the l imi t of disk bandwidth. A s well, the value of stagger does not seem to cause much change in the number of buffers required for requests of the same size. One of the low-variabili ty requests for 27 M b p s at a stagger of 5 seconds requires approximately 1300 buffers. The requests of similar size with a 10 second stagger require approximately the same number of buffers. A 10 second stagger results in higher achieved disk bandwidth and this enables more streams to be accepted and requires more buffer space for those scenarios. These scenarios would not be accepted at a smaller stagger value. Client Buffer Space. The next experiment examined the effect of client buffer space on the admission performance. The extra buffer space at the client permits the server to send ahead at the maximum rate for a longer period of time, based on the rate-based sender-side flow control outlined in Section 3.4. The maximum rate is the rate that is established when the connection is opened. Recall that this policy attempts to send data at the maximum rate until the client buffer is full, subject to the availability of bandwidth at the server, and ensures that client buffers do not overflow. In a single-disk server, the network bandwidth is always sufficient. If the server can send at the faster rate for a longer period of time, then more buffers are available at the server for reading the remaining streams. This read-ahead may provide enough smoothing for addit ional streams to be accepted by the disk. The client buffer sizes were set at two different values. A s mentioned in Section 3.2, the smallest allowable client buffer is the number of bytes required to be transmitted in two consecutive slots. Thus, the values chosen were: the minimum required, and 32 * minimum. For the medium rate video streams that 126 were being tested, the actual number of bytes in the min imum client buffer space ranged from 750 K B y t e s to 4.5 M B y t e s . Client buffer sizes of 32 * minimum are much larger than can be reasonably provided by the client machines (24 M B y t e s to 124 M B y t e s ) . Reasonably priced client machines or such as set-top boxes are likely to have memory capacities which are somewhat smaller than this range, not likely more than 16 M B y t e s . Therefore, the tests which were performed exercised the limits of reasonable client buffer sizes and beyond. The scenarios were presented to the C M F S with 2 values of stagger: 5 seconds and 10 seconds. Simultaneous arrival scenarios were not tested because any send-ahead in the scenario is achieved after all admission decisions have been made, so send-ahead has no effect on the admission decision. The 143 scenarios of low-variabili ty streams and 143 scenarios of high-variabili ty streams were submitted to a C M F S . T w o separate server configurations were used: the first contained 64 M B y t e s for disk block buffers (1000 buffers), and the second contained 128 M B y t e s (2000 buffers). The results showed that in every case, exactly the same streams were ac-cepted, regardless of the client buffer configuration. Even very large buffer sizes at the client could not change the acceptance rates. Th is is because the client buffer is too small to hold a large enough percentage of the stream to make any substantial difference to the server. A t a maximum bandwidth rate of 10 blocks per slot, the minimum client buffer is 20 buffers. A client buffer space of 32*minimum is 640 buffers (40 M B y t e s ) . Th is is a very large amount of buffer space, but st i l l less than 15% of what would be required for a 10 minute stream of this average bandwidth (4800 blocks). Request Inter-Arrival Time. In this section, the effect of different values of stagger between arrivals of stream requests is evaluated. The previous tests showed that the effective disk bandwidth which could be achieved by the C M F S depended significantly on the amount of contiguous disk reading that could be achieved. C o n -127 tiguous reading occurs whenever a new stream is admit ted, because the data for the new stream has an earlier deadline than the next blocks of data which must be read for existing streams. Thus, data from only the new stream is read for several seconds after acceptance. D a t a for existing streams is already in buffers at the server as a result of the read-ahead. The benefit of having a longer time between stream arrivals is that the steady s ta te 3 of the disk is reached before a new arrival occurs. In the steady state, all streams are equally read-ahead a significant amount of time (between 5 and 20 seconds) for a reasonably configured system servicing moderate bandwidth video streams. For example, Section.4.4 shows that a server equipped with 128 M B y t e s of buffer space per disk can support approximately 6 or 7 streams at an average rate of 4 M b p s each, depending on the shape of the server schedule. Tha t translates into approximately 30 to 40 seconds of read-ahead per stream. The configuration and arrival pattern that would allow the disk to always read at maximum transfer rate would be a server with unlimited buffer space and arrivals such that the entire stream was read into server buffers before the next request arrived. The disk would never have to perform a seek operation during a slot. A s a concrete example, consider video streams with a bit rate which averages 4 M b p s . On a disk which can read at 5 M B y t e s per second, the transfer rate is 10 times the required rate of playback. Thus a 5 minute stream can be read off disk in 30 seconds, total ing 150 M B y t e s of data. It would occupy buffer space corresponding to the remaining 270 seconds of playback. This 270 seconds would require 135 M B y t e s because the data for the first 30 seconds of presentation would have been transmitted during that period of t ime. A system would be capable of supporting 10 simultaneous streams (the theoretical maximum) with arrival staggers of 30 seconds. The buffer space required would be at its maximum immediately after all 10 streams have been read. Since the earlier streams wil l have transmitted most of their data, 3This state is when the reading rate is limited by the number of buffers returned to the system as a result of transmission. 128 the occupancy can be calculated by mult iplying 15 M B y t e s times the number of 30 second intervals that are remaining in the transmission for each stream. The total buffer space required to support such a retrieval pattern is 675 M B y t e s . This is more than 30% of the capacity of the disk used in this server configuration. It is not reasonable to devote so much memory to server buffers. Since buffer space is likely to be l imited to a much smaller fraction of the disk capacity, the disk system wil l reach the steady state because moderate length streams cannot be stored in their entirety in server buffers. If steady state is reached very quickly and each stream stil l has most of its da ta resident on disk, then all the benefit of read-ahead with limited memory at the server can be achieved with a small value of stagger. In this case, increasing the arrival stagger wi l l not enable more streams to be admit ted. If steady state is achieved more slowly, then an increased stagger wi l l allow reading to continue based only on the l imitat ions caused by seek activity. More streams may be accepted with longer stagger. The next set of experiments used three stagger values: 5 seconds, 10 seconds, and 20 seconds. These values were considered to be reasonable because they provide a substantial amount of time between user requests. Smaller values of stagger were not considered due to there being fewer than 10 slots during which read-ahead could take place. Longer stagger values were not considered because, according to the previous analysis, a 5 minute stream can be read in 30 seconds, but requires over 128 M B y t e s of buffer space. Since this is more than the memory that was available in the hardware configuration, it seems certain that steady state wil l occur before 30 seconds of contiguous reading. Several server configurations were used as well in the experiments, in an attempt to see how the total buffer space at the server affected the abili ty of stagger to influence acceptance decisions. The streams comprising the scenarios were grouped according to the length of playback time to see if shorter streams were able to take advantage of the increase in stagger more than long streams. The short streams ranged from 50 seconds to 3 129 minutes in length, while the long streams ranged from 6 minutes to 10 minutes in length. Ten randomly selected scenarios were used from the short streams as well as ten scenarios from the long streams. Tables 4.5 and 4.6 show the admission results for the short streams and long streams respectively. The results show that, for the short streams, increasing the length of the stagger allowed more scenarios to be accepted. W i t h minimal client buffer and 64 M B y t e s at the server, moving from a 5 second stagger to a 20 second stagger allowed every scenario to have at least one more stream accepted. W i t h 128 M B y t e s at the server, only 3 of the 10 scenarios could be accepted at a staggerinterval of 5 seconds, while wi th 20 second staggers, all scenarios were accepted. Seen Stms Req B / W Req 64 MB Server Buffer 128 MB Server Buffer B / W Acc Stms Acc B / W Acc Stms Acc 5 10 20 5 10 20 5 10 20 5 io 20 1 7 34.5 24.8 24.8 34.5 5 5 •7 . 24.8 29.2 34.5 5 6 7 2 7 37.6 27.1 33 33 5 6 6 33 37.6 37.6 6 7 7 3 7 37.4 26.9 32.7 32.7 5 6 6 26.9 37.4 '37.4 5 7 7 4 6 33 27.1 27.1 27.1 5 5 5 27.1 33 33 5 6 6 5 6 31.7 27 31.7 31.7 5 6 6 31.7 31.7 31.7 6 6 6 6 6 31.6 26.9 26.9 31.6 5 5 6 31.6 31.6 31.6 6 6 6 7 6 31.7 26.1 26.1 26.1 5 5 5 26.1 31.7 31.7 5 -6 6 8 5 27.5 27.5 27.5 27.5 5 5 5 27.5 27.5 27.5 5 5 5 9 4 21.2 21.2 21.2 21.2 4 4 4 2.1.2 21.2 21.2 4 4 4 10 7 34 29.3 29.3 34 6 6 7 29.3 29.3 34 6 6 • 7 Table 4.5: Short Streams - Admiss ion Results - Staggered Arr iva ls For long streams, there was no difference in acceptance in any of the scenarios. This is not part icularly surprising. A 128 M B y t e server can hold approximately 4 minutes of video data if the display rate averages 4 M b p s . Th is can be read in 24 seconds. Thus most certainly, a stagger of more than 20 seconds would not influence admission. When a second stream of similar bandwidth profile is added, the buffer 130 Seen Stms Req B / W Req 64 MB Server Buffer 128 MB Server Buffer B / W Acc Stms Acc B / W Acc Stms Acc 5 10 20 5 10 20 . 5 10 20 5 10 20 1 7 29.1 . 24.2 24.2 24.2 6 6 6 24.2 24.2 24.2 6 6 6 2 7 31.4 24.1 24.1 24.1 5 5 5 28.2 28.2 28.2 6 6 6 3 7 25.9 22.2 22.2 22.2 6 6 6 22.2 22.2 22.2 6 6 6 4 6 25.8 22.3 22.3 22.3 5 5 5 22.3 22.3 22.3 5 5 5 5 6 25.8 22.1 ' 22.1 22.1 5 5 5 22.1 22.1 22.1 5 5 5 6 6 22.1 22.1 22.1 22.1 6 6 6 22.1 22.1 22.1 6 6 6 7 6 25.6 20.0 20.0 20.0 . 5 5 5 20.0 20.0 20.0 5 5 5 8 5 21.1 21.1 21.1 21.1 5 5 5 21.1 21.1 21.1 5 5 5 9 4 14.2 14.2 14.2 14.2 4 4 4 • 14.2 14.2 14.2 4 4 4 10 <~f 1 30.1 25.9 25.9 25.9 5 . 5 5 25.9 25.9 25.9 5 5 5 Table 4.6: Long Streams - Admiss ion Results - Staggered Arr iva ls space wi l l be split among the streams, giving 2 minutes to each stream. Thus , the second stream can read at max imum for 12 seconds, stealing buffers from the original stream to do so until both streams have been read ahead 2 minutes. Th is process of sharing the server buffer space continues as addit ional streams are added. W h e n 6 streams are active in the steady state, between 30 and 40 seconds of data are stored in server buffers per stream. This amount can be read in between 3 and 4 seconds at maximum transfer rates. If a new stream were accepted in this state, it would catch up to the existing streams in terms of read-ahead in about the same amount of time. F rom then on, the reading rate would be l imited by buffer space considerations. For a 64 M B y t e server, the values for the playback lengths of data which can be stored at the server must be divided by 2, so steady state is achieved much earlier. W h y then is there a performance difference when short streams are submit-ted? This is a case where a large percentage of the stream can be read contiguously and stored at the server in buffers. In a 128 M B y t e server, all of a 3 minute stream can be stored. It takes approximately 18 seconds to read the entire stream. Thus, a 131 stagger value of greater than 18 seconds would be superfluous, because there would be no more data to read before the next request arrived, during the time the server is lightly loaded. When two short streams (i.e. 3 minute streams) share the server buffer space, 2 minutes of each stream can be stored, as in the previous case. So far, the analysis is identical to the previous case. A reasonable amount of da ta is left to be read for each stream. It changes, however when 5 or 6 streams are read and a short stream request (less than 1 minute in duration) is submitted late in the scenario. The data for a 60 second stream can be read in 6 seconds. A scenario with 6 existing streams can buffer up to 40 seconds' worth of data per stream. W i t h a small stagger of 5 seconds, all 6 streams are introduced in 25 seconds and occupy all server buffer space. The new stream would arrive after 30 seconds. The new stream can be guaranteed to read at approximately 6 times the playout rate (23/6 « 4), so it would take about.7 seconds to read 40 seconds' worth of data, and thereby catch, up to the existing streams. A t this point, all 7 streams occupy the server buffer space. A s well, all 7 streams have a considerable amount of data left to read. The first 3-minute stream has read at most 70 second's worth of data, 30 which have been transmitted and 40 which have been buffered; there are st i l l 110 seconds of data left to read. It may be the case that the 7th stream cannot be accepted due to the overall bandwidth st i l l required in the future of the scenario. If this stagger is increased to 10 seconds, then the first 6 streams are not all completely active until 50 seconds into the scenario. The earlier streams have transmitted approximately twice as much data by the time the 7th stream arrives at 60 seconds into the scenario. Whi le this is insignificant for a 10 minute stream, for a 3 minute stream it means that over half the first stream has been read and/or transmitted, as only 80 seconds remain to be read. The data in the remainder of the schedule is much less than in the case where the stagger was 5 seconds. Shorter streams have an even greater percentage of their da ta already read off disk. Th is affects the size and duration of the remaining peaks, so that in more of the scenarios, 132 the 7th stream can be accepted. 4.4.6 Summary In this section, quantitative performance differences between the algorithms were identified. A s well, the effects of different traffic patterns on the admission perfor-' mance and buffer space requirements were carefully examined wi th respect to the vbrSim a lgori thm. The difference- between the performance of the deterministic-guarantee al- • gorithms depends on the characteristics of the streams which are submitted to' the C M F S . Performance tests on real streams have shown clear quantitative differences. Streams with very low-variabil i ty which are requested simultaneously produce band-width schedules which are devoid of significant peaks. In this si tuation, wi th simul-taneous arrivals, all three algorithms accept almost the same scenarios. Since the disk is capable of higher performance, all give conservative decisions compared with the opt imal algori thm. W i t h variabil i ty in the stream bandwidth profiles, there is a substantial difference in acceptance behaviour. The vbrSim a lgori thm accepts ap-proximately 20% more bandwidth than the Instantaneous Maximum a lgori thm for mixed-variabil i ty streams. Val id scenarios were rejected by the Average a lgori thm, due to the amount of contiguous reading that was achieved by the disk system. W h e n the average achieved bandwidth was much greater than minRead, then Average provided con-servative admission decisions, while vbrSim could accept scenarios that requested substantially above minRead blocks per slot. W i t h 10 second staggers between ar-rivals, vbrSim accepted scenarios that requested nearly 150% of minRead, since the achieved read-ahead in the past was incorporated by the algori thm. There are a reasonable percentage of valid scenarios which vbrSim rejects, but request below minRead in terms of total bandwidth, especially when the requests arrive simultaneously. If the Average algori thm is based on minRead, then the 133 disk system can support such scenarios, because disk performance is greater than minRead as relatively few seeks are required. There is no read-ahead achieved above minRead between arrivals for vbrSim, so the upper bound on acceptance is minRead. The Average algorithm accepted all scenarios .with a request below minRead. W i t h simultaneous arrivals, the most bandwidth could be sustained wi th C B R stream requests and the admission performance degraded as the variabili ty increased. When stagger was introduced to the arrival pattern, the difference in admission performance between stream types became negligible. Buffer space re-quirements grew linearly with the size of the request for scenarios that requested more than minRead blocks per slot. For requests, slightly below minRead, the high-variabili ty streams needed approximately double the buffer space that low-variabili ty streams required. The experiments designed to test the. effect,of client buffer space showed no. effect on admission performance. Increasing the inter-arrival time permitted more short streams to be accepted, thus elevating the sustainable accepted bandwidth from the disk system. 4 . 5 Full-length streams A l l of the tests performed in this chapter were performed with short streams which are typical of a News-On-Demand environment. It is reasonable to assume that a Video-On-Demand environment, wi th feature-length video streams of an hour or more in duration, would place somewhat different demands on a continuous media file server. The general shape of the bandwidth schedules of each individual stream would likely be the same, regardless of clip length. W h e n long streams are combined into scenarios, the scenarios have the same general shape. Figure 4.26 shows a selec-tion of 3 minutes of a bandwidth schedule of a scenario wi th simultaneous arrivals. There are a number of peaks that are of moderate size and durat ion. In Figure 4.27, a 2000 second (33 minute) scenario is presented which is comprised of the same 134 CM CO CO CM lO CO o CM CM CM T f CM 10 CM CO CM O CO CM CO T t CO Disk Slot Number Figure 4.26: Short Stream Scenario streams as in Figure 4.26, but concatenated together. A three minute selection of the bandwidth requirements is shown in Figure 4.28. The peaks are of relatively the same size and durat ion. The three-minute selection taken from the full-length streams was analyzed for buffer space requirements. Th is segment would require 2462 buffers to smooth out the peaks, which is similar to what is required without repetition. Unfortunately, from the buffer space point of view, the scenario does not end at this point. In particular, the entire half hour of the scenario required 36527 buffers (2.22 G B y t e s ) . Since the graph of the entire schedule does not seem to have many large peaks, it must be the case that the peaks are of longer durat ion. Th is scenario indicates that, while the bandwidth is not significantly different between 135 long scenarios and short scenarios, the buffer space requirements may be extremely large for scenarios that have requirements somewhat above minRead, but possibly below the actual performance of the disk. Th i s scenario could not be executed on the server, due to the size l imitat ion on the disk and the lack of a testing client flexible enough to masquerade this loop as a single request. Therefore, it is unclear whether the scenario could be supported, but the average bandwidth of the scenario is approximately 30 blocks per slot, and most scenarios achieved close to 30 blocks per slot. Th i s one data point is not enough to provide conclusive evidence regarding the characteristics of long streams in general. 45 - r 10 5 0 -I , r—, , 1 n 1 r- 1 1 r , 1 , 1 1 1 , 1 ^ o o o o o o o o o o o o o o o o o o o e N t U 3 C O O e N T ( D O O O C N 1 ( D O O O t N t o a o O T - , i - C M C M t N C M C M C O C O C O C O C O Disk S lo t N u m b e r Figure 4.27: Looped Scenario In the moderate length streams, several tests showed that an overall request 136 O C M 1 - < D C 0 O C N J * T C M C M C M C M C M C O C O C O Disk Slot Number Figure 4.28: Short Stream Excerpt pattern that is above minRead can be accepted if enough stagger is introduced to the arrival pattern so that the remaining portion of the schedule can be smoothed. M o s t of the slots dur ing which all streams are active wi l l have bandwidth requirements above minRead only a portion of the scenario length. The read-ahead achieved during the start-up of the scenario may often be enough to smooth out peaks for a schedule of 10 minutes or less. W i t h long streams, a scenario that has a cumulative bandwidth request above minRead may have the bandwidth over minRead for many minutes. Th is would result in a need for a massive number of buffers to supply the long period of over-subscription. The simple extrapolation from this one scenario indicates that the 137 abili ty to read scenarios with cumulative requests above minRead is not sustainable for long scenarios. Admiss ion performance with long streams has not been done, due to l imi ta -tions in the capacity of the disks. Storing a one-hour video stream would use more than an entire 2 G B y t e disk at the bit rates of the streams being studied. Simula-tions can give some intui t ion, but an enhanced hardware environment is necessary to provide more definite conclusions. The behaviour of long streams is included as a potential area of further study. 4.6 Analytical Extrapolation The vbrSim algori thm has exactly the same performance as the opt imal algori thm when the observed disk performance (actualRead) is exactly equal to minRead for every slot. If disk performance is not constant, then the admission decisions are conservative, because the actual disk bandwidth is greater than the worst case esti-mate. A s the difference between minRead and actualRead increases, the admission decisions of vbrSim diverge from those provided by the Optimal a lgori thm. The purpose of this section is to provide some analyt ical discussion of the relationship between this difference and the admission performance of vbrSim. Note that actual-Read varies frOm scenario to scenario, but is within a defined range for a particular type of stream on the disks used in the experiments. There are 3 cases to consider: 1) actualRead = minRead, 2) minRead ap-proaches 0, and 3) 0 <C minRead < actualRead. The first two cases are the extreme boundary conditions and would never be true in a real server. They provide the limits for the algori thm. • Case 1: The disk performance can be characterized by a single number, the number of blocks read per slot. Define this number to be N. The vbrSim algori thm simulates the reading of N blocks in every slot k. Let 138 the bandwidth requirements of the disk in each slot be defined as reqk for all k. A t the end of every slot, the read-ahead is appropriately adjusted as in Figure 4.6. If the read-ahead value is always greater than 0, the set of streams can be accepted. Since the disk wi l l always read N blocks when buffers are available, the read-ahead prediction is exact and no additional read-ahead can be achieved. If the read-ahead drops below 0, the new stream must be rejected. The Optimal algori thm would, however, make the same decision. Case 2: If the value of minRead is sufficiently low, then no streams can be accepted by the vbrSim a lgori thm. Choose minRead = 0. The simulation of the first slot adjusts read-ahead by 0 — reqk- Th i s is negative for the first slot where reqk > 0, and would result in rejection of the new stream. Case 3: 0 <C minRead < actualRead. Consider a server which has B buffers for disk blocks and is servicing moderate length, high-bandwidth video streams with average bandwidth of Sav blocks per slot. The experimental results show admission performance for a situation in which actualRead « 1.5 * minRead. W i t h simultaneous arrivals, the admission performance is determined by band-width alone. No scenarios that request over 80% of the achieved bandwidth are accepted by vbrSim. W i t h minRead = 23, this is approximately 21.5 blocks per slot (or 21.5 M b p s ) . Since the average bandwidth of the streams (strav) is approximately 4 M b p s , the system can accept between 4 and 6 streams simultaneously. If the actual-Read is much larger than minRead, the number of streams accepted would be identical (as would the total bandwidth accepted), but the percentage of disk uti l ization would decrease linearly. W i t h staggered arrivals, the acceptance decision is determined by a combina-tion of bandwidth and buffer space l imitat ions. The experimental environment 139 has scenarios where the maximum acceptable bandwidth is equivalent to 100% of the disk bandwidth given moderately large buffer space. The approximate number of streams acceptable in this situation is actual Read/sav = n streams. The amount of buffer space per stream is B/n = b buffers. Th is buffer space contains the data for an average of b/sav slots. If the observed bandwidth is doubled, but minRead remains the same, there may be a slight change in the number of streams accepted. A s achieved band-width increases, the amount of time necessary to fill the available buffer space decreases, allowing the same number of streams to be accepted at smaller stagger values. Under what circumstances could an addit ional stream be accepted? A n ad-dit ional stream could be accepted if the new schedule had an overall band-width requirement less than minRead for the remainder of the schedule. T h a t is total Requirements — B < minRead * scheduleLengthlnSlots. A s more streams (of the same approximate length) are accepted, only total Requirements changes. If total Requirements does not change drastically, the new stream i may be acceptable. This impl ies ' tha t the new stream must be of a short duration or low bandwidth (i.e., anything that creates a smaller object). In the steady state, each stream has approximately b/sav slots' worth of data in server buffers. The value of b decreases as an exponentially decaying function of n. Thus , less of each stream is in server buffers and more remains to be read at the time a new request arrives, compared with a smaller value of n. Consider the following example. Let t be the average length in slots for each stream. If each stream in a particular scenario has an average bandwidth of 4 blocks per slot {Sav), and there is 128 Mbytes of server buffer space (B = 2000), approximately 500 slots of data can be stored in server buffers. If the length of the stream averages 5 minutes (t = 600), then each stream is composed of 2400 blocks. 140 W i t h current disk speeds and reasonable stagger (less than 10 seconds), Section 4.4 shows that the system can accept between 6 and 7 streams of this size, providing for an average of 500/7 = 71 slots of read-ahead per stream. These 71 slots allow slightly over 12% of the data to be stored in server buffers. There are 14400 total blocks of data to be read for 6 streams and 16800 for 7 streams. After 6 streams have been accepted with a 10 second stagger between them, approximately 13000 blocks remain to be read. The number of blocks which would have been transmitted is: 6 5 ]T20 * 4 * (i - 1) = 80 * (X)0'= 1 2 0 0 ( 4- n) 1=1 2=0 If 7 streams were successfully accepted, then about 15000 blocks would remain immediately after the 7th stream was accepted. A d d i n g an 8th stream of similar characteristics at a 10 second stagger would inject another 2400 blocks to the schedule and impose a schedule length of 600 slots. In the meantime, 480 additional buffers would have been transmit-ted. Thus, the new schedule contains 16920 blocks for the next 600 slots, for an average requirement of 28.2 blocks per slot. The 8th stream cannot be accepted, no matter how high the actual disk performance was in the past. So, if the disk performance is doubled, then the percentage of the disk achiev-able is divided by 2. Disk uti l ization becomes a decreasing linear function.of averageRead/minRead. For shorter streams, this si tuation is slightly different, since a larger percentage of the stream can be buffered at the server node. Us ing the same analysis as in the previous paragraph, 71 slots out of 300 are buffered for 2.5 minute streams (24% of required data) after 7 streams have been admitted with a 10 second stagger between arrivals. 5920 blocks remain for the existing streams. A n addit ional stream request 10 seconds later means that 5920+1200—480 = 6640 blocks remain for the next 300 slots. The average requirement is 22.1 blocks 141 per slot, just enough to be accepted. The effect would be greater as the average playback length of the streams de-creases. If streams are 2 minutes long, 48% of the data is held in server buffers. A lmos t 10 streams can be accepted. This shows that an small increment in the number of streams is possible for short, but high-bandwidth video streams for very large values actualRead. Thus, the amount of bandwidth and the number of streams accepted by vbrSim can increase a small amount if the actual performance of the disk actualRead increases from just above minRead to a great deal more than minRead for certain types of streams. These are the smaller and shorter streams which can have most of the stream data in buffers, when a new stream arrives, and thus, the remaining requirements with the new stream added are acceptable by vbrSim. If the streams are large with respect to the buffer size, or have long playback duration, increasing actualRead cannot increase the acceptance by vbrSim. When actualRead increases, more scenarios become valid, and the Op-timal algorithm would accept them. Thus, the percentage of the valid scenarios that vbrSim can accept decreases linearly with the increase in actualRead. 142 C h a p t e r 5 Network Admission Control and Transmission The second major resource in l imited supply in a. C M F S is network bandwidth. To manage the bandwidth, a method of reserving bandwidth to prevent overflow while permit t ing high uti l izat ion is desired. In this chapter, the server flow-control':-mechanism which permits the server to push data to clients without requiring ex-plicit feedback is described in detail . A n admission algorithm is developed for the network system that ensures that the requirements of the streams never exceed the capacity of the network interface of an individual server node. In order to have such an algori thm, it is necessary to provide a measurement of the network bandwidth requirements over t ime. The development and integration of the network charac-terization and admission control algorithms is the third major contr ibution of this dissertation. Th is measurement of network bandwidth requirements is s imilar , .but distinct from the measurement of disk bandwidth for reasons which wi l l be described later in the chapter. The options available to solve the network bandwidth allocation problem are more restricted than for the disk, as the network bandwidth estimate is a fixed upper bound (maxXmit) on performance, while the disk admission utilizes a fixed lower 143 bound (minRead) on performance. It is not possible to send data at a faster rate than maxXmit. The concept of send-ahead cannot be used to take advantage of excess system bandwidth to the same degree as read-ahead was used for the disk. Also in this chapter, bandwidth smoothing techniques are introduced which alter the resource reservation needs of streams while st i l l providing a guarantee that the data wi l l be sent from the server prior to its deadline. Final ly , performance experiments are described that show the benefits of the smoothing methods and the admission algorithm in the context of the C M F S . 5 . 1 Server Transmission Mechanism and Flow Control The network component of the C M F S is modeled as a single outflow per active connection from the server node. It is assumed that a fixed percentage of the nominal network interface is available for sending data. From the server's point of view, this is a maximum value (maxXmit), which cannot be exceeded in any circumstance. Another assumption regarding network transmission is that the rates estab-lished by the client and the server are guaranteed rates. Once a rate is established for a particular length of time, the server transmits at that rate and the network interface on the client receives at the specified rate. If the network is incapable of supporting that rate, the algorithms presented in this chapter do not efficiently use the server resources. The server continues to assume that all the bits transferred have value to the client. Appropr ia te responses under these circumstances require the client application to reduce the requested bandwidth to a level that can be supported by the network. The details of client policies to solve this problem are outside the scope of this dissertation. There may be different underlying physical network structures which imple-ment the delivery of data to the client. M u c h research has been done in charac-terizing traffic patterns and analyzing the performance of the network itself. The 144 server has no control, however, over any aspect of this performance, except as to its own patterns and volume of sending. F rom the network point of view, the value of maxXmit must be a guaranteed min imum, as the factors which influence the throughput of the network are beyond the scope of this dissertation. When the connection is established between the server node and the client, a transmission rate is negotiated. This rate is based on the fact that, in the worst case, all the data for a specific disk slot may need to be sent during that disk slot, rather than at any previous time. The largest amount of bandwidth needed is for the largest disk slot and this is the minimum rate used. If the bandwidth is not available, then no connection can be established. W h e n data is actually transferred, however, there may not be enough buffer space at the client to receive this amount of data at the maximum rate. Therefore, the sender and receiver require methods to deal with overload at either the sender interface, the network itself, or the receiver. Th is is performed in most network environments v ia a flow-control mechanism. This flow control is often implemented by a protocol executed between the sender and the receiver. If the data is being transmitted too quickly, the receiver informs the sender that it cannot receive at the current rate. The sender either stops t ransmit t ing or reduces the rate appropriately until informed otherwise by the receiver. In the C M F S , a method which uses requests from the client to implement flow-control is impract ical , because of the unavoidable latency of responses to the requests. These requests are also unnecessary, since the server knows precisely how much data the client wi l l be presenting to the user in each display period, once the presentation has begun. The server is informed of the beginning of the presentation by the start packet (see Section 3.1). Flow-control can then be implemented exclusively at the server. The basic goals of the mechanism were introduced in Section 3.4. Th i s section provides a more detailed explanation of the server operations which implement flow control . A rate-based connection is used, but the server only sends data based on credit issued 145 by the network manager. Therefore, the channel is not fully utilized if the amount of credit issued is less than what could be sent at the full rate for the entire slot. Once the start packet has been received, the server begins computat ion of how many bytes have been displayed. A timer thread generates a t iming signal once per disk slot. Th is causes the network manager to examine all the active streams and perform the following actions: 1. Increase the server's estimate of the client buffer space capacity by the amount of da ta displayed, and therefore consumed from the client's buffers, in the previous disk slot t ime. 2. Decrease the server's notion of available client buffer space by the amount of data required to be sent in the current disk slot. 3. Issue credit to the Stream Manager for the current disk slot if data must be sent at this time in order to maintain continuity. 4. Whi l e there is excess bandwidth at the network interface, find a stream which has unused connection bandwidth, available client buffer space, and the next disk slot of data read ahead. Decrease client buffer space by the amount of data in the next disk slot and issue credit for the stream manager. This is repeated for each stream unti l either the server bandwidth is exhausted or no stream has sufficient client buffer space to accept more data. Th i s step achieves what is termed "network send-ahead". In most cases, the network wil l send ahead to fill up the client buffer space and streams wi l l have no work to do for steps 2 and 3. Send-ahead is l imited by the amount of client buffer space; the minimum required amount is the sum of the largest two disk slots. The network manager determines how many presentation units have been consumed by the client application when deciding the amount of credit that should 146 be issued to the stream manager for each connection. It does this based on the t imestamp in the start packet. B y knowing the start ing time, the server can deter-mine the exact buffer occupancy at the client, assuming the client is processing the stream according to the contract established at CmfsPrepare. It is important to note that no stream wil l be given credit to send beyond the min imum required as long as there are other streams that are sending their required data. Step 4 in the above procedure attempts to send ahead by an equal amount of time for each stream, within the constraints of client buffer space, server bandwidth, and individual connection bandwidth. In the description of step 4, credit is issued for one disk slot at a time. Credi t is only issued for a stream if. there are buffers queued for transmission. This is because all of the credit must be used up by the stream manager in the current disk slot. If credit is left over for a stream because there was no data to send, in some future slot, the connection rate may be exceeded. 5.2 Network Admission Algorithm Design There are a small number of possible approaches to determining network admis-sibili ty of a continuous media stream. M u c h previous work has been done on the statistical mult iplexing of variable bit-rate data streams on networks such as A T M [45]. Constant bit-rate connections have also been used to transport variable bit-rate data [60]. Statist ical approaches cannot provide absolute guarantees, since the entire concept of multiplexing is based on providing low, but non-zero probabil-ities of transient network switch overload. Kn igh t ly et al . [45] prove that their deterministic-guarantee admission control method significantly improves the net-work uti l izat ion above what is available using peak rate allocations. Th i s approach is only able to achieve a maximum util ization of about 30% in tests of full length video objects, which is still a fairly low network uti l izat ion. Th i s realization has led researchers to investigate statist ical admission guarantees or to utilize constant 147 bit-rate channels and provide smoothing or startup latency. This avoids peak-rate allocation of constant bit-rate channels or failures to establish V B R connections with acceptable qualities of service. In keeping wi th the philosophy, of disk admission control and resource usage characterization, time can be divided into network slots, and a detailed schedule of the bandwidth needed can be constructed in terms of network slot values. A network slot is an even multiple of the disk slot time (e.g. one network slot could be defined as 20 disk slots). B y using large network slots, the system can transmit data at a constant rate during a network slot in accordance with the bandwidth values in the schedule. Th is mechanism is known in other literature as Piecewise Constant Rate Transmission and Transport ( P C R T T ) [29, 60]. In network environments that provide point-to-point connections with guar-anteed bandwidth, establishing a connection requires negotiation between the sender, the receiver and the network components in between. In P C R T T , the transmission rate varies during the lifetime of the connection. Th i s requires re-negotiation to ensure that the new parameters of the connection are st i l l acceptable to all relevant entities [37]. There are two uses for network slots in the transmission subsystem in the C M F S . The first is for establishing the rates for individual connections. P C R T T maintains a constant rate for a period of time, at the end of which a new rate can be negotiated. It is reasonable to make this period of time a constant length and utilize it for the second purpose: admission control . The amount of da ta required in each network slot can be characterized by some process and used as input to the admission control a lgori thm. The size of a network slot is significantly larger than a disk slot for two main reasons. The first is the overhead of renegotiation. A renegotiation takes a non-t r iv ia l amount of time, and therefore, should be effective for a substantial amount of t ime. The second is the abili ty to smooth out the data delivery by sending data at 148 an earlier time in the network slot than is absolutely required, making use of client buffer space. Th is client buffer space can be significant, allowing more send-ahead. Other research has experimented with network slots ranging from 10 seconds to 1 minute in length [37, 95]. Zhang and Knigh t ly [95] suggest that renegotiations at 20 second intervals provide good performance. The ini t ia l tests in this chapter use 20 seconds as the size of the network slot. Addi t iona l experiments are conducted at the end of this chapter to determine an opt imal network slot size. The approach taken by the network admission control in the C M F S is to provide a deterministic guarantee at the server and use constant-bit-rate network channels. D a t a is transmitted at a constant bit-rate, subject to buffering constraints at the client for the duration of a network slot. These constraints are used by the network manager to provide credit to each stream manager for actual transmission of the data. The algorithms that were considered for the disk admission control are also candidates for the network admission control . Obviously, as shown in Chapter 4, Simple Maximum is inferior to Instantaneous Maximum, and thus, does not deserve ; serious consideration. Algor i thms such as Average wi l l not perform well in terms of correctness, because the estimate of network performance is a fixed upper bound. If admission is based on the average being less than the bound, then it is highly probable that there wi l l be data to be transmitted in slots that oversubscribe the network, causing the server to fail to meet its obligations. It is possible that all slots could be under the bandwidth boundary, but this is very unlikely with variable bit-rate streams. A n intr iguing possibility is to use vbrSim for the network. Th is has a number of advantages, including the fact that a uniform admission policy could be used for both resources. The smoothing effect enabled by sending data early could eliminate transient network bandwidth peaks. One of the necessary conditions is that largely differing amounts of data must be sent in each slot for each connection, correspond-149 ing to the particular needs of each stream. This requires that either the network or the server polices the use of the network bandwidth. One major benefit oi vbrSim for the disk system is the ability to use the server buffer space to store the data which is read-ahead.. Th i s buffer space is shared by all the streams and thus, at any given time, one connection can use several Megabytes, while another may use only a small amount of buffer space. The server buffer space is typically large enough to hold the data for dozens of disk slots per stream, when considering large bandwidth video streams. For scenarios with cumulative bandwidth approaching minRead, significant server buffer-space is required to enable acceptance. ' If the same relative amount of buffer space was available at each client, then vbrSim's send-ahead method for the network system could be effective. Unfor tu-nately, the server model only requires two disk slot's worth of buffer space to handle the double buffering. W i t h only the required client buffer, very litt le send-ahead is possible. Even this amount of buffer space is large compared with the min imum required by an M P E G decoder. Accord ing to the M P E G - 2 specifications [38], only three or four frames is required as buffer space. M a n y Megabytes of client buffer would be needed to provide space for on the order of a dozen disk slots' worth of video data. It is not practically reasonable to assume or require client applications to have this level of memory. Another factor to consider is the disk performance guarantee. In order to accommodate the variable send-ahead, the disk must have enough data read-ahead to be able to send. In this way, the client buffers can be looked upon as extensions to the server buffer space. The admission control on the network must consider the disk's abili ty to have the data buffered in time to send ahead. The disk admission only guarantees that the data wi l l arrive in the slot in which it is needed. Thus, an intimate integration between the two algorithms is required. Since the bandwidth of the network is known wi th certainty, the vbrSim 150 algorithm for the network would never make a conservative admission decision. A n y slot where the required amount of data is greater than the capacity would suffer failure, and would be rejected by an opt imal algori thm, because some da t a ' would not be sent successfully. Based on its knowledge of the client buffer space and the guaranteed data delivery rate from the disk, the disk system knows precisely which data it can transmit during every sJot. Unfortunately, this requires an extra level of bookkeeping.. Upon admission, the system must keep track of the latest possible moment at which each data block can be read off the disk and st i l l keep the commitment, so that the network admission control knows how much send-ahead it can perform. A l l of these enhancements could be made wi th a reasonable amount of ef-fort, having the network and disk decisions both made on a disk-slot granularity. Unfortunately, this would require a network renegotiation on a very frequent basis. Keshav et al . [37] indicate that renegotiation intervals should be relatively long. W i t h larger network slots, the amount of client buffer space for send-ahead shrinks in relative importance. It is most likely that the client buffer would be much less than could be transmitted in a network slot. A n y attempt at send ahead would fill the client buffer during the first network slot and then the credit that could be issued would resort to being a function of the client buffer space freed during each slot by presentation to the user. F rom then on, very li t t le addit ional smoothing would be possible. Final ly , the other significant performance benefit that vbrSim for the disk provides is contiguous reading, which allows a much greater disk bandwidth to be achieved, especially at the beginning of streams on the outer edge of the disk. The network has no analogous variabil i ty in performance. Thus, although vbrSim could theoretically be used as the network admission control algori thm, the l imited amount of smoothing between network slots and the constant bandwidth nature of the network interface restricts the performance im-151 provements possible. In light of this analysis, a straightforward adaptation of the Instantaneous Maximum algori thm is utilized for network admission control . When combined with the network characterization algorithm described in the following section, sufficient send ahead is achieved even with the large network slots. The network admission control algorithm is shown in Figure 5.1. Each hard-ware configuration has a maximum number of bytes that it can transmit per second. This value can be easily converted to the number of blocks per disk slot, previously defined as maxXmit, which is the only configuration-specific component of the al-gor i thm. The input is the network bandwidth characterization of each stream (the netBW array) and the length of the new stream (networkSlotCount). It is only necessary to examine the values for the slots in the new stream, using the loop counter netwSlot, since the comparison results for each slot are independent. A l l slots were below the threshold before the admission test and so the cumulative network bandwidth schedule for the server beyond the end of the new stream is irrelevant to the admission decision. NetworkAdmissionsTest( netBW[][], networkSlotCount ) begin for netwSlot = 0 to networkSlotCount do sum = 0 for i = firstC'onn to lastConn do sum = sum + A /e£Z?fl /[netwSlot][i] if (sum > maxXmit) then return ( R E J E C T ) end end return ( A C C E P T ) end Figure 5.1: Network Admissions Cont ro l A l g o r i t h m For each network slot considered, the network bandwidth requirements for 152 each stream are summed to provide a total network util ization for that slot. If the sum of every slot is less than maxXmit, the scenario is accepted. One particular problem in admission is that a stream request may arrive during the middle of a network slot. Streams must be admitted in the disk slot that the request arrives. Thus, the first network slot for the new stream is made shorter, so that the network slots end at the same time for all streams. 5.3 N e t w o r k B a n d w i d t h S c h e d u l e C r e a t i o n The network admission control algorithm is susceptible to spurious peaks in band-width requirements of individual streams. If such peaks occur in the same network slot for many streams, then a scenario may be rejected, when it could easily be sup-ported. These peaks cannot be easily smoothed via sending ahead, so it is impor-tant to provide a bandwidth characterization that has a small amount of variability. Bo th the overall level of the bandwidth reservation and the variabil i ty should be minimized if possible. T w o simple measures of network bandwidth requirements are the peak and the average bandwidths. Peak allocation reserves bandwidth that is significantly higher than the average. Average allocation is very likely to fail under heavy uti-l ization with variable bit-rate streams. Whi l e it is possible that peaks in one stream offset valleys in others, a complex probabili ty analysis is necessary to ensure that the risk of failure is sufficiently min imal . Another reason this method is insufficient is that sending at the average rate for the entire duration of the stream does not ensure that enough data wil l be present in the client buffer to handle peaks in the bandwidth which occur early in the stream. In this si tuation, the data arrives late at the client, causing starvation and preventing continuous presentation to the user. Average allocation, combined wi th a method to calculate overflow probabil-ities is the approach taken by V i n et al . [87]. Dur ing periods of overload, their system does not attempt to send all the required data. Rather, the server deter-153 mines how to judiciously spread the overflow among the active streams so that all users observe an equal degradation in service. . Using average allocation and constant transmission rates, starvation can be prevented by prefetching enough media da ta to allow continuous playback. This introduces start-up latency and requires a large client buffer. Bo th client buffer size and start-up latency have been parameters in previous research [80]. Significant reductions in either buffer space or latency can be achieved at the expense of in-creasing the other component, so design trade-offs must be considered. If both of these values are to be kept to a min imum, then an approach which utilizes the V B R profile of the network bandwidth schedule and sends at differing rates is essential. The tightest upper bound characterization on network bandwidth require-ments is the empirical envelope [45], which has been used in much of the previous work in this area. It results in a somewhat conservative, piece-wise linear function, specified by a set of parameters, but requires 0(n2) time to compute, where n is the number of frames in the stream. Approximat ions to the empirical envelope using multiple-leaky bucket packet delivery mechanisms have been used, providing either deterministic or statistical guarantees of data delivery across the network. Results indicate that the deterministic algorithms improve util ization above peak-rate allo-cation, but. are st i l l overly conservative. In this section, three algorithms for constructing a network allocation sched-ule (called a network bandwidth schedule) are presented. The following section compares their effect on admission performance. The first algori thm (denoted Peak) uses the peak disk bandwidth value in each network slot as the network bandwidth schedule value. Th is value is very easy to calculate v ia a linear pass of the disk block schedule and selecting the maximum within each network slot. The second algori thm, hereafter called Original, considers only the number of bytes that are required to be sent in each network slot independently of every other 154 network slot. The server knows this amount.when the disk schedule is created. The algorithm proceeds as follows: Each network slot is processed in order. W i t h i n the network slot, each disk slot is examined in order. The number of bytes in each disk slot is added to the total number of bytes for the network slot. T h i s v a l u e is divided by the number of disk slots encountered in the current network slot so far. Th is provides the cumulative average bandwidth required for the network slot up to this point. Th is process calculates the cumulative average for every disk slot within the network slot. The max imum of the cumulative averages.is chosen as the bandwidth value for the network, slot (rounded up to the next highest integer number of 64 K B y t e blocks). Th is method enables some peaks to be absorbed, because the server can send at the negotiated rate for the duration of the network slot as long as client buffer space is available. Peaks which occur late in the network slot have marginally less influence in the cumulative average and are absorbed easily. Th is is shown in Figure 5.2, for a network slot size of 20 seconds and the min imum required client buffer space. Here, the first three large peaks in disk bandwidth are at slots 68, 94, and 136. These peaks do not increase the minimum bandwidth needed at a l l . Unfortunately, if a peak in disk bandwidth occurs early in a network slot, then the maximum cumulative average for the slot is near ' this peak, as evidenced by the peaks at disk slots 201 and 241. Overa l l , this method does reasonably well in reducing the average network .bandwidth required, but depends on fortuitous slot boundaries. The server-based flow control policy (see Section 5.1) takes advantage of the client buffer by sending data to the client as early as possible. Since the value used in the network bandwidth schedule is the max imum cumulative average, more bytes can be sent in a network slot than wi l l be displayed at the client. Therefore, at the beginning of every network slot except the first, it is likely that some data wi l l be present in the client buffer. The leftover bandwidth may be larger for streams wi th highly variable disk block schedules, filling the client buffer to a larger extent. 155 14 Original (40) - TWINS 12 10 o 5 8 CT V tr in 1 « m imii Original Network — Disk Block Count 1  Ii l l K i •UL, 1 Ml inn W ji * ii i ll 0 _| , , , , , , , , ,J *~ 5 5 CM 5 5 5 5 <N 5 1- i - CM CM CM CO CO Disk Slot Number Figure 5.2: Network Bandwid th Schedule - Original ( M i n i m u m Client Buffer Space) The final network bandwidth characterization algori thm improves on the. second by explicit ly accounting for the fact that sending excess data to the client reduces the amount of data that must be sent during the next network slot. Each slot "carries forward" a credit of the number of bytes already in the client buffer at the beginning of a slot. In nearly all of these scenarios wi th 20 second network slots, there is sufficient excess bandwidth to fill the client buffer in the first network slot. If the precise average bandwidth and the cumulative average (rounded to the next highest block boundary) are very close, however, the amount of send-ahead achieved may be minimal . Th is improvement reduces the amount of bandwidth that must be reserved for each subsequent slot by the amount of data already in the 156 client buffer, smoothing the network bandwidth schedule. Thus, this is called the Smoothed a lgori thm. Variat ions of this strategy have been presented in other work [96]. The details of the Smoothed algorithm are as follows. Since the client buffer is filled (either partially or completely) at the end of a network slot, the number of bytes in the client buffer at the end of a network slot is counted as a negative bandwidth requirement in the next network slot. The bytes required in each disk slot are then added to the total requirements as in the Original a lgori thm. Due to the "negative bandwidth" from the send-ahead, the cumulative average is often also negative for the first few disk slots of the network slot. A peak in disk bandwidth . that occurs very early in a network slot can thus be merged with the previous network slot, reducing the cumulative average in the current slot. Figure 5.3 shows the smoothed network bandwidth schedule for the stream in Figure 5.2. In some cases, the reduction of the bandwidth schedule value in this manner means that after a particular slot, the client buffer is less full than it was at the beginning of that slot. A larger client buffer space enables smoothing to be more effective at reducing both the peaks and the overall bandwidth reservation necessary [29]. W i t h more client buffers, it is less likely that a full client buffer wi l l restrict the server's abili ty to send at the negotiated rate. A s well, the fact that network bandwidth peaks can be smoothed by send-ahead means that subsequent network slots wil l have reduced resource needs. This frees up bandwidth reservation on the network for subsequent requests, but has the potential disadvantage of not fully ut i l izing the client buffer at the end of every slot. For the purposes of this dissertation, making conservative use of network bandwidth takes precedence over completely ut i l izing the client buffer, although the more aggressive use of the client buffer is an intr iguing area for future study. This would be equivalent to vbrSim for the network where every byte of the client buffer is used for send ahead. 157 TWINS 14 12 10 •o 0) •5 8 C T 0> CC $ • m r n u •—Smoothed Network — Disk Block Count I U i i imr ' mi IT O Tt CO CM CO CM CM CM CO CO Disk Slot N u m b e r Figure 5.3: Network Bandwid th Schedule - Smoothed ( M i n i m u m Client Buffer Space) One major assumption that makes send-ahead smoothing possible is that the disk system has achieved sufficient read-ahead such that the buffers are available at the server for sending. The vbrSim disk admission control algori thm only guarantees that disk blocks wi l l be available for sending in the slot which they are required to be sent. In other words, all the peaks in disk bandwidth are properly accounted for at the end of each disk slot. M a n y of the streams included in the scenarios used in these experiments have several disk bandwidth peak values in each network slot greater than the network bandwidth schedule value. Only a stream that has str ict ly increasing bandwidth 158 during a network slot would not exhibit this characteristic. For a disk which is under heavy load, it is possible that the disk peaks which are smoothed by the network bandwidth schedule creation algorithm wil l not be read off the disk in time to send early. In other words, all of the bytes which are required for a particular disk slot must be read and transmitted in the next disk slot. If this is the case, the network bandwidth schedule value must be increased in order to transmit this peak amount in a single disk slot. For example, if the value in the disk block schedule for disk slot was D and the value for the network slot was N, such that D > N then the server would need to have sent D — N of those blocks early in order to keep up with the delivery guarantee. There are 2 cases to consider: either the disk has a current read-ahead of less than minRead - D buffers, or the disk has a current read-ahead of greater than or equal to minRead - D buffers. In the first case, if the disk was under heavy load and had not read minRead - D blocks before the current disk slot, it is possible that all D blocks would be read in the current slot and only ./V of them could be sent in the following slot. Th is is unacceptable and could lead to starvation at the client as the server does not know exactly which of the minRead blocks it wi l l read during the current slot. Therefore, the network bandwidth schedule value for the current slot must be increased to D to accommodate the fact that all D of the blocks must be sent in the current disk slot. In the second case, fewer than D blocks of the current disk slot must be read in the current disk slot time, so the D — N blocks in question must have been available in the previous slot time to be sent early. No adjustment is necessary. To summarize, if the disk has read ahead by at least minRead — D buffers, then the disk is far enough ahead and data can be sent early. Thus, any individual stream which has a disk bandwidth peak which is higher than the current network bandwidth wi l l not have to deliver any blocks which may be read in the current 159 disk slot. It is necessary that both the bandwidth and buffer space is sufficient to accommodate this read ahead in every case. The probabili ty of insufficient read-ahead is very s l im, because of the manner in which read-ahead is achieved by the disk subsystem. The disk admission algo-r i thm guarantees that in steady state, the guaranteed bandwidth from the disk is always sufficient to service the accepted streams. In fact, the achieved disk band-width is greater than this value, because disk performance is variable and the average performance is somewhat above minRead, as long as there are empty buffers. If a new request fails, the accepted scenario wil l always have a somewhat lower band-width request than the capacity of the disk; due to the large granularity of video objects. The disk system wil l read as fast as permitted by the buffer space and in steady state, all buffers are filled. Steady state occurs shortly after a new stream is accepted as shown in Chapter 4. Whi l e this appears to be a tight integration between the disk admission and network admission algori thm, this process merely adjusts the connection rate in the rare case where the disk system emulation has not been successful in achieving guaranteed read-ahead. A further consideration in smoothing is the time granularity of the network slot. If very short network slots a reused , then only a small amount of averaging within a slot is possible. The network bandwidth schedule would st i l l contain a great deal of variability, which would result in .a greater chance of peak values exceeding maxXmit. If the network slot is too long, then the bandwidth required approaches the average of the entire stream. This prevents the network system from sending at reduced bandwidths during periods of low bandwidth requirements. 5.4 Network Bandwidth Schedule Creation Performance In this section, the admission results of the Peak, Original, and Smoothed network bandwidth schedule creation algorithms are compared. The ini t ia l performance ex-periments were implemented wi th single disk servers and the results were combined 160 as though they were executed on a multi-disk server. Th is was due to l imitat ions in the hardware environment available for experimentation. The first observation that can be made is that the average bandwidth reser-vation is significantly greater than the average bandwidth util ization for all of the bandwidth schedule creation algorithms. Tab le 5.1 shows that this difference is less for the smoothed allocation (17% rather than 27%). Thus, it is reasonable to ex-pect that uti l izing the Smoothed algorithm wil l result in acceptance in many network scenarios that the Original algorithm rejected. The Peak algorithm reserves almost 40% more bandwidth than required. It is also reasonable to expect that very few scenarios wi l l be accepted b^y the Peak a lgori thm. Average of all Scenarios Required B / W Requested B / W Peak Requested B / W Original Requested B / W Smoothed 96.5 M b p s 136.5 M b p s . .12.2.8 M b p s 113.3 M b p s Table 5.1: Network Bandwid th Character izat ion Summary A more detailed evaluation was obtained by examining these scenarios wi th respect to the relative amount of disk bandwidth ' they requested and the corre-sponding admission decisions. The scenarios were grouped according to the sum of the average bit-rates of each stream. 193scenarios were presented to a C M F S as if configured with 4 disks and attached to a 100 M b p s network interface. In the first test, each disk had a similar request pattern that issued requests for delivery of all the stream simultaneously. The system with four disks was able to achieve between 110 and 120 M b p s in cumulative bandwidth. The scenario wi th the largest cumulative bandwidth that the Smoothed algori thm could accept was 93 M b p s , as compared with 87.4 M b p s for the Original a lgori thm. One major advantage of the vbrSim algori thm is the ability to take advantage of read-ahead achieved when the disk bandwidth exceeded the minimum guarantee. 161 This is greater when only some of the streams are actively reading off the disk, reducing the number of seeks needed per slot. Thus, more simultaneous video clients can be supported by each disk. When the scenarios are submitted to the C M F S with stagger between arrivals, a greater cumulative load is presented to the network, as almost all of the scenarios can be supported by the disk system. The achieved bandwidth of the disk increases by approximately 10%, resulting in cumulative performance between 125 M b p s and 133 M b p s . Due to the achieved read-ahead, only 9 of the 193 scenarios are rejected by the respective disk systems. The results of admission for simultaneousarr ivals and staggered arrivals are shown in Tables 5.2 and 5.3. Only those scenarios that were valid from both the network and the disk point of view were considered. Even though 184 scenarios with staggered arrivals were accepted by the disk, only 130 of them requested less than 100 M b p s . Figures 5.4 and 5.5 show the same information in a visual form. If is quite clear that smoothing did change the number of streams and bandwidth that could be accepted by. the network admission algori thm. The results show that smoothing is an effective way to enhance the admission performance of the network admission algori thm. A maximum of 80% of the net-work bandwidth can be accepted by the original algori thm on simultaneous arrivals, although most of the scenarios in that range are accepted. The smoothing operation allows almost all scenarios below ,85% request to be accepted, along with a small number with slightly greater bandwidth requests. A s expected, the Peak algori thm accepts very few scenarios, none which required over 70% of maxXmit. In Table 5.3, we see that combining smoothing with staggered arrivals has a compounding effect in increasing the bandwidth supportable by the server. None of these high bandwidth scenarios are accepted by any of the network admission algorithms. A few scenarios with a request range of between 80% and 90% can be accepted with the Original a lgori thm, which is a slight improvement over the simultaneous arrivals case. The Smoothed a lgori thm accepts nearly all requests 162 100% 90% 80% T3 0) o. 70% 0) u u < 60% in . O n 0) o 50% 40% 2. 30% Q. 20% 10% 0% • Peak • Or ig inal • S m o o t h e d 60-64 65-69 70-74 75-79 80-84 85-89 Percentage of Disk B/W Requested 90-94 95-99 Figure 5.4: Simultaneous Arr iva ls : Network Admiss ion below 90% of the network bandwidth. The reason for this increase is that only a few streams are reading and t ransmit t ing their first network slot at the same time. The first network slot is the only one that cannot benefit from pre-sending data and cannot be smoothed. Thus, it is more likely that the peaks in bandwidth for the entire scenario with simultaneous arrivals occur in the first network slot. W i t h this stagger, the existing streams are sending at smoothed rates when a new arrival enters the system, meaning lower peaks for the entire scenario. The Peak algorithm shows no improvement whatsoever, which is as expected. The results of this section show that the simplest smoothing technique (Orig-inal algorithm) reduces the peaks and provides a substantial improvement over the 163 100% 90% 80% TJ 0> g- 70% u o < g 60% ™ m § 50% -| CO O o> 40% CO ra 0) DL 20% -I 10% 0% 60-64 65-69 70-74 75-79 80-84 85-89 Percentage of D isk B/W Reques ted • Peak FJ Original • Smoothed 90-94 95-99 Figure 5.5: Staggered Arr iva ls : Network Admiss ion Peak a lgori thm. The network bandwidth schedule that it produces st i l l has substan-tial variability. The allocation required is significantly above the average bandwidth of the streams in the scenarios. The Smoothed algori thm provides an even better network characterization that increases the maximum network uti l izat ion up to 90%. 5.5 Stream Variability Effects In this section, the influence of stream variabil i ty on the network admission control algorithm and the bandwidth schedule creation algorithms is examined. To evaluate this factor, three configurations of streams were util ized: mixed variabili ty streams, 164 Pet # of Valid # Accepted # Accepted # Accepted Band Scenarios Peak Original Smoothed 95-100 0 0 0 0 90-94 5 . 0 0 0 85-89 4 0 0 2 80-84 18 0 1 17 75-79 32 0 19 32 70-74 19 0 18 19 65-69 11 6 11 11 60-64 2 2 2 2 Tota l 91 8 51 85 5.2: Network Admiss ion Performance: Simultaneous Arr iva ls (% of Net 1 Pet # of # Accepted # Accepted # Accepted Band Scenarios Peak Original Smoothed 95-100 5 0 0 0 90-94 19 0 0 2 85-89 15 0 2 14 80-84 27 0 3 27 75-79 29 0 18 29 70-74 22 0 22 22 65-69 11 5 11 11 60-64 2 . .2 2 2 Tota l 131 7 59 106 Table 5.3: Network Admiss ion Performance: Staggered Arr iva ls (% of Network) low variabili ty streams, and high variabili ty streams. Each configuration simulated the storing of 44 moderate bandwidth video streams on a number of disks. The video streams were chosen from the list of streams given in Table 2.1. In the high and low variabili ty cases, some duplication of streams was performed in order to have a sufficient number of streams in each configuration. The first configuration consisted of 44 unique streams that had a mix of variabili ty in the scenarios (denoted M I X ) . The second configuration contained streams with low variabil i ty (denoted L O W ) . 165 The 25 lowest variabili ty streams were used with 19 replications to complete the 44 streams. The same process was used to obtain the third configuration, wi th the 25 highest variability streams. A s seen in the Section 4.4.1, the streams would likely occupy between 4 and 5 disks. Since the purpose of these tests was to examine only the network admis-sion algori thm, the evaluation was done by simulation only. Hardware l imitat ions prevented each configuration from being stored in a C M F S so that the tests could be done without significant system enhancements and reconfigurations. A five disk configuration was not readily available and was not necessary to evaluate this aspect of network admission, so the placement on disk of each stream was ignored. The number of disks could range from 4 to 44. In the latter case, each disk would contain only one stream. Aga in , 193 scenarios were submitted to the simulation of a multi-disk, single-node C M F S . These scenarios generated between 58 and 135 M b p s of bandwidth (as measured by the sum of the average bit-rate of each stream). The admission results for simultaneous arrivals are shown in Table 5.4. The results for the Peak algori thm are not shown, since it was previously shown to be incapable of accepting stream scenarios above 70% of maxXmit for the ini t ia l set of streams. For the low variabil i ty streams, the acceptance rate of complete scenarios in the 80-84% range increased from 52% acceptance to 72% acceptance, by using the Smoothed a lgori thm. The high variability streams did not have any scenarios accepted in the 80-84% range with the Original a lgori thm, but this increased to 29% with the Smoothed a lgori thm. In the 75-79% range, the acceptance rate increased from 19% to 93% for high variabili ty streams. The network could admit scenarios of low variabili ty streams at a higher network uti l izat ion overall, but smoothing had a more drastic effect on the acceptance rate with high variabil i ty streams. Higher variabili ty streams have more peaks to smooth, so this test shows that the Smoothed 166 algorithm is effective in achieving this smoothing. Network M I X E D L O W H I G H Pet Original Smoothed Original Smoothed Original Smoothed 95-100 0/5 0/5 0/22 0/22 0/7 0/7 90-94 0/19 0/19 0/10 0/10 0/10 0/10 85-89 0/15 4/15 0/3 0/3 0/17 0/17 80-84 3/27 27/27 13/25 18/25 0/21 6/21 75-79 18/29 • 29/29 24/25 24/25 5/27 25/27 70-74 21/22 22/22 21/21 21/21 16/22 22/12 65-69 11/11 11/11 22/22 22/22 25/27 27/27 60-64 2/2 2/2 30/30 30/30 .5/5 5/5 Table 5.4: Network Admiss ion Performance: Simultaneous Arr iva ls (% of Network) Table 5.5 shows the admission performance for arrivals which are staggered by 10 seconds. When arrivals are staggered and the Smoothed a lgori thm is used, a majority of requests below 90% are granted admission. Some scenarios of mixed-variabili ty and high-variability streams are accepted in the 90-94% request range. A g a i n , the effect of smoothing is confirmed to be greater for the high variabili ty streams than for the low variabili ty streams (from 5% to 95%, rather than from 56% to 100% in the 80-84% range). The acceptance rate with low variabil i ty streams is maintained at a higher level than the mixed variability streams for the 85-89% request range. Correspondingly, scenarios'of mixed-variabil i ty streams have higher acceptance rates than the scenarios of high-variabili ty streams in terms of acceptance rates. A n interesting observation is that the 90-94% request range has the best performance with high-variability streams. 5.6 Network Slot Granularity Whi le the disk admission control is based on relatively short disk-reading slots, the network admission is based on slots which are significantly longer.. Selecting an appropriate slot length may substantially affect the network bandwidth schedule and 167 M I X E D L O W H I G H Pet Original Smoothed Original Smoothed Original Smoothed 95-100 0/5 0/5 0/22. 0/22 0/7 0/7 . 90-94 0/19 4/19 0/11 0/11 0/10 5/10 85-89 3/15 13/15 0/3 2/3 0/17 9/17 80-84 9/27 27/27 14/25 : 25/25 1/20 20/21 75-79 20/29 329/29 25/25 . 25/25 10/27 27/27 70-74 22/22 22/22 21/21 21/21 16/22 22/12 65-69 10/10 10/10 22/22 22/22 : 26/27 27/27 60-64 2/2 2/2 28/28 28/28 5/5 5/5 Table 5.5: Network Admiss ion Performance: Staggered Arr iva ls (% of Network) subsequently the admission performance of the network admission control a lgori thm. Since Section 5.5 showed that the Smoothed algori thm outperforms the Original algori thm, results are only shown for the Smoothed a lgori thm. The network admission experiments in section 5.4 used 40 disk slots (20 seconds) as the length of the network slot. Th i s value was chosen as a result of other work which used network slots of 20 seconds and 1 minute in durat ion [95] and suggested 20 seconds was a reasonable tradeoff between smoothing and network renegotiation frequency. To establish the validity of the selection of 20 seconds as an opt imal (or at least reasonable) choice for the network slot size, the admission performance is compared for network slot sizes of 1/2 second, 10 seconds, 30 seconds, and 600 disk slots in addition to the original choice of 20 seconds. The first and the last case are the extreme boundary conditions. The 1/2 second slot is identical to the Instantaneous Maximum disk admission control algori thm. The 600 second slot case is very similar to allocating bandwidth on an average bit-rate basis. Th is average is not, however, the average of the stream but the maximum of the cumulative average bit-rates from the init ial frame (aggregated into disk slot groups). Th i s is influenced greatly by the first significant disk bandwidth peak, as described in Section 5.3. This 168 is shown in more detail by Figure 5.6, where the first peak is at disk slot 6 and a later peak is at slot 248. Only the first peak significantly affects the cumulative average. m 3 - vbrnet-1 - Cum Ave vbr-net20 Slots Figure 5.6: Network Slot Characterizat ion Summary results for the network bandwidth schedule creation algorithms in these four cases are given in Table 5.6. A s the network slot size increases, each schedule becomes smoother, but the average requirement increases. Note that the average of the 20 second and 30 second slots are extremely close. The effect of this difference in characterization on the admission results for simultaneous arrivals is shown in Table 5.7. The best performance is given by the 10 second network slot, but there is a very small difference in performance, especially for the high variabil i ty streams. 169 Network Slot Length (in Seconds) 1 10 20 30 600 Bandwid th Schedule Average 4.065 4.249 4.555 4.551 5.176 Bandwid th Schedule Std . Dev 1.020 .813 .659 .61 0 Table 5.6: Network Bandwid th Schedule Summary for Different Slot Lengths For this set of streams, no scenario requesting greater than 90% of the net-work bandwidth can be accepted. It is clear that the worst admission performance occurs for the 600 second slot. The network bandwidth schedules are greatly affected by peaks early in the stream. The most interesting range of requests is the 85-89% range. For the mixed variabili ty and high variabil i ty streams, there are several sce-narios in that range. It appears that the worst performance, rather than the best, occurs for a network slot of 20 seconds (excluding the 600 second boundary case). The requests in this range are very sparse for the low variabil i ty streams. This is unfortunate, since all requests in the next higher range are rejected and most of the requests in the next lower range are accepted. For the 80-84% range, nearly all of the mixed variabili ty streams are accepted, excluding the 600 second slot column. For low variabil i ty streams, fewer scenarios get accepted than in the mixed variabil i ty case. The table shows that the best admission performance is for 1/2 second slots and that acceptance gets steadily worse as the network slot size is lengthened. The major reason for this behaviour is the simultaneous arrivals of the streams. W i t h 1/2 second slots, there is a possibility that the peaks in some of the first few disk slots wil l be offset by valleys in others, which wi l l reduce the peaks in the scenario. When a longer network slot is used, the value for each stream is very close to that of the first peak, but that value is in effect for the entire durat ion of the network slot. Since valleys at the beginning of the schedule are not created in the network schedule, there is no possibility for a valley to offset that peak. For 170 Low Var iabi l i ty Pet 1/2 sec 10 sec 20 sec 30 sec 600 sec 95-100 0/22 0/22 0/22 0/22 0/22 90-94 0/10 0/10 0/10 •0/10 0/10 85-89 2/3 0/3 0/3 0/3 0/3 80-84 25/25 20/25 18/25 16/25 15/25 75-79 21/21 21/21 21/21 21/21 21/21 M i x e d Var iabi l i ty Pet 1 /2 sec 10 sec 20 sec 30 sec 600 sec 95-100 0/6 0/6 0/6 0/6 0/6 90-94 6/19 1/19 0/19 0/19 0/19 85-89 13/15 9/15 4/15 5/15 0/15 80-84 27/27 26/27 27/27 25/27 16/27 75-79 29/29 29/29 29/29 29/29 29/29 Hig l Var iab i ity • Pet 1 /2 sec 10 sec 20 sec 30 sec 600 sec 95-100 0/7 0/7 0/7 0/7 0/7 90-94 0/10 0/10 0/10 0/10 0/10 85-89 5/17 3/17 0/17 0/17 0/17 80-84 11/21 14/21 5/21 4/21 1/21 75-79 27/27 27/27 25/27 26/27 17/27 Table 5.7: Network Admiss ion Granular i ty : Simultaneous Arr iva ls (% of Network) many streams, the largest network slot value occurs in the first slot, as it is not pos-sible to smooth the bandwidth from prior network slots. W h e n all stream requests are submitted simultaneously, this nearly always results in the highest bandwidth request for the entire scenario occurring in the first network slot. A clearer benefit of a longer network slot is seen wi th staggered arrivals W i t h 1/2 second slots, approximately the same occurrence of peaks and valleys would occur regardless of any particular staggering effect. In the longer network slot cases, the Smoothed algori thm generates smaller values for all but the first slot. Th is should produce a reduction in peaks when some of the streams are sending at smoothed data rates. The results for a stagger of 10 seconds are shown in Table 5.8. The only request bracket where the admission performance shows significant 171 variation in al l types of streams is the 90-94% range. In this range, the 10 second network slot performs significantly better than the 20 second network slot and the 30 second network slot, except for the high-variabili ty streams, in which admission performance is identical. For high variabil i ty streams, the 20 second slot is better than the 30 second slot, but for the mixed variabil i ty streams, the 30 second slot outperforms the 20 second slot. These results are quite inconclusive, part ial ly due to the small number of scenarios in that request band. A more exhaustive analysis should show some clearer trends. Low Var iabi l i ty Pet 1/2 sec 10 sec 20 sec 30 sec 600 sec 95-100 0/22 0/22 0/22 0/22 0/22 90-94 0/10 7/10 0/10 0/10 0/10 85-89 2/3 . 3/3 3/3 2/3 0/3 80-84 25/25 ,25/25 ;•• 25/25 25/25 15/25 75-79 21/21 21/21 ; 21/21 21/21 21/21 M i x e d Var iabi l i ty Pet 1/2 sec 10 sec 20 sec 30 sec 600 sec 95-100 0/6 0/6 0/6 0/6 0/6 90-94 6/19 13/19 4/19 6/19 0/19 85-89 13/15 15/15 13/15 13/15 0/15 80-84 27/27 26/27 27/27 26/27 16/27 75-79 29/29; 29/29 29 /29 , . 29/29 29/29 H i g l Var iab i i ty Pet 1/2 sec 10 sec 20 sec 30 sec 600 sec 95-100 0/7 . 0/7 . 0/7 0/7 0/7 90-94 0/10 5/10 5/10 1/10 0/10 85-89 5/17 14/17 9/17 9/17 0/17 80-84 . 11/21- 20/21 20/21 21/21 1/21 75-79 27/27 27/27 27/27 26/27 17/27 Table 5.8: Network Admiss ion Granular i ty : Staggered Arr iva ls (% of Network) It should be noted, however, that the results for the 1/2 second slot case show no improvement wi th staggered arrivals. Since there is no smoothing performed in this case, the individual peaks in bandwidth of each stream have the same probabil i ty 172 of occurring in any slot. The shape and peak values of the server network bandwidth schedule should not be very different between the staggered and the non-staggered case. To verify this conjecture, the network bandwidth schedules for a small num-ber of scenarios were compared. Twenty-six scenarios from different configurations were analyzed. The difference in the total cumulative network block requests was very small . A total of .13 more blocks were required in the staggered case, which is an increase in bandwidth required. When this was averaged over 208 streams which comprise the 26 scenarios, it amounted to a .06 block per network slot addit ion. A l -though this did not indicate exactly which scenarios were accepted, it did indicate that the number of accepted scenarios in each band wil l not change significantly if stagger is introduced in the 1/2 second slot case. F rom these results, it appears that a 10 second network slot provides a good balance between acceptance rate with both simultaneous arrivals and staggered arrivals and the need to minimize the overhead of-re-negotiating network bandwidth reservations. Further work to determine an opt imal network slot size for each stream type could be a promising area of refinement. The ini t ia l results for these stream types indicate, however, that the admission performance wi l l be better with smaller network slot sizes, rather than larger ones. Th is is somewhat surprising, given the ini t ia l intui t ion that longer streams could be smoothed more effectively. A reasonable conclusion is that smoothing is indeed effective, and that the sooner in the schedule that the smoothing takes effect, the better the performance results. 5.7 Network Admission and Scalability The results of Section 5.4 enable an addit ional aspect of the C M F S design to be evaluated: scalability. Al though this was not a goal of that particular, test, it was observed that the ini t ia l configuration of the server with four disks could not saturate the network interface. One aspect of scalability is the manner in which components 173 can be added to achieve scalable performance. It is desirable that the disk and network bandwidth scale together. In the configuration tested, 4 disks with minRead = 23 provided 96 M b p s of bandwidth with a network interface of 100 M b p s . A t this level of analysis, it would seem a perfect match, but the tests with simultaneous arrivals did not support this conjecture. The tests showed that with simultaneous arrivals, a system configured' with guaranteed cumulative disk bandwidth approximately equal to nominal net-work bandwidth was unable to accept enough streams at the disk in order to use the network resource fully. There were no scenarios accepted by the disk that requested more than 94% of the network bandwidth. In Table 5.2, there are only 4 scenarios in the 85-89% request range, that were accepted by the disk system. In Table 5.3, there were 15 such scenarios. Th is increase is only due to the staggered arrivals as the same streams were requested in the same order. When staggered arrivals were simulated, the network admission control be-came the performance l imi ta t ion, as more of the scenarios were accepted by the disk. There were no scenarios that requested less than 100 M b p s that were rejected by the disk. Th i s arrival pattern would be the common case in the operation of a C M F S . Thus, equating disk bandwidth with network bandwidth is an appropriate design point which maximizes resource usage for moderate bandwidth video streams of short duration if the requests arrive staggered in t ime. 174 C h a p t e r 6 Related Work Recent work in mult imedia computing systems has been very extensive. T w o large surveys by Adie [2, 3] indicate the wide-spread interest in distributed mult imedia, both in the research and commercial communities, and specify the scope and focus of many projects. Several of these projects encompass a very wide focus and transcend the issues involved in continuous media server design. The existing research can be categorized as follows: server implementations, operating system level support for multimedia, data storage layout optimizations and simulation and/or analyt ical evaluation of resource reservation mechanisms for the disk and network systems. There are several server design and implementation issues raised in previous research efforts. A key question to consider when evaluating the existing work and its appro-priateness to the design of a general, heterogeneous, continuous media file system is the level of abstraction of certain components of the system. In some work, the details of the user interface are completely ignored, while in others the details of disk block layout are not discussed in any detail and the focus may be rather on network issues. Each decision is appropriate for the specific environment considered, but at the expense of an accurate and realistic model that incorporates the heterogeneity and scalability that is desired. 175 The remainder of this chapter discusses the approaches and contributions of other research to the issues involved in the support and development of mult imedia systems, and in particular, servers designed for continuous media. 6.1 Complete System Models One deficiency in most of the previous work is the integration of the specific algo-rithms or hardware features in a complete system model. Complete system models have been derived in only a small portion of the previous work. In particular, A n -derson et a l . [5], K u m a r et a l . [47], Hsieh et. al [56], Heybey et al . [42] and Li t t le and Venkatesh [55] have provided model descriptions which are general enough to accommodate a scalable, heterogeneous server. There have also been several com-plete server systems implemented, most notably I B M ' s Tiger Shark Fi le System [40] and Microsoft 's Net Show Theater Server [64], which is based on the Tiger Video FileServer [9]. Even at that level, some important aspects of a complete model, such as scalability or variable bit-rate awareness, are left out. Complete models for the lower-level support of mult imedia are also provided. These models do not correspond to particular server implementations, but do discuss relevant design issues. L i t t l e and Venkatesh focus pr imari ly on the user query interaction and do not consider the real-time interface component. Anderson et a l . base their work on a model that does not have scalabili ty considered, although they do recognize the need for a powerful stream control interface, which has similar expressibility to that provided by the C M F S described in this dissertation. Hil lyer and Robinson have a system model that matches Anderson in many ways, but their focus is on more general system issues, including, but not l imited to continuous media. A server which has a system model that is similar to the C M F S is Call iope [42]. It contains facilities for extensibility and uses off-the-shelf hardware to achieve a scalable system. Their server design consists of a co-ordinator and multiple media storage units ( M S U s ) . Th is corresponds well to the design of an administrator and 176 server nodes as in this dissertation. The system developed by K u m a r and Kouloheris [47] has many similar archi-tectural features as the C M F S . They focus on storing network packets at the disks and bypassing the C P U entirely once transmission has begun on a stream. They do not focus on admission control specifically and include only a very brief model for the user's interaction with the system. Largely differing sizes of streams and fast-motion delivery complicates their scheduling process significantly. Message passing v ia a reliable connection is used to establish rates for band-width consumption by the client application for delivery of presentation objects. A n important component in Cal l iope is the interleaving of the delivery schedule and the media content in a single file. Thus , this model is incapable of handling fast-forward and reverse on-line because the delivery schedule is not available separately. Off-line filters are provided which create a fast-forward version of the stream. The model does not deal with the user's interface or the details of admission control beyond the specified bandwidth consumption rate, which is constant for a stream, even if it contains variable bit-rate data. Hsieh et a l . [56] provide considerable detail on the specific implementation of their model, but do not clearly distinguish between the principles behind the model and the instance provided by their particular hardware environment. They perform extensive experiments on the number of supportable clients, but do not describe any mechanism by which bandwidth can be reserved and admission to the system controlled. In the Tiger Shark F i le System [40], support for continuous-time data is provided by admission control and disk scheduling. They give no detail on the admission control and use deadline approaches to retrieving disk blocks. Str iping for increased bandwidth is essential and retrieval across the network is performed by the client "pul l ing" data using t radi t ional filesystem calls. Each file can be striped across a very large number of disks (possibly all the disks in the system). 177 Bandwid th is reserved for clients reading at a fixed rate, imply ing only constant bit-rate clients can be supported. In this system, an end-to-end model is provided, including mechanisms for replication and fault tolerance, but it lacks the flexibility needed for efficiently dealing with complex user interactions.with V B R streams. In the Tiger Vide Fileserver [9]. by Microsoft , a similar end-to-end model is provided. The components which make up the server entities (tiger controllers and cubs), are similar to the notion of an administrator and server nodes. They claim that the server scales arbi trari ly by striping; each file across each disk in the entire system. This provided the ability for extremely high bandwidth on popular objects, if the access is staggered in time. For example, over 7000 users can view the same two-hour movie on a system wi th over 7000 disks, if they are space equally distant from each other in time (i.e. request delivery with one second stagger). Their distributed scheduling mechanism [10] ensures that admission of re-quests for constant bit-rate streams wil l not overload the system. Unfortunately, they are extremely conservative in requiring that no two users ever request data from the same disk during the same slot, el iminating seeks during slots. The major focus of this system is high availability and low-cost scalability, but it is in fact, quite over-engineered. Al though the system uses off-the-shelf P C hardware, it re-quires more resources than necessary because of the manner in which it performs allocation and scheduling. The fault-tolerant capabilities also increases the amount of hardware required. The C M F S attempts to maximize the number of users that can retrieve ob-jects from a particular disk, by examining the detailed requirements of each stream, and by intelligently allocating the disk bandwidth resource on a t ime-varying man-ner. It is unclear what kind of fragmentation problems and differing utilizations occur on either Tiger Shark or Netshow when highly variable streams are utilized. Netshow is heterogeneous in that it does not matter what kind of encoding is used in the streams, but the clients that have been written only provide M P E G encoding. 178 Other formats have been used in testing, but no performance figures are given for the other formats. • System models for server support are described in reasonable detail . Whi l e these systems do not describe servers, they give detailed analysis of the relevant performance and data delivery mechanism issues. The goal in these systems is to provide an infrastructure at the operating system level that permits a number of differing mult i-media applications to be implemented. Tierney et a l . have developed a system [84, 85] that has its major use in storage/display of large images. It is a lower-level approach that is also claimed to be capable of supporting continuous me-dia. The storage system is capable of transferring data at an aggregate throughput rate of hundreds of M b p s , and is designed for the support of visualization environ-ments in conjunction with the M A G I C gigabit testbed. Th is configuration is used as the basis for exploring system design issues. Th i s design model is similar to the Zebra striped file system [41], because of the distributed nature of the organization and str iping of the data, but contains more special, purpose design. The main goal is to enhance throughput, but not to provide redundancy. A n image server that distributes tiles of an image across video data servers on a network is the most sig-nificant application considered. Th is application is somewhat similar to the C M F S , so the general system design issues are relevant. Some applications considered re-quire reliable transmission of the images they request, so images which are incorrect or misordered, are continually requested until their real-time deadline has passed. This requirement is quite different than in the C M F S . . A general model for distributed mult i-media support is described in Mul l en -der at a l . [66, 52]. Mul lender et a l . [66] provide a holistic approach to scheduler support for processing mult imedia data in real-time by using a special micro-kernel called Nemesis. Resources can be allocated to applications within small time win-dows, but generally, the application must adapt to the resources it is given. Enti t ies similar to user-processes (called domains) are activated based on a weighted schedul-179 ing discipline. If resources remain at the end of a particular scheduling interval (analagous to a slot), they are shared among the domains. Earliest-deadline-first is the method used as the choice of scheduling algorithm for the remaining resources. This model considers other workload in a system besides continuous media, so it does not provide the strict guarantees desired by a C M F S , but supports the operational models that would be used in most continuous media applications. The U B C C M F S provides a total system model, but at a higher level of abstraction and stays away from low-level operating system details. The real-time scheduling algorithm used in the system is earliest-deadline-first, which has been shown to be opt imal if the requirements of the tasks are less than the capacity of the system. 6.2 Synchronization of Media Streams A system that provides flexible user access to continuous media data must permit synchronization of streams at either the client or the server. Synchronization of mul-tiple media streams has been a large topic of research which is addressed at various levels. Models of synchronization which deal wi th mult i-media hyper-documents involve complex temporal relationships between continuous and non-continuous ob-jects and is outside the scope of this dissertation. Detai led discussions can be found in L i and Georganas [53] and Bul terman and van Liere [11]. The level of client synchronization addressed by the C M F S is that which is needed for synchronization of a single video stream (or multiple resolutions of a single video clip stored as sepa-rate streams in the case of scalable video encoding) and a single audio stream, with optional synchronized text streams. In Anderson and Homsy [4], the synchronization mechanism is a logical time system ( L T S ) , which provides time-stamps for the data units. Software processes or hardware devices at client workstations deal wi th t ime skew by skipping presentation units to speed up a stream which is slow in decoding/displaying and/or pausing the 180 presentation of one or more of the streams while waiting for data for a tardy stream. Th i s is to enable peripheral presentation to be kept in synchronization. The server provides data in an earliest deadline first manner to support the synchronization effort at the client station. Since data is time stamped, the client knows what the display time of each presentation unit should be and attempts to keep all media streams as close to the presentation rate as possible. The client application is capable of specifying an allowable skew so that the server's delivery requirements may be somewhat relaxed when the skew tolerance is larger. Rangan et al . [75] address the problem of synchronization by proposing a feedback mechanism to inform the server of inter-media asynchrony at the client machine so the server can adjust the delivery rate. The temporal relationships are stored as relative timestamps and one stream performs as a master of the other streams (slaves) which are concurrently presented. Th i s causes the slave streams to pause presentation or skip presentation units to remain synchronized. Significant detail is provided regarding the server's interpretation of this feedback. The concept of master-slave streams can be used by client applications of the C M F S , but the lack of a feedback channel eliminates the server's direct involvement in this part of the synchronization. In Chiueh and K a t z [18], multiple resolution files are stored and retrieved separately so that desired bandwidth can be achieved. This introduces the need for synchronization of the components of a video stream, but the mechanism for providing the synchronization is not discussed in detail . It is acknowledged that the retrieval of the data for the same display period must be done in a parallel fashion for reassembly and decoding at the client. The Continuous M e d i a Player [78] uses a time-ordered queue to synchronize audio/video packets at the client and utilizes adaptive feedback to match the data delivery with the client station's capability. This method calculates penalty points for the clients and has the server adjust the frame rate according to accumulated 181 penalty points. Synchronization is a part icularly difficult problem in systems that delay stream acceptance in order to reduce bandwidth needs (such as [77, 61, 33]). These systems also attempt to l imit the startup latency. If multiple streams must be co-ordinated at a client, then small time bounds on latency are necessary so that the detailed synchronization can be achieved. For example, a 30 second start-up latency for a video stream may make it impract ical to retrieve a corresponding audio stream at the same time. E x t r a scheduling mechanisms are necessary to know when to re-quest that audio stream so that it arrives at the proper-time for synchronization with the video stream. A s mentioned previously, the C M F S does not utilize feedback for synchro-nization. Once the presentation has started, the client must deal wi th asynchrony. The real-time guarantees of delivery ensure that the data wi l l be sent in t ime. Once the client knows the latency of the network connection, it can adjust prepare times so that the presentation can begin at the appropriate time for synchronization. 6.3 U s e r I n t e r a c t i o n M o d e l s W i t h the underlying support for synchronization, the user interaction models can be further developed. A subset of V C R functions is typically provided, which allows the user to request continuous media streams. The simplest systems provide only playback and stop. A more sophisticated model that includes pause, fast-motion and slow-motion (in both forward and reverse directions) and random start positions is more useful to interacting wi th continuous media. Some system descriptions provide playback only [89, 86, 56, 14, 74, 70, 57]. Th is playback may be at a different rate than the recorded rate, so some amount of variable speed is considered, but not in any detail . In much of the work that focuses on disk layout, the abil i ty to fit a maximum number of requests on a certain number of disks assumes a certain speed of playback (i.e. full motion) . One user 182 cannot alter the playback rate in a scenario without affecting the data retrieval for all other users. In systems that ,provide a litt le flexibility in terms of playback rates, the number of frames per second desired/achievable is used to inform the server of the data rate required by the stream. Yavatkar and Lakshman [94] use a rate-adjustable priori ty scheduling mechanism to provide an average frame delivery rate to a client application. Thus, variable speed playback directly affects the data rate required from both the disk and the network. Li t t le and Venkatesh [55] provide fast forward and rewind temporal access control in the user interface, but do not describe how the server implements these functions. Dey-Sircar et al . [24] consider the implementation of fast motion by sending data at a higher rate if possible, or by adjusting the rates of all fast motion users in a synchronized fashion so that the bandwidth constraints of the server are not violated. One of the options considered is to deny fast motion service to a user until the bandwidth is available, but to continue normal motion data delivery in the meantime. This is also the viewpoint taken by Lau and L u i [50], whereby the user selects the amount of time for which fast motion is desired. The resumption of normal display is then considered as a new request. Fast motion options are given, but not elaborated upon. Provid ing fast motion display by skipping data segments is an increasingly common alternative. Thus, the average data rate required is not significantly altered by the fast motion request. Chen et a l . [17] provide a disk layout procedure to balance the load on the disks while retrieving and t ransmit t ing some percentage of the segments stored at the server, where segments are defined to be media-type specific amounts of continuous media data. The unique requirements associated with fast motion and its relation to batching and buffer sharing are addressed by K a m a t h et al . [43]. They propose skipping segments as well and consider the effect of skipping data on their shar-183 ing schemes. Ozden et al . [71] consider sending complete independent sequences ( M P E G I-frame sequences in their discussion), but require the server to be aware of when a new I-frame has been encountered in the data stream. They give a consid-erably detailed discussion on how to improve the performance during fast motion retrieval, including the effect on buffer, space, and the possibility of storing a fast motion version of the stream. Rangan and V i n [73] distinguish between a destructive pause and a non-destructive pause operation. The non-destructive pause stops delivery and reading of continuous media, but st i l l reserves buffers and bandwidth at the server for the anticipated resumption of playback. The C M F S implements a destructive pause because of the uncertainty regarding the .amount of time that the display may be paused. The C M F S allows fast motion by both increased presentation rate, which increases the bandwidth used at the server as well as skipping data segments (called sequences in this dissertation). The option of providing a non-destructive pause has not been considered, because the pause length would have to be very short to avoid causing buffering problems at the server. In particular, a non-destructive pause of indeterminate length cannot work in the model of the C M F S , because it changes the t iming of when buffers are made available for read-ahead at the server. The availability of these buffers is relied upon by other streams. It would also affect the server's send-ahead flow control mechanism adversely. In the steady state, this would simply reduce the level of read-ahead, but it may invalidate a previous admission decision if that buffer space was required for any existing stream. 6.4 Scalabil i ty Since performance of individual hardware components makes a centralized file server impract ical for meeting the needs of a large and diverse user population, methods of achieving scalability have been considered by many research projects. The major 184 prototype systems that have been developed are of a specific scale, and cannot be incrementally expanded. In particular, the server by Starlite Networks [86] is built up of a collection of disks in a personal computer environment and is capable of serving twenty 1.2 M b p s video users. None address the abili ty to combine server components into a server that scales arbitrarily. Commerc ia l video server products by manufacturers such as Oracle [51], S G I [67], and others have provided high-speed super-computer hardware technology, and parallel disk drives for the purpose of delivering high bandwidth video streams, but have not provided evidence that they have addressed the fundamental issues of server design from the point of view of variable bit-rate requirements of the streams. In the existing literature, several-simulation experiments have considered the issues of performance and admission guarantees in large systems ([50] uses 200 disks,. [75] uses 120 disks). These show the levels of bandwidth required to support a large number of users with a large selection of streams, but do not address the difficulties in building a system of that size. Some systems provide a design based on a server array. In Bernhardt and Beirsack [6], a single video is striped over a subset of the server nodes. They claim that a server should be capable of storing several thousand objects and attempt to deal wi th the load balancing issue by evenly str iping the data across a large percentage of the disks. It appears that load balancing operations may dominate the act ivi ty in such a system because the reorganization task is reasonably complex.. Another method of providing scalabili ty is to have tertiary archival storage that increases the content available in a system. Systems that incorporate this storage are known as video-jukeboxes. Systems such as these attempt to l imi t start-up latency by retrieving part of the stream directly to server buffers for transmission and the remainder to disk. Arch iva l storage is not an issue which is directly examined in this dissertation, but the model does incorporate the ability to perform migration of continuous media streams. Mig ra t ion could be initiated from an archive server 185 to keep the contents of server nodes up-to-date wi th recent request patterns. The work that deals with operating system enhancements has shown that file system facilities-are scalable (a goal of terabyte file capacity in [66]). The simulations in [24] indicate a large system (that, would require scaling of smaller systems), but offer no mechanisms by which these smaller systems can be combined. Anderson's C M F S [5] performs some simulation experiments that indicate that the l imits of scale they are wil l ing to consider is in the range of several dozen simultaneous users. When Cr immins studied Video Conference performance over Token R ing [19], the limits of the system are quickly reached by using all of the physical network medium and no concept for scaling beyond that is considered. L i t -tle and Venkatesh [55} mention the issue of scale, but their work does not specifically provide any solutions. • Linear scalabili ty-is achieved in Tiger Shark [40], Netshow [64], by simply • adding disks or processing nodes. The concept of scalable server nodes, called M S U ' s is also provided in Cal l iope [42] and by K u m a r et al [47]. The C M F S uses both of these methods to increase storage capacity and bandwidth. The number of disks on a server node is l imited by the network interface, but then server nodes can be added until the administrator database is saturated. Then servers can be confederated with a location service [46]. 6.5 Real-Time Scheduling/Guarantees B y definition, continuous media requires real-time constraints on the retrieval and delivery of the media data. The approaches to providing the real-time semantics vary from providing statistical guarantees that a certain percentage of data wi l l arrive correct and on-time, to hard real-time guarantees of some classes of data and soft real-time guarantees on others. Real-t ime scheduling methods are desired by many researchers but only im-plemented by some. Tierney et a l . [85] recognize the need for O S support for 186 deadline driven data delivery at higher rates of success than UNIX-based systems which cannot provide this facility. Their ini t ia l system does not provide such guaran-tees. The simulations and prototypes by Rangan and V i n [74, 75] provide statist ical real-time guarantees of delivery of data, as does the admission control algori thm described by V i n et a l . [87]. In such a system, it suffices to allocate the data loss equitably among the users within a given service round, as well as over longer peri-ods of t ime. If da ta loss is small enough, the clients wi l l be satisfied as this loss wi l l be indistinguishable from network loss and not appear as a chronic inabil i ty of the server to deliver data. Th is requires the server to be able to distinguish data which causes great disruption at the client (i.e. M P E G I-frame) from less important data in an effort to distribute the effective frame loss to the application: V i n et al . [87] claim a 200% increase in the number of admissible streams with this method over the conventional worst case assumptions. . Chiueh and K a t z [18], as well as Lau and L u i [50] and Tobagi et a l . [86] pro-vide systems that provide strict guarantees of delivery of continuous media data, but do so by delaying the servicing of a stream until the bandwidth can be guaranteed. Start-up latency can be significant. Average wait ing time is the performance metric used, but the order in which streams are accepted is unclear. If streams are treated in a Firs t -Come-Firs t -Served manner, then a large bandwidth and/or long stream may prevent several smaller streams from immediate admission. A bandwidth-based policy (perhaps minimum bandwidth first) could indefinitely starve those large bandwidth streams. Bo th of these policies distort the wait ing time measure as an accurate reflection of system performance. The resource server of Li t t le and Venkatesh [29] establishes real-time con-nections for streams "...to ensure they can support movie playout for the entire duration. . . ." and engages in Q O S negotiations with a client application. Th is i m -plies a hard real-time guarantee, but explicit consideration of graceful degradation of service is given, so it is unclear what type of guarantees are provided. 187 Real-time scheduling of disk and network act ivi ty is done in many systems [40, 45, 52, 66}. The most common method is Earl iest-Deadline-First , which is shown to be opt imal if the resource requirements can be met. The simulations in Reddy and Wyl l i e [76] compare the performance of hybrid techniques, including S C A N - E D F . The disk scheduling algorithm in the C M F S is very similar to S C A N - . E D F . The group of requests with the earliest deadline is sent in a group to the disk controller and the controller uses a method beyond the control of the application to most efficiently perform this group of requests asynchronously. Systems that provide support for both real-time and non real-time data traffic clearly distinguish between the types of guarantees they provide. The general kernel-level support system by Lougher/Shepherd [57] provides hard, real-time guarantees of delivery of data, as well as supporting non real-time access. The hard real-time scheduler ensures all continuous media requests meet their deadlines using a round-robin scheme. This task is simplified as the system assumes very similar da ta rates for streams and constant rates within a stream. The scheduler uses worst-case estimates of processor execution time and data delivery, and is therefore conservative in its uti l ization of the hardware resources. A soft real-time scheduler is used during the slack time to provide additional streams which are of reduced quality and does not provide the deterministic guarantees for these streams. In Cr immins [19], a statist ical guarantee is used to classify the transmission of a continuous media stream successful. A specific threshold of 98% packet delivery for continuous media is used in his experiments which are designed to model both synchronous traffic and asynchronous traffic. The C M F S utilizes a real-time scheduler which can be implemented on top of a real-time or non-real-time operating system. The environment for each server component is intended to be a dedicated machine for the sole purpose of running the C M F S . O n a non real-time OS , such as U N I X , there is no mechanism to support hard real-time deadlines, but performance tests by Mechler [62] indicate that missed 188 deadlines rarely occur if only one user application is active. If other programs are run on the server simultaneously, the real-time guarantees cannot be enforced. These real-time guarantees are primari ly for the software tasks that run on the processor. 6.6 Encoding Format M u c h existing work on continuous media servers has considered particular details of the encoding format for disk layout and transmission mechanisms. The early work focussed on video-on-demand playback and providing methods that could increase the overall bandwidth of the server ([70, 72, 18]). V i n et al [87] utilize knowledge of the encoding format in dealing with transient network overload. Servers have been designed that take specific characteristics of M P E G video streams into consideration in both data transmission and storage layout policies. The majori ty of the related literature does not consider any aspect of the encoding method. Since these papers analyze performance v ia simulation, the de-tails of the encoding method are not significant except as they distinguish between C B R and V B R streams. The data format details are ignored by the C M F S as well. It is possible wi th in the model of the C M F S to distinguish between more impor-tant (or essential) presentation units and less important presentation units as an enhancement to the data delivery process during periods of time where the network bandwidth is not being fully used. For the systems that do describe specific syntax, the main focus has been on M P E G video encoding [17, 18, 48]. Al though the authors explicit ly state that their methods may be extended to other encoding formats, it is not made clear that any instance of a system would be capable of efficiently supporting more than one format simultaneously. Chen et a l . [17] and Chiueh and K a t z [18] specify that their techniques can only be used on M P E G - c o n f o r m i n g models of encoding. M P E G is a good instance of an encoding method to use as an example, since it incorporates both intra-frame and inter-frame dependencies. Unfortunately, no consideration is given 189 to the combination of encoding formats on the same server, or on the same storage devices. If a system uses the unique characteristics of the encoding in opt imizing disk layout (as in [18]), this would conflict wi th other encoding formats. . Several of the simulations have restricted their focus to only one media syn-tax. M o s t typically, this has been M P E G - 1 , with increasing emphasis on M P E G - 2 . The spatial resolution and constant bit-rate of M P E G - 1 makes it straightforward to study, but not useful for systems that can grow with developments in compres-sion technology or be commercially viable. Al though hardware extensions exist to provide hardware M P E G decoding, present and future generation continuous media systems wil l need to deal with more efficient and higher quality compression schemes, most likely in an incremental fashion. Thus, the abili ty to support multiple formats, as in the C M F S , is essential. 6.7 Data Layout Issues The detailed allocation of continuous media fragments to specific locations on disk storage devices occupies a great proportion of the attention of previous system designers. The pr imary motivation is to increase the bandwidth in general, or to reduce potent ia l interference that occurs when multiple requests are serviced from the same disk. A s well, s tr iping tends to focus on the details of a particular encoding algori thm, making it unsuitable for a heterogeneous system like the C M F S . O n the other hand, some systems do not provide any details of disk block layout. The server of Anderson et al . [5], makes use of the raw disk interface of the operating system (in this particular case, U N I X ) without consideration for s tr iping. No detail is given on a mapping between objects and their locations on disk. A general file server for continuous media must abstract the object to disk block layout policy to the level of the object itself, due to the varying sizes of presen-tation units and the need to map them to specific disk block locations for efficient retrieval. Addi t ional ly , if significant computat ional energy is used to determine the 190 opt imal locations of the various components of an object, this may compete with system resources available for playback. For an environment which encourages read-ing and wri t ing concurrently, the opt imal allocation of existing streams may prevent new streams from having the same kind of opt imal layout without reorganization of the entire disk. Complete on-line reorganization is total ly unacceptable from a performance point of view. Taking the system off-line for reorganization is equally undesirable. In all these methods, the existence of multiple, independent concurrent re-quests affects the usefulness of the stream-specific disk layouts for individual files. A s well, if the effective bandwidth is increased due to the fact that careful placement and subsequent careful retrieval patterns reduces seek act ivi ty for an expected re-trieval pattern, then a user request pattern that has regular use of varying speed/skip modes wi l l negate this enhancement. The effort involved in the placement and strip-ing does not bring a performance benefit, and thus is not worth the complexity. In Anderson [5], contiguous disk blocks were allocated to a stream to reduce seek times. Th is provided the abili ty to achieve higher bandwidth when relatively few streams are active, but is not a necessary condition for the C M F S , as this level of contiguity is not assumed. A method of positioning data on the disk known as mult imedia "strands" and "ropes" was developed by V i n and Rangan [89]. The goal was to ensure the relative spacing between successive segments of a stream and the careful selection of appropriate data to place between those segments to guarantee continuous playback. This is only applicable to a specific set of streams, each with a constant bit-rate, retrieved in a specific order. The careful allocation does not perform well wi th arbitrary skipping of segments and the retrieval of data streams which are out of phase with each other. In particular, the retrieval patterns associated wi th retrieving only one or two of the accepted video streams in reverse or slow-motion would be very resource intensive, greatly reducing the bandwidth achieved due to extra seek 191 activity. The C M F S handles this transparently as it makes no assumptions regarding stream request patterns. • Tierney et al . [84] propose a scheme which clusters data in 2-dimensional "tiles" for each particular image. This is a form of str iping which can greatly increase the bandwidth for retrieving a single stream. The general concept of R A I D striping is used by Oyang et a l . [70] to design a system from a hierarchical-disk point of view. Chen , Kand lu r , and Y u [17] propose str iping on a segment level for the purposes of load balancing during variable speed retrieval of streams. In the case of fast-motion, entire segments of a stream are skipped to provide the fast motion. If the segments retrieved are not evenly spaced across disk devices, this could alter the relative uti l ization of the disks in a significant manner, causing hot spots on some disks. Th is pattern of use would not allow the system to support addit ional users with the bandwidth that is freed up by skipping data on some of the disks. The actual effect on the number of users would be quite complicated and could provide a further area of research: The preliminary study by Sincoskie [81] looks at str iping data across devices to achieve the ability to retrieve the same stream in parallel for users who are offset in time. A similar approach is taken by Ozden et al . [71], where the buffer space and disk bandwidth necessary to retrieve data for different phases of the same movie are analyzed. • ' •. • L a u and L u i [50] allocate files into equally-sized fragments and place the fragments on disks in a round-robin manner. Th is distributes the load across the disks, but since V B R compression results in differing frame sizes, the number of fragments required per second can vary as well. Th is system does not attempt to split frames across fragment boundaries. The method of disk block allocation is reasonably generic, as it does not depend on the encoding format. It is unclear what performance benefit is realized by the str iping when a group of disks is scheduled to service a number of requests for different media objects. 192 A detailed examination of storage allocation for multi-resolution video is provided by Chiueh and K a t z [9}. A compression method which utilizes Laplacian pyramids intelligently divides the data in two ways: reference data which is needed to display the video stream at the lowest level of resolution and motion data, which has separate components for each resolution level. Thus, one reference file and n—1 motion files are created for a video wi th n levels of resolution, and they are located on disk so that a reference file and the motion files do not occupy the same portion of the disk array. A method which spreads the reference files evenly across the disks could increase, the number of low-resolution viewers that could be supported in this kind of a system. It could be adapted into the C M F S model, but then the block schedule for an object would need entries for every disk on which the data for the object was stored. The details of block allocation are not considered in the C M F S . If s tr iping can increase the guaranteed minimum number of blocks which can be read, then it could be an effective lower level opt imizat ion. The admission mechanism does not use this information. Whenever possible, presentation objects are stored contiguously on an individual disk so that seeking is not necessary when reading only one stream. This resulted in a great performance benefit in the execution of the server wi th large bandwidth video streams. The massively parallel str iping methods of the Tiger system [9] can achieve enormous bandwidth for a particular stream by str iping across all disks in the sys-tem. This is only effective if an appropriate stagger between arrivals occurs. 6.8 Disk Admission Algorithms The admission control question for both C B R and V B R streams has been extensively examined, from both the network and the disk point of view. K a m a t h et al . [43]-perform disk admission control by examining the second-by-second variable bit-rate bandwidth needs of a set of streams, but they do not take advantage of slack t ime 193 read-ahead at the server to smooth out data rate peaks. Th i s is1 the same as the Instantaneous Maximum algori thm from Chapter 4. A variant of this algorithm is also described in Chang and Zakhor [15]. V i n et al . [87] grant admission based on average bit-rate and deal wi th peaks in resource usage by equitably dis t r ibut ing the loss among the active streams. A model for C B R streams based on data-placement details and disk retrieval deadlines ( Q P M S ) is given in V i n and Rangan [88]. The various algorithms for disk admission considered in Chapter 4 have been simulated or implemented in several systems. The early server implementations are capable of dealing adequately only with admission mechanisms for constant bit-rate continuous media streams. In particular, a number of systems/simulations assume there is a specific rate of consumption for each stream ([18, 31, 73, 70, 76, 86, 94]. V i n and Rangan [89] and Gemmel l [31] consider systems in which there may be differences between streams, but not within a stream. This assumption drives the remainder of the design of their systems, making them unsuitable for efficient use wi th variable bit-rate streams unless the peak rate is used for resource reservation. One of the first explicit considerations of variabili ty was made by Dey-Sircar et a l . [24] with planned bandwidth allocation when supporting multiple streams, but they do not provide a mechanism for allocating the bandwidth. L a u and L u i [50] also consider variable bit-rate retrieval to provide data for the client application. Their algorithms utilize a client-provided time bound, on start-up latency (i.e. deadline) to determine if a stream can be scheduled before the requested time, given a l imited set of resources. The admission test considers the peak rate needed to determine if a stream is admissible and delays the start ing time and readjusts the disk tasks to minimize other measures of resource usage. Th is approach explicit ly considers the anticipated length of time required for disk-reading tasks, which may vary over the lifetime of a stream. Performance analysis v ia s imulat ion, based on statistical models of arrival rates, was done wi th the conservative, deterministic a lgori thm. The analysis of Chapter 4 shows that for streams with a reasonable difference between 194 their average and their peak rates, this method provides an unacceptably low level of ut i l izat ion. The admission control and data delivery mechanisms in Tobagi et al . [86] utilize the mechanical characteristics of particular disk drives to derive their model. Th is is then used to predict the number of C B R users that can be guaranteed to be supported, given a bound on start-up latency. D a n et al . [20] follow a similar model of guaranteeing block delivery while batching requests for the same stream, thereby delaying the acceptance of streams until suitable points in time, known as batching intervals. No explicit consideration of V B R in the streams is considered. Recent work by Dengler et a l . [23] and Biersack and Thiesse [8, 7] builds on the work of Kn igh t ly et al.[45] and Chang and Zakhor [16], describing admission con-trol methods which provide statistical and deterministic guarantees of disk service for V B R streams. The major focus is da ta placement strategies and the use of traffic constraint functions is prominent. Constant T ime Length ( C T L ) placement with deterministic guarantees is investigated in [23], while statistical admission control and Constant D a t a Length is examined in [7]. In V i n et al . [87], a statistical admission control algorithm is presented, which considers not only average bit rates but the distributions of frame sizes and probabili ty distributions of the number of disk blocks needed during any particular service round. They acknowledge that the algorithm fails (i.e. over-subscribes the disk) in certain circumstances referred to as overflow rounds. In overflow rounds, the system has the complex task of dealing with the inabili ty to read enough data. A greedy disk algorithm attempts to reduce the actual occurrence of overflow rounds and the system attempts to judiciously distribute the effective frame loss among the subscribed clients. Th is requires some knowledge of the syntax of the data stream, at least to the point of knowing where display unit (i.e. video frame) boundaries exist and which presentation units are more important than others (i.e. M P E G I-frames vs. M P E G B-frames). Thei r system is able to give priority to these more 195 important presentation units. Chang and Zakhor are also among those who have more directly experi-mented with more complicated versions of algorithms based on average bit-rates and distr ibution of frame sizes. A more complex version of the Average algori thm is given in [13] for Constant T i m e Length ( C T L ) video data retrieval. They also in-vestigate Constant D a t a Length retrieval methods which introduce buffering for the purposes of prefetching portions of the stream and incorporate a start-up latency period. In further work [16], they show v ia simulation that a variation of determin-istic admission control admits 20% more users than their statistical method for a small probabili ty of overload. • In the C M F S , the placement of data on the disk has no effect on the admission control a lgori thm. It is not always possible to allocate blocks to the streams in such a way to get higher bandwidth from individual disks. For example, when large bandwidth video objects are stored on a disk, the-estimate of performance can be increased if a certain access pattern is assumed which has fewer seeks. Unfortunately, the abili ty to deliver streams-at varying values of speed and skip makes these kinds of assumptions not valid in every case. 6.9 Network Admission Control Algorithms The problem of allocation network resources for Variable B i t Rate audio/video trans-mission has been studied extensively. Zhang and .Knight ly [95] provide a brief tax-onomy of the approaches from conservative peak-rate allocation to probabilistic allocation using V B R channels of networks such as A T M . The levels of network uti l ization with V B R channels are st i l l below what many system designers would consider acceptable. Thus , the use of C B R channels with smoothed network trans-mission schedules has also been examined. This affects the nature of the admission control algorithms which are used in these systems [60, 95]. Also , since the resource requirements vary over time, renegotiation of the bandwidth [37] is needed in most 196 cases to police the network behaviour. Knigh t ly et al . [45] perform a comparison of different admission control tests in order to determine trade-offs between deterministic and statistical guarantees. The streams used in their tests are parameterized by a traffic constraint function, known as the "empirical envelope". It describes the bandwidth needed at various points during stream transmission, so it is somewhat similar in form and function to the block schedule as presented in this dissertation, although much less detailed. This characterization is used in a system-wide admission control . Th is is then com-bined with different packet transfer schemes and the results do not particularly isolate each subsystem. The results are applied pr imari ly to the network transmis-sion subsystem. The C M F S applies the admission, control to the disk and network in series, so that admission performance bottlenecks can be isolated. The empirical envelope is the tightest upper bound on the network uti l iza-tion for V B R streams, as proven in Knigh t ly et a l . [45], but it is computat ionally expensive. Th is characterization has inspired other approximations [91, 35] which are less accurate and less expensive to compute, but which st i l l provide useful pre-dictions of network traffic. In particular, Wrege and Liebeherr [91] utilize a prefix of the video trace as an aid in characterizing the traffic pattern. When combined wi th statistical mult iplexing in the network, high levels of network uti l ization can be achieved [45]. Zhang et a l . [96] have worked on network call admission control methods with smoothed video data, which can take advantage of client buffering capabilities. Th i s reduces the amount of buffering needed at the server and increases potential network ut i l izat ion. Four different methods of smoothing bandwidth which can be used for con-structing bandwidth schedules for admission control are compared by Feng and Rexford [29]. The four main methods are: cri t ical bandwidth allocation, mini-mum changes bandwidth allocation, minimum variabil i ty bandwidth allocation and 197 piecewise constant rate transmission and. transport ( P C R T T ) . The cri t ical band-width allocation is simpler than the next two because it attempts to keep the same rate as long as possible. Th is may cause more changes later, on in the stream but the process is more efficient. The other algorithms they use attempt to minimize the number of bandwidth changes and the variabili ty in network bandwidth. The simplest computat ional technique is P C R T T , which is very similar to the Original network bandwidth characterization scheme. They do not integrate this wi th partic-ular admissions strategies other than peak-rate allocation. The smoothing methods and performance implications are discussed in more detail in Feng [28]. Bandwid th renegotiation over C B R channels and smoothing are used by K a m i y a m a and L i [44] in a Video-On-Demand system. M c M a n u s and Ross [60] analyze a system of delivery that-pre-fetches enough of the data stream to allow end-to-end constant bit rate transmission of the remainder without starvation or overflow at the client, but at the expense of substantial latency in start-up. Sen et al . indicate [80] that min imum buffer space can be realized with a latency of between 30 seconds and 1 minute. The model used in the C M F S has a f ixed bound on start-up latency of a very small number of seconds (with 500 msec slots, this bound is less than 2 seconds), but requires that the data is sent at varying bit-rates to avoid starvation or buffer overflow efficiently at the client. 6.10 Network Transmission The C M F S and nearly all of the research efforts in continuous media servers have taken place in the context of high-speed networks where bandwidth requirements can be specified to the network and the network can support some type of reser-vation/guarantee of the resource. The first network environment where this was investigated was A T M networks with V B R channels. A s mentioned in the previous section, recent effort has been concerned with C B R connections. One reason for this is that the burstiness of a video stream can be predicted, allowing C B R channels to 198 be used effectively. Another reason is that specifying the V B R connection parame-ters for the minimum acceptable requirements results in consistent data loss during overload times. Th is loss can be prevented by using C B R channels. In either V B R or C B R channels, smoothing can be used to handle the bursti-ness problems associated with transmission of V B R data. M a n y smoothing tech-niques have been provided which absolve the network of the responsibility of dealing with congestion caused by variable bit-rate streams. L a m et al . [48] smooth the data rate required, explicit ly examining the data v ia looking ahead in the stream. Sta-tistical properties of live video can be used to predict the bit-rates of the short term future of the stream to enable smoothing on live video as well. Encoding-specific techniques are required at the server to act as filters to enable smooth network trans-mission rates. A s well, the problems associated with variable bit-rate disk access are not addressed. . Yavatkar and M a n o j [94] require variations in the amount of data sent across the network connection to increase the Qual i ty of Service provided to client ap-plications. The variation in bit-rate of the encoding algorithm is not considered, but selective transmission, rate-based flow control, and selective feedback are used in the simulation of the Quasi-Reliable Mul t icas t Transmission Pro tocol ( Q M T P ) . Th is has a variation in the use of the network according to the availability of band-width , but not in a planned way that corresponds to the data rate of the stream itself. In the Heidelberg Transport Protocol (He iTP) [22], congestion at a network interface due to variable bit-rate traffic is handled by reducing the amount of data t ransmit ted/read off disk in a dynamic manner, which in turn leads to degradation in service to the client. C r i m m i n s [19] uses a variable frame size in his simulation study but has a l imited scope of analysis, considering only the effect on the data transmission over a token ring network. Network protocol considerations have a considerable effect on performance. 199 Previous work has been done using T C P / I P ([5] and [85, 84]). Th i s has the advan-tage of being a standard, well-understood protocol that has reliable transmission characteristics. In an environment that is more concerned with timeliness than ac-curacy, a protocol that only performs retransmissions on control requests would be very desirable. It has been pointed out that protocols which require retransmission of video data for reliability are not useful in continuous media environments [49]. Thus the exploration of other protocols is necessary. In particular, the idea of se-lective retransmission with extra bandwidth [82] can increase reliabili ty of the data stream. This is useful as long as there is not latency introduced. M o s t systems use some variation of unreliable data transfer in which missing data can be identified. 6.11 Summary In summary, the related work has been very extensive in the design and simulation analysis of continuous media servers. Constra ining the problem has enabled sig-nificant results to be achieved. The bandwidth available from magnetic disks has reached the point where multiple video streams can be supported form a single disk. Techniques for careful block layout tend to restrict the flexibility and heterogeneity of the system unnecessarily when varying request patterns (of speed and skip) are used. Scalabili ty of a system has been explored and several designs accommodate the abili ty of a system to grow. None have obtained actual performance results from a large scale server. The early implementations were not even capable of such expansion. The complete models of servers are quite similar in general design to the C M F S described in this dissertation but do not describe admission control in suf-ficient detail . The discussions of admission control algorithms range from specific dependence on detailed characteristics of the disk device to abstractions that only considered peak rate allocations. The C M F S design has used a complete model and considered the variabili ty of the data stream itself. Th i s is the most impor-200 tant aspect of the design,, since delivering variable bit-rate media streams to client applications is the aim of the server. The systems and their associated issues are given in Table 6.1. M u c h other work is not included in this matr ix because it deals with only a single issue. Thus, most of this work has a fairly complete system model to back up the subset of issues that are discussed. There has been very litt le discussion of synchronization of multiple streams in the more comprehensive systems. Some of the issues discussed in the existing literature have not been specifically considered in this dissertation, but have been incorporated into the model. For example, fault tolerance could be built in, and replication issues are discussed in Kraemer [46]. 201 a h E H » OS o E S - £ E "3 202 Chapter 7 Conclusions and Future Work 7.1 Conclusions In this dissertation, it has been demonstrated that a Continuous M e d i a F i le Server ( C M F S ) for Variable Bit-Rate Data, based on the principles of scalability, hetero-geneity, and an abstract model of storage resources and performance, can be imple-mented to provide a human user wi th flexible access primitives for synchronization, while making near-optimal use of the disk and network resources available. The most significant aspect of the server is that it incorporates the variable bit-rate nature of compressed continuous, media (particularly video) into the entire system design, from the user interface to the resource reservation mechanisms at the server and the data delivery method. Th i s server has been implemented and tested on a variety of hardware plat-forms. The performance testing has utilized a wide range of video streams that are typical of a news-on-demand environment. The range of video frame rates provides near full-motion to full-motion (from 20 to 30 fps) at high resolution (640 X 480 pixels). M u l t i p l e encoding formats were used, but the pr imary encoding format was M o t i o n J P E G , due to l imitat ions in the encoding hardware available. The playback duration ranged from 40 seconds to 10 minutes per cl ip, which is in the appropriate range for news stories, sports highlights, or music videos. 203 Three major contributions have been identified: 1) a complete system model and a working, efficient server implementation; 2) the disk admissions algorithm that incorporates variable bit-rates and send ahead into the admission decision; 3) the network smoothing characterization and admission algori thm. Bo th of these admission algorithms have been integrated into the server and analyzed with respect to efficiency and the amount of the resource that can be allocated to V B R streams while sti l l providing deterministic data delivery guarantees. The system model is more comprehensive than most existing models in that it can accommodate more specialization in terms of heterogeneity, scalability, relia-bility and flexibility, while not focusing on any one of these aspects to the detriment of the others. Whi le replication and migration are extensions to the research of this dissertation which have already been implemented, the mechanisms to introduce these facilities were present in the existing model. The modular design permits sys-tems of varying scale to be implemented. A s well, the design of the user interface to the server facilities enables simple client applications or complicated continuous me-dia presentations to be developed independently of most of the server details. These applications may even make use of objects which are located on different servers. The resource allocation/reservation schemes were integrated into a complete server that allows client applications the flexibility of storing individual mono-media streams and retrieving them in almost arbitrary manners for synchronized presenta-tion to a human user. The abstract disk model permits streams of widely differing V B R profiles and different encoding formats to be stored on the same server node without adverse effect on the guaranteed performance of the server. The most significant contribution is the development of a disk admission control algorithm that explicit ly utilizes a detailed bit-rate profile for each stream, simulating the use of server buffer space for the reading ahead of data. Th is algo-r i thm is named vbrSim, and it emulates the variable bit-rate retrieval for the stream during the admission process, using a worst-case guarantee of disk bandwidth. 204 The vbrSim algorithm significantly outperforms, the deterministic algorithms presented in Chapter 4 in terms of admission performance and is linear in execu-tion time. The performance tests, evaluated the algorithms for scenarios of large bandwidth video streams of differing variabil i ty and arrival patterns, ranging from simultaneous to a stagger of 20 seconds between requests. For most of the scenarios with staggered arrivals, the stream requests were ordered from longest to shortest to ensure that all streams were reading contiguously for some portion of the scenario. In terms of admission performance, the Simple Maximum algori thm accepted very few.simultaneous stream requests and was incapable of accepting any scenarios requesting more than 50% of the disk capacity. Even smaller scenarios were regularly rejected. The Instantaneous Maximum algori thm accepts scenarios which are ap-proximately 20% larger than Simple Maximum,- but the vbrSim algori thm regularly accepts scenarios that are at least another 20% larger. The admission performance of the first two algorithms was unaltered by introducing inter-arrival stagger to the request pattern. The algorithms could not accept scenarios which were larger, and the relative percentage of the disk decreased, due to the higher achieved bandwidth from the disk. The admission performance of the vbrSim algori thm improved wi th stag-gered requests, due to the incorporation of achieved read-ahead and guaranteed future read-ahead. W i t h reasonably small values of stagger, scenarios that request a cumulative bandwidth which is greater than minRead and close to the actual band-width achieved by the disk system can be accepted. The largest scenarios accepted by the vbrSim a lgori thm with a stagger of 10 seconds sustained more than 100% of the long-term disk bandwidth achieved for the execution of that scenario. Assuming the guaranteed level of disk performance in the admission algo-r i thm enabled a large degree of smoothing of the peaks of the variable bit-rate disk schedule. The observations from the performance tests showed that the vbrSim al-gori thm is sensitive to the variabili ty of the bit rate within the stream. Even the 205 high variability streams achieve high uti l ization of resources while providing delivery guarantees. Constant bit-rate streams would enable the system to achieve perfect util ization of the network bandwidth. A s well, the most complete usage of the disk occurs with a disk that achieves a constant bandwidth. W i t h simultaneous arrivals, the vbrSim algori thm accepted the largest percentage of the disk resource when C B R streams were stored on the server. W i t h varia.ble bit-rate streams, the existence of bandwidth peaks did cause many scenarios to be rejected. W i t h staggered arrivals, 1 there was not much difference in the acceptance rate between the different types of streams. . • • • Buffer space requirements for the vbrSim algori thm were found to be large, but not excessive. The scenarios which requested more than minRead blocks per slot required significant buffer space to guarantee delivery from the server for every stream. Mos t scenarios that were acceptable in terms of bandwidth required less than 200 M B y t e s of server buffer space on the disk. The required space appeared to grow linearly with the cumulative bandwidth of the scenario for requests above minRead. High-variabil i ty streams require more buffer space than low-variabili ty streams while the cumulative request is below minRead; if the request is above minRead, there is no significant difference in buffer space requirements between the two types of streams. A d d i n g more client buffer space did not affect the number of high-bandwidth video streams that could be accepted. A s well, increasing the amount of time be-tween request arrivals allowed more streams to be accepted only when those streams were of short playback durat ion. This is because a significant percentage of each stream could be held in the server buffer space. The final major contribution is the development and integration of a network admission control algori thm and a network bandwidth smoothing technique. This enables the network subsystem to transmit data on each real-time data connection using renegotiated constant bit-rates for reasonably long periods of t ime. 206 Whi le the vbrSim algori thm was shown to be technically feasible for the net-work bandwidth as well as the disk bandwidth, it was not used because it depended too much on the disk system having significant read-ahead and very large client buffer space in order to achieve any substantial guaranteed send-ahead. A slightly modified Instantaneous. Maximum algori thm with the Smoothed network bandwidth characterization can accept scenarios with over 90% of the network interface l imit requested. Th is is 10-15% better than using the Original network characterization method and far superior to the Peak characterization. W i t h respect to different stream types, the Smoothed a lgori thm showed more performance improvements for the high-variability streams, because there are more peaks to smooth. M o s t of the performance testing was conducted using 20-second network slots for renegotiation and admission control purposes. M o r e detailed study showed that, for the particular type of streams, using a 10-second network slot resulted in better admission performance. Some of the results are not convincing, due to the small number of data points upon which to base a conclusion. 7.2 Future Work There has been a great deal of work exploring the issues involved in storing and retrieving continuous media. Hardware l imitations often restricted the more sig-nificant theoretical work to the context of simulation studies. Recent economics have made implementations more practical and thus provided a more tangible en-vironment for evaluation. The implementation in this dissertation is one of the first V B R servers that has been examined in detail . Th is has opened up a number of possibilities for future work. 7.2.1 Long Streams The results in Chapter 4 showed that the benefits of read-ahead and staggered arrivals were more significant for shorter streams. One of the reasons behind this 207 observation is that streams utilize an increasing amount of buffer space as they increase in total size. When staggered arrivals permitted a significant percentage of a stream's data to be read ahead into server buffer space or sent ahead to be stored in client buffer space, the util ization of the disk was able to be increased. M o r e work on longer streams is necessary to find the point at which the sophisticated admission algorithms do not improve performance and server buffer space becomes the primary l imi ta t ion . 7.2.2 Disk and Network Configurations A s an extreme point on the server design continuum, a C M F S could be implemented with one video clip per disk, and one disk per server node. Thus, the number of simultaneous users per server node would be a direct function of the popularity of the video c l ip : Such a configuration would be very expensive and lead to poor uti l ization, due to the phenomenon of ' locality of reference. There has been work done on storage allocation of movies, based on popu-larity, and this work could be adapted to the context of the C M F S to determine efficient configurations of servers. In particular, this extension could help determine the opt imal number of objects to be stored on a particular disk, levels of replication within the disks on a server node, and the number of disks that should be attached to a server node as a function of its network interface capability. 7.2.3 Relaxing the value of minRead Addi t iona l empirical studies with more aggressive values of minRead would lead to further indications on precisely'how conservative the vbrSim algori thm is for typical sets of streams. If minRead can be set to the actual average of the recent past, what is the potential that a bad decision could be made? 208 7.2.4 Variants of the Average Algori thm In the performance tests, the Average algori thm used minRead as its estimate of disk performance. N o stream that caused the average request bandwidth to exceed minRead was accepted. This process provided admission decisions that were too conservative in some cases and too aggressive in others. This was because it did not factor the shape of the bandwidth requirements into the acceptance decision. One option would be to use the observed average as the estimate of disk performance. Obviously, this would tend to eliminate the possibility of being too conservative but increase the potential for making aggressive admission decisions. A more careful study of the sensitivity to these parameters may give additional insight into the overall benefit of the vbrSim a lgori thm. 7.2.5 Reordering Requests One of the performance benefits of the system is that contiguous reading off the disk increases the achieved bandwidth. This often permits addit ional streams to be accepted. W i t h large amounts of server buffer space, many streams have dozens of disk slots of buffer space at the server. Once the server is in steady state, buffers are released at a slower rate than the disk can read. When the disk system is approaching steady state, it is likely ahead on most of the existing streams. Thus, it could perform even better by readjusting the deadlines of the data for some streams, so as to perform more contiguous reading. This would enable steady state to be reached more quickly, and may have the benefit of enabling more streams to be accepted. 209 Bibl iography [1] I S O / I E C J T C 1 C D 10918. M J P E G . Dig i t a l compression and coding of continuous-tone st i l l images. Technical report, ISO, 1993. [2] Chr i s Ad ie . A Survey of Distr ibuted M u l t i m e d i a Research. Technical Report R A R E Project O B R (92) 046v2, Reseaux Assizes pour la Recherche European, January 1993. [3] Chr i s A d i e . Network Access to M u l t i m e d i a Information. Technical Report R A R E Project O B R (93) 015, Reseaux Assizes pour la Recherche European, August 1993. [4] D . P . Anderson and G . Homsy. A Continuous M e d i a I / O Server and Its Syn-chronization Mechanism. IEEE Computer, 24(10):51—57, October 1991. [5] D . P. Anderson, Y . Osawa, and R . Gov indan . A Fi le System for Continuous M e d i a . ACM Transactions on Computer Systems, 10(4):311-337, November 1992. [6] C . Bernhardt and E . Biersack. The Server A r r a y : A Scalable Video Server Archi tecture . In O . Spaniol , W . Effelsberg, A . Danthine, and D . Ferrar i , editors, High Speed Networking for Multimedia Applications, chapter 5, pages 103-125. Kluwer P u b l . , M a r c h 1996. [7] E . W . Biersack and F . Thiesse. Statist ical Admiss ion Cont ro l in Video Servers with Constant D a t a Length Retrieval of V B R Streams. In Third International Conference on Multimedia Modeling, Toulouse, France, November 1996. [8] E . W . Biersack and F . Thiesse. Statist ical Admiss ion Cont ro l in Video Servers with Variable B i t Rate Streams and Constant T i m e Length Retrieval . In Eu-romicro '96, Prague, Czech Republic, September 1996. [9] W . J . Bolosky, J . S. Barrera III, R . P. Draves, R . P. Fi tzgerald , G . A . Gibson , M . B . Jones, S. P. L e v i , N . P. M y h r v o l d , and R . F . Rashid . The Tiger Video 210 Fileserver. In 6th International Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 29-35, Zushi , Japan, A p r i l 1996. [10] W i l l i a m J . Bolosky, Robert . P. Fi tzgerald , and John R . Douceur. Distr ibuted Schedule Management in the Tiger Video Fileserver. In SOSP 16, pages 212-223, St. M a l o , France, October 1997. [11] Dick O A . Bul terman and Rovert van Liere. M u l t i m e d i a Synchronization and U N I X . In 2nd International Workshop on Network and Operating Systems Support for Digital Audio and Video, Heidelberg, Germany, November 1991. [12] W . C . Chan and E . Geraniotis . Near -Opt imal Bandwid th Al loca t ion for M u l t i -M e d i a V i r t u a l C i rcu i t Switched Networks. In INFOCOMM, pages 749-757, San Francisco, C A , October 1996. [13] E . Chang and A . Zakhor. Admissions Cont ro l and D a t a Placement for V B R Video Servers. In 1st IEEE International Conference on Image Processing, pages 278-282, Aus t i n , T X , November 1994. [14] E . Chang and A . Zakhor . Disk-based Storage for Scalable Video . In Unknown, web page down, pages 278-282, Aus t i n , T X , November 1994. [15] E . Chang and A . Zakhor. Variable B i t Rate M P E G Video Storage on Parallel Disk Arrays . In 1st International Workshop on Community Networking Inte-grated Multimedia Services to the Home, pages 127-137, San Francisco, C A , July 1994. [16] E . Chang and A . Zakhor . Cost Analyses for V B R Video Servers. In IST/SP1E Multimedia Computing and Networking, pages 381-397, San Jose, January 1996. [17] M . Chen, D . D . Kand lu r , and P. S. Y u . Support for F u l l y Interactive Playout in a Disk-Array-Based Video Server. In ACM Multimedia, pages 391-398, San Francisco, C A , October 1994. [18] T . Chiueh and R . H . K a t z . Mul t i -Resolu t ion Video Representation for Paral lel Disk Arrays . In ACM Multimedia, Anahe im, C A , June 1993. [19] S. Cr immons . Analysis of Video Conferencing on a Token R i n g Local A r e a Network. In ACM Multimedia, Anahe im, C A , June 1993. [20] A . D a n , D . Si taram, and P. Shahabuddin. Scheduling Policies for an O n -Demand Video Server wi th Batching . In ACM Multimedia, pages 15-23, San Francisco, C A , October 1994. 211 [21] S. E . Deering. Multicast Routing in a Datagram Internetwork. P h D thesis, Stanford University, December 1991. [22] L . Delgrossi, C . Halstr ick, D . Hehmann, R . G . Herr twich, O . Krone , J . San-voss, and C . Vogt . M e d i a Scaling for Audiovisual Communica t ion with the Heidelberg Transport System. In ACM Multimedia, Anahe im, C A , June 1993. [23] J . Dengler, C . Bernhardt , and E . W..Biersack. Determinist ic Admission Cont ro l Strategies in Video Servers with Var iable B i t Rate Streams. In Interactive Distributed Multimedia Systems and Services, European Workshop IDMS'96, Heidelberg, Germany, M a r c h 1996. [24] J . K . Dey-Sircar , J .Salehi , J . Kurose, and D . Towsley. Prov id ing V C R Capa-bilities in Large-Scale Video Servers. In A CM Multimedia, pages 25-32, San Francisco, C A , October 1994. [25] E . Dubois , N . Baaziz , and M . M a t t a . Impact of Scan Conversion Methods on the Performance of Scalable Video Coding . In IST/SPIE Proceedings, San Jose, C A , February 1995. [26] S. El -Henaoui , R . Coelho, and S. Tohme. A Bandwid th Al loca t ion Protocol for M P E G V B R Traffic in A T M Networks. In IEEE INFOCOMM, pages 1100-1107, San Francisco, C A , October 1996. [27] Anwas E lwa l id , , Danie l Heyman, T . V . Lakshman, Debasis M i t r a , and A l l a n Weiss. Fundamenta l Results on the Performance of A T M Mult iplexers wi th Appl icat ions to Video Teleconferencing. In ACM SIGMETRICS '95, pages 86-97. A C M , M a y 1995. [28] W u C h i Feng. Video-On-Demand Services: Efficient Transportation and De-compression of Variable-Bit-Rate Video. P h D thesis, Universi ty of Michigan , 1997. [29] W u C h i Feng and Jennifer Rexford. A Comparison of Bandwid th Smoothing Techniques for the Transmission of Prerecorded Compressed Video . In IEEE INFOCOMM, pages 58-66, Los Angeles, C A , June 1997. [30] D . Finkelstein, R . Mechler, G . Neufeld, D . Makaroff, and N . Hutchinson. Real -Time Threads Interface. Technical Report 95-07, Universi ty of Br i t i sh Columbia , Vancouver, B . O , M a r c h 1995. [31] D . J . Gemmel l . M u l t i m e d i a Network Fi le Servers: Mul t i -channel Delay Sensi-tive D a t a Retrieval . In ACM Multimedia, pages 243-250, Anahe im, C A , June 1993. 212 [32] J . Gemmel l and S. Chris todoulakis . Principles of Delay-Sensitive M u l t i m e d i a Storage and Retrieval. A CM Transactions on Information Systems, 10(1), 1992. [33] S. Ghandeharizadeh, S. Ho K i m , W . Shi , and R . Z immerman . O n M i n i m i z i n g Star tup Latency in Scalable Continuous M e d i a Servers. In IST/SPIE Multi-media Computing and Networking, San' Jose, C A , February 1997. [34] Pawan Goya l , Harr ick M . V i n , and Prashant J . Shenoy. A Reliable, Adapt ive Network Protocol for Video Transpor t . In IEEE INFOCOMM^ San Francisco, C A , October 1996. [35] Marce l Graf. V B R Video over A T M : Reducing Network Requirements through Endsystem Traffic Shaping. In IEEE INFOCOMM, pages 48-57, Los Angeles, . C A , June 1997. [36] Carsten Gr iwodz , Michae l Bar , and Lars C . Wolf. Long-term Mov ie Popular i ty Models in Video-on-Demand Systems. In ACM Multimedia, pages 349-357, Seattle, W A , November 1997. [37] M . Grossglauser, S. Keshav, and D . Tse. R C B R : A Simple and Efficient Service for Mul t ip l e Time-Scale Traffic. In ACM SIGCOMM, pages 219-230, Boston, M A , August 1995. [38] I S O / I E C J T C 1 / S C 2 9 / W G 11 Edi tor ia l Group . M P E G - 2 DIS 13818-7 - Video (Ggeneric Cod ing of moving pictures and associated audio information. Tech-nical report, International Standards Organizat ion, Geneva, Switzerland, 1996. [39] I T U - T Recommendation H.263. Video Cod ing for low bitrate communicat ion. Technical report, C C I T T , 1995. [40] R . L . Hark in and F . B . Schmuck. The Tiger Shark Fi le System. In IEEE Spring Compcon, Santa C la r a , C A , February 1996. [41] John Har tman and John K . Ousterhout. Z E B R A : A Striped Network F i le Sys- 1 tem. Technical Report U B C / C S D 92/683, Universi ty of Cal i fornia , Berkeley, Berkeley, C A , 1992. [42] Andrew Heybe, M a r k Sull ivan, and Pau l England . Ca l l ipoe : , A Dis t r ibuted, Scalable M u l t i m e d i a Server. In ACM USENIX Annual Technical Conference', San Diego, C A , January 1996. [43] M . K a m a t h , K . Ramamr i tham, and D . Towsley. Continuous M e d i a Sharing in M u l t i m e d i a Database Systems. Technical Report 94-11, Department of C o m -puter Science, Universi ty of Massachussets, Amhers t M A , 1994. 213 [44] N . K a m i y a m a and V . L i . Renegotiated C B R Transmission in Interactive Video-on-Demand Systems. In IEEE Multimedia, pages 12-19, Ot tawa, Canada , June 1997. [45] E . W . Knight ly , D . E . Wrege, J . Liebeherr, and H . Zhang. Fundamenta l L imi t s and Tradeoffs of Prov id ing Determinist ic Guarantees to V B R Video Traffic. In ACM SIGMETRICS '95. A C M , M a y 1995. [46] Oliver Kraemer. A Load Sharing and Object Replication Architecture for a Dis t r ibuted M e d i a Fileserver. Master ' s thesis, Universi tat Karlsruhe, January 1997. [47] M . K u m a r , J . L . Kouloheris , M . J . M c H u g h , and S. Kasera . A High Performance Video Server for Broadband Network Environment . In IST/SPIE Multimedia Computing and Networking, San Jose, C A , January 1996. [48] Simon S. L a m , Simon Chow, and Dav id K . Y . Y a u . A n Algor i thm for Lossless Smoothing of M P E G Video . In ACM SIGCOMM, London, England, September . 1994. [49] Berrid Lamparter , Wolfgang Effelsberg, and Norman M i c h l . A Mov ie Transmis-sion Protocol for M u l t i m e d i a Appl icat ions . In Jtth IEEE ComSoc International Workshop on Multimedia Communications, Monterey, C A , 1992. [50] S. W . Lau and J . C . S. L u i . A Novel Video-On-Demand Storage Archi tecture for Support ing Constant F rame Rate with Var iable B i t Rate Retrieval . In 5th International Workshop on Network and Operating System Support for Digital Audio and Video, D u r h a m , N H , 1995. [51] Andrew Laursen, Jeffrey O l k i n , and M a r k Porter . Oracle M e d i a Server: P rov id -ing Consumer Based Interactive Access to M u l t i m e d i a D a t a . In ACM SIGMOD '94, pages 470-477, A p r i l 1994. [52] Ian M . Leslie, Derek M c A u l e y , and Sape J . Mullender . Pegasus - Operat ing-System Support for Dis t r ibuted M u l t i m e d i a Systems. Technical Report 282, Universi ty of Cambridge, 1992. [53] L ian L i and Nicolas Georganas. M P E G - 2 Coded and Uncoded Stream Syn-chronization Cont ro l for Real-t ime M u l t i m e d i a Transmission and Presentation over B - I S D N . In ACM Multimedia, San Francisco, 1994. [54] C . J . L indblad , D . J . Wetheral l , W . F . Stasios, J . F . A d a m , H . H . Houh, M . Ismets, D . R . Bacher, B . M . Phi l l ips , and D . L . Tennenhouse. ViewSta t ion 214 Applicat ions: Intelligent Video Processing Over a Broadband Local A r e a Net-work. In High-Speed Networking Symposium, USENIX Association, Oakland , C A , August 1-3 1994. U S E N I X Associat ion. [55] T . D . C . Li t t le and D . Venkatesh. Client-Server M e t a d a t a Management for the Delivery of Movies in a Video-On-Demand Systems. In. First International Workshop on Services in Distributed and Networked Environments, Prague, Czech Republic, 1994. [56] J . C . L . L i u , J . i Hseih, and D . H . C . D u . Performance of A Storage System for Support ing Different Video Types and Qualities. IEEE Journal on Selected Areas in Communications: Special Issue on Distributed Multimedia Systems and Technology, 14(7):1314-1341, sep 1996. [57] P. Lougher and D . Shepherd. The Design of a Storage Server for Continuous M e d i a . The Computer Journal (Special Issue on Multimedia), 36( l ) :32-42, February 1993. [58] D . Makaroff, G . Neufeld, and N . Hutchinson. A n Evaluat ion of V B R Admiss ion Algor i thms for Continuous M e d i a Fi le Servers. In ACM Multimedia, pages 143-154, Seattle, W A , November 1997. [59] R . Maras l i , P. D . Amer , and P. T . Conrad . Retransmissin-Based Par t ia l ly Reliable Transpor t Service: A n Ana ly t i ca l M o d e l . In IEEE INFOCOMM, pages 621-629, San Francisco, C A , October 1996. [60] J . M . M c M a n u s and K . W . Ross. Video on Demand over A T M : Constant-Rate Transmission and Transpor t . In IEEE INFOCOMM, pages 1357-1362, San Francisco, C A , October 1996. [61] Jean M . M c M a n u s and K e i t h W . Ross. Video on Demand over A T M : Constant-Rate Transmission and Transport . IEEE Journal on Selected Areas in Com-munication, 14(6), August 1996. [62] R . Mechler. A Portable Real T ime Threads Environment . Master ' s thesis, Universi ty of Br i t i sh Co lumbia , A p r i l 1997. [63] Roland Mechler. Cmfs D a t a Stream Protocol . Unpublished U B C Tech Report , 1997. [64] Microsoft . Netshow overview. ht tp: / / /overview.htm, 1998. 215 [65] I S O / I E C J T C 1 / W G 1 1 M P E G . International Standard ISO 11172 Cod in of moving pictures and associated audio for digital storage media up to 1.5 mb/s . Technical report, ISO, Geneva, Switzerland, 1993. [66] S. J . Mullender , I. M . Leslie, and D . M c A u l e y . Operating-System Support for D i s t r i b u t e d M u l t i m e d i a . In USENIX High-Speed Networking Symposium Proc-ceedings, pages 209-219, Oakland , C A , August 1-3 1994. U S E N I X Associat ion. [67] Michae l N . Nelson, M a r k L in ton , and Susan Owick i . A Highly Avai lable , Scal-able I T V System. In SOSP 15, pages 54-67, A p r i l 1994. [68] G . Neufeld, D . Makaroff, and N . Hutchinson. Design of a Var iable B i t Rate Continuous M e d i a F i le Server for an A T M Network. In IST/SPIE Multimedia Computing and Networking, pages 370-380, San Jose, C A , January 1996. [69] J . Nieh and M . S. L a m . S M A R T U N I X S V R 4 Support for M u l t i m e d i a A p p l i -cations. In IEEE Multimedia, pages 404-414, Ot tawa, Canada , June 1997. [70] Yen-Jen Oyang, Meng-Huang Lee, and Chun-Hung Wen . A Video Storage System for On-Demand Playback. Technical Report N T U C S I E 9 4 - 0 2 , Nat ional Taiwan University, Taiwan, 1994. [71] B . Ozden, A . Bi l i r i s , R . Rastogi , and A . Silberschatz. A Low-Cos t Storage Server for Movie on Demand Databases. In 20th VLDB Conference, pages 594-605, Santiago, Chi le , 1994. [72] Seungyup Paek and Pau l Bocheck Shi-Fu Chang . Scalable M P E G 2 Video Servers wi th Heterogeneous QoS on Paral lel Disk Ar rays . In Proceedings of 5th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV'95), D u r h a m , N H , A p r i l 1995. [73] P . V . Rangan and H . M . V i n . Designing Fi le Systems for D ig i t a l Video and A u d i o . In Proceedings 13th Symposium on Operating Systems Principles (SOSP '91), Operating Systems Review, volume 25, pages 81-94, October 1991. [74] P . V . Rangan and H . M . V i n . Efficient Storage Techniques for Dig i t a l Continuous Mul t imed ia . IEEE Transactions on Knowledge and Data Engineering Special Issue on Multimedia Information Systems, August 1993. [75] P . V . Rangan, H . M . V i n , and S. Ramanathan . Designing an On-Demand M u l -t imedia Service. IEEE Communications Magazine, 1992. [76] A . L . Reddy and J . W y l l i e . Disk Scheduling in a M u l t i m e d i a I / O System. In ACM Multimedia, Anahe im, C A , June 1993. 216 [77] A . Narashima Reddy. Improving Latency in an Interactive Video Server. In IST/SPIE Multimedia Computing' and Networking, San Jose, C A , February 1997. [78] Lawrence A . Rowe and Br i an C . Smi th . A Continuous M e d i a Player. In 3rd International Workshop on Network and Operating Systems Support for Digital Audio and Video, San Diego, C A , November 1992. [79] H Schulzrinne, S. Casner, R . Frederick, and V . Jacobsen. R F C 1889: R T P A Transport Protocol for Real -Time Appl icat ions . In IETF - AudioVideo Work-ing Group, 1996. [80] Subrahata Sen, Jayanta Dey, James Kurose, John Stankovic, and D o n Towsley. C B R Transmission of V B R Stored Video . In SPIE Symposium on Voice Video and Data Communications: Multimedia Networks: Security, Displays, Termi-nals, Gateways, Dal las , T X , November 1997. [81] W . C . Sincoskie. System Archi tecture for a Large-Scale Video on Demand M u l -t imedia Service. Computer Networks and ISDN Systems, 22(1):155-162, 1991. [82] B . C . Smi th . Implementation Techniques for Continuous Media Systems and Applications. P h D thesis, Universi ty of California, Berkeley, 1994. [83] W . T . Strayer, B . J . Dempsey, and A . C . Weaver. XTP: The Xpress Transport Protocol. Addison Wesley Publ ishing, October 1992. [84] B . Tierney, W . Johnston, H . Herzog, G . Hoo, G . J i n , and J . Lee. System Issues in Implementing High Speed Dis t r ibuted Parallel D a t a Storage Systems. In USENIX High-Speed Networking Symposium, 1994. [85] B . Tierney, W . Johnston, H . Herzog, G . Hoo, G . J in , J . Lee, L . T . Chen, and D . Ro tem. Dis t r ibuted Paral le l D a t a Storage Systems: A Scalable Approach to High Speed Image Servers. In ACM Multimedia, San Francisco, C A , October 1994. [86] F . A . Tobagi, J . Pang, R . B a i r d , and M . Gang . Streaming R A I D - A Disk A r r a y Management System F o r Video Files. In ACM Multimedia, pages 393-400, June 1993. [87] H . M . V i n , P. Goya l , A l o k Goya l , and Anshuman G o y a l . A Stat is t ical Admiss ion Cont ro l A lgo r i thm for M u l t i m e d i a Servers. In ACM Multimedia, pages 33-40, San Francisco, C A , October 1994. 217 [88] H. M . V i n and P. V . Rangan. Admiss ion Cont ro l Algor i thms for M u l t i m e d i a On-Demand Servers. In 3rd International Workshop on Network and Operating Systems Support for Digital Audio and Video, 1992. [89] H . M . V i n and P . V ; Rangan. Designing a Mul t i -Use r H D T V Storage Server. IEEE Journal on Selected Areas in Communication: Special Issue on High Definition Television and Digital Video Communication, 11(1), August 1993. [90] J . W . Wong, D . Evans, N . Georganas, J . Brinskelle, G . Neufeld, and D . Makaroff. A n MBone-based Distance Educat ion System. In International Conference on Computer Communications, Cannes, France, 1997. [91] Dallas E . Wrege and Jorg Liebeherr. Video Traffic Character izat ion for M u l -t imedia Networks with a Determinist ic Service. In IEEE INFOCOMM, pages 537 - 544, San Francisco, C A , . M a r c h 1996. [92] C C I T T Study Group XV. C C I T T Rec H.261 Video Codec for Audiovisua l Services at px64 kb i t / s . Technical report, C C I T T , Geneva, Switzerland, 1990. [93] D . Yau and S. L a m . Adapt ive Rate-Control led Scheduling for M u l t i m e d i a Appl ica t ions . In ACM Multimedia, Boston, M A , November 1996. [94] R . Yavatkar and L . M a n o j . Opt imis i tc Strategies for Large-Scale Dissemination of M u l t i m e d i a Information. In A CM Multimedia, Anahe im, C A , June 1993. [95] H . Zhang and E . W . Knight ly . A New Approach to Support Delay-Sensitive V B R Video in Packet-Switched Networks. In 5th International Workshop on Netivork and Operating Systems Support for Digital Audio and Video, pages 381-397, Durham N H , A p r i l 1995. [96] Z . Zhang, J .Kurose, J . D . Salehi, and D . Towsley. Smoothing, Stat is t ical M u l t i -plexing and C a l l Admiss ion Cont ro l for Stored Video . IEEE Journal of Selected Areas in Communications Special Issue on Real-Time Video Services in Multi-media Networks, 15(6), August 1997. 218 Appendix A C M F S Applicat ion Programmer's Interface The Distr ibuted Continuous M e d i a F i le System provides the client wi th the following interface. These routines use the underlying Send/Receive /Reply I P C mechanism supported v ia the U B C Real -Time Threads ( R T T ) kernel [30]. To use the C M F S as it is currently implemented, the client must be using the R T T kernel. To make use of the A P I , an application must have the following statement: #include <cmfs.h> This file contains data structures and data types for further use of the inter-face. In particular, status values are defined for the return codes of A P I calls. They are implemented as the enumerated type CmfsStatus. enum CmfsStatus = {STREAMOK = 0, N0TF0UND = -1, NOTACCESSIBLE= -2, NETWORKUNABLE = -3, SERVERUNABLE = -4, CLIENTREFUSED = -5, DATALOST = -6, NOTYETIMPL = - 7 , C0MMERROR = - 8 , ENDOFDATA = - 9 , INVALIDREQUEST = -10, NOSPACE = -11, CLIENTUNABLE = -12} Other data structures are described wi th the procedures that reference them. 219 A . l Object Manipulation A.1.1 C M F S Storage API Creat ion of presentation objects must be performed by a client application. W h e n the client makes the decision to save an object to the C M F S , the following interface is provided: CmfsStatus CmfsCreate( u_long dispUnits, u_long timeVal, u_long length, u_long * c i d , UOI *uoi ) This procedure creates a new presentation object at the server. The param-eters dispUnits and timeVal provide a context in which to interpret time values. These are to be interpreted as a ratio where dispUnits is specifically the number of presentation units and time Val is the length of time (in milliseconds) that this number of units comprises. For example, this may be given as frames/second for video (e.g. 30/1000), sampling rate in K h z for audio (i.e. 8000/1000), frame length for M P E G audio (1/24 for 24 millisecond frames) or viewgraphs (or captioned text) per second (e.g. 1/1000). Length is the approximate total length (in bytes) of the presentation object to be stored. The C M F S needs to be able to determine if it has the resources available to store the object at the time requested. If it is possible to store the presentation object, the connection on which to transmit the data (cid) and the uoi of the object are returned as result parameters along with the status of C M F S O K . Otherwise, an error status is returned, indicating the type of error. To store a sequence of an object in the C M F S , the following call is provided: CmfsStatus CmfsWrite ( u_long c i d , u_long length, char *buffer, int units, u_long sizes[] ) This call provides the method of storing the data for an object on the C M F S . The length parameter (specified in bytes) and buffer parameter refer to the actual 220 data being sent. The data writ ten in one call to CmfsWri te is defined as a sequence. These sequences are defined as small units of continuous media data (typically up to one second's worth of data, for the pr imary purpose of accommodating the implemen-tation of displaying in various modes (fast forward, rewind, and slow-motion). Se-quence boundaries are also points at which the retrieval process can begin. They can be used as beginning of a scene, or other related logical division of the object. Thus, several calls to CmfsWri te would be made during the creation of a particular presentation object. The units parameter refers to units of time which are compatible with the numunits parameter from CmfsCreate. For example, if the sequence consists of 2 1/3 seconds of video and the numunits parameter pervious had been set to 30, then the units parameter would be 70. The sizes parameter is an array containing the size in bytes of each display unit that is being wri t ten. The server requires this information so that it can properly select the proper data blocks which are required for retrieval during a particular t ime interval. N O T E : It is expected that software at the application wi l l be provided to convert a stream (possibly encoded) into sequences that would be stored by the C M F S . Th i s would be different for each media type and would provide sequences with appropriate characteristics for storage. CmfsStatus CmfsComplete( u_long c i d ) This call indicates that the object has been completely writ ten to the C M F S and the connection is closed. A.1.2 Moving and Deleting CmfsStatus CmfsRemove ( UOI uoi ) 221 This call allows an application to remove a presentation object from the server. CmfsStatus CmfsMigrate ( UOI uoi ) The details of this call and its functionality are provided in [46]! CmfsStatus CmfsReplicate ( UOI uoi ) The details of this call and its functionality are provided in [46]. A.2 Stream Delivery and Connection Management Each interface call returns a status code. The interface is as follows: CmfsStatus Cmfslnit( u_long ipAddr, u_long port ) This procedure takes the address and port number of the administrator as arguments and initializes any client-wide data structures. The client ini t ia l ly establishes contact with the C M F S via the CmfsOpen procedure. The parameters passed by the client to CmfsOpen include the U O I for the object. CmfsStatus CmfsOpen( UOI uoi, (int)OcallBack)(SD *sd), RttTimeValue *prepareBound, u_long * c i d , u_int ipAddr, u_int port) typedef struct StreamDescriptor { u_long i n i t ; /* i n i t i a l buffer reservation request */ u_long avgBitRate; u_long avgPeriod; /* time over which avg b i t rate calc */ u_long maxBitRate; > SD; 222 CmfsOpen does not cause transmission of the presentation data, nor schedul-ing of disk activity. Th is procedure only establishes a real-time network connection to the server and verifies that the presentation object exists. Cont ro l does not return from CmfsOpen until a connection is established (or refused). Par t of the establish-ment of the connection is to determine an ini t ia l buffer size that the client must have in order for the data transmission protocol of stream data to proceed properly, i.e. that the server is be able to deliver the data across the network in time. The pa-rameter callBack is a pointer to a client-application supplied function and is invoked when the client receives the network connect request from the server. A stream de-scriptor is passed to callBack which is used to convey quality of service parameters for this stream. The fields are computed at the server and/or client as appropriate. The first value, init, is the amount of buffer space that must be allocated for the connection to be able to support a prepare request. Th is is server-determined, based on the network latency and the maximum bandwidth of the stream. The major task of callBack is to ensure that the client has the required resources to accommodate any subsequent CmfsPrepare request and that it informs the server of the amount of buffer space that it is wil l ing to allocate to this connection. In order to refuse the entire connection, callBack should return the value C L I E N T R E F U S E D . O t h -erwise, S T R E A M O K should be returned and the contents of the stream descriptor structure are returned to CmfsOpen. The value in init is used as the amount of buffer reservation in the client. Th is value is (possibly) modified and passed back to the server so the server can perform send ahead. A client application needs to be aware of the possible configurations in which this connection could be prepared in the future, as these impact the amount of buffer space that must be allocated. The new value of init must consider the buffer space needed in fast motion mode as more data is transmitted per time unit. A s well, if the client allows for delay in ini t ia l reading of the stream (via delayTime in CmfsPrepare), the buffer space needed for that amount of time must be included. CmfsOpen takes care of cleaning 223 up the connection details at the server, in case of a failure of any kind. The param-eter prepareBound is returned from the server to the client. It specifies the upper bound on the amount of time that a call to CmfsPrepare may take. This allows a client to determine if multiple presentation objects can be opened in sufficient synchronization wi th each other for an effective presentation to the user. The parameters ipAddr and port identify the machine and portnumber (es-sentially, a process on the client machine which is to listen for continuous media data to be sent from the server) to which a real-time data connection is to be estab-lished. This allows a different network interface to be used for the control and the data connections. The server initiates the establishment of this connection, which is unreliable in both directions. The following two calls are provided for the convenience of client applications wishing to have separate processes (perhaps on separate processors), perform the control functions and the real-time data transfer operations. CmfsStatus CmfsProxyOpen(UOI uoi.RttTimeValue *prepareBound, u_long * c i d , u_int ipAddr, u_int port, RttThreadld c l i e n t P i d ) ; This call performs all the work of CmfsOpen except that which deals with the setting up of the real-time connection at the client. Note that the actual thread identifier (clientPid) of the real-time client must be communicated to this interface call in order for the data connection to be properly established. This must be accomplished by higher level software. CmfsStatus CmfsListen (u_long * c i d , int (*) (SD *callBack) , u_int iPaddr, u_int Port); CmfsListen establishes a transport-level connection for the stream that was opened by CmfsProxyOpen. A l l the parameters have the same semantics as de-scribed in CmfsOpen. 224 The client may, terminate this connection at any time: by issuing the following close request. CmfsStatus CmfsClose( u_long c i d ) CmfsClose takes a single parameter - the connection id (cid) - and closes the session. If the stream is not in the stopped, state, the C M F S stops the connection data transfer before performing the close. A.2.1 Stream Contro l Once a connection has been opened, the client must request that the server prepare the stream for data transfer. Th is involves determining if the server has sufficient resources available to display the portion of the-stream requested at the moment. The request to provide real-time : delivery of the media is made with CmfsPrepare. CmfsStatus CmfsPrepare( u_long c i d , pos startPos, pos stopPos, u_int speed, u_int skip, RttTimeValue delayTime ) This routine returns a status code indicating success if the server can deliver the data requested by the client with the required quality of service specified by the speed and skip parameters. It also specifies the maximum amount of t ime (delayTime ) that a client can delay reading data from the connection without any (implicit) adverse action being taken by the C M F S . Bandwid th is reserved at the server and data is delivered across the network into client buffers for impending calls to C'mfsRead. If the delay in the ini t ia l call to CmfsRead exceeds delayTime seconds, the connection wil l be terminated. Th i s may happen at the application or the network level of the server and calls to CmfsRead wil l indicate this data loss. The server continues to send data at the prescribed rate, and if the client does not read quickly enough, additional data may be dropped. The startPos and stopPos parameters are of the opaque datatype pos, and are interpreted as offsets into the stream. If startPos is greater than stopPos, the 225 display of the video is in rewind mode. The values of these position parameters correspond to places where the data transfer can start (i.e. sequence boundaries c f . CmfsWrite). Otherwise, CmfsPrepare wil l fai l . To display an entire object, the constants bf S T A R T O F S T R E A M and ENDOFSTREAM are provided. The parameter speed indicates at what rate the client wishes to retrieve the stream in percentage of normal speed. For example, a value of 100 indicates normal speed (i.e. the same time as recording speed), whereas 50 would indicate slow retrieval at half the display rate of normal and 200 would be fast retrieval at twice the recorded rate. Due to the fact that this may affect the network bandwidth required, this may result in some parameters of the network connection being altered. A client requesting data to be delivered at a speed of 50 and displaying at normal speed (i.e. 100) wi l l most certainly starve. The parameter skip tells the C M F S how many sequences to skip when re-trieving the data from the disk. Th is would allow for efficient implementation of fast-motion display. A value of 0 means that no data is to be skipped. A value of 1 means that one sequence is to be skipped for every sequence sent. A value of 2 indicates 2 skipped sequences for every sequence sent. Before CmfsPrepare returns control to the client application, sufficient data is sent over the network connection previously established v ia CmfsOpen so that CmfsRead operations (see below) wi l l be not delayed. The amount of da ta that is init ial ly sent is defined during CmfsOpen. The parameter delayTime is given as an RttTimeValue and is the maximum amount of delay the client can cause by postponing the ini t ia l call to CmfsRead. The reason this is necessary is that per stream buffer memory wil l be reserved at the server during CmfsPrepare. If these buffers accumulate beyond a threshold value during playback, some action must be taken at the server. If that read is not issued before delayTime from the return of CmfsPrepare, then the connection should be terminated. 226 Once the presentation object has been "readied" via the CmfsOpen and Cmf-sPrepare requests, the client can issue the CmfsRead- request to obtain the stream data. CmfsStatus CmfsRead( u_long cid, void **buffer, int *length ) This procedure returns a pointer to the data read from the connection. If the return status is E N D O F D A T A , then there is no more data on the prepared stream. The caller of this procedure is responsible for freeing the returned buffer. This can be done v ia CmfsFree (see below). If da ta is lost by the network, a return value of D A T A L O S T is given, and the length parameter is set to the length of the missing data. The connection wil l be aborted by the server if the client does not issue sufficient number of CmfsRead operations quickly enough (as given by the data rate values in the StreamDescriptor parameter in CmfsOpen). The rate of data reading must keep up with the number and sizes of the display units requested in the CmfsPrepare cal l . The client also has to perform the reads of data so that the order and timeliness of the data makes sense to the presentation application. Because of the transport layer implementation of buffer allocations, the buffer that is returned in a call to CmfsRead must be freed in accordance with the alloca-t ion. Th is is accomplished by the call : CmfsStatus CmfsFree (char *buffer) so that the details of this mechanism are invisible to the client application. The delivery of the continuous media stream to the client can be terminated at any time by the following cal l : CmfsStatus CmfsStopC u_long c i d ) Once control returns from Jem CmfsStop, any subsequent calls to CmfsRead on that connection wi l l return the status value E N D O F D A T A . 227 A . 3 Meta Data Management The C M F S needs to store some of its own meta-information about the stream. The following interface allows attributes about a presentation to be stored and retrieved. There is a set of system-defined attributes for every object that the C M F S needs for its own internal operation. Addi t ional ly , client applications can define their own attributes. The system defined attributes are accessible to the application, but as read-only. CmfsStatus CmfsPutAttr ( UOI uoi, CmfsDatum attrKey, CmfsDatum value ) struct { u_int length; u_char *key; } CmfsDatum; This procedure inserts the value into the list of attribute-value pairs for the given object. Correspondingly, an interface to retrieve the value of the attributes is pro-vided. CmfsStatus CmfsGetAttr ( UOI uoi, CmfsDatum attrKey, CmfsDatum *value ) A n y application that desires more than one of these attributes in a given call must provide a wrapper function to do so. CmfsStatus CmfsListAHUOIs ( CmfsDatum *uoiValue ) This call provides a user to get a list of the objects which are stored at the particular administrator to which a client is connected. One at tr ibute of every U O I is the list of all the attributes that have been stored for that U O I . This allows a client application to get a detailed listing of 228 all information on an object. Due to the fact that attr ibute keys and values are arbitrary bit strings, a client may or may not be able to intelligently decipher the meaning of these attributes. A . 4 Directory Service A . 5 Miscellaneous A.5.1 Conversions and Stream Display Information This call determines the position in the stream that corresponds to the given time value. Th is timevalue is the real time that has transpired since the first read oper-ation took place. It wi l l be close to indicating the exact amount of data that has been displayed by the client. C m f s S t a t u s C m f s T i m e 2 P o s ( u _ l o n g c i d , R t t T i m e V a l u e t i m e , p o s * p o s i t i o n , R t t T i m e V a l u e * o f f s e t ) CmfsTime2Pos returns the position of the particular point in time of the stream identified by the cid. Note that it is impossible to make this call for an arbitrary uoi, but it can only be used on an opened stream. The returned position value is the nearest (previous in time) valid start ing position in the stream. The parameter Offset indicates the difference in time between the actual position calculated and the one returned. The position which is returned is influenced by the parameters to the previous CmfsPrepare request. In particular, the start position, the speed,' and the skip value determine how real-time display time correlates to movement in the stream itself. For example, if the stream display started at position 12, wi th speed equal to 100 and skip =. to 1, then the position returned would be 12 greater than the amount of time the client had been displaying due to not start ing at the beginning, and it would also be greater due to the fact that every other sequence had been skipped. The effect of the skip value would depend on the actual sizes of 229 the sequences which had been wri t ten. If there has been no CmfsPrepare request, this procedure assumes that the time referred to is the time from the beginning of the stream at speed = 100 and skip = 0. 230 Appendix B Stream Scenarios B . l Stream Groupings This section of the appendix shows which streams were grouped together for the experiments in Chapters 4 and 5. The first grouping is for the per-disk tests in Chapter 4 and is shown in Table B . l . First Low High Short Long Group Variability Variability Playback Playback bloop93 aretha football trailers leia bengals coaches yes30 cat in hat raiders chases rescue maproom tomconnors evac rescue Joe Greene bloop93 country music football si-intro Ray Charles laettner elway aretha maproom F B I M e n snowstorm clinton yes30 coaches Plan-9 twins twins coaches boxing Coun t ry Mus i c tenchi30E trailers maproom aretha Fires dallas dallas football T o m Connors akira30E montreal Canadiens leia C l in ton John Elway F B I M e n yes30 Table B . l : Disk Admission Cont ro l Stream Groupings The long streams had repeat selection of streams, although they were not 231 stored on the disk more than once. This is because there was not enough disk capacity to store a stream more than once. Since the experiment where this was used was investigating the effect of stagger and client buffer space, it is reasonable to use a stream more than once, since the two requests are offset in t ime. The next three tables show the pseudo disk-configurations used for the network admission control tests. The first 143 scenarios used streams in the same relative position on each disk (i.e. streams 0, 3, 4, and 9), as shown in Table B . 6 . The last 50 scenarios were selected afterwards and were comprised of a different scenario from each disk, providing a different cumulative load to the network. Disk 1 Disk 2 Disk 3 Disk 4 nfldeception evac leia deathstar aretha football raiders yes30 bloop93 boxing coaches maproom chases rescue spinal-tap laettner snowstorm twins Joe Greene Ray Charles country music fires F B I men Plan-9 iw Tom Connors Annie Ha l l John M a j o r Tenchi30E Cl in ton basketball A k i r a 3 0 E Dallas Mont rea l Canadiens trailers John Elway M r . Whi t e George of Jungle Summit Series Criswel l John M a j o r Bengals C a t l n H a t Green Eggs Table B .2 : Network Admiss ion Cont ro l Stream Groupings - M I X E D B.2 Scenario Selection B.2.1 Algorithm Comparison For the set of tests in Section 4.4, a selection of streams was made in a random fashion selecting streams in the order of longest to shortest. M o s t of the scenarios were instantaneous arrivals. Table B.5 shows the scenarios for the first tests which compare the various algori thms (Sections 4.4.2, 4.4.3, 4.4.4, and 4.4.5). 232 Disk 1 Disk 2 Disk 3 Disk 4 leia Olympics deathstar aretha due south fender aretha moriarty coaches beatles joe greene rescue eric tomconnbrs fender rescue beach boys Ray Charles K i n k s deathstar F B I M e n Plan-9 country music fires Annie Ha l l John M a j o r Plan-9 T o m Connors Buddy Hol ly B u d d y Hol ly Beatles Cl in ton Joe Greene John M a j o r Tenchi30E moriar ty jor iar ty Tom Connors Ray Charles Le ia K i n k s coaches . John M a j o r Bengals Table B . 3 : Network Admission Cont ro l Stream Groupings - L O W Disk 1 Disk 2 Disk 3 Disk 4 rivals x-files maproom yes24 arrow raiders football football hilites maproom bloop93 si-intro pink floyd laettner snowstorm pink floyd ads bloop93 si-intro twins cars twins maproom arrow jays dallas iw John Elway basketball - akira30E dallas criswell trailers John Elway chicken jays summit series criswell criswell hilites iw baseball laettner baseball Table B .4 : Network Admiss ion Cont ro l Stream Groupings - H I G H 233 Scenario Stagger so Si S2 S3 S4 S5 S6 S7 S8 1 10 X X X X X X X 2 10 X X X X X X X 3 10 • X X X X X X X 4 10 . X X x! X X X X 5 10 X X X X X X X 6 10 X X X X X X X 7 5 X X X X X X X 8 0 X X X X X X X 9 0 X X X X X X X 10 0 X X X X X X X 11 0 X X X X X X X 12 0 X X X x' X X X 13 0 X X X X X X X 14 0 X X X X X X X 15 0 X X X X X X X 16 0 X X X X X X X 17 0 X x.-' X X X X X 18 0 X X x. X X X X 19 0 X X X X X X X 20 0 X X X X X X X 21 0 X X X X X X X 22 5 X X X' X X X X 23 5 X X X X X X X 24 5 X X X X X X X 25 5 X X X X X X X 26 5. X X X X X X X 27 5 X X X X X X X 28 5 X X X X X X X 29 5 X X . X X X X X 30 0 X X . X X X X X 31 0 X X X X X X X 32 0 X X X X X X X 33 0 X X X X X X X 34 0 X X X X X X X 35 0 X X X X X X X 36 0 X X X X X X X 234 Scenario Stagger SO SI S2' S3 S4 S5 S6 S7 S8 37 5 X X X X X X X . 38 5 X X X X X X X 39 5 X X X X X X X 40 5 X X X X X X X 41 5 X X X X X X X 42 5 x X X X X X X 43 5 X X X X X X X 44 5 X X X X X X X 45 5 X X X X X X X 46 5 x X X X X X X 47 5 x X X X X X X 48 10 x X X X X X X 49 10 X X X X X X X 50 10 X X X X X X X 51 0 X X X X X X 52 0 X X X X X X 53 0 X X X X X X 54 0 X X X X X X 55 o X X X X X X 56 0 X X X X X X 57 0 X X X X X X 58 0 X X X X X X 59 0 X X X X X X 60 0 X X X X X X 61 0 X X X X X X 62 0 X X X X X X 63 0 X X X X X X 64 0 X X X X X X 65 0 X X X X X X 66 0 X X X X X X 67 0 X X X X X X 68 0 X X X X X X 69 0 X X X X X X 70 0 X X X X X X 71 5' X X X X X X 72 5 X X X X X X 235 Scenario Stagger SO S i S2 S3 S4 S5 S6 S7 S8 73 5 X X X X X X 74 5 X X X X X X 75 5 X X X X X X 76 5 X X X X X X 77 5 X X X X X X 78 5 X X X X X X 79 5 X X X X X X 80 5 X X X X X X 81 5 X X X X X X 82 5 X X X X X X 83 5 X X X X X X 84 5 X X X X X X 85 5 X X X X X X 86 5 X X X X X X 87 5 X X X X X X 88 5 X X X X X X 89 5 X X X X X X 90 5 X X X X X X 91 10 X X X X X X 92 10 X X X X X X 93 10 X X X X X X 94 10 X X X X X X 95 10 X X X X X X 96 10 X X X X X X 97 10 X X X X X X 98 10 X X X X X X 99 10 X X X X X X 100 10 X X X X X X 101 0 X . X X X X 102 0 X X X X X 103 0 X X X X X 104 0 X X X X X 105 0 X X X X X 106 0 X X X X X 107 0 X X X X X 108 0 X X X ' X X 236 Scenario Stagger so SI S2 S3 S4 S5 S6 S7 S8 109 0 X X X X X 110 0 X X X X X 111 0 X X X X X " 112 0 X X X X X 113 0 X X X X X 114 0 X X X X X 115 0 X X X X X 116 0 X X X X X 117 0 X X X X X 118 0 X - X X X X 119 0 X X X X X 120 0 X X X X X 121 5 X X X X X 122 5 X X X X X 123 5 X X X X X 124 5 X X X X X 125 5 X X X X X 126 5 X X X X X 127 5 X X X X X 128 5 X X X X X 129 5 X X X X X 130 5 X X X X X 131 5 X X X X X 132 5 X X X X X 133 5 X X X X X 134 5 X X X X X 135 5 X X X X X 136 5 X X X X X 137 5 X X X X X 138 5 X X X X X 139 5 X X X X X 140 5 X X X X X 141 10 X X X X X 142 10 X X X X X 143 10 X X X X X 144 10 X X X X X 237 Scenario Stagger SO Si S2 S3 S4 S5 S6 S7 S8 145 10 X X X X X 146 10 X X X X X 147 10 X X X X X 148 10 X X X X X 149 10 X X X X X 150 10 X X x X X 151 0 X X X X 152 0 X X X X 153 0 X X X X 154 0 X X X X 155 0 X X X X 156 0 X X X X 157 0 X X X X 158 0 X X X X 159 0 X X X X 160 0 X X X X 161 0 X X X ' X 162 0 X X X X 163 0 X X X X 164 0 X X X X 165 0 X X X X 166 0 X X X X 167 0 X X X X 168 0 X X X X 169 0 X X X X 170 0 X X X X 171 5 X X X X 172 5 X X X X 173 5 X X X X 174 5 X X X X 175 5 X X X X . 176 5 X X X X 177 5 X X X X 178 5 X X X X 179 5 X X X X 180 5 X X X X 238 Scenario Stagger so s i ,S2 •S3 :S4 S5 S6 S7 S8 181 5 X X X X 182 5 X X X X 183 5 X X X 184 5 X X X X 185 5 X X X X 186 5 X X X X 187 5 X X X X 188 5 X X X X 189 5 X X X X 190 5 X X X X 191 10 X X X X 192 10 X X X X 193 10 X X X X 194 10 X X iX X 195 10 X X X X 196 10 X X X X 197 10 X X X X 198 10 X X X X 199 10 X X X X 200 10 X X X X 201 0 X X X 202 0 X X X 203 0 X X X 204 0 X X X 205 0 X X X 206 0 X X X 207 0 X X X 208 0 X X X 209 0 X X X 210 0 X X X 211 0 X X X 212 0 X X X 213 0 X X X 214 0 X X X 215 0 X X X 216 0 X X X 239 Scenario Stagger so s i S2 S3 S4 S5 S6 S7 S8 217 0 X X X 218 0 X X X 219 0 X X X 220 0 X X X 221 5 X X X 222 5 X X X 223 5 X X X 224 5 X X X 225 5 X X X 226 5 X X X 227 5 X X X 228 5 X X X 229 5 X X 1 X 230 '5 X X X 231 5. X X X 232 5 X X X 233 5 X X X 234 5 X X X 235 5 X X X 236 5 X X X 237 5 X X X 238 5 X X X 239 5 X X X 240 5 X X X 241 10 X X X 242 10 X X X 243 10 X X X 244 10 X X X 245 10 X X X 246 10 X X X 247 10 X X X 248 10 X X X 249 10 X X X 250 10 X X X Table B . 5 : Stream Selection into Scenarios (First Tests) 240 B . 2 . 2 A l l Remaining Comparisons For the remaining comparisons on the vbrSim algori thm, the following selections of streams into scenarios were.performed. There are 143 scenarios, selected as described in Section 4.4.5. Only a small number of scenarios were used in some of the tests, such as those examining client buffer space effects and inter-request arrival times. Scenario SO SI S2 S3. S4 S5 S6 S7 S8 S9 S10 1 X X X X X X X 2 X X X X X X X 3 X X X X X X X 4 ; X X X X X X X 5 X X X X X X X 6 X X X X X X X 7 X X X X X X X 8 X X X X X X X 9 X X X X X X X 10 X X X X X X X 11 X X X X X X X 12 X X X X X X 13 X X X X X X 14 X X X X X X 15 X X X X X X 16 X X X X X X 17 X X X X X X 18 X X X X X X 19 X X X X X X 20 X X X X X X 21 X X X X X X 22 X X X X X X 23 X X X X X X 24 X X X X X X 25 X X ' X X X X 26 X X X X X X 27 X X X X X X 28 X X X X X X 29 X X X X X X 30 X X X X X X 241 Scenario SO Si S2 S3 S4 S5 S6 S7 S8 S9 S10 31 X X X X X X 32 X X X X X X 33 X X X X X X 34 X X X X X X 35 X X X X X X 36 X X X X X X 37 X X X X X X 38 X X X X X X 39 X X X X X X 40 X X X X X X 41 X X X X X X 42 X X X X X X 43 X X X X X X 44 X X X X X X 45 X X X X X 46 X X X X X 47 X X X X X 48 X X X X X 49 X X X X X 50 X X X X X 51 X X X X X 52 X X X X X 53 X X X X X 54 X X X X X 55 X X X X X 56 X X X X X 57 X X X X X 58 X X X X X 59 X X X X X 60 X X X X X 61 X X X X X 62 X X X X X 63 X X X X X 64 X X X X X 65 X X X X X 66 X X X X X 242 Scenario SO Si S2 S3 S4 S5 S6 S7 S8 S9 S10 67 X X X X X 68 X X X X X 69 X X X X X 70 X X X X X . 71 X x. X X X 72 X X X X X 73 X X X X X 74 X X X X X 75 X X X X X 76 X X X X X 77 X X X X X 78 X X X X 79 X X X X 80 X X X X 81 X X X X 82 X X X X 83 X X X X 84 x • X X X . 85 X X X X 86 X X X X 87 X X X X 88 X X X X 89 X X X X 90 X X X X 91 X X X X 92 X X X: X 93 X X X X 94 X X X X 95 X X X X 96 X X X X 97 X X X X 98 X X X X 99 X X X X 100 X X X X 101 X X X X 102 X X X X 243 Scenario SO S i S2 S3 S4 S5 S6 S7 S8 S9 S10 103 X X X X 104 X X X X 105 X X X X 106 X X X X 107 X X X X 108 X X X X 109 X X X X 110 X X X X 111 X X X I X 112 X X X X 113 X X X X 114 X X X X 115 X X X X 116 X X X X 117 X X X X 118 X X X X 119 X X X X 120 X X X X 121 X X X X 122 X X X X X X X 123 X X X X X X X 124 X X X X X X X 125 :x X X X X X X 126 X X X X X X X 127 X X X X X X X 128 X X X X X X X 129 X X X X X X X 130 X X X X X X X 131 X X X X X X X 132 X X X X X X X 133 X X X X X X X 134 X X X X X X X 135 X X X X X X X 136 X X X X X X X 137 X X X X X X X 138 X X X X X X X 244 Scenario SO Si S2 S3 S4 S5 S6 S7 S8 S9 S10 139 X X X X X X X 140 X X X X X X X 141 X X X X X X X 142 X X X X X X X 143 X X X X X X X Table B .6 : Stream Selection into Scenarios (Remaining Tests) For the network tests on stream variability, an additional 50 scenarios were created by combining individual disk scenarios which were accepted by each disk. The twenty largest disk requests accepted by each disk for the mixed variabil i ty streams were selected and combined in various manners to get network scenarios which maximized the stress on the network admission control algori thm. This selec-tion did not necessarily provide the most aggressive scenarios for the low variabil i ty and high variabili ty streams, but it did give more data points in total . These are shown in Tables B .7 and B .8 . 245 S21 | X' X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X S20 | X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X S19 | X X X X X X X X X X X X X X X X X X S18 | X X X X X X X X X X X X X X X X X X X X X S17 j X X X X X X X X X X X X X X X X X X S16 | X X X X X X X X X X X X X X X X X X X X X S15 j X S14 X X X X X X X X X X X X X X X X X X X X S13 ] X X X X X X X X X X X X X X S12 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Sll I X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X SIO | X X X X X X X X X X X X X X X X cn cn X X X X X X X X X X X X X X X X X X X X X X oo cn X X X X X X X X X X X X X X X X X cn X X X X X X X X X X X X X X X X X CD cn X X X X X X X X X X X X X X X X X X X X X X X X X ic cn X X X X X X X X X X X X X X X X X cn X X X X X X X X X X X X X X X X X X X X X X 00 cn X X X X X X X X X X X X X X X X X X X X X X X X X X CN cn X X X X X X X X X X X X X X X X X X cn X X X X X X X X X X X X X X X X X X X X X X X X X X X X X o cn X X X X X X X X X X X X X X X X X X X X I Seen, j T T IC T CD T t- oo cn -3* o IC IC CN ic CO IC T IC IC IC 'CD IC I -IC oo IC o> IC o CD CD CN CD oo CD T CD IC CD CD CD CD oo CD a> CD o CN [~ CO T IC CD oo a> o CO CO CN CO CO 00 T 00 IC 00 CD CO 1^  CO 00 00 CJ 00 o CT) cn CN CD CO cn cc3 a) 246 S43 | X X X X X X X X X X X X X X X X CN T f cn X X X X X X X X X X X X X X X X X X X S4i X X >< X X X X X X X X X X X X X X X X X S40 X X X X X X X X X X X X X X X X X X X X X |- S39 X X X X X X X X X X X X X X X X X X X X | S38 X X X X X X X X X X X X X X X X X X X X | S37 X X X X X X X X X X X X X X X X | S36 X X X X X X X X X X X X X X X X X X X X X X | S35 X X X X X X X X X X X X X X X X X X X 1 S34' X X X X X X X X X X X X X S33 X X X X X X X X X X X X X X X X X X X X X X X X S32 | X X X X X X X X X X X X X X CO cn X X X X X X X X X X X x X X X X X X X X X X X S30 X X X X X X X X X X X X X X X X X | S29 ] X X X X X X X X X X X X X X X X X X X X 1 S28- 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 1 S27 1 X X X X X X X X X X X X X X X X X X X X X X X X X X X 1 S26 j X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X S25 X X X X X X * X X X X X X X X X X X X X X X X X X X X 1 S24 1 X X X X X X X X X X X X X X X X X X X X ( S23 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X S22 X X X X X X X X X X X X X X X X X X X X X X X X X X | Seen. | T f TT T—1 LO T f r H CO T f t~ T f oo T f CD T f o in in CN in CO in T f in in in CD in t-in 00 in <D in o CO CO CN CO r H CO CO T f CO in CO CO CO r H CO 00 CO CD CO o r-CN t~ CO T f t~ in CO r-t~ 00 CT) o 00 r H 00 CN oo CO 00 ' T f oo in 00 CO oo f~ 00 00 00 CD 00 o CT) CD CN CD CO CD r H 247 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items