UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Performance issues in an interactive video-on-demand server Najafian Razavi, Maryam 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2001-0083.pdf [ 3.61MB ]
Metadata
JSON: 831-1.0065044.json
JSON-LD: 831-1.0065044-ld.json
RDF/XML (Pretty): 831-1.0065044-rdf.xml
RDF/JSON: 831-1.0065044-rdf.json
Turtle: 831-1.0065044-turtle.txt
N-Triples: 831-1.0065044-rdf-ntriples.txt
Original Record: 831-1.0065044-source.json
Full Text
831-1.0065044-fulltext.txt
Citation
831-1.0065044.ris

Full Text

Performance Issues in an Interactive Video-on-Demand Server by Maryam Najafian Razavi B . S c , Sharif University of Technology, Tehran, Iran, 1995 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Applied Science in T H E F A C U L T Y O F G R A D U A T E S T U D I E S (Department of Electrical and Computer Engineering) We accept this thesis as conforming to the required standard The University of British Columbia November 2000 © Maryam Najafian Razavi , 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada Date DE-6 (2/88) Abstract This research project is an experimental investigation of resource management tech-niques that can improve the performance of Interactive Video-On-Demand ( IVOD) servers. Examples of I V O D can be found in digital libraries, hyper-media, education, entertainment and telemarketing applications. Interactive video programs have some particular characteristics which need to be considered when designing an I V O D serv-er. They consist of a series of short video segments which are created by on-line user interactions and need to be presented to the user in a particular order. The I V O D server must thus try to keep the latency before start-up and between video segments to min imum. Furthermore, user arrivals and departures are frequent in most of the I V O D applications, which causes the server to be in the transient state most of the time. Thus, The adverse effect of frequent transients on the performance of the server must also be addressed. In this research, we identify I V O D performance issues and propose several re-source management strategies to improve the performance of our I V O D server in various aspects. A n I V O D server is designed and implemented to a im evaluating the suggested techniques. i i Contents Abstract ii Contents iii List of Tables vii List of Figures ix 1 Introduction 1 1.1 Video-On-Demand Service 1 1.2 V O D System Model 3 1.3 Mot ivat ion 7 1.4 Contr ibut ion 8 1.5 Outl ine of the Thesis 9 2 Background and Related work 10 i i i 2.1 Resource Management Issues in an I V O D system 10 2.1.1 Disk Scheduling Policies 11 2.1.2 Da ta Placement Schemes 16 2.1.3 Buffer Management Options 20 2.1.4 Admission Control Policies 24 3 System Design and Performance Issues 27 3.1 System Design 27 3.1.1 Server Resources . 28 3.1.2 Server Processes 31 3.2 Performance Issues ' . . 36 3.2.1 Steady-state Stream Throughput . 37 3.2.2 Response T ime 38 3.2.3 Ji t ter . 40 3.2.4 Performance in the Transient State 41 4 System Implementation 43 4.1 System Requirements .-, . . . 43 4.2 The Server Program 44 4.3 The Client Program 50 5 Network Interface Considerations for Improving Throughput 52 iv 5.1 The Implications of Changing Socket Buffer Size 55 5.2 Increasing the Size of 'send buffer' 58 5.3 Experiment Design 59 5.4 Results and Discussion 60 6 Order of Service Considerations for Improving Jitter 62 6.1 Vert ical Scheduling vs. Horizontal Scheduling 62 6.2 Experiment Design 66 6.3 Results and Discussion . 68 6.3.1 In-request Delay 68 6.3.2 Between-request Delay ^2 6.3.3 Response T ime 74 6.3.4 System Throughput . . . - 74 7 Improving the Response Time under Transient State 79 7.1 The Effect of Frequent Transients on Response T ime 80 7.2 Experiment Design 83 7.3 Results and Discussion 84 8 Conclusions and Future Work 89 8.1 Summary of Results 89 8.2 Suggestions for Future Work 92 v References 95 vi List of Tables 6.1 Comparing the Variance in Response T ime for Round-robin, C S C A N , and Vertical Scheduling methods • • • 75 7.1 Compar ing the Response T ime for Round-robin policy: Using In-sequence Service and Out-of-sequence Init ial Service 85 7.2 Comparing the In-request Delay for Round-robin Pol icy: Using In-sequence Service and Out-of-sequence Init ial Service 85 7.3 Comparing the Between-request Delay for Round-robin Policy: Using In-sequence Service and Out-of-sequence Ini t ial Service 85 7.4 Comparing the Response T ime for C S C A N Policy: Using In-sequence Service and Out-of-sequence Init ial Service 86 7.5 Comparing the In-request Delay for C S C A N Policy: Using In-sequence Service and Out-of-sequence Init ial Service 86 v i i 7.6 Comparing the Between-request Delay for C S C A N Policy: Using In-sequence Service and Out-of-sequence Ini t ial Service 87 7.7 Comparing the Response Time for Vertical Policy: Using In-sequence Service and Out-of-sequence Init ial Service 87 7.8 Comparing the In-request Delay for Vertical Policy: Using In-sequence Service and Out-of-sequence Init ial Service 87 7.9 Comparing the Between-request Delay for Vertical Policy: Using In-sequence Service and Out-of-sequence Init ial Service 88 v i i i List of Figures 1.1 Video System Architecture 4 1.2 Logical Da ta Flow in an I V O D Server 6 4.1 The Structure of Global Queue 47 5.1 The Da ta Pa th From the Server to the Client . 55 5.2 Effect of Socket Send Buffer Size on System Throughput . . . . . . . 60 6.1 How the Average Percentage of Delays are Calculated 69 6.2 Effect of Order of Service on In-request Delay 70 6.3 Effect of Order of Service on Between-request Delay 72 6.4 Effect of Order of Service on Response T ime . 76 6.5 Effect of Order of Service on System Throughput 77 ix Chapter 1 Introduction This chapter reviews the thesis research topic. It introduces the Video-On-Demand ( V O D ) system model and the problems associated wi th i t . The motivation and contributions of the research are also discussed. Final ly , an outline of the thesis has been provided. 1.1 Video-On-Demand Service Advances in computing and networking are generating a significant number of multimedia-enabled applications in computer systems. Online access to mult imedia information has become both possible and cost-effective. Aud io and video are becoming integrated in the standard user interfaces. Among the different media of a mult imedia system, audio and video are classified as continuous media (CM). This means they consist of 1 a sequence of media quanta (e.g. audio samples or video frames) which do not convey meaning i f not presented in the proper time intervals. Conventional file servers can not guarantee the timely delivery of successive C M data samples and thus, can not guarantee to prevent annoying effects in presentation, such as audio pops or video hic-cups. For this reason, a special class of file servers known as CM Servers has emerged for serving C M data requests. The C M server designed to provide video retrieval service is referred to as Video-On-Demand server (VOD server). When requests for C M data are generated from interactive activities of on-line clients, the V O D server is called an Interactive VOD server (IVOD server). The main responsibility of an I V O D server is to provide timely delivery of video data to its clients. For this matter, the I V O D server must be equipped with: • Proper resources, such as processor time, network bandwidth, memory buffers and storage devices. • Efficient algorithms and policies for scheduling of the server resources. The I V O D server resources need to be collectively scheduled in order to provide timely delivery of C M data. Tradit ional resource scheduling techniques are not d i -rectly applicable mainly because they are designed only for a single resource, or they are not designed to meet real-time constraints. In this research, an I V O D server is designed and implemented to serve requests for video files from multiple clients over 2 the network. The server serves the clients by extracting the requested video data for each client and transferring the data to the client v i a the network. The main objective is to optimize the performance of the I V O D server by applying resource management techniques designed particularly for an I V O D server. 1.2 VOD System Model A typical V O D system mainly consists of various clients connecting to the server v ia a network, as shown in Figure 1.1. Each client generates requests for video data over time. In I V O D applications, each request consists of a series of video segments which must be presented to the client in the proper order. The network is used as a transmission medium for sending clients' requests to the server as well as sending back video data from the server to the clients in response to clients' requests. The clients thus need to be equipped wi th a buffer to be able to tolerate variations in network delays and data consumption rate. We do not exclude the possibility of having both a client and a server on the same physical workstation, as long as the cumulative resource requirements are satisfied. The V O D server must be equipped wi th a storage device to store the video files, a pool of memory buffers and a C P U to execute the server processes. A typical scenario in an I V O D system starts by clients t rying to make a connec-tion to the server. A t the time of connection, each client w i l l go through an admission 3 5 test. The role of the admission test is to make sure that accepting a new client w i l l not violate satisfying the requirements of already admitted clients. Once admitted, a session w i l l be established between the client and the server through which al l the client's requests w i l l be served. The clients' requests w i l l then generate a continu-ous delivery of C M data, which is often referred to as a stream. The time between a request being released unt i l the first quanta is played back is called start-up la-tency. The requests received from different clients w i l l be put in a queue, waiting to be served. A t this point, a disk scheduler w i l l determine the order of service to the clients by changing the order of requests in the queue. Since the disk data rate is much higher than the individual data rates of clients' requests, the I V O D server can serve multiple clients by reading ahead for each client. Memory buffers are thus needed at the server to hold data read ahead for consumption by each client. The requests w i l l then be served by transferring the data of the requested video file from the storage device to the client's associated buffer on the server. The final step wi l l be transmission of data from clients' buffers to the client programs v ia the network. The server is said to complete a round of service by giving one service to refill the buffers of each stream. A logical flow of the C M data in the server is shown in Figure 1.2. The time between two consecutive reads for a client is called service period and the amount of data read for each client in a round is called read-size. The condition 5 To Client Programs Via Network Empty Buffer Full Buffer Figure 1.2: Logical Da ta Flow in an I V O D Server necessary to guarantee real-time retrieval is that the data buffered for a stream must be sufficient to sustain consumption unti l the next time the stream receives service. This prevents buffer under-run, or starvation. Most known mult imedia disk schedul-ing strategies give each stream one service in each service round. A widely accepted sufficient condition for non-starvation is that the designated amount of data read for each stream is never less than that consumed by the stream in the service period. This is often referred to as the buffer conserving [1], or work-ahead augmenting [2] requirement. There is never a net decrease in the amount of samples buffered for each stream in any service round. However, uncontrolled accumulation of read-ahead data wi l l result in buffer over-run. Disk accesses are normally non-preemptive. To ensure that buffer over-run wi l l not occur, the size of each disk read must not exceed the r buffer vacancy level. If the read-size can be varied, the server can simply truncate the read to match the buffer vacancy level. O n the other hand, i f the read-size is fixed, a 6 read may need to be postponed unti l the buffer vacancy level matches the read-size. Whether a read can be varied depends very much on the data placement scheme and buffer model chosen. We shall discuss these issues further in later sections. 1.3 Motivation User-driven interactivity imposes some constraints on the performance of the I V O D server, and eases some others. Video files used in I V O D applications are usually short in length (compared to movie-on-demand) and have a fairly low play-out rate. A request usually consists of a series of video segments presented in a particular order. Requests for video data are generated by interactions of on-line clients. On-line composition of video files might also be needed. In contrast to the delivery of movies, interactive video programs are seldom linear. They usually consist of short video branches separated by user interactions. I V O D users w i l l not be satisfied wi th long latency for start-up or between play out of the video branches. Therefore, the resource management strategies for an I V O D server must address performance issues such as startup latency, the delay between video branches, and data throughput. The adverse effect of frequent arrivals and departures of clients on start-up latency and robustness must also be considered. Most of the work in the area has been focused on the theoretical aspects of V O D servers. Less attention has been paid to the practical issues of performance. Also, 7 particular performance issues regarding hyper-media applications, when a request is consisted of a series of short, connected video segments has not been fully addressed before. In this research, we aim to elaborate on this matter. 1.4 Contribution In this thesis, we identify I V O D performance issues and exploit the particular char-acteristics of I V O D to present resource management techniques for improving the performance of I V O D servers. A n experimental I V O D server has been designed and implemented under Linux operating system using C programming language, which has been used as the test-bed for our experiments and evaluations. A t networking level, we have investigated the effect of socket send buffer size on the throughput of the server and how to change this parameter to achieve the opt imum throughput. A t scheduling level, we have proposed a new approach to minimize the delay experienced by the clients during play back of video segments contained in a request. To improve the performance of the system in the case that clients' arrivals and departures are frequent, we have proposed an out of sequence in i t ia l service strategy to minimize the variance in the response time of clients without sacrificing the quality of service for the server. A l l the proposed techniques are verified by comprehensive performance evalua-8 tions on our implemented system. The results confirm that the proposed strategies can improve I V O D service in multiple dimensions of performance. 1.5 Outline of the Thesis Following the introduction of the thesis topic in the first chapter, Chapter 2 explains the issues specific to I V O D service. It discussed different options available in var-ious perspectives when designing an I V O D server and explains the advantages and drawbacks of each one. Later, the existing work in each perspective is reviewed. Chapter 3 covers the design of our system in details. We discuss the design decisions made regarding each aspect and the reason why they have been chosen. We also discuss the performance issues in I V O D service in various dimensions and introduce some performance metrics to be used in later chapters. In Chapter 4, some important points regarding the implementation of the system is presented. We go very briefly over the implementation of various parts of our system and explain the challenges and trade-offs regarding the implementation of the system. Chapters 5 through 7 cover our proposed techniques for improving the performance of the I V O D server. The experiments designed to evaluate each technique are presented in detail, along wi th the results of our evaluations. Final ly , the results of the research in this thesis and the possible directions for future work are summarized in Chapter 8. 9 Chapter 2 Background and Related work In this chapter, the issues of concern in an I V O D server w i l l be discussed. We wi l l investigate the alternative options available in each dimension. A l o n g our discussion, we wi l l provide overview of some of the existing approaches in the literature regarding each issue and discuss their performance trade-offs. 2.1 Resource Management Issues in an IVOD sys-tem In an I V O D system, the disk scheduling policy dictates the order of disk accesses and the sequence of service to clients' requests. The buffer model depicts how the server's memory is shared and synchronized to support the concurrent consumption 10 and production events. The data placement scheme determines how the video data is stored on the storage device. Scheduling of server's C P U , although a less demanding issue, is also necessary to invoke the execution of various server functions at a proper time. The way chosen to schedule and manage a particular kind of resource in an I V O D server often imposes constraints on the way other resources are used. Therefore, resource constraints must be considered along wi th time constraints when designing an I V O D server. In this section, we wi l l discuss alternative options available regarding each resource. . ' 2.1.1 Disk Scheduling Policies A disk scheduling policy is a policy for deciding the order of serving several disk access requests. Servers usually use disk scheduling policies to achieve the following goals [3;: • Reduce seek time and rotational latency • Achieve high throughput • Provide fair access to each stream • A v o i d starvation for each stream We wi l l now discuss some disk scheduling policies most widely used in literature. 11 2.1.1.1 FCFS (First-Come-First-Served) In this method, the requests in each round are served in the order of their arrivals. This is a simple, traditional approach for disk scheduling. The drawback of this policy is that it can not guarantee a fair access to disk for each stream (e.g. an stream might not receive service during a round) and thus, starvation may happen for one or more streams. 2.1.1.2 Round-robin In this method, the requests are served in a fixed order that does not vary from one round to the next. Many of the earlier V O D schedulers adopt a round-robin algorithm as their scheduling policy for sequencing the order of service to streams [4] [5] [6] [7] [8]. This approach is easy to implement and deterministic, and can easily support heterogeneous play-out rates. However, a round-robin scheduler can only produce a moderate stream throughput because it does not minimize disk overheads. Further, its static behavior does not allow it to take advantage of run-time information to improve the efficiency of disk accesses or to tolerate fluctuations in play-out rates. 2.1.1.3 SCAN In this method, the disk head is scanned back and forth across the disk's surface (from the outermost to the innermost track or vice versa) and requested blocks are 12 retrieved in the order of scan direction. Tha t is, requests are served on both inward and outward sweeps of disk head over the disk surface. Seek-reducing approaches such as S C A N have been primari ly adopted to mini -mize disk seek overheads. However, the reversal of the service sequence by S C A N in alternating service rounds increases the worst inter-service time (time between two consecutive services to one stream) to span two service periods, forcing the streams to have buffers to hold data to last for two service rounds. Therefore, while the service period of S C A N policy is shorter than that of round-robin, the buffer requirement of S C A N is not necessarily lower. In general, SCAN-based seek-reducing approaches reduce the length of service period at the expense of increasing buffer requirement to tolerate the large variation in inter-service time [2] [9]. Another point is that the start-up latency of S C A N is likely to be more than the round-robin approach, since starting of new streams can be delayed by up to two service periods. A n alternative of S C A N scheduling policy, C S C A N , serves the requests in circular rounds. That is, streams are not serviced on a return tr ip of head travel. W i t h no disk arm sweeps in each round of service, the total seek overhead of C S C A N is larger than S C A N . O n the other hand, with the tighter bound on inter-service times of C S C A N , streams only need buffers to cover consumption for slightly more than one service period. Therefore, a C S C A N scheduler can be more efficient in buffer usage than a S C A N one [10]. 13 2.1.1.4 Advanced Seek Reducing Schemes To alleviate the problem of large variations in the inter-service time of S C A N , ad-vanced seek-reducing schemes have been proposed in the Group Sweeping scheme [2] [11] and Pre-Seeking Sweep scheme [4]. In both of these schemes, streams are divided into G groups, wi thin each of which streams are serviced in the S C A N order. In each round of service, the relative service ordering among successive groups do not change. The amount of data read need only to sustain (1 + l/G) of the round period, compared to two round periods needed by S C A N . The drawback of these methods is that the total seek time is G times that of generic S C A N method. The variation in the inter-service time of streams is less when there are more sets. So, the buffer requirement can be decreased. However, the cumulative seek overheads increase with the number of sets. Thus, finding the opt imum number of sets becomes a trade-off between buffer requirement and seek overhead. Another major point is how to assign streams to each set. Al though dynamic redistribution of streams at run-time is pos-sible, the suggested procedures are quite complex. Further, as streams are completed or admitted, regrouping and migration of streams among sets are recommended. The overhead involved in the needed run-time computation can outweigh the benefits of the optimization and can also delay the starting of new streams. 14 2.1.1.5 EDF(Earliest Deadline First) E D F is a deadline-driven approach which has also been proposed for real-time retrieval of C M data in the literature. In this method, the request wi th the earliest deadline wi l l be scheduled for the next retrieval. Each request is assigned a deadline. The requests w i l l then be served in the order of their deadlines. A major point here is how to assign deadlines to requests. The approach mostly used in literature is to set the deadline to request release time plus a multiple (m) of request period (normally, m = l ) . It has been proven that the E D F algorithm is opt imal i f requests' release times are known in advance [14]. However, because in practice this is not the case, assigning deadlines to requests becomes a major point. Because E D F does not exploit the relative position of disk head, it may have Y excessive seek and rotational latency time and poor disk head uti l izat ion. It also assumes that disks are preemptable wi th zero preemption costs. However, current disks are not preemptable [14]. Another point regarding E D F policy is that i f the order of deadlines does not change from round to round, then E D F wi l l operate the same as round-robin. 2.1.1.6 SCAN-EDF This is another deadline-driven approach which is a combination of S C A N and E D F [13] [14]. It serves the requests in each round in the E D F order, but when several 15 requests have the same deadline, they wi l l be served according to their track location on the disk ( S C A N order). Obviously, the efficiency of this method depends on how often streams have the same deadlines. If streams have constant play-out rates read sizes are fixed, the deadline of streams wi l l be equal to their release time. However, wi th variable play-out rate or different read sizes, it becomes difficult to keep the deadlines of streams aligned. E x t r a buffers might be needed in order to keep read sizes constant and stream deadlines aligned. This approach has been claimed to achieve the same throughput as that of C S C A N , but at the expense of higher buffer requirement and longer start-up latency [14]. 2.1.2 Data Placement Schemes Storage of mult imedia objects is mainly involved wi th opt imal placement of media blocks on the storage device (generally disk), in order to help synchronous playback of multiple streams being more efficient. Before discussing different data storage strategies, we should describe some terminology used in this area [4]: • A data placement strategy is called Contiguous, i f each C M file is stored in a physically contiguous area on disk. • A data placement strategy is called Scattered, i f data samples are split up into large, fixed-size chunks called blocks, and placed in various locations on the disk. So, The blocks comprising a C M file can reside anywhere on disk. 16 • A data placement strategy is called Interleaved, i f al l samples of data from different sources that have been read at the same time (or alternatively, are anticipated to be retrieved concurrently) are merged together, making a large block to be placed on the disk [4]. . Unless reads are restricted to fall wi thin a single cylinder, a read of size as small as two sectors can span across cylinder boundaries and incur seek and rotational penalties. The disk overhead incurred within a read is referred to as intra-stream overhead, as oppose to inter-stream overhead which is incurred in switching service between streams. We wi l l now discuss the strength and drawbacks associated wi th each scheme. 2.1.2.1 Contiguous Scheme The main advantage of this approach is that it is easy to implement, and does not have the overhead of keeping extra structures (such as indexes, etc.) to keep track of the location of different blocks of data on disk. The disadvantages of this approach is that under the contiguous placement scheme, frequent file updates may cause the remaining space of disk to become frag-mented. Al though in data processing systems file updates are very frequent, most C M data accesses generated by V O D applications are read-only in nature. In addi-tion, real-time constraints usually need not apply to write activities, i f any. There-17 fore, compaction operation can be done as background activity to provide storage space efficiency. Further, the problem of fragmentation is less crucial i f the server is bandwidth-bounded rather than space-bounded. 2.1.2.2 Scattered Scheme The main advantage of this scheme is that the storage space fragmentation problem in contiguous placement can be avoided. To safeguard disk efficiency and avoid seeks wi thin a transfer, the transfer size must be large and fixed. Each disk access must be an integral multiple of these large-sized blocks. This imposes a strong restriction on the variabili ty of the read-size. Further, how to pre-determine the block size to support a wide range of unknown play-out rates is an issue. The main drawback of scattered placement scheme is that it disallows the sched-uler to fine tune the amount of data in each transfer. Also , there is the overhead of keeping indexes to keep track of the location of various blocks of data on the disk. 2.1.2.3 Interleaved Scheme The advantage of this scheme is that it eliminates inter-stream switching overhead. Interleaved scheme can provide the highest C M data bandwidth and storage efficiency [15] [16] [17]. It is specially valuable for read-only optical disks because of their large, access overhead. 18 The disadvantage of this scheme is that on-line merging of a new C M file into the empty gaps interleaved with existing C M files can become an issue specially when the system needs to support C M files with different play-out rates. Besides, synchronization information must exist in advance in order to do interleaving, and it can not change without rewriting the data. A s the retrieval schedule is also pre-determined by placement, the rate of data transfer is also fixed. The issue of buffer overrun can arise when the server needs to support Variable-Bit-Rate streams or pausing operations. Da ta replication might also be needed for video data that are going to be played more than once. Considering the above definitions, data placement schemes can fall into either of the following categories: Contiguous noninterleaved, contiguous interleaved, scattered noninterleaved, and scattered interleaved. Other data placement schemes used in literature include Constrained placement and Log-structured placement. Constrained placement is a version of scattered method which enforces an upper bound on separation between successive blocks of data . This bound is generally not enforced for each pair of successive blocks but on average over a finite sequence of blocks. This method is usually useful when block sizes are small (e.g. block sizes tailored for text). But , for being effective, the method should retrieve al l data blocks for one stream before switching to another one. So, it is not suitable for use with disk scheduling algorithms such as S C A N , which order the data blocks regardless of the 19 stream they belong to. In Log-structured placement modified blocks of data are not stored in their origi-nal position. Instead, al l the modified blocks of al l streams wi l l be written contiguous-ly in a large free space. So, it is a good approach for applications which involve large amount of editing. This approach wi l l lead to a better performance during recording, because only one seek wi l l be done for a batch of writes. Obviously, it does not make any improvement in playback performance. So, it is not a good choice for read-only applications (such as Video-On-Demand). Despite al l the above issues, contiguous placement is believed to be the most efficient placement method to utilize disk bandwidth [8] [9] [18] [19], simply because long seeks wi l l not be incurred within a transfer regardless of the size of the transfer. W i t h the flexibility of supporting variable read-size, efficiency of disk accesses can be dynamically optimized under the contiguous placement scheme. 2.1.3 Buffer Management Options In every V O D system, one important challenge of the server is to prevent each client from starvation. Al though it is conceivable that the server must be able to transfer data directly from the storage device to the clients' buffers just in time to be played, in practice retrieval is known to be bursty[3]. This brings up the need for having a buffer per client at the server so that the data can be transfered from disk to buffers 20 before being transferred to the client v ia the network at proper time. A significant fraction of the hardware investment in a V O D server is spent on its memory buffers. Efficient use of buffers can thus reduce the cost of a V O D server. The issue of buffer management should be considered from two perspectives: buffer size and buffer model. 2.1.3.1 Buffer Size In order to prevent each client from starvation, the amount of data retrieved for that client wi thin any time interval must be proportional to the client's consumption rate. Obviously, i f we prefetch al l the data for a client before starting its playback, starvation wi l l never happen for that client, but this brings up the need for excessive buffer space and also a long start-up latency. So, the main point here is to put a lower bound on the amount of buffer space needed to prevent al l the clients from starvation. It has been shown in [4] that minimizing buffer space wi l l also minimize the start-up latency. There is a trade-off associated wi th the buffer space allocation: generally, it is expected that a real-time request is served before the next request is released. W i t h larger buffer space, the server can handle larger request sizes, so the requests' period can be longer. O n the other hand, i f the available buffer space is small , the request streams can ask for smaller pieces of data during each round wi th corresponding 21 smaller periods between requests. This trade-off is usually important because disk service efficiency depends on the request period. When requests' periods are longer, there is extra time available for serving a request. Hence, the system can apply seek-optimization techniques more frequently to the request queue so that disk arm can work more efficiently. Another issue of concern is that when a server is designed to support only streams wi th homogeneous play-out rates, the system's buffer can be pre-divided into equal constant shares, each allocated to an admitted stream. O n the other hand, when heterogeneous play-out rates of streams are only known at run-time, buffers need to be managed dynamically. Considering the above, we can now discuss the policies used to decide on buffer size. M i n i m u m Buffer Allocation Scheme: In this scheme, each stream is allocated the min imum amount of buffers as prescribed by admission test, while the re-mainder of the system's buffer are left idle. When streams depart, excess buffers are retrieved by the system. Whi le this scheme reserves vacant buffers for im-mediate allocation to new streams, the ut i l izat ion of buffer is low. Exhaustive Buffer Allocation Scheme: In this scheme introduced in [1], all avail-able buffers are allocated to admitted streams. Whi l e this scheme improves the uti l izat ion of buffers, global redistribution of buffers need to be performed upon 22 departure and admission of new streams. A potential issue is that retrieval of sufficient vacant buffers for new streams may not be immediately possible unless data in some buffers can be discarded. However, discarding buffered data can cause discontinuity i f reading of the succeeding data has already started. 2.1.3.2 Buffer Model The two mostly used buffer models in literature include FIFO and double-buffer model, we wi l l now discuss these models and where they are most suitable. FIFO (First-In-First-Out): Buffers allocated to a stream can be modeled as non-overlapping sets or as one single recirculating FIFO. This model is suitable for use wi th contiguous placement, where the size of each disk read can be varied. In this case the buffers of each stream can be topped-up in each service [3], so we do not need to give each stream more buffers than necessary to cover its consumption for the worst-case inter-service time. Further, i f the sequence of service is not reversed in successive rounds (as in S C A N scheduling policy), the worst-case inter-service time is only slightly longer than the service period. Double-Buffer: In this model, buffers allocated to a stream are partitioned into two distinct equal-sized set. Whi le the consumption process is emptying one set, the other set can be filled by the production process. B y such mutually exclusive use of two buffer sets, implici t synchronization of the production and consumption 23 processes is enforced. The buffers are swapped when the consumer's set is completely emptied. To guarantee non-starvation, the producer must complete the filling of its buffer set before each swap. However, the producer must also wait for the depletion of data in the consumer's buffer set before each swap. If, for some reason, one of the consumption processes slows down or pauses, the service period can be lengthened and on-time service to other streams wi l l be put at risk. It should be noted that i f either scattered or interleaved placement schemes are used, double or multiple buffer sets are necessary because the size of each disk read must be an integral multiple of the block size. W i t h fewer buffers, mutually exclusive accesses to buffers can s t i l l be enforced as long as the size of a disk read can be truncated by the buffer vacancy level. The obvious problem wi th double-buffer model is that the use of double or multiple buffer sets significantly increases the total buffer requirement. 2.1.4 Admission Control Policies A s in any mult imedia system, i f the clients are admitted without any control, there might be a point where the system load gets so large that the server can not satisfy the time constraints for some clients. A n admission control policy is thus needed to impose some limitations on the process of accepting new clients. The factors used 24 to determine the admission policy is usually based on the system limitations. They may also depend on the requirement of the new client as different clients might have different requirements. The steady-state stream throughput of an I V O D server is essentially governed by the admission control policy used. Wi thout an appropriate admission control policy, the potentially high throughput of a good disk scheduler can be underestimated and not util ized to the fullest extent. Usually, serving a real-time client is equivalent to guarantee meeting al l the real-time deadlines for that client. However, in some cases clients can tolerate missing a few deadlines during any specific period of time. In such cases, the client is provided wi th lower-quality service. So, in general, admission control pol icy depends on the quality of service needed for new and current clients. A server quality-of-service can be in one of the following categories [3] : Deterministic: For this level of service, al l real-time deadlines must be guaranteed to be met. In this case, the admission control policy should be based on worst-case scenarios for all resources (e.g. buffer ut i l izat ion, disk uti l izat ion, ...). Statistical: For this level of service, real-time deadlines are guaranteed to be met wi th a certain probability. In this case, admission control policy should be based on system statistical behavior. 25 Background: For this level of service, there is no guarantee for meeting real-time deadlines. In this case, admission control policy can accept the new clients anyway, because it w i l l be served only when there is spare time after serving deterministic and statistical services. Video-On-Demand applications usually need deterministic services, although in some cases it is possible to provide user wi th statistical services wi th lower costs to compensate for lower quality-of-service. Another concern is that during each round the deterministic clients must be guaranteed service before any statistical service, and statistical clients must be guaranteed service before any background service. In addition, missed deadlines should be distributed fairly so that the same client does not miss deadline each time. The admission policies of many existing approaches [1] [2] [4] [5] [7] [15] [17] [20] use worst-case estimates on disk parameters and are therefore unnecessarily over-constraining. In addition, some admission policies [5] [6] [7] [10] [20] assume a double-buffer model in the system. Al though the use of double of multiple buffer sets is sometimes necessary to support th placement or scheduling techniques used, the implici t increase in buffer requirement effectively tightens admission control. In some approaches, statistical admission control policies have, been proposed to exploit a server's underlying service capabilities [2] [4] [8]. However, some degradation in the quality of service is usually expected.and needed to be tolerated. 26 Chapter 3 System Design and Performance Issues In this chapter, we wi l l describe the overall design of our system and how it fits into the general V O D model. The aspects selected for our system in each dimension wi l l be discussed along wi th the reason why they have been chosen. Later, issues pertaining to different performance aspects of an I V O D server w i l l be discussed. A long our discussion, some performance metrics wi l l be established for use in later chapters. 3.1 System Design The basic model used in the design of the system is the client-server model. Clients send their requests for video data to the server v ia the network. Each single request 27 consists of several video segments to be displayed in a particular order. The requested video data w i l l be extracted from the storage device and put in the client's buffer to be sent to the client later v ia the network. The elements responsible for performing this scenario can be described from two perspectives: Server resources and server processes. In the following, we describe each point in detail. 3.1.1 Server Resources The resources required on an I V O D server include a storage device, memory buffers and a C P U . 3.1.1.1 Storage Device As mentioned earlier, contiguous placement is believed to be the most efficient data storage scheme because no disk seek wi l l occur wi thin a read regardless of read-size. Furthermore, this scheme prevents disk fragmentation. Tha t is why we have decided to use this scheme as our storage scheme. The L inux system is currently using ext2 file system, which reduces disk seeking time by making al l free blocks in disk space pointed to by inode structure. However, the files are not guaranteed to be saved on contiguous blocks of the disk. In our system, a separate part i t ion on Linux file system has been assigned as the storage device. To support the contiguous placement 28 of video data on the disk, a disk driver module has been developed to manage storage and retrieval of the video files on this device. A l l the physical read and write actions from/to disk wi l l be done only through the interface routines provided by the disk driver. A file manager module supports the storage process. A l l the video files to be saved must be put in a particular directory. The file manager w i l l then open the directory and write all the files data contiguously on the device, one after another. This scheme wi l l maximize disk uti l ization and also help to minimize disk seeks at the time of retrieval. A directory of file names plus the size and start address on the disk is also made during the process of storage. This directory is called system directory, which wi l l be used later during the process of data retrieval. It must be mentioned that the file manager process is supposed to run prior to starting the video server. Once started, the server expects al l the video files and the directory to exist. 3.1.1.2 Memory Buffers The module responsible for handling clients memory, buffers is called data buffer module. Buffer model (i.e. Circular vs. Double), buffer allocation scheme (minimum vs. exhaustive), and the buffer size allocated for each client are the main buffer management factors that can affect the performance of the I V O D server. In our system, these factors have been chosen according to the specific requirements of I V O D applications. 29 In I V O D applications, client requests consist of a sequence of video segments, wi th rather homogeneous play-out rates. A s mentioned in the previous chapter, the best allocation scheme to support homogeneous play-out rate is the min imum buffer allocation scheme. That is why we have chosen this scheme as the allocation scheme for our server. In this scheme, server's memory is divided into equal constant chunks (1 M B in our system), each to be assigned to each client at the time of admission. Regarding buffer model, circular F I F O buffer scheme has been adopted, as it is the most suitable scheme to accompany a contiguous placement scheme. This scheme also provides a better ut i l izat ion of server memory, compared to double buffer scheme. So, unless there is a specific need to use double buffer scheme (dictated by scheduling policy, for example, using S C A N algorithm), it is more beneficial to use circular scheme. Considering the nature of an I V O D request, each single request can be assumed to ask for several video segments, each of which stored on a different block of the disk. The starting block of each video segment is saved in the system directory. A t the time of retrieval, each of the video segments is supposed to be retrieved as a whole and put into the buffer. That is, each time a client's buffer is visited, the server w i l l try to read one whole video segment into the buffer. If there is not. enough space, though, the amount read wi l l be truncated to the amount of empty space in the buffer. In other words, the buffer w i l l be topped off. So, we can assume that the maximum 30 possible read-size is the size of the biggest video segment in the system. 3.1.1.3 A CPU The C P U is needed to run different server processes. The scheduling of the server processes is also done by the C P U . In our system, a l l of the server processes have the same priority and the C P U scheduling of the server processes is done on a time sharing basis. 3.1.2 Server Processes The actual service is provided by several server processes. The server processes in our system include a listener and a disk scheduler, plus request handler and responder processes which are created and terminated dynamically, one per each active client. In this section, we describe each server process in details. 3.1.2.1 The Listener In order to serve clients requests, a connection must be first established between the client and the server. The listener process is responsible for starting the server and accepting the clients connections. After star up, the listener listens to a pre-determined port wait ing for the clients to connect. A client connecting to the port wi l l receive a communication handle in response. This communication handle wi l l later be 31 used for al l the data transmission between the client and the server. First , the listener wi l l put the client through an admission test to decide i f the client is acceptable. We have adopted a statistical approach for our admission test. This means that our server does not guarantee to meet all the real-time deadlines. Instead, based on the capabilities and l imitat ion of the system, we can determine how many clients the server w i l l be able to accommodate according to a reasonable quality of service. We wi l l discuss the supported quality of service in more detail in later chapters, when we provide an evaluation of the system's behavior. Considering the above, the two following factors have been used in our system for deciding on accepting a new client: • The number of current active clients in the system: The maximum number of clients supported is an static parameter in our system (50). If the server is currently serving maximum number of clients, the new client w i l l be rejected. • The amount of available memory in the system: Each client needs a memory buffer to keep the data read for that client before being sent. Failure in allocat-ing the necessary memory buffer wi l l result in rejection of the client. So, The total amount of server memory is a l imi t ing factor in accepting new clients to the system. If both of the above conditions are satisfied, then the new client wi l l be accepted 32 into the system. The server then assigns an I D to the client and passes this ID to the client to be later included in al l the clients requests. A session w i l l be established for the new client through which a l l the clients requests w i l l be served. The session wi l l be terminated only as the response to the explicit disconnect command from the client. It should be noted that once admitted, a client w i l l be provided wi th a reserved bandwidth in the system unti l it voluntarily disconnects from the server and gives up its reserved bandwidth. Even i f the client stays idle for a while, the server w i l l not use the unused bandwidth to admit another client. This is because we are considering an interactive application, where the amount of bandwidth required by each client can not be determined by the server at the time of connection, or even while the server is serving current requests from the client. This brings up the need for the server to save the bandwidth for the client, assuming that the client w i l l use its bandwidth effectively. So, having idle clients in the system can degrade the ut i l izat ion of system resources. 3.1.2.2 The Request Handler This is the process responsible for accepting requests from the clients. Every active client in the system communicates with the server through a request handler process. This means that there is one request handler per each active client in the system. This 33 process receives client's requests for video data v ia client's communication handle. Every request contains client ID accompanied by a list of video file names. There is a global queue in the system, in which each client has an entry. The client ID determines client's entry in the global queue. This entry w i l l hold al l the information about the client's active requests, including video file names, and the address and size of each one extracted by the request handler from system directory. So, from this point on, a request consists of the file name, the start address and the file size. Each client can have up to 300 active (unserved) requests at any point in time. A client may also send a request to disconnect instead of asking for video data. In this case, the request handler marks the client for termination, so that the client can be terminated after a l l its active requests are served. The request handler process wi l l then terminate itself. 3.1.2.3 T h e Disk Scheduler After requests are put in the global queue, they are ready to be served by the disk driver. A t this point, the scheduler determines the order by which the requests are to be served. A s mentioned earlier, the order of service is a dominant factor in the performance of an I V O D server. This is where different scheduling policies can be applied. In our system, we have experimented with 3 different scheduling policies and compared the results with one another. 34 The first, policy used is the round-robin method. This policy has been used as a reference for the other two policies to be compared wi th . To evaluate the effect of seek optimization on the performance of our server, the C S C A N policy has been used as the second model. Final ly, we have presented a th i rd approach, called vertical scheduling, which is tailored to the specific characteristics of I V O D applications. The details of this approach along with the comparative results of evaluating the three approaches are presented in the next chapters. It should be noted that deadline-driven approaches, such as E D F , have been excluded from our evaluation mainly because according to the interactive nature of the I V O D applications, there can not be any particular assumptions on the requests' periods. In this case, request deadlines can be aligned only according to their release time. Bu t , having variable read-size as we have in our system, makes this alignment of deadlines invalid. In the absence of any practical way to align the requests' deadlines, we wi l l not be able to evaluate any deadline-driven policy. After ordering the requests in the global queue, the requested video data wi l l be extracted from the storage device and transferred to the clients' buffers in the dictated order. For this matter, the entries in the global queue are examined in a loop. For each entry wi th an active request, the address and size of the video file (previously extracted from system directory) w i l l be used to read the data form the disk to the client's buffer. A s mentioned earlier, al l the read (and write) accesses to 35 the disk wi l l be done by the disk driver module, which wi l l attempt to read a l l the data for a video file in one turn. Bu t the amount actually read wi l l be truncated to the amount of current empty space in the client's buffer. If al l the data for the specified request is read, The current request w i l l be removed from the client's list of active requests. Otherwise, the request wi l l remain in the list to be served in the next round again. 3.1.2.4 The Responder The responder is responsible for transmission of data from client's buffer at the server, to the client program. Like request handler processes, there is one responder per each active client in the system. It performs its job by examining the client's buffer in an infinite loop. Whenever there is data in the buffer, it w i l l be sent to the client v ia network using client's communication handle. 3.2 Performance Issues Performance is measured by a number of metrics which determine the overall quality of service provided by the I V O D server. Different I V O D applications might have different views of an acceptable quality of service according to their needs, but there are general rules that are widely accepted. Here we wi l l address the metrics through which we have measured the performance of our I V O D server. 36 3.2.1 Steady-state Stream Throughput In most of the V O D applications, streams' admissions and departures are infrequent. The playtime of streams is much longer than the duration of streams' admissions and departures. Therefore, the server is in its steady-state mode of operation most of the time. Even in the case of I V O D applications where stream play times are not very long, i f the clients' admissions and departures are infrequent compared to the duration of clients' activity, and if clients' requests are being generated fast enough to keep the request queue full and the server fully loaded most of the time, the server wi l l be in the steady-state mode most of the time. The steady-state stream throughput is, thus, a dominant performance indicator. The steady-state stream throughput of an I V O D server depends directly on the number of clients admitted to the system. Given the same disk parameters and stream rates, the main objective of an I V O D server is to serve as many clients as possible under a given buffer capacity. A s more clients are admitted to the system, inter-service time increases for each client. This may result in a degraded quality of service for the clients (i.e. hiccups). Thus, the performance metric here is in fact the number of clients supported according to a particular quality of service. Most of the work reported in the literature concentrates on disk scheduling to achieve high data throughput as the main performance objective. Another factor effecting the steady-state stream throughput is the admission 37 control policy. A n I V O D server performs admission control upon arrival of clients to prevent over-committing itself to real-time stream services. It may not be possible to increase stream throughput without putt ing real-time guarantees at risk. Howev-er, in enforcing admission control, a conservative estimation of the cumulative disk overheads is often used. If the estimation is too pessimistic (the test is too con-servative), the server wi l l be admit t ing less streams that it can safely accommodate. Therefore, apart from using disk-scheduling algorithms that minimize disk overheads, steady-state stream throughput can also be increased by some extent by easing the admission test. 3.2.2 Response Time In an interactive environment, the user perceived response time is the time span-ning from request generation at the client site to the time at which play out starts. As new requests for video segments are generated by on-line users and sent to the I V O D server, the response time is necessarily a factor affecting the user's perceived performance. Al though the response time can be considered for every request sent from client to the server, it is only the response time to the very first request that is apprehended by the client. So, it is desirable to keep this time interval as short as possible, specially as more clients are admitted to the system. 38 The response time to a request is consisted of four components: • Network latency, which is the total time it takes for the request to be trans-mitted to from the client to the server, and later for the video data to be trans-mitted from server to the client v ia the network. It depends on the network characteristics as well as network load at the time of transmission. • Queuing time, which is the amount of time that the request spends in the request queue for its turn to be served. A l l the former requests for that client must be served before serving the new request. • Sequence delay, which is the time it takes for the scheduler to complete the current round of service and sequencing the new round (which includes the new request) according to the scheduling policy. • Buffer-fill delay, which is the time it takes to read the video data into the client's buffer. When the server and the network are lightly loaded, the network latency and the queuing time are comparatively small. Bu t the performance of the server is generally evaluated in the steady-state, where the server is assumed to be fully loaded. Because it is assumed that in a fully loaded server clients tend to use their bandwidth completely, the queuing time wi l l be much longer than the other components and wi l l be the dominant factor for the response time. The only exception is the very first 39 request where there is no queuing time. In this case, sequence delay and buffer-fill delay are the dominant factors in the response time. A s mentioned above, it is only the response time to this request that matters to the client. So in order to achieve a fast response time, an I V O D server should focus on minimiz ing these two factors. It is usually expected that the response time be of the order of one service period. 3.2.3 Jitter A n I V O D server is said to provide a high quality of service (QoS) i f real-time guaran-tees to streams are seldom violated. In practical terms, this means that interruption, or jitter, to the continuous delivery of C M data should seldom occur. Jitters can be caused by unpredictable network latencies and transmission rates. Issues on real-time delivery of C M data over networks are themselves a large research topic and are be-yond the scope of this work. Here we only address the source of j i t ter in the context of the I V O D server. In every I V O D system, the number of clients admitted to the system wi l l be determined by the admission test. In our system, only static factors have been con-sidered as the criteria for accepting new clients (i.e. available memory buffers, number of current active clients in the system). Thus, it is expected that as more and more clients are accepted to the system, the clients face a degradation in the quality of service provided by the server. This happens because at some point the amount of 40 video data required to be extracted for the clients to support their consumption rate exceeds the disk bandwidth. In other words, the server has been over-committed. In this situation, clients w i l l experience some j i t ter in receiving video data. Jitters can happen within a video segment, between two consecutive video segments within one request, or between two consecutive video segments in two consecutive request-s. Video segments, in our application, are assumed to be small video files and each being stored contiguously and later retrieved simultaneously. The only exception to this case is when there is not enough space in the buffer for the entire video segment. Because the amount of buffers allocated to each client is much larger than the size of a single video segment, the jitters wi thin a video segment are expected to happen very rarely. The other two incidents of j i t ter may happen more often in the system and the server should focus on minimizing them in order to provide a high quality of service. 3.2.4 Performance in the Transient State The lifetime of an admitted stream can be modeled as three distinct phases: • Start-up phase: This phase begins with admission of stream and lasts until play out of stream starts. The length of this phase is usually referred to as start-up latency. 41 • . Play phase: This phase begins when play-out starts and lasts unti l the stream is completely read. • Departure phase: This phase denotes the period during which data are no longer read and service involves delivery of buffered data only. When al l streams are in their play phase, the server is said to be in the steady-state. When there are streams in their start-up or departure phase, the server is said to be operating under transients. The steady-state stream throughput has often been used as the dominant per-formance indicator in V O D applications. However, in interactive Video-On-Demand applications, requests for the video data are generated by interactive user activities and the average playtime of streams is much shorter. Stream admissions and depar-tures become more frequent. A s the play phase of streams becomes shorter, more streams are in their start-up phase or departure phase. A fully loaded server wi l l then be operating under transients much of the time. This w i l l result in variance in the response time for clients. The reason for this is that the process responsible for extracting the data for the clients is the disk scheduler process, while in the tran-sient state the server's C P U wi l l be util ized by the processes responsible for creating and destroying clients most of the times, which are listener and responder processes. Thus, i t is important for an I V O D server to provide a mechanism for minimizing the variance in the response time, without sacrificing the quality of service. 42 Chapter 4 System Implementation This chapter covers some points regarding the implementation of our I V O D system. In the discussion, we have focused only on the important aspects of the implementation of both the server and the client programs. 4.1 System Requirements The target platform for the video server is a Pentium M M X 200 M H z wi th 32 M B R A M and 4.3 G B hard drive. 1 G B of disk space has been used for the L inux OS and for the video server. Another 1 G B has been used as the storage device to hold video data files. Our client program uses Berkeley M P E G tools (version 2.3, release 2, August 95) which plays M P E G or M P G video clips. Modifications have been made to the 43 M P E G tools so that they can support networking facilities. The client programs can be running on either Linux, SunOS, or Solaris operating system. 4.2 The Server Program A s mentioned in the previous chapter, there are four kinds of server processes in the system, namely the listener, the request handler, the disk scheduler, and the respon-der. The server program is implemented as a single process, wi th in which the listener and the disk scheduler are implemented as Posix threads, running concurrently and wi th the same priority. The request handler and responder programs are also imple-mented as Posix threads. They are created and terminated dynamically at the time of each client's arrival or departure, respectively. Whi le there is just one listener and one scheduler in the system, there is one instance of the request handler and responder thread for each active client in the system. The listener thread is responsible for creating an incident of each of these two threads for each client it admits. The request handler w i l l then receive al l the client's requests and put them in the client's entry in the global queue, which wi l l later be served by the scheduler along wi th requests of other clients. The scheduler puts the data for each client's request in the client's buffer. Each client's buffer is examined by the client's associated responder thread in an infinite loop. Whenever there is data available in the buffer, the responder thread wi l l send it to the client's 44 program via the network. A l ibrary of Posix socket operations has been implemented to support commu-nication between the server and the clients. T C P has been adopted as the primary platform for the socket operations. A t the time of connection, each client is assigned a unique socket i d through which it performs al l of its communications wi th the serv-er. A client is also assigned a buffer at the time of connection, providing that it has passed the admission test. A l l the clients are assigned a fixed, 1 M B buffer, which is assumed to be enough to hold the data for at least 10 requests at each point of time, considering the fact that the largest video file existed in the system is smaller than 100 K . The C function call malloc is used to acquire a section of the memory to be used as the buffer: Then, a L inux system call mlock is used to lock the buffer in the physical memory. B y doing so, we wi l l prevent the assigned pages of memory to be swapped out and wi l l minimize the disk activities other than accessing the video data. A n id is also given to the client at this time, which specifies its assigned position in the global request queue. There are two essential data structures in the system: the system directory and the global request queue. The System directory is constructed prior to the start of server and holds all the required data for accessing the video files data. Each entry in the directory contains the information for one video file, including the file name, its starting block on the 45 disk, and its length in bytes. The directory is sorted according to file names to make later searches faster and is saved in a well-known file. Upon start, the server reads this file into the system directory in the memory. Each client's request is in the form of client i d accompanied with a series of video file names that the clients wants to play. Upon arrival of each request, the request handler thread for the client wi l l extract the start position and length for al l the requested video files and include them in a new data structure to be later put in the global request queue. Currently, there are 100 video files in the system. The average size of video files is in the order of 10 K B and the average play-out time for each file is in the order of seconds. The global request queue is a shared data structure between different threads of the system. Each entry in the queue consists of a series of administrative information, such as client's socket i d and the address of the client's buffer, plus a list of client's unserved requests. The start position and length of each requested video file is also included. The structure of the global request queue is shown in Figure 4.1. The global request queue is being accessed by al l threads in the system. B y the listener thread when creating an entry in the queue for a new client, by request ^handler threads to put each client's requests in their respective entry, by the disk scheduler thread to get the requests to serve, and by the responder threads to access the client's buffer and put the data into it . To provide mutual exclusion between different threads accessing the queue, a pthread mutex variable QLock has been used. 46 m General Information, ^  Inckuding: Lock Synch-Flag Current num of files etc. Request 1 Request 2 (Max n = 300) 11 f2 13 14 15 f6 f7 18 In (empty) 11 12 13 14 15 16 17 18 In (Max m = 50) Figure 4.1: The Structure of Globa l Queue 47 Each thread must acquire the lock prior to using the queue and release it when it is done using the structure. Failure to acquire the lock (when it is currently held by another thread, for example) causes the attempting thread to suspend execution unti l the lock is available again. Thus, the lock is in fact being used as a synchronization mechanism between the threads in the system which are a l l running wi th the same priority. Client buffers are also shared data structures between the scheduler thread, which is the producer of the buffer's data, and the client's responder thread, which is the consumer of the data. The mutual exclusion for accessing the data buffer structure is also provided through the uses of a pthread mutex variable, called BufLock. There is a difference in the way the scheduler uses this mutex compared to the Qlock. As mentioned before, when the scheduler thread's (or any other thread's in the system) attempt to acquire the QLock fails, it causes the attempted thread to be preempted by the C P U unt i l the time the lock is available. For the BufLock, however, this mechanism could cause the scheduler thread to suspend for an arbitrari ly long time. The reason for this is described below. The responder threads which are responsible for transmission of data from client's buffer to the client program, do this through a write system call on client's socket, which tries to write the buffered data to the specified buffer in the kernel of the server's program, from where it w i l l later be sent to the client program over the network. If for 48 any reason, there is not sufficient space in this buffer, the write system call w i l l block unti l there is space available. (This is the reason why we have chosen to send the data to the clients through separate, independent threads rather than doing it in an infinite loop on active clients: so as to prevent a blocked client from affecting other clients in receiving service.) There might be several reasons for insufficient buffer space in the server program's kernel. One or more client programs might have been i slow in retrieving the data, which can easily happen due to the heavy load on the client program's C P U , or the specified size for this buffer might be small according to the rate at which the server program provides the data. We wi l l discuss this in more details in the next chapter. For this reason, the scheduler thread adopts another approach for using the BufLock. For each active client in the global request queue, the BufLock variable is first examined when it is time to serve the client. If the buffer is already locked by the client's responder thread, the scheduler w i l l s imply proceed to the next client rather than giving up the C P U . This has been accomplished through the use of trylock system cal l . This way, i f one of the client programs is slow in retrieving its data, it w i l l not be able to affect the service to other clients. 49 4.3 The Client Program The client program is a single-process video player. It consists of a video decoder program, which decodes the M P E G video streams and displays video data. The main body of the client program is a modification of the M P E G video player. Changes have been applied to the M P E G player program to make it work wi th our server. The most important change made is to add the support for the client program to communicate wi th the server v i a the network. Upon start, the client program connects to the server and receive a communication handle (socket id) i n response. Then, it composes requests from video file names and sends the requests to the server using the communication handle provided by the server. A l l the video data in the client program are also received from the server v ia the network, while in the original M P E G ; player program the data is read from a local file. Another major change applied to the M P E G player program is adding the sup-port for playing multiple video files. In the original program, the video play stops after reading the sequence end code for the video file being played. In our program, however, each request contains several M P E G files that are supposed to be played one after another. Changes have been made to the source code to make the program wait for another sequence start code after receiving the sequence end code for current video file. If the client program decides to exit, it sends a special disconnect command to the server and waits for the server to close the communication handle. The client 50 program wi l l then terminate itself. Chapter 5 Network Interface Considerations for Improving Throughput In this chapter, we investigate how changing the network interface parameters can affect the performance of our I V O D server. One of the potential bottlenecks in any V O D system is failing to transfer data between the clients and the server at the proper speed. In the absence of unlimited buffer space, the server might have to stall for a while for the data (or part of it) to be transferred to the client to make room for any further data to be provided. Reducing the duration of this stall period wi l l thus become a dominant factor in the throughput of the server. The duration of the stall period depends on various factors, including the network transfer rate, the communication protocol used for transmission, and the size of buffers 52 used in the kernel as the medium for holding data before i t is sent. For any particular network platform, the data transfer rate is pre-determined by the characteristics of the platform and is not a dynamic parameter to be changed according to the requirements of the applications. Communicat ion protocol and k-ernel buffer size, however, can be selected according to the nature of application to effectively provide a better overall sense of performance. The basic model used in our system is the client-server model, which consists of a server and a number of clients communicating wi th each other over the network using socket interface routines. Client-server model is a commonly used paradigm in constructing distributed applications. A server process offers services to a network; a client process uses those services. The client and server require a well-known set of conventions before services is rendered and accepted. This set of conventions compris-es a protocol that must be implemented at both ends of a connection. Depending on the situation, the protocol can be connection-oriented (asymmetric) or connectionless (symmetric). In a connection-oriented protocol, one side is always recognized as the server and the other as the client. The server binds a socket to a well-known address associated wi th the service and then passively listens on its socket. The client requests service from the server by ini t ia t ing a connection to the server's socket. The server accepts the connection and then server and client can exchange data. For applications where 53 large amounts of data are exchanged and the sequence in which the data arrives is important, connection-oriented communication is preferable. A n example of a connection-oriented protocol is Transport Control Protocol (TCP). T C P is reliable, which means that it ensures accurate delivery of data by re-sending any data which is lost en route. It also provides a mechanism to ensure that data arrives in the correct sequence to the receiving application. In a connectionless protocol, on the other hand, either side can play the server or client role. The client does not establish a connection wi th the server; instead, it sends a datagram to the server's address. Similarly, the server does not accept a connection from a client. Rather, it issues a recvfrom system call that waits unt i l data arrives from a client. Where small amounts of data are exchanged and sequencing is not vi tal , connectionless communication works well. A n example of a connectionless protocol is Unreliable Datagram Protocol (HDP). U D P does not provide any mechanism to guarantee that messages get to the receiving application in the correct order, or whether they get there at a l l . In an I V O D system, the quality of service acceptable by the clients dictates that video frames be received in the correct order. Rel iabi l i ty is also an issue as i f lots of video frames get dropped the client w i l l end up wi th a degraded service. Considering these, we have chosen T C P as the communication protocol between the server and the clients in our system. 54 User Space Ifaer Space -IVOD Server program , , MPEG Player program | Server Buffer \ \ | ! Client Buffer \ i _ _ _ w _ _ _ _ j _ _ _ . Kernel Space \ j Kernel Space | Send Buffer [•— » T " Receive Buffer j Network Server Machine Client Machine Figure 5.1: The Da ta Pa th From the Server to the Client 5.1 The Implications of Changing Socket Buffer Size Figure 5.1 shows the actual path that must be passed by the data to be transferred from the server to the client. A s we can see in the above figure, in order for the server to send data to a client, the data must be first copied from the server buffer to the send buffer in the server's kernel, from where it w i l l later be transferred to the receive buffer in the client's kernel and later, to the client buffer in the client program. U N I X I / O (and similarly, L I N U X I /O) is termed synchronous. This means that a read or write system call does not return to the call ing process unt i l the operation is complete and the requested data has been copied by the kernel, either from receive buffer into the client buffer for a read, or from the server buffer to the send buffer for 55 a write [22]. Considering this, we can see that for every client to receive data, there must be sufficient space in the send buffer, as this buffer serves as a medium for every data before it can be transmitted over the network. Sending data to client programs is done by the corresponding responder threads of the clients, who do the job by acquiring a.write system call . The write system call w i l l return when the data has been copied from the server buffer to the send buffer. If send buffer is full , however, the write system call wi l l block unti l the data in the send buffer (or at least, part of it) is transmitted to some client program. Thus, it is possible that a client experiences j i t ter due to the fact that the data has not been sent to i t , although the server has provided the data in the server buffer in time. The send buffer gets emptied when the data is transfered to the client programs over the network. Several factors can influence the speed of the data drainage from the send buffer: • The network might have a low data transfer rate, compared to the rate at which the server provides data to be transferred. In this case, data w i l l be accumulated in the send buffer, waiting to be sent. In other words, the network wi l l be the bottleneck of the system. A s we already discussed, this can not be changed as it depends on the characteristics of the network platform that is being used for the communication.: V O D applications are usually developed on fast networks, 56 though, as C M playback in general demands a high data transfer rate. The receive buffer might be much smaller than the send buffer, so that the server might have to wait for the data in the receive buffer (or part of it) to be transferred to the client buffer before it can send any more data. This can be improved by increasing the size of receive buffer. This may not be always possible, though, as it requires some superuser privileges that the client program might not have. In addition, we want our client programs to be as non-intrusive as possible. Thus, although it might be interesting to investigate the effect of increasing this factor on the performance, we wi l l not experiment wi th it. The send buffer may be full due to the fact that the client programs are not as fast in retrieving the data as the server is in providing the data. This can happen if the C P U load on the client machine is high, e.g. we are executing several client programs on the same machine. This can easily happen in our system as we do not make any particular assumptions on the C P U load on the client machines. If there are one or more clients in the system who are slow in reading their data from the server, they can effectively occupy space in send buffer for a relatively long time, thus preventing other clients from receiving their data in time so that they experience jitter. This situation can be improved by increasing the size of send buffer, so that it has room to hold data from more clients. 57 5.2 Increasing the Size of 'send buffer' The size of the send buffer is specified by the socket option SOSNDBUF. The default value for this option on Red Hat L I N U X 6.0 is 4096 bytes. A n application can override the default T C P send buffer size using the setsockoption system cal l . The call must be obtained after creating the socket, prior to establishing the connection. The largest size that can be specified is l imited by the kernel variable sb.max. The current default for this variable is 256 K B . The actual socket buffer size used for a T C P connection can be larger than the specified value. This situation occurs when the specified socket buffer size is not a multiple of the TCP Maximum Segment Size (MSS) to be used for the connection. T C P determines the actual size, and the specified size is rounded up to the nearest multiple of the negotiated M S S . It should be noted that a socket buffer is not really a buffer in the traditional sense. Rather, it is a list of fixed sized buffers shared by various protocol stacks, called mbuf. T o avoid fragmentation of kernel memory, common buffer pools are shared by the various layers of the communication subsystem. The mbuf pool is a pool of small buffers (256 bytes each) which are controlled- by the mbuf management facility. The mbuf pool consists of locked pieces of kernel vir tual memory; this means that they always reside in physical memory and are never paged out. The result is that the real memory available for paging in application programs and data has been decreased by the amount that the mbuf pools have been increased. In addition to avoiding 58 duplication, sharing the mbuf pool allows the various layers to pass pointers to one another, reducing mbuf management calls and copying of data. When we set the socket send buffer size, the protocol stack wi l l keep track of how many bytes of mbuf space are in the send buffer, not the number of actual bytes. This approach is used because the resource that we are actually controlling is really how many mbufs is used, not how many bytes are being held in the socket buffer. 5.3 Experiment Design In this section, we wi l l study the effect of increasing the socket send buffer size on the throughput of our I V O D server. We have evaluated the throughput for the case that there are 10 active clients in the system. We have considered the server in the steady-state mode of operation, with client arrivals modeled as a Poisson distribution wi th a mean of 1.0. Dur ing the experiment, each client generates requests for video data at a constant rate of one request per second. The number of video segments in each request created by each client is uniformly distributed in the range 4 to 6. The requested video segments are homogeneous in size (ranging from 30 K B to 80 K B ) and play-out time (ranging from 1.5 to 8 seconds). Dur ing the experiment, each client generates 100 requests (each containing 4-6 video segments) for video data. We have evaluated the throughput of the server according to the size of the socket send buffer size on the server machine. We have experienced wi th different sizes of 59 320 270 1 ! • • j -fr I 1 'rou id-robin' -e— CSCAN' -+--• 4/ I < i 0 50 100 150 200 250 300 Send Buffer Size - KB Figure 5.2: Effect of Socket Send Buffer Size on System Throughput socket send buffer, starting from its default value at 4 K B , up to its upper l imit specified by the kernel variable sb-max, 256 K B , in doubling steps. 5.4 Results and Discussion Figure 5.2 shows the throughput of the server in K B / S as a function of the socket send buffer, size for two scheduling policies, namely round-robin and C S C A N , used in the system. The graph shows that as expected, the system throughput improves as we in -60 crease the size of the socket send buffer. This is true for both of the scheduling policies. However, for smaller sizes of the socket send buffer, the improvement in the throughput according to the change in the buffer size is more noticeable. Around a certain point in the graph (around 128 K B ) , the server seems to reach its maximum throughput for both scheduling policies. After this point, the server is less sensitive to the change in the socket send buffer size, although there is s t i l l some small improve-ment in the throughput as we increase the buffer size further. The graph shows that under either of the scheduling policies, the throughput of the server can be improved significantly by increasing the socket send buffer size. 61 Chapter 6 Order of Service Considerations for Improving Jitter The nature of the I V O D applications brings up some particular scheduling issues that have not been fully addressed before. In this section, a new approach to scheduling of I V O D applications has been introduced. The technique has then been verified by comprehensive performance evaluation in comparison to basic scheduling techniques. 6.1 Vertical Scheduling vs. Horizontal Scheduling In I V O D applications, every single request from a client is composed of a batch of requests for video segments which have to be presented to the client in their proper order to provide a whole. Examples of these can be video sequences represented 62 by different paths in a hyper media graph [21], or, text-to-video application, where every request is a sentence, consisted of a series of small video segments, each one representing a single word in the sentence. Each client has an entry in the global system queue, where all the client's requests wi l l be placed, wait ing to be served. A s we can see in Figure 4.1, there are individual requests for video segments wi th in each request for each client, which make the structure of the global queue look like a matr ix of requests. In traditional scheduling techniques (round-robin, C S C A N , etc.) this matr ix is traversed in horizontal order. That is, in every round of service the scheduler thread extracts the data for the video segment on the top of the list for each client and places it into the client's buffer. The responder threads work in parallel wi th the scheduler process and wi l l transfer the data from client buffers to the client programs, whenever there is data available in the clients' buffers. A s mentioned in Chapter 3, as more clients are admitted to the system, the probability of experiencing interruption in continuous delivery of video data, or jitter, goes up for each client. This is because the total amount of video data required to support the clients' consumption rates might exceed the disk bandwidth. In our system, jitters that happen between two video segments wi th in a single request are called in-request delays, as opposed to jitters that happen between two consecutive video segments from two requests, which are called between-request delays. From the above scenario, it can be easily seen that when we use a traditional 63 scheduling method, there is an equal probability for in-request delays and between-request delays to happen. From the user point of view, however, these two kinds of j i t ter are not of the same degree of importance. Client requests are generated as the result of user interaction, so it is reasonable to expect some impredict ibi l i ty between them. Video segments contained in each request, however, are hidden within the request and the distinction between them is thus hidden to the user. In other words, the user sees each request as a whole and expects to receive the data for it in a smooth fashion, while reasonable between-request delays are tolerable. Thus, it is conceivable to think of a scheduling algorithm that works in favor of minimiz ing in-request delays at the expense of making between-request delays worse. To this aim, we have proposed the vertical scheduling method. Our approach traverses the request matr ix in a vertical fashion, as opposed to the horizontal fashion used by tradit ional scheduling algorithms. In each round of service, the scheduler wi l l read the data for a l l the video segments contained in a request for the current client before proceeding to the next client. The whole data w i l l then be sent to the client in one single transfer. This is equivalent to prolonging the service period for each client, but the larger amount of data read for the client in each service should compensate for this. Using this method, we expect to eliminate the in-request delays, as the data for a whole request w i l l be sent to the client in one shot, but we also expect the between-request delays to increase as a result of longer service period. We wi l l show 64 that the better overall performance of our approach for interactive applications well compensates for its adverse effects on the between-request. The round-robin approach has been used as a base for our comparison because of its static, deterministic nature. One drawback of our approach is that the video data for requests should be extracted in the same order as they are presented in the request queue, so there is no place to apply any seek optimization to have a better uti l ization of the disk arm movements. We have compared our approach to seek optimizing algorithms by using the C S C A N algorithm as another reference for comparison. Another drawback of our approach is that the amount of buffer required by it is bigger than buffer requirement of horizontal methods. Under horizontal methods, only as much buffer space as to hold the data for one video segment would be sufficient while in the vertical approach, the amount of buffer required should be equal to the sum of maximum sizes of video segments that a request can hold. However, this should not be a problem for the server, as video segments contained in a request are short in size and thus, it is practical for the server to provide each client with a buffer space as large as the sum of sizes of a few video segments. A s mentioned in Section 4.2, our server assigns each client with a fixed, 1 M B circular buffer at the time of admission, which, considering the small sizes of video segments in the system, provides room for holding the data for the maximum video segments that might be contained in a request (10). The same buffer size is thus sufficient for the 65 buffer requirements of the vertical scheduling method, too. Another point regarding the buffer requirement in the vertical approach is that once the min imum buffer requirement is satisfied, increasing the buffer size wi l l not have any effect on improving the performance of the server, because before reading the data for the next request, the data read for the current request wi l l be sent to the client and the buffer w i l l be emptied. This is accomplished through the explicit synchronization provided between the scheduler thread and the responder thread, a variable Synch has been used to provide the synchronization. It is stored in the client's entry in the global request queue. The scheduler thread w i l l read the data for the client only i f Synch is 0, and the responder thread wi l l transfer the data only i f Synch is 1. The flag is in i t ia l ly set to 0, which enables the scheduler thread to read the data for the client. The scheduler wi l l set the flag to 1 only after it has read al l the data for the entire request, so as to enable the responder thread to send the data. The responder thread wi l l reset the flag to 0 once it has sent the data to the client's program, so that the scheduler thread can start reading the data for the next request of the client. 6.2 Experiment Design In this section, we wi l l study the run-time behavior of vertical scheduling method wi th reference to round-robin and C S C A N method. For each algorithm used, we 66 have evaluated the steady-state throughput, response time, in-request delays, and between-request delays according to the number of active clients in the system. We have modeled the arrival of clients as a Poisson distr ibution wi th a mean of 1.0. Our client programs are running on 5 different machines using Solaris operating system. We start by running one client on every machine, then adding more clients to each machine as new clients are added to the system. A l l the clients are running wi th the same priority and share the C P U on a time-sharing basis. Dur ing the experiment, each client generates requests for video data at the con-stant rate of one request per second. The number of video segments in each request created by each client is uniformly distributed in the range of 4 to 6. A t such a high arrival rate, the request queue is kept full and the server is fully loaded at al l times. The behavior of the server for this case (when the request queue is full) can also be an indication of how the server w i l l perform under a bursty arrival pattern of the clients' requests. The requested video segments are rather homogeneous in size and play-out time. Their size is in the range 30 K B to 80 K B , and the play-out time varies between 1.5 to 8 seconds. Dur ing the experiment, each client generates 100 requests (each containing 4-6 video segments). We have conducted the experiment wi th the number of active clients in the system increasing from 1 to 15. Considering the distribution parameters used for the arrival rate of clients, it can be seen that al l the clients w i l l be entered into the system shortly after the experiment starts. The 67 server w i l l , thus, be receiving and serving the clients' requests for most of the duration of the experiment with only occasional transients. So, the experiment is evaluating the steady-state performance of the server. The experiment has been repeated under the same settings for the round-robin and C S C A N algorithms. 6.3 Results and Discussion In this section, we have presented the result of evaluating the vertical scheduling algorithm compared to round-robin and C S C A N algorithms. Our goal is to show the effect of change in the order of service on the average in-request and between-request delays that each client experiences. Each graph covers the evaluation of our algorithm according to one performance issue in the system. 6.3.1 In-request Delay Figure 6.1 shows the average in-request delay experienced by each client for each of the three scheduling methods as the number of clients in the system increases. The y axis shows the average in-request delay experienced by each client. The numbers shown are in fact the percentage of total delay time to the total play time, averaged over al l the clients, as shown in Figure 6.1. To measure the delays, we have probed the read activities in the client programs. 68 Total Play Time i play play play play delay 1 delay 2 delay 3 Percentage of Delay = ( delay 1 + delay 2 + delay 3)1 Total Play Time The "Average Percentage of Delay" is calculated by averaging the "Percentage of Delay" over all the clients participating in the experiment. Figure 6.1: How the Average Percentage of Delays are Calculated Each client has a communication handle (a socket id) through which it receives data from the server. A l l the data sent from the server to the. client is accumulated in this handle. The client reads this data into a local buffer, parses and decodes the data, and plays it in a periodic fashion, which means that when the amount of data in the client's local buffer reaches a threshold, the client attempts to read more data from the socket. The delays happen when the client attempts to read the data from the socket and finds it empty. This means that the server has failed to provide the data in time for the client to process. When this happens, the client program wi l l block on the read system call unt i l there is data available in the socket. To measure the delays, we have measured the duration of each blocking time for the client. The sum of these blocking times indicates the total time that the client has experienced delay 69 Number of Clients Figure 6.2: Effect of Order of Service on In-request Delay during its whole play time. To show the amount of in-request delays, we must only consider the delays that happen between two video segments within a single request. For this matter, the client has to be able to keep track of the current status of the requests (whether we are in the middle of a request or in between two) when a delay happens. To achieve this goal, the server must somehow inform the client of the start of a new request. The server does this by sending a special message to the client at the beginning of each new request. Upon receiving this message, the client w i l l disregard the last delay registered, as it has happened before the start of a new request. A l l other delays wi l l 70 be considered. A s the graph shows, for both horizontal scheduling algorithms, as the number of active clients in the system increases, so does the average in-request delay. This is because the server is providing service to more clients and thus, more real-time guarantees are put at risk. As expected, the vertical method practically eliminates the in-request delays as the data for a l l the video segments inside a request are sent to the client in one transfer. Thus, the client w i l l receive the data at its socket altogether and wi l l not experience any delay wi thin receiving the data. However, this is not the case for the round-robin and C S C A N methods and clients experience different amount of delay during receiving data for a single request. Compar ing the two horizontal methods, it can be seen that the amount of delay experienced under C S C A N method is lower than in rqund-robiri. This is because of seek optimization applied to the disk in C S C A N which makes the disk arm work more efficiently and provides a better overall performance for the server. The maximum amount of delay registered is less than 3.5. This means that the server can easily serve up to 15 client while maximum in-request delay experienced on average by each client does not exceed the %3.5 of client's total play time. 71 Q 4 h ST 0) ? > < i a 'round-rbbin' 'CSCAN' - H -'vertical' fe-6 8 10 Number of Clients Figure 6.3: Effect of Order of Service on Between-request Delay 6.3.2 Between-request Delay Figure 6.2 shows the average between-request delays experienced by each client for each of the three scheduling methods as the number of clients in the system increases. The numbers shown on the y axis have been derived in the same manner as described before for in-request delays, with the only difference that this time for between-request delay, only delays before receiving the start-of-request message from the server have been counted. The graph shows that the average amount of between-request delay is higher for the vertical method compared to each of the horizontal 72 methods. This is reasonable, since by applying the vertical method, the service pe-r iod of requests has been increased which results in larger delay between each two consecutive services to each client. Compar ing the two horizontal methods, the C S C A N algorithm outperforms the round-robin algorithm again, for the same reason described in the previous section. Al though the amount of average between-request delay is higher for the vertical method, it is not unreasonably high so that the client finds it intolerable. In fact, even wi th the maximum of 15 clients in the system, the average amount of between-request delay experienced on average by each client is less than %6 of the total play time of the client. It should be mentioned that the graphs for in-request delay and between-request delay can alternatively be used to provide a mean for measuring system capacity according to a particular quality of service, under a particular scheduling method. From the graphs, one can easily determine i f the system wants to guarantee a par-ticular quality of service under either of the scheduling algorithms, how many clients it w i l l be able to accept. For example, i f the clients are not supposed to tolerate more than 1 percent in-request delay during their playtime, the graphs indicate that under round-robin method, no more than 10 clients should be accepted, while under C S C A N , the server can safely accept 13 clients. The vertical method is assumed to provide %0 in-request delay under any load. 73 6.3.3 Response Time Figure 6.3 shows the average response time experienced by each client as a function of the number of active clients in the system. The graph shows that there can be a large variance in the average response time experienced by the clients for the round-robin and C S C A N methods, but not for the vertical method. This is likely due to the fact that for the vertical method, the sequence delay element has been eliminated from the response time. The values of variance in the response time for each of the three scheduling methods is shown in Table 6.1. The experiment has been repeated 5 times so as to provide a more accurate stimation of the results. Each line in the table shows the variance in the response time between the 5 runs of the experiment for the given number of clients. 6.3.4 System Throughput Figure 6.4 shows the system throughput as the number of clients increases. The throughput has been measured in ki lo bytes per seconds. A s we can see, for each of the three algorithms, the throughput increases with the number of clients unt i l a certain point. This is because before this point, the server is under-loaded and has not reached its full potential. The maximum point on each graph indicates the maximum throughput that the server can achieve using the par-ticular scheduling algorithm. The graph shows slight difference in the throughput for 74 Round-robin C S C A N Vert ical 1 Client 0.048926 0.052438 0.034562 2 Clients 0.028950 0.085868 0.014906 3 Clients 0.041970 0.069052 0.036510 4 Clients 0.032040 0.070662 0.028448 5 Clients 0.041921 . 0.059290 0.039560 6 Clients 0.027903 0.132643 0.232561 7 Clients 0.057871 0.046516 0.033976 8 Clients 0.027650 0.055479 0.055196 9 Clients 0.046183 0.053251 0.043122 10 Clients 0.463534 0.049334 0.021085 11 Clients 0.091086 0.068644 0.052226 12 Clients 0.072591 0.056097 0.043513 13 Clients 0.056728 0.982870 0.079142 14 Clients 0.974014 0.384147 0.090145 15 Clients 0.850425 0.723360 0.130198 Table 6.1: Comparing the Variance in Response T ime for Round-robin, C S C A N , and Vertical Scheduling methods 75 0 2 4 6 8 10 12 14 16 Number of Clients Figure 6.4: Effect of Order of Service on Response T ime 76 Figure 6.5: Effect of Order of Service on System Throughput 77 the three methods, with vertical method achieving a better performance than round-robin, but not better than C S C A N . The C S C A N method has higher throughput than the other two methods, mostly because of applying seek optimization techniques that are not used by the other two methods. However, the slightly lower throughput of vertical method in comparison to the C S C A N is compensated by its abili ty to elim-inate the in-request delays and providing the clients wi th lower variance in response time. There is a slight degradation of system throughput after reaching the max point for a l l three methods. This is likely due to the fact that the max point is where the server is using its full potential. After this point, the server is serving more clients than it can safely accommodate and is thus being over-committed. 78 Chapter 7 Improving the Response Time under Transient State In this chapter, we wi l l address the effect of frequent transients on the response time experienced by each client in the system. We wi l l discuss how having frequent transients in the system causes large variance in the response time experienced by different clients and provide a solution to minimize this variance without sacrificing the other performance issues. 79 7.1 The Effect of Frequent Transients on Response Time In Chapter 3, we discussed the different elements of the system and how the actual service is provided by several server processes. After establishing a connection, the steps that are taken by various server processes from the moment a client's request is released unt i l the video data is available at the client site include: 1. The request is received by the client's associated request handler thread, who wi l l put the request in the global request queue after performing some minor processes on it . 2. The disk scheduler thread serves a l l the requests in the global queue in rounds. Before a new request wi l l be included in a round of service, the scheduler thread must complete the current round of service, and do the sequencing (according to the disk scheduling algorithm being used) for the new round. The scheduler thread wi l l then serve the request by transferring the requested video data from the storage device to the client's buffer. 3. The client's associated responder thread wi l l later transfer the data from client's buffer in the server to the client program via the network. After this step, the video data wi l l be available at client's site for display. 80 The above scenario shows that depending on the position of the scheduler thread in the current round at the time of the new request arrival, the duration of wait period for the request in the second step (before it can be served) might vary. A s mentioned in Chapter 3, this wait period along wi th buffer fill period (the time that it takes for the scheduler to read that request data) are the dominant factors constructing the response time for the first request, which is the only response time that can be sensed by the client. A l l the other requests following the first one wi l l most probably be served while the client program is busy wi th play-back of some previously served request (because a l l the successive video data are provided to the client as a stream) and thus, their response time wi l l not really matter to the client. When the server is working in the steady state, stream arrivals and departures are more rare. The server's C P U is mainly util ized by the scheduler thread. Sequence delay and the in i t ia l fill delay are thus the only factors contributing to the response time. However, in the presence of frequent transients, there are lots of arriving and departing streams at any point in time. A s each stream has a pair of request handler and responder threads associated with i t , there wi l l be lots of active incidents of these threads in the system at any point of time. Thus, there wi l l be more threads sharing the server's C P U and the scheduler thread wi l l not be the one ut i l iz ing it most of the time. In this case, the number of currently arriving and/or departing streams in the system can be another factor contributing to the variations in the clients' response 81 times. To minimize the adverse effect of frequent transients on response time, we have decided to give an out-of-sequence initial service to the first video segment for each client. In our approach, the first request received by every request handler thread wi l l be served immediately by the request handler thread itself rather than going through the steps mentioned above. For every client in the system, there is a tag in the client profile indicating weather the server has received the client's first request yet. Every request handler thread wi l l examine this tag upon arrival of a request. If the tag shows that the current request is the first request received, the request handler wi l l extract the first video segment wi thin the request, read its data immediately, and then discard it . The rest of the video segments wi th in the request wi l l then be treated in the usual way. We expect the video data for the first video segment to sustain consumption unti l data for the remainder of the video segments in a request becomes available. This w i l l eliminate the sequence delay phase for the first small request, thus improving its response time. Further, the first video segment does not have to wait for the disk scheduler thread to receive C P U in order to receive its data. In other words, the first request w i l l be served as soon as it is received by the server. 82 7.2 Experiment Design To evaluate the effect of out-of-sequence ini t ia l service technique on the response time, we have investigated a case with 100 clients. The arrival rate of the clients is modeled as the Poisson distribution wi th mean 0.5, to provide a sense of frequent arrivals. Each client generates 5 requests for video data, each containing 4 to 6 video segments. The clients' requests are generated at the constant rate of 1 request per second. The clients w i l l thus stay in the system for a very short time, and there are always clients in their arrival or departure phases. The same video files as in our previous experiments have been used here again, which are homogeneous in size and play-out time wi th their size ranging from 30 K B to 80 K B and their play-out time ranging from 1.5 to 8 seconds. We have experienced with two scenarios. One is when al l the video segments (including the first segment in the first request) are served by the disk scheduler thread after they have been sequenced according to the scheduling policy. In the other scenario, the first video segment is served out of sequence by the request handler thread. To evaluate our approach, we have measured the response time as well as the in-request and between-request delays in each case to compare the effect of out-of-sequence in i t ia l service on both response time and quality of service. We have evaluated these parameters for each of the three scheduling policies (round-robin, C S C A N , and vertical), in our system. 83 7.3 Results and Discussion Improving the response time comes at the cost of sacrificing the quality of service in terms of j i t ter to some degree. A s there are lots of streams in the arrival phase at each point in time, there wi l l be as many active request handler threads as well, reading data from disk. This causes the disk scheduler thread to experience some interruptions, thus degrading the performance of the disk scheduler. In general, we expect our approach to cause more in-request and between-request delays for the clients compared to the first scenario where a l l the requests are served by the disk scheduler. Table 7.1 compares the response time for the two scenarios i n the case where round-robin has been used as the disk scheduling policy. The first line shows the results for the in-sequence service (first scenario) and the second line shows the results for the case that out-of-sequence in i t ia l service has been used. In each line, the min imum and maximum value of response time experienced by the clients has been reported. The variance of response time values experienced by 100 clients has also been reported. We can see from the table that using the out of sequence in i t ia l service effectively minimizes the gap between the minimum and maximum values of response time by a great deal. Comparing the values of variance for the two cases indicates that this approach has been successful in minimizing the variance in the response time. 84 M i n Resp. T ime (Sec) M a x Resp. T ime (Sec) Variance In-sequence 0.058 . 5.708 0.642 Out-of-sequence 0.037 1.622 0.347 Table 7.1: Comparing the Response T ime for Round-robin policy: Using In-sequence Service and Out-of-sequence Init ial Service M i n In-req. Delay (Sec) M a x In-req. Delay (Sec) Mean In-sequence 0.000 0.858 0.379 Out-of-sequence 0.000 0.926 0.626 Table 7.2: Comparing the In-request Delay for Round-robin Pol icy: Using In-sequence Service and Out-of-sequence Init ial Service Tables 7.2 and 7.3 compare the in-sequence delay and between-sequence delay for each of the two cases for the round-robin scheduling, respectively. A s expected, the average amount of both delays experienced by the clients goes up in the second scenario, but the amount of increase in these two metrics is small and can be tolerated considering the improvement that is gained for the response time. Tables 7.4, 7.5, and 7.6 show the result of experiment wi th the C S C A N scheduling policy for response time, in-request delay, and between-request delay, respectively. M i n Between-req. Delay (Sec) M a x Between-req Delay (Sec) Mean In-sequence 0.000 1.037 0.169 Out-of-sequence 0.000 1.802 0.240 Table 7.3: Comparing the Between-request Delay for Round-robin Policy: Using In-sequence Service and Out-of-sequence Init ial Service 85 M i n Resp. T ime (Sec) M a x Resp. T ime (Sec) Variance In-sequence 0.062 5.366 0.563 Out-of-sequence 0.084 1.600 0.344 Table 7.4: Comparing the Response T ime for C S C A N Policy: Using In-sequence Service and Out-of-sequence Init ial Service M i n In-req. Delay (Sec) M a x In-req. Delay (Sec) Mean In-sequence 0.000 0.635 0.243 Out-of-sequence 0.000 1.426 0.944 Table 7.5: Comparing the In-request Delay for C S C A N Policy: Using In-sequence Service and Out-of-sequence Init ial Service We can see that there is an improvement in the variance in the response time and a drop in the jit ter, very similar to what we experienced wi th the round-robin policy. However, If we compare these results with the results for the round-robin for the second scenario, we can see that clients have been experiencing longer delays under C S C A N than round-robin. This is because under C S C A N policy, we have the extra overhead of sequencing the requests according to the policy. In the in-sequence case, this overhead is well compensated by the better disk arm ut i l izat ion provided by the C S C A N policy. Under the second scenario, however, the frequent out of sequence service to the newly arrived clients causes the disk arm to move to the arbitrary blocks of the disk, reducing the positive effect of C S C A N policy. Thus, it is not appropriate to use the out-of-sequence in i t ia l service wi th the C S C A N policy. Tables 7.7, 7.8, and 7.9 demonstrate the results of our experiments wi th the ver-86 M i n Between-req. Delay (Sec) M a x Between-req. Delay (Sec) Mean In-sequence 0.000 0.986 0.122 Out-of-sequence 0.000 1.925 0.478 Table 7.6: Comparing the Between-request Delay for C S C A N Policy: Using In-sequence Service and Out-of-sequence Init ial Service M i n Resp. T ime (Sec) M a x Resp. T ime (Sec) Variance In-sequence 0.028 5.623 0.518 Out-of-sequence 0.079 1.186 0.324 Table 7.7: Comparing the Response T ime for Vert ical Policy: Using In-sequence Service and Out-of-sequence Init ial Service tical scheduling policy. We have similar improvement in the variance for the response time along wi th slight degradation in in-request and between-request delays. How-ever, as the tables show, under the vertical scheduling the delays are less affected by the out-of-sequence in i t ia l service compared to the other two scheduling policies, showing that the out-of-sequence service technique is best suited for use with the vertical scheduling policy. M i n In-req. Delay (Sec) M a x In-req. Delay (Sec) Mean In-sequence 0.000 0.000 0.000 Out-of-sequence 0.000 0.042 0.006 Table 7.8: Comparing the In-request Delay for Vertical Policy: Using In-sequence Service and Out-of-sequence Init ial Service 87 M i n Between-req. Delay (Sec) M a x Between-req. Delay (Sec) Mean In-sequence 0.000 1.342 0.432 Out-of-sequence 0.000 2.001 0.475 Table 7.9: Comparing the Between-request Delay for Vert ical Policy: Using In-sequence Service anH Out-of-sequence Init ial Service 88 Chapter 8 Conclusions and Future Work This chapter provides a summary of the results of the work developed in this thesis. It also suggests some ideas for the future work in the area. 8.1 Summary of Results Providing a high quality Video-On-Demand service requires on-time delivery of con-tinuous data streams to the clients. V O D services and mult imedia applications further require access to the storage devices to be shared among multiple concurrent stream-s. In contrast to the long length of movies, the length of videos in interactive V O D applications such as digital libraries are typically short. In these applications, each request for video data consists of a series of small video segments which are generated by user interactions and should be provided to the user in a particular order. Long 89 latencies before new requests and between video segments are, therefore, highly un-desirable. The frequent arrivals and completion of streams can also seriously degrade the quality of service provided by an I V O D server. In this work, an Interactive Video-On-Demand server is designed and developed. We identified performance issues pertaining to different aspects of I V O D applications and used our I V O D system as a test bed for experiments on these issue wi th the goal of improving the performance of the I V O D server in various dimensions. O n network level, we have investigated the impact of the changing the socket send buffer size on the throughput of the server. We discussed why this parameter affects the performance of the server, and how our server can achieve a higher throughput by increasing the socket send buffer size to its maximum possible value, which is specified by the kernel. O n disk scheduling, we have proposed the vertical scheduling method, which serves the video segments contained in one single user request before proceeding to give service to the requests from the next users. We have shown that the vertical scheduling method effectively minimizes the delay between the video segments wi thin one single request, at the cost of slight degradation in the delay between the video segments from two individual requests. The method is also more stable in terms of response time, compared to the round-robin and C S C A N methods. There are two drawbacks associated wi th the vertical method. One is that it 90 requires more buffer space compared to the horizontal methods such as round-robin and C S C A N . We discussed that it is wi thin the capability of our server to provide the higher buffer requirement of the vertical method. Another drawback of the vertical method is that it achieves slightly lower through-put than the C S C A N method, because of the lack of disk seek optimization techniques used by the C S C A N method. We discussed that the difference is small and practically negligible for the interactive applications, where response time and in-request delays are more visible to the user than the underlying data throughput. Frequent arrival and departure of the clients is a common characteristic of the I V O D applications. We have shown how this characteristic causes large variability in the response time experienced by different clients, and proposed a technique called out-of-sequence initial service to decrease the variance of the response time under transient state. We have shown that there is a trade off between the improvement in the response time and in-request and between-request delays, but the gain in response time compensates for the slight degradation in the quality of service in terms of j i t ter. Final ly , we discussed that the out-of-sequence in i t ia l service performs best under vertical scheduling method. When used together, these two techniques provide shorter delays and less sensitivity to transients, and make a suitable solution for better performance for I V O D applications. 91 8.2 Suggestions for Future Work The work presented in this thesis can be extended in several aspects: • A hybrid scheduling method could be developed, by combining the vertical and C S C A N scheduling methods, in which video segments are served in the C S C A N order, but no data wi l l be sent to the client before the server serves a whole request for the client. Such an approach is expected to combine the advantages of both C S C A N and vertical method. • Our results show that using vertical scheduling method brings up a trade-off be-tween the in-request delay and between-request delay. This trade-off is specially important i f the number of video segments contained in a request can be large. In this case, some clients might experience long between-request delays. For this reason, it is conceivable to think of a scheduling method that acts between the horizontal and vertical methods. In this method, during each round, up to a certain number of video segments in a single request w i l l be served according to the vertical scheduling policy. If the request contains more video segments, the rest w i l l be left to be served in the next round. The disk scheduler wi l l then proceed to the next client as dictated by the horizontal scheduling policy. Using this method, the amount of in-request delays experienced by the clients w i l l not be zero as in the vertical scheduling policy, but there w i l l be an upper 92 l imi t on both the buffer requirement and the amount of between-request delays experienced by the clients. The number of video segments i n a request that wi l l be served vertically in each round should be determined according to the pattern of the clients' requests, i.e. what is the maximum possible number of video segments in a request and how often wi l l clients send requests containing large number of video segments. A n amortizing technique might be developed in the system, which enables the server to amortize the cost of service between clients who require a common video segment in a round of service. Instead of reading the video segment once for each client, the server reads the common video segment once for a l l clients, keeping the data in a shared buffer to be later provided to each of the clients. However, the gain in the cost obtained by using this approach depends on the number of clients who share a video segment, and how often this happens in the system. Popular video segments can be cached into the system. Tha t is, they can be kept in memory, instead of disk, to reduce the cost of frequent access to them. However, a criteria for deciding on a video segment's 'popularity ' can only be made based on previous information about the frequency of access to the video segment, which does not necessarily hold for future access to i t . 93 The server can be extended to support a variety of video formats in addition to M P E G . Integrating audio in to the system is also another interesting area for further extension of the server capabilities. 94 References [1] D . P . Anderson. A Fi le System For Continous Med ia ACM Transactions on Computer Systems, 10:4:311-337, 1992. [2] D.J .Gemmel l , J .Han. Mul t imedia Network Fi le Servers: Mult ichannel Delay-sensitive Da ta Retrieval. ACM/Springer Multimedia Systems, 1:6:240-252, 1994. [3] D . J .Gemmel l , H . M . V i n , D .D .Kand lu r , P .V.Rangan, L .A .Rowe . Mul t imedia S-torage Servers: A Tutorial . IEEE Computer, 40-49, 1995. [4] D.J .Gemmel l , S. Christodoulakis. Principles of Delay-sensitive Mul t imedia S-torage and Retrieval. ACM Transactions on Information Systems, 10:1:51-90, 1992. [5] P.Lopher, D.Sheperd. The Design of a Storage Server For Continuous Media. The Computer Journal, 36:1:32-42, 1993. [6] K .K.Ramakr i shnan , L .Vai tzb l i t , C.Gray, U.Vahal ia , D . T i n g , P.Tzelnic, S.Glaser, W.Duso . Operating System Support for a Video-on-Demand Fi le Service. ACM/Springer Multimedia Systems, 3:53-65, 1995. [7] D.R.Kenchammana-Hosekote, J.Srivastava. Scheduling Continuous Media in a Video-on-Demand Server. IEEE International Conference on Multimedia Com-puting and Systems, 19-28, 1994. [8] H-J-Chen, T . D . C . L i t t l e . Storage Al loca t ion Policies for Time-dependent M u l t i -media Data. IEEE Transaction on Knowledge and Data Engineering, 8:5:855-864,1996. [9] D.J .Gemmel l . Disk Scheduling for Continuous Media , in Mul t imed ia Information Storage and Management (edited by S.M.Chung). Kluwer Academic Publishers, 1-21, 1996. 95 L.H.Ngoh , H .Pan , V.Reddy. O n Storage Server Issues for Mul t imedia-On-Demand Systems. Multimedia Modelling: Towards the Information Superhigh-way, 392-409, 1996 P . S . Y u , M.S .Chen , D .D .Kand lu r . Grouped Sweeping Scheduling for DASD-based Mul t imed ia Storage Management. ACM/Springer Multimedia Systems, 1:3:99— 109, 1993 T . D . C . L i t t l e , J .F .G ibbon . Real-time Scheduling for Mut imed ia Services. SPIE High-speed Fiber Networks and Channels, 2:320-331, 1992 A . L . N . R e d d y , J . C . W y l l i e . Disk Scheduling in a Mul t imed ia I / O System. Pro-ceedings of First ACM Conference on Multimedia, 289-297, 1993 A . L . N . R e d d y , J . C . W y l l i e . I / O Issues in a Mul t imedia System. IEEE Computer, 27:3:69-74, 1994 P.Venkat Rangan, H . V i n , S.Ramanathan. Designing an On-Demand Mul t imedia Service. IEEE Communication Magazine, 30:7:56-64, 1992 P.Venkat Rangan, H . V i n . Efficient Storage Techniques for Dig i t a l Continuous Media . IEEE Transactions on Knowledge and Data Engineering, 5:4:564-573, 1993 H . V i n , P.Venkat Rangan. Designing a Multiuser H D T V Storage Server. IEEE Journal on Selected Areas in Communications, 11:1:153-164, 1993 T . D . C . L i t t l e , D.Venkatesh. Popularity-based Assignment of Movies to Storage Devices in a Video-On-Demand System. ACM/Springer Multimedia. Systems, 2:280-287, 1995 S.Shim, H.Vedavyasa, D . H . C . D u . A n Effective Da ta Placement Scheme to Serve Popular Video-On-Demand. 1996 Pacific Workshop on Distributed Multimedia Systems, 153-162, 1996 K .Hwang , H.Shin . New Disk Scheduling Algor i thm for Reduced Rotat ional L a -tency Proceedings of the third International Symposium on Database Systems for Advanced Applications, 395-402, 1993 96 [21] C .Gopa l , J .F .Buford . Delivering Hypermedia Sessions From a Continuous Me-dia Server in Mul t imedia Information Storage and Management (edited by S.M.Chung) . Kluwer Academic Publishers, 209-235, 1996. [22] Richard Stevens. U n i x Network Programming Prentice Hall, 1990. 97 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065044/manifest

Comment

Related Items