Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A novel scheduling algorithm for video flows in high-rate WPANs Moradi, Shahab 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-ubc_2007-0195a.pdf [ 4.13MB ]
Metadata
JSON: 831-1.0101142.json
JSON-LD: 831-1.0101142-ld.json
RDF/XML (Pretty): 831-1.0101142-rdf.xml
RDF/JSON: 831-1.0101142-rdf.json
Turtle: 831-1.0101142-turtle.txt
N-Triples: 831-1.0101142-rdf-ntriples.txt
Original Record: 831-1.0101142-source.json
Full Text
831-1.0101142-fulltext.txt
Citation
831-1.0101142.ris

Full Text

A Novel Scheduling Algorithm for Video Flows in High-rate W P A N s by Shahab Moradi B.Sc, Electrical Engineering, University of Tehran, Iran, 2004 A THESIS S U B M I T T E D IN PARTIAL F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Applied Science in The Faculty of Graduate Studies (in Electrical and Computer Engineering) The University of British Columbia April 2007 © Shahab Moradi 2007 Abstract The emerging high-rate wireless personal area network (WPAN) technology is capable of supporting high-speed and high-quality real-time multimedia applications. In particular, video streams are deemed to be a widespread traffic type, and require quality of service (QoS) support. However, in the current I E E E 802.15.3 standard for M A C (media access control) of high-rate WPANs, the implementation details of some key issues such as scheduling and QoS provisioning have not been addressed. Moreover, the hierarchical structure of video streams calls for special measures at the M A C layer in order to improve the QoS. In the first part of this thesis, we propose a frame-decodability aware (FDA) tech-nique to make the scheduling algorithms aware of the hierarchical structure and decoding dependencies in video streams. Simulation results show that the F D A technique can sig-nificantly improve the performance of F-SRPT [1] and E D D + S R P T [2] schedulers by up to 61% and 60%, respectively. We also compare two common performance metrics and investigate which one is a more accurate indicator of the QoS given to video streams. In the second part of this thesis, we first propose a mathematical model for the optimal scheduling scheme for video flows in high-rate WPANs. Using this model, we then propose ii Abstract a scheduler that incorporates reinforcement learning (RL). Simulation results show that our proposed scheduler is nearly optimal and performs 42%, 49%, and 53% better than E D D + S R P T [2], PAP [3], and F-SRPT [1] schedulers, respectively. iii Table of Contents Abstract ii Table of Contents iv List of Tables viii List of Figures ix List of Acronyms xiii List of Symbols xvii Acknowledgements xxii 1 Introduction 1 1.1 Motivations and Objectives 2 1.2 Contributions 3 1.3 Structure of the Thesis 3 2 Background and Related Work 5 iv / Table of Contents 2.1 Physical Layer 5 2.1.1 Impulse-based U W B 6 2.1.2 Multiband O F D M (MB-OFDM) 7 2.2 I E E E 802.15.3 Standard for M A C 9 2.2.1 Communication Structure 10 2.2.2 Piconet Initiation, Maintenance and Termination 10 2.2.3 The Superframe Structure 12 2.2.4 Data Communication Between DEVs 14 2.3 QoS Issues 16 2.3.1 Characteristics of Video Streams 16 2.3.2 Performance Metrics 18 2.4 Related Work on M A C Enhancement 19 2.4.1 Scheduling Proposals for Impulse-based U W B 19 2.4.2 Scheduling Proposals for I E E E 802.15.3 Standard for M A C . . . 22 General Schemes 23 Application-aware Schemes 30 Other Schemes 31 2.5 Markov Decision Process (MDP) 32 2.6 Reinforcement Learning (RL) 35 2.6.1 Curse of Modeling 37 Model-free Q-Learning 38 Table of Contents S M A R T Algorithm 40 2.6.2 Curse of Dimensionality 41 State Aggregation and Features 41 Generalization and Function Approximation 43 2.7 Summary 46 3 Frame Decodability Aware (FDA) Technique 47 3.1 Motivations and Assumptions 47 3.2 F D A Technique 48 3.3 Performance Evaluation 54 3.3.1 Simulation Model 54 3.3.2 Simulation Results 55 3.4 Summary 61 4 Scheduling Algorithm for Video Flows 63 4.1 Introduction and Motivations 63 4.2 Problem Formulation 64 4.2.1 Assumptions and System Model 64 4.2.2 Decision Epochs 64 4.2.3 States 65 4.2.4 Actions 66 4.2.5 Reward Function and Gain 68 Table of Contents 4.2.6 State Transitions 70 4.3 Algorithm Implementation 72 4.3.1 Features and State Space Representation 73 4.3.2 Function Approximation: Kanerva Coding 76 4.3.3 R L Scheduler 80 4.4 Simulation Results 82 4.5 Summary 88 5 Conclusions and Future Work 89 5.1 Future Work 90 Bibliography 92 List of Tables 2.1 Time Frequency Codes (TFC) used in M B - O F D M . The first four TFCs are used for the 3-band groups, and the last two TFCs are for the 2-band group 9 2.2 Number of frame dependencies for each video frame 18 3.1 Simulation Parameters 55 4.1 Quantitative comparison among RL scheduler and conventional schedulers. The minimum average D F R given by F-SRPT, EDD+SRPT, and PAP is used for comparison 86 viii List of Figures 2.1 Multiple access using time-hopping with Na = 4 and Nh = Tj/Tc = 8. Users A and B use 0, 3, 1, 2 and 3, 7, 5, 4 T H codes, respectively 7 2.2 Diagram of the sub-bands and band groups of M B - O F D M [4] 8 2.3 Elements of I E E E 802.15.3 piconet 11 2.4 Superframe structure. The presence of C A P is optional. The number, duration and order of the CTAs and M C T A s can vary from one superframe to another 13 2.5 A C K policies defined in I E E E 802.15.3 Standard for M A C : (a) no-ACK, (b) Imm-ACK, (c) Dly-ACK 15 2.6 G O P structure for (N=9,M—3). The arrows show the decoding depen-dencies 17 ix List of Figures 2.7 Flaws of one-byte solution for updating the status of the queues at the P N C [1]. In superframe n from time i i to t2 is allocated to flow i. Using the one-byte approach, PNC has the queue status of flow i up to time t2. At time £ 3 and £ 4 , new data arrive for flow i and change its queue status. Therefore, P N C schedules the flows in superframe (n + 1) at time £ 5 based on incorrect information about flow i. Flow i cannot update its queue status sooner than £ 6 25 2.8 Standard R L model 37 2.9 This figure shows a 2-D plane, in which a robot tries to find a target. Robot and target are depicted in three different positions, which are all equivalent with respect to the value of features fx and / 2 . These different states can be aggregated 42 2.10 A n example of memory-based function approximator. The input point is the point in the state-action space that its Q-value is desired. All the prototypes within the activation radius of the input are activated. The weighted sum of the values of the activated prototypes is the approximated value of the input 45 3.1 Pseudo-code of the F D A technique 50 3.2 An illustration of how the F D A technique updates the scheduling eligibility table. The table after being updated at the beginning of (a) superframe n, (b) superframe n + 1 52 x List of Figures 3.3 Logical interaction between F D A module and the scheduler 54 3.4 The effect of time separation <j> on failure rate (F = 9) 56 3.5 Effect of F D A technique on F-SRPT scheduling algorithm versus time separation 4> 58 3.6 Effect of F D A technique on F-SRPT scheduling algorithm versus the num-ber of flows F 58 3.7 Effect of F D A technique on E D D + S R P T scheduling algorithm versus time separation 4> 59 3.8 Effect of F D A technique on E D D + S R P T scheduling algorithm versus num-ber of flows F 60 3.9 Effect of F D A technique on PAP scheduling algorithm versus time sepa-ration qb 60 3.10 Comparison of F-SRPT, EDD+SRPT, and PAP schedulers when F D A is used 61 4.1 Pseudo-code of the RL scheduler 81 4.2 Structure of the RL scheduler 82 4.3 Comparison of RL scheduler and optimal scheduler for <f> = 0. R L scheduler is nearly optimal in this case 84 4.4 Effect of time separation on average D F R for F = 9. The R L scheduler has the smallest average D F R for (j> > 8 ms 85 4.5 Comparison of R L scheduler and other schedulers 86 xi List of Figures 4.6 Comparison of the exact and the estimated average D F R (F = 9) 87 List of Acronyms A C K Acknowledgement A T P Association Timeout Period BcstID Broadcast Identifier B E R Bit Error Rate C A P Contention Access Period C C A Clear Channel Assessment C M A C Cerebellar Model Articulation Controller C S M A / C A Carrier Sense Multiple Access with Collision Avoidance C T A Channel Time Allocation C T A P Channel Time Allocation Period D C Direct Current D C M Darken Chang Moody List of Acronyms D E V Device DEVID Device ID D F R Decoding Failure Rate Dly-ACK Delayed Acknowledgement DP Dynamic Programming DstID Destination Identification DS-UWB Direct Sequence Ultra-wide Band E C M A European Computer Manufacturers Association E D D Earliest Due Date F C T A Feedback Channel Time Allocation F D A Frame-decodability Aware F-SRPT Fair Shortest Remaining Processing Time G O P Group of Pictures ID Identification IE Information Element I E E E Institute of Electrical and Electronics Engineers xiv List of Acronyms I F F T Inverse Fast Fourier Transform Imm-ACK Immediate Acknowledgement J F R Job Failure Rate M A C Media Access Control M B - O F D M Multi-band Orthogonal Frequency Division Multiplexing M C T A Management Channel Time Allocation M D P Markov Decision Process MIFS Minimum Inter-frame Spacing MUI Multiuser Interference PAP Pre-assigned Priority P N C Piconet Coordinator QoS Quality of Service USB Universal Serial Bus U W B Ultra-wide Band R F Radio Frequency R L Reinforcement Learning xv List of Acronyms SIFS Short Inter-frame Spacing SINR Signal to Interference and Noise Ratio S M A R T Semi-Markov Average Reward Technique SMDP Semi-Markov Decision Process SrcID Source Identification SRPT Shortest Remaining Processing Time T D M A Time Division Multiple Access T G Task Group T F C Time-frequency Coding T H Time Hopping T H - U W B Time Hopping Ultra-wide Band W L A N Wireless Local Area Network W P A N Wireless Personal Area Network xvi List of Symbols | • | cardinality of a set [•] floor function \ set difference <— assignment = definition A set of all actions As set of all possible actions in state s ^ scheduling decision about flow i a a possible action B similarity radius B B frame type of video flows Ci(s) state-dependent cost function for flow List of Symbols CR cumulative reward CT 8 cumulative time decodability of the frame of flow i decodability vector deadline of the frame of flow i deadline of the newly arrived frame for flow i vector of deadlines dist(x, z) distance of feature vectors x and z D" maximum deadline D F R average D F R that policy n obtains superframe size eligibility of the frame of flow i /(s,a) feature vector of state s and action a number of flows in the system F set of integers from 1 to F 7 inter-arrival time of video frames List of Symbols gi offset of the frame of flow i with respect to the beginning of the frame's G O P g vector of g^s hk /eth prototype of Kanerva coding Hx set of prototypes activated by feature vector x A(x, z) similarity between feature vectors x and z li channel time that flow i requires l™ew channel time required by the newly arrived frame for flow i ^partial amount of channel time allocated for partial transmission of a frame 1 vector of channel time requirements L™ a x maximum frame size of flow i I I frame type of video flows K minimum number of prototypes that should be activated by a feature vector u-o, u-r parameters of D C M for learning rate u-n learning rate at superframe n xix List of Symbols M I-to-P distance of a video flow Mi I-to-P distance of flow i n superframe number (N, M) GOP pattern N I-to-I distance of a video flow Ni I-to-I distance of flow i O(-) big O function 4> start time separation p index of the flow that partially transmit its frame P P frame type of video flows qo, qr parameters of D C M for exploration probability qn exploration probability at superframe n Q(x) value of feature vector x estimated by Kanerva coding p" gain of policy TT r(s, a) reward of taking action a in state s R set of real numbers xx List of Symbols s a possible state S set of all the states txi(x) channel time that flow i requires for transmitting x bits of data T last decision epoch U{.y indicator function Wk weight of the /eth prototype Ui type of the frame of flow i Ui(g) type of the frame of flow i that has offset g yfew type of the newly arrived frame for flow i xxi Acknowledgements It is difficult to overstate my gratitude to my Master's supervisor Dr. Vincent Wong for his guidance, patience, and support during my graduate studies. I sincerely appreciate the amount of time and effort he invested in helping me developing the ideas, improving the results, and refining the structure of my thesis. I am indebted to Dr. Richard Sutton for his insightful comments, Dr. Pentti Kanerva for providing me with his publications, and Hamed Mohsenian for his helpful suggestions. I would also like to thank my fellow colleagues and friends who assisted and encouraged me along the way. I am especially grateful to Adrienne Peltonen, Amir Moghimi, Farshid Farhat, Babak Fallah, Samaneh Ashoori, Enrique Stevens-Navarro, Stephen Ney, Jessica Raposo, Mohammad Maysami, Ali Soltanzadeh, Hamidreza Ahmadian, Ashkan Hesh-mati, Bijan Azadi, Mehran Shaghaghi, and Nasim Arianpoo. Lastly, and most importantly, I wish to thank my parents, Mohammad-Taghi and Marzieh Moradi, as well as my sister and her spouse, Sholeh and Mohammad Adimi, for their love, support, and encouragement. To them I dedicate this thesis. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant number C R D P J 320552-04. xxii Chapter 1 Introduction In the past few years, ultra-wide band (UWB) technology has received increasing at-tention in the wireless world. It provides short-range connectivity, low transmit power levels, and high-data rates, which make U W B the physical layer of choice for high-rate wireless personal area networks (WPANs). UWB-enabled WPANs can offer many new applications, such as home entertainment, real-time multimedia streaming, and wireless USB, to name a few. In order to fully exploit U W B technology in high-rate WPANs, upper layers, in-cluding the media access control (MAC) layer, must be properly designed for high-rate applications. Video transmission is one such application for high-rate WPANs, which is predicted to constitute a major traffic load. Real-time video flows are delay-sensitive and require quality of service (QoS) guarantee. However, in the I E E E 802.15.3 standard for M A C [5], which is designed for WPANs, details of scheduling and QoS support are left to the developers. Consequently, in this thesis, we aim to design an application-aware scheduling algorithm for M A C layer to provide the required QoS for video traffic. 1 Chapter 1. Introduction 1.1 Motivations and Objectives Similar to other real-time traffic, video is delay sensitive and its frames are dropped at the receiver if their delay exceeds the maximum tolerable delay. However, video stream has a few unique characteristics that make QoS support more challenging than other real-time traffic. It has large peak-to-average ratio of the frame sizes and hierarchical structure with dependency among its frames [3]. The related work on the video traffic schedulers in high-rate WPANs is limited. The few existing proposals try to enhance the scheduling performance by using some kind of application layer information about video streams. This information is mainly the type of the video frames. However, the information about frame type is not used efficiently because it can be readily used along with the decoding dependencies in video streams to further improve scheduling performance. We aim to design a technique that enables the scheduler to be aware of the decoding dependencies, and thus, use the resources more efficiently. Furthermore, this technique must 1) be simple to implement, 2) not incur signaling overhead in the system, and 3) be independent of the scheduling algorithm details. Besides not fully using application layer information, all the current proposals are heuristics that describe how application layer information can be combined with a con-ventional scheduling algorithm for better performance. As a result, we are motivated to devise an analytical framework to provide us with a foundation to design a systematic scheduling algorithm for video traffic. 2 Chapter 1. Introduction 1.2 Contributions Our contributions in this thesis can be summarized as follows: • We propose a frame decodability aware (FDA) technique that the scheduler can use to determine whether a video frame is decodable at the receiver. The F D A technique uses the information about frame type to minimize the channel time wastage caused by scheduling undecodable frames. This technique is applicable to different schedulers and can improve the QoS [6]. • We provide a mathematical framework for finding the optimal scheduler of video flows in high-rate WPANs. This framework takes into account the number and pattern of video flows, and their hierarchical structure. • Based on the above framework, we design a scheduler that provides significantly better QoS to video flows when compared to some other schedulers [7]. 1.3 Structure of the Thesis In Chapter 2, we give an overview of U W B physical layer and I E E E 802.15.3 standard for M A C . It also summarizes the hierarchical structure of video streams and the related work. In addition, Chapter 2 introduces Markov decision process (MDP) and reinforcement learning (RL), which lays the foundation for Chapter 4. In Chapter 3, we describe our proposed F D A technique which can efficiently use application layer information to 3 Chapter 1. Introduction improve performance. The mathematical formulation for optimal video scheduler, as well as our proposed scheduling algorithm for video traffic, are presented in Chapter 4. The conclusions in Chapter 5 outline the main contributions of the thesis and explain possible future works in the area. 4 Chapter 2 Background and Related Work In recent years, many researchers have addressed the scheduling and QoS support issues in high-rate WPANs, which are not specified in the standard for M A C [5]. The different types of physical layer for U W B transmission, as well as attention to different details by which performance improvements can be achieved, have resulted in a rich literature in this area. In this chapter, we first give an overview of the types of U W B physical layer, followed by a brief description of the I E E E 802.15.3 standard for M A C . Since real-time traffic is envisioned to constitute a major traffic load in WPANs, challenges to provide QoS to this traffic type are also addressed in this chapter. After that, we summarize the key proposals for M A C scheduling algorithms for WPANs. Since we use Markov decision process (MDP) and reinforcement learning (RL) in Chapter 4, we introduce the basics of M D P and R L in this chapter as well. 2.1 Physical Layer U W B signal is defined as a signal having a —10 dB fractional bandwidth greater than 20%, or occupying at least 500 MHz of bandwidth. This section introduces the main 5 Chapter 2. Background and Related Work types of U W B physical layer. The first type is impulse-based U W B which can further be classified as time-hopping (TH)-UWB and direct sequence (DS)-UWB. The former is used for low-rate WPANs, while the latter is suitable for high-rate WPANs. The second type of U W B physical layer, which is a contender of DS-UWB for physical layer of high-rate WPANs, is multi-band orthogonal frequency-division multiplexing (MB-OFDM). We briefly describe these types in the following subsections. 2.1.1 Impulse-based U W B Impulse-based U W B systems utilize pulses with duration ranging from a few tens of picoseconds up to a few nanoseconds, which have frequency components from near di-rect current (DC) to a few giga Hertz [8]. The typical waveforms used in impulse-based U W B are Gaussian monocycle (the first derivative of Gaussian pulse with the form aiexp(—i 2 /r 2 ) , where t is the time, r determines the pulse duration and a is the amplitude coefficient) and its derivatives. Standard modulation schemes such as pulse amplitude, position or phase modulation can be used. The two types of impulse-based U W B systems are T H - U W B and DS-UWB. Using the notations in [8], a user of DS-UWB sends a bit in an interval with duration Tj. Within this period, each user uses a pseudo random spreading code with length N„ and chip duration Tc to spread its data. Thus, the data rate for each user is 1/Tf. On the other hand, a user of T H - U W B spreads one bit to Ns intervals each with duration Tf. The pseudo random time-hopping code assigned to the user determines the pulse position 6 Chapter 2. Background and Related Work A B A B A B A B • time Figure 2.1: Multiple access using time-hopping with Ns = 4 and Nh = Tf/Tc = 8. Users A and B use 0, 3, 1, 2 and 3, 7, 5, 4 T H codes, respectively. within each interval. Therefore, the data rate for each user is l/NsTf. The use of pseudo random codes can be used for multiple access. Figure 2.1 illustrates this for T H - U W B . Impulse-based U W B system has two main advantages. First, it is robust to multipath propagation as most of the multipaths are resolvable. Second, pulses with short dura-tion can give accurate timing information that can be used for ranging and positioning [9]. However, very short duration of pulses requires very long acquisition time; thus, re-ducing synchronization overhead becomes a challenge. Besides, suppressing narrowband interference requires notch filters, which cause additional complexity to compensate the distortion of the pulses caused by the notch filters [10]. 2.1.2 Mult iband O F D M ( M B - O F D M ) As its name suggests, M B - O F D M divides the available spectrum into 14 sub-bands with 528 MHz bandwidth, and uses O F D M modulation within each sub-band. The spectrum is divided into five band groups. Each of the first four band groups has three sub-bands, whereas the fifth one has only two sub-bands (see Figure 2.2). The M B - O F D M system utilizes a time-frequency code (TFC) to interleave coded data over sub-bands of each 7 Chapter 2. Background and Related Work band group. As shown in Table 2.1, there are four TFCs available to the 3-band groups, and two T F C s available to the only 2-band group. Together, these band groups and the TFCs define eighteen separate logical channels that can be used by independent piconets. TFCs are designed to have good collision properties for all possible asynchronous shifts among the simultaneously operating piconets. All devices should be able to operate in the band group #1, whereas support for the other band groups is optional and can be added in the future [4]. Since U W B regulations may vary worldwide, the M B - O F D M approach gives a high regulatory flexibility for worldwide operation by simply turning different bands on or off. Furthermore, M B - O F D M is flexible in coexisting with other systems in uncoordinated environments, by selectively using available bands according to the in-band interference level. For instance, U W B systems are likely to receive interference from I E E E 802.11a operating at 5 GHz. This interference can be reduced by not using the sub-bands overlap-ping with the operating spectrum of I E E E 802.11a. In addition, it can efficiently capture multipath energy in a single radio frequency (RF) chain [11]. However, the M B - O F D M transmitter is slightly complex because it uses an inverse fast Fourier transform (IFFT) [12]. Band Group #1 Band Group #2 Band Group #3 Band Group #4 Band Group #5 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 s—\' #14 | 3432 3960 4488 5016 5544 6072 6600 7128 7656 8184 8712 9240 9768 10296 Frequency (MHz) Figure 2.2: Diagram of the sub-bands and band groups of M B - O F D M [4]. 8 Chapter 2. Background and Related Work T F C Number Length 6 T F C 1 1 2 3 1 2 3 2 1 3 2 1 3 2 3 1 1 2 2 3 3 4 1 1 3 3 2 2 5 1 2 1 2 1 2 6 1 1 1 2 2 2 Table 2.1: Time Frequency Codes (TFC) used in M B - O F D M . The first four TFCs are used for the 3-band groups, and the last two TFCs are for the 2-band group. 2.2 IEEE 802.15.3 Standard for M A C Personal operating space is a space around a person or an object with a typical radius of up to 10 m [5]. WPANs are used to convey information within personal operating space. As a result, unlike wireless local area networks (WLANs), link robustness at long range is not a primary concern in the development of W P A N standards. This allows the standard to primarily focus on issues such as cost, size, power consumption and data rate. The I E E E 802.15.3 Task Group has designed the M A C standard for WPANs, which aims to provide low cost, low power consumption, and high data rate within the personal operating space. In the following subsections, we describe the main features of the standard based on [5]. 9 Chapter 2. Background and Related Work 2.2.1 Communication Structure According to [5], a piconet is defined as a "wireless ad hoc data communications sys-tem which allows a number of independent data devices (DEVs ) to communicate wi th each other." The size of a piconet is generally confined to the personal operating space which is much smaller than the coverage area of local or metropolitan area networks, and hence the name. Figure 2.3 shows the different elements (DEVs and P N C ) of an 802.15.3 piconet, hereafter briefly referred to as a piconet. A s i l lustrated, communication wi th in a piconet is based on central coordination and peer-to-peer data transfer. One D E V called the piconet coordinator ( PNC ) is responsible for providing basic t iming, per-forming scheduling, managing QoS, and controling access to the piconet. M A C overhead, including headers and acknowledgement frames, is normally transmitted at a lower rate, called the base rate, to guarantee reliable detection at the receiver. 2.2.2 Piconet Initiation, Maintenance and Termination Since a piconet is a k ind of ad hoc network, and thus infrastructure-less, a procedure is required to create a piconet. In this procedure, a D E V , which is capable of performing the roles of P N C , scans the available channel. The number of available channels depends on the underlying physical layer and is specified in the standard. If the P N C finds an unused channel, it starts sending beacons in that channel. Otherwise, it can associate wi th an existing piconet. This starting procedure does not guarantee that the most capable D E V wi l l be the P N C . Therefore, a P N C handover procedure is also defined which is described 10 Chapter 2. Background and Related Work DEV Figure 2.3: Elements of I E E E 802.15.3 piconet later in this section. A D E V can join or leave a piconet by using associate or disassociate procedures, respectively. After association, D E V is provided with a unique one-octet ID, called DEVID. This DEVID is used instead of DEV's 8-octet address in order to reduce the overhead. This imposes a limit on the maximum number of DEVs in a piconet. When a new D E V associates with the piconet, the P N C checks its capabilities. If it is more capable than the P N C itself and the current security policy also allows handover, the P N C can handover the control of the piconet to that D E V . Several criteria, such as being able to support security, are considered to determine the capabilities. When a D E V disassociates from the piconet, its DEVID is no longer valid. The P N C does not re-assign it until expiration of a waiting period. When the P N C wants to shutdown, it handovers the control to another PNC-capable D E V in the piconet; otherwise, it sends a shutdown 11 Chapter 2. Background and Related Work information element (IE) to inform the members of the piconet. If the PNC abruptly leaves, the piconet terminates its operation for the duration of association timeout period (ATP). After expiration of ATP, another PNC-capable DEV can start a piconet using the procedure described earlier in this section. 2.2.3 The Superframe Structure The timing in the piconet is based on the superframe, which consists of three parts (see Figure 2.4) as follows: 1. Beacon - It announces timing allocations and is also used to communicate manage-ment information for the piconet. It contains piconet synchronization parameters, such as superframe duration and contention access period end time. It may also include several IEs, such as channel time allocation, DEV association, PNC shut-down, or PNC handover. 2. Contention Access Period (CAP) - The presence of this part is optional. It is used to communicate commands and asynchronous data. It is stated in the beacon the types of command and data that are allowed to be transmitted during CAP. Some examples of command are association request /response, PNC handover re-quest/response, and channel time request/response. The access method within CAP is carrier sense multiple access with collision avoidance (CSMA/CA) with binary exponential backoff. The length of CAP is determined by the PNC and may vary from one superframe to another. 12 Chapter 2. Background and Related Work Superframe #m-1 Superframe #m Superframe #m+1 Beacon #m CAP CTAP MCTA 1 MCTA .2 CTA 1 CTA 2 • • • CTA n Figure 2.4: Superframe structure. The presence of C A P is optional. The number, duration and order of the CTAs and M C T A s can vary from one superframe to another. 3. Channel Time Allocation Period (CTAP) - This period is composed of channel time allocations (CTAs), and may include management CTAs (MCTAs). When a D E V is assigned a C T A , it can use it for commands, isochronous streams, and asynchronous data transfer. Channel access during C T A P is time division multiple access (TDMA) and there is no contention within this time. At the beginning of the superframe, the P N C announces the start time and length of each C T A as well as the DEVs that are allowed to use it (typically, source and destination of a flow, which are referred to as SrcID and DestlD. In the case of broadcast or multicast, more DEVs are involved.) This will allow DEVs to operate in normal power con-sumption mode only during their C T A and switch to low power consumption mode throughout the rest of CTAP. This scheme also allows the P N C to support QoS. If either the source or destination of a C T A is the PNC, that C T A is called M C T A . The P N C has the option of using M C T A s instead of C A P for sending or receiving command frames. Like other CTAs, the number and position of M C T A s may vary from superframe to superframe and the appropriate number is determined by the 13 Chapter 2. Background and Related Work PNC. M C T A s can be assigned to a specific source or destination, and function as direct uplink or downlink, respectively. For example, in the superframe following the successful association of a new D E V , the P N C can allocate an M C T A to that D E V . Thus, it is guaranteed that the newly associated D E V can communicate its traffic needs with the P N C as soon as possible, which in turn, results in fast con-nection to the piconet. It is also possible that an M C T A be shared among several DEVs, as in open and associate MCTAs. An M C T A is open if its SrcID is the broadcast identifier (BcstID). Therefore, all associated DEVs can send command frames to the P N C during open M C T A . On the other hand, the SrcID of associate M C T A is UnassocID, which means that all DEVs that are not currently associated in the piconet can attempt to send association request to the P N C in order to become a member of the piconet. Channel access during shared M C T A is slotted A L O H A . 2.2.4 Data Communication Between DEVs As mentioned in Section 2.2.1, all data exchange is peer-to-peer. Furthermore, C A P can be used to send asynchronous data, and C T A P is used for both isochronous streams and asynchronous data transfer. Isochronous streams require channel time on a regular basis, whereas in the case of asynchronous data transfer, only the total amount of required channel time is important. The P N C can split this amount among several superframes in any manner, as long as they sum up to the requested amount. According to its current 14 Chapter 2. Background and Related Work (a) no-ACK 2 2 2 Frame 1 T I w Frame 2 T I W o o o Frame n T I C/J (b) Imm-ACK Frame 1 SIFS ACK SIFS Frame 2 SIFS ACK SIFS o o o Frame n SIFS ACK SIFS (c) Dly-ACK with busrt size n 2 2 Frame 1 T I W Frame 2 T I </> O O O Frame n SIFS ACK SIFS Figure 2.5: A C K policies denned in I E E E 802.15.3 Standard for M A C : (a) no-ACK, (b) Imm-ACK, (c) Dly-ACK. needs, each D E V can send a C T A request to the P N C to modify the length or number of its allocated CTAs. However, the P N C may or may not be able to fulfill all the requests of the DEVs. In any case, the P N C will notify the D E V of its decision, through the channel time response command. Three acknowledgement (ACK) policies are defined for M A C (see Figure 2.5): no-A C K , immediate A C K (Imm-ACK) and delayed A C K (Dly-ACK). The No-ACK policy is used for applications that do not require guaranteed delivery, and two successive frames are separated by minimum interframe space (MIFS). On the other hand, when Imm-A C K is used, every frame should be acknowledged upon reception. The short interframe space (SIFS) separates the transmitted frame and A C K . Dly-ACK is a hybrid of the two previous policies. In this mechanism, the receiver groups the A C K s of a burst of frames and sends them all to the source in one A C K frame. Adjusting the burst size can balance between overhead and transmission reliability. 15 Chapter 2. Background and Related Work 2.3 QoS Issues Providing guaranteed QoS is a crucial issue in the design of networks that are aimed to serve different types of traffic including multimedia and real-time traffic. For the QoS traffic classes, networks should be able to, deterministically or statistically, satisfy some performance criteria, such as minimum bandwidth allocation or bounded transmission delay. QoS management methods include resource reservation, admission control, QoS routing, and packet scheduling, many of which are related to M A C protocol [13]; hence, M A C design is of vital importance to provide QoS. Multimedia traffic, and more specifically video traffic, requires QoS support in high-rate WPANs. Besides typical challenges to meet QoS requirements of multimedia traffic, such as strict delay and loss bounds, video stream introduces other complexities. It has large peak-to-average ratio of the frame sizes and hierarchical structure with dependency among its frames [3]. Consequently, it is essential to study the performance of proposed scheduling algorithms for I E E E 802.15.3 under video flows. In the following subsections, we first describe different video frame types and hierar-chical structure of video streams. We then introduce the performance metrics that will be used in Chapters 3 and 4. 2.3.1 Characteristics of Video Streams A typical video encoder generates a sequence of three types of compact frames: intra-coded (7), predictive (P), and bidirectional (B) frames [14]. Because of the different 16 Chapter 2. Background and Related Work N = 9 F i g u r e 2 .6: G O P structure for (TV—9,M=3). The arrows show the decoding dependen-cies. compression schemes used to encode the different frame types, / frames tend to be larger (less compressed) than P and B frames, and P frames tend to be larger than B frames. Video encoders generate the three frame types according to a predefined pattern called group of pictures (GOP). This pattern is characterized by two parameters (N, M), where N is the I-to-I frame distance, and M is the I-to-P frame distance [14]. This pattern is generally fixed for a given video sequence, and N is a multiple of M. Figure 2.6 illustrates the hierarchical structure of video streams, as well as decoding dependencies among the frames. In order for a frame to be decodable at the receiver, all other frames that it depends on, including the frame itself, must be available at the receiver. A frame is undecodable if it is directly or indirectly lost [15]. Direct loss happens when the receiver does not receive the frame completely, either because of channel error or deadline expiration. A frame is indirectly lost when some frames that it depends on are directly 17 Chapter 2. Background and Related Work Frame type Number of frames depending on it I N + (M-1) Pki k = 1, . . . , — 1 N-l-(k-l)M B 1 Table 2.2: Number of frame dependencies for each video frame lost. As an example, in Figure 2.6, all the frames in a G O P depend on the only I frame in that GOP; therefore, if that I frame does not meet its deadline, the whole G O P is considered undecodable (direct loss of the I frame and hence, indirect loss of the rest of the frames). For instance, with iV = 12 and frame rate of 30 frames per second (fps), this would result in impaired quality for about 360 ms, which is fairly obvious to a user. Similarly, loss of P frames can also degrade the quality of video streams. Table 2.2 shows the number of frames depending on each frame inside a GOP. The larger the number of dependencies on a frame, the more important that frame is for decoding process at the receiver. Consequently, video frame types can be ranked, from the most important to the least important, as I, P, and B. 2.3.2 Performance Metrics Video frames, like other real-time traffic, are delay sensitive. The video frames will be dropped at the receiver if their delay exceeds the maximum tolerable delay. This is the base of job failure rate (JFR) criterion for evaluating performance of schedulers in the M A C layer. For a delay sensitive flow, J F R is defined as the fraction of frames that 18 Chapter 2. Background and Related Work do not meet their transmission deadlines, and are hence useless and get dropped. An important consequence of hierarchical structure of video stream is that JFR, by itself, cannot accurately reflect the QoS given to a video flow at the M A C layer. In other words, low J F R does not necessarily indicate high QoS. As a result, another performance metric is required for comparing the performance of different schedulers. We use the decoding failure rate (DFR) criterion, which takes frame dependencies into account. It is defined as the ratio of the total number of undecodable frames to the total number of frames [15]. D F R can be viewed as an objective measure of user-perceived degradation of quality, and is the main performance metric in this thesis. 2.4 Related Work on MAC Enhancement This section gives an overview of the related work on scheduling algorithms for M A C in WPANs. In Section 2.4.1, we introduce the proposals that incorporate the unique properties of impulse-based U W B physical layer, especially the possibility of simultaneous transmissions. In Section 2.4.2, we summarize those proposals that are applicable to various types of U W B physical layer. 2.4.1 Scheduling Proposals for Impulse-based UWB Both DS-UWB and T H - U W B have the potential to support concurrent transmissions, by using time hopping and spreading codes, respectively. Many researchers have tried to take advantage of this property and improve the efficiency of resource allocation. The 19 Chapter 2. Background and Related Work scheduling scheme proposed in [16] chooses the appropriate scheduling method out of single-link transmission policy and concurrent-links transmission policy, based on the power level of the active links. Since transmission power in U W B networks is very low, two links with a large separation in space will cause negligible interference to each other and thus can be active at the same time. To clarify such large space separation, a concept, called exclusive region, is defined in [17] and [18], which is used to choose between concurrent and single transmission policies. Another approach is the resource allocation algorithm proposed in [19]. It uses /^-coloring problem from graph theory to perform scheduling. This problem is that, given a graph G with V vertices and E edges, find the minimum number of colors for coloring the vertices in such a way that no adjacent vertices (i.e., connected with an edge) have the same color. Links, time slots, and the number of time slots in a superframe can be mapped to the vertices, colors, and the number of colors in the ^-coloring problem. The edges also represent heavy interference between two links. The output of this algorithm is the list of links that are allowed to transmit in each of the time slots. If more than one link is scheduled in a time slot, they will transmit concurrently. Shen et al. used a different approach in [20]. They define a weight for each link that is a measure of potential interference on that link. In other words, the larger the weight is, the more the links that can interfere with that link will be. These weights are then used by an algorithm called maximal weighted matching to group flows into concurrent transmission groups. They also consider the fact that not all the flows that are scheduled to transmit concurrently during a time slot use the whole 20 Chapter 2. Background and Related Work time slot. By taking the duration of each transmission into account, they reuse these idle times to further enhance the throughput and channel reuse. Concurrent transmission causes multi-user interference (MUI), which can limit the capacity of the system. Similar to other spread spectrum systems such as C D M A , power allocation can be used in impulse-based U W B systems to combat MUI. However, in U W B systems, under some conditions, power control is not an efficient way to guarantee the required signal to interference and noise ratio (SINR) at the receiver [11]. An effective way to compensate for the shortcomings of power control is rate control. There are two main ways to adjust the rate of a node in an impulse-based U W B system. The first method is to use adaptive channel coding [17], where the code rate is adapted to the channel condition. Based on the feedback information that the sender obtains from the receiver, the sender uses a lower (higher) rate channel code if the channel becomes worse (better). An efficient way of implementing this adaptive channel coding is to use codes that provide incremental redundancy. In these codes, a high-rate code is a subset of lower rate codes. Therefore, additional protection can be provided to high-rate codes by only transmitting the extra redundancy and not the whole data encoded with the stronger code. The second method is to alter T H sequence parameters, i.e. number of pulses for each bit (Ns), maximum time hopping shift (Nh), and time hopping unit (Tc) [19]. This is a unique control method inherent in impulse-based U W B networks. Combined with rate allocation, the power allocation problem can be formulated as a joint optimization problem so as to minimize the total power consumption [19] or 21 Chapter 2. Background and Related Work maximize the total throughput of the system [16]. Bit-level QoS constraints can also be included in the optimization problem to guarantee QoS. This QoS is expressed in terms of an upper bound on the bit error rate (BER), which can be translated into SINR threshold. The powers and rates should be allocated in such a way that the SINR of all the links be above the threshold. Similarly, rate guarantee can also be implemented to provide QoS based on rate requirements [21]. However, the optimal scheduling problem for peer-to-peer concurrent transmissions is NP-hard [22]. Since P N C itself is an ordinary device in terms of computational capacity, optimal scheduling would be a significant burden on it. To reduce the computation load, Liu et al. proposed two simple heuristic scheduling algorithms with polynomial time complexity [22]. In [23], Jiang et al. tried to solve this problem with an effective distributed resource allocation. Using this method, the burden of resource allocation is distributed among all the nodes. They have also proposed a novel message exchange procedure to convey control messages required for the distributed algorithm. 2.4.2 Scheduling Proposals for IEEE 802.15.3 Standard for M A C The proposed channel time allocation algorithms for piconet generally consist of two parts: signaling and scheduling. As the central coordinator, P N C is responsible for scheduling. Depending on the scheduling algorithm, P N C requires certain information such as the number of flows, their reserved rates, their queues' status, type and deadline 22 Chapter 2. Background and Related Work of the frames in their queues, in order to allocate CTAs to different flows. Therefore, a signaling scheme is necessary so that DEVs can inform P N C of the required information. As specified in the standard, DEVs can use C A P for this purpose; however, this scheme has limitation in providing QoS for delay sensitive traffic. The devised frame format is also limited in the diversity of information that can be conveyed to the PNC. Hence, many proposed channel time allocation algorithms propose their own signalling scheme, so that the scheduler receives all the necessary information in a timely manner. In this subsection, we focus on the scheduling and signaling schemes for piconet. General Schemes It has been shown that the Shortest Remaining Processing Time (SRPT) scheduling algorithm minimizes the number of pending jobs in the system, and thus minimizes the average waiting time of the jobs [24]. It schedules different jobs in the system in the order of their remaining processing time, from the shortest to the longest. In the preemptive case, SRPT switches to a newly arrived job if the processing time of that job is shorter than the job currently being served. (In an 802.15.3 piconet, the C T A requirements of a flow can be regarded as a job.) In systems with heterogeneous flows that have different average data rates, SRPT naturally favors the flows with smaller average data rate since they tend to have smaller frames. Motivated by merits of SRPT, Mangharam et al. proposed a solution to this natural unfairness of SRPT. In [1], they proposed a slight variation of SRPT, which 23 Chapter 2. Background and Related Work maintains fairness among flows with different data rates. This scheduling algorithm, called Fair-SRPT (F-SRPT), first normalizes the size of the frames belonging to a flow to the mean rate of that flow. This normalized size is then used by SRPT to sort the frames. As a result of the normalization in the first step, flows with smaller mean data rate will not dominate channel access. Hence, network resources are allocated in a fair manner. The channel time allocation algorithm proposed in [1] guarantees that all the flows receive at least as much as their reserved amount of C T A . If a flow does not require all of its reserved CTAs (i.e., under-loaded), the excess is added to the idle channel time. This idle channel time is shared among the overloaded flows, which require more C T A than their reservation. It is highlighted in [1] that the subtle variations in allocation of idle channel time to overloaded flows can significantly affect the overall scheduling per-formance. Their simulation results show that using F-SRPT gives a lower J F R compared to the I E E E 802.15.3 standard. The F-SRPT scheduling algorithm proposed in [1] includes a signaling scheme, called the one-byte solution. As the name suggests, a' field with one byte is added to the M A C header. A D E V can use this field to inform the P N C of its current queue size. Despite its simplicity, this approach has a few drawbacks. First, this scheme works best when P N C is the destination of all the flows in the system, so that the P N C would be able to extract the extra one-byte from the M A C header and use it for scheduling. This is generally not the case, since P N C is not responsible for packet forwarding. Second, 24 Chapter 2. Background and Related Work U fe fe U fe fe fe < f i l l < ' 1 B B e a CTA of e a CTA of flow / c 0 flow /' c 0 n n Superframe n Superframe n+1 Figure 2.7: Flaws of one-byte solution for updating the status of the queues at the P N C [1]. In superframe n from time t\ to t2 is allocated to flow i. Using the one-byte approach, P N C has the queue status of flow i up to time f2- At time t3 and £ 4 , new data arrive for flow i and change its queue status. Therefore, P N C schedules the flows in superframe (n + 1) at time £ 5 based on incorrect information about flow i. Flow i cannot update its queue status sooner than 7 j 6 . the last opportunity for a D E V to update the P N C about its queue status is the end of its allocated C T A period (see Figure 2.7). Therefore, if new data frames arrive for one of the D E V s flows and change its queue status, it cannot update P N C sooner than its allocated C T A within the next superframe. As a result, P N C schedules the flows based on inaccurate information about the status of the D E V s queues. Lastly, most of the one-byte-information units sent to the P N C are not necessary and are not used by the scheduler. As far as the scheduling algorithm is concerned, the only useful piece of information is the most recent one, which is sent to the P N C in the header of the last M A C frame. In [25], Liu et al. proposed a signaling scheme to mitigate the flaws of the one-byte solution. It allocates one M C T A to each flow at the end of the superframe. This M C T A is used to send control information to the PNC. The main advantage of this scheme is 25 Chapter 2. Background and Related Work that it does not alter the I E E E 802.15.3 standard. Furthermore, since the M C T A s are at the end of the superframe, more recent information is given to the PNC. This scheme reduces the average queueing delay of the system. The optimality of SRPT, in that it minimizes the average waiting time of the jobs in the system, holds only when the jobs have no delay constraint. Thus, in the systems with delay sensitive traffic, SRPT need not be the best scheduler, and using deadline information can yield better performance. One scheduler that considers the deadline of the frames is the earliest due date (EDD). It schedules the frames in the order of their deadlines, from the earliest on. It is proven that E D D policy minimizes the probability of packet dropping in the system due to delay violation [26]. However, in wireless systems like WPANs, where the P N C should be informed of the internal state of the system queues, it is not practical to convey the exact deadline. This information is represented in a quantized form, usually with respect to the superframe size. For instance, a D E V may inform the P N C that the frame in its queue will expire after j full superframes. As a result of quantization, many frames can have equal deadlines, hence a tie-breaking method is needed. In [2], Torok et al. used E D D along with SRPT as the tie-breaking method. They mainly studied the performance of this scheduler under real-time traffic with variable frame sizes. They modeled this traffic with batch arrival process. Each batch is the M A C fragments of a frame and all have the same deadline. In addition, the inter-arrival time of the batches is equal to the inter-arrival time of the frames and can be constant 26 Chapter 2. Background and Related Work or variable. For each flow, P N C has the information about the size and deadline of its first (head-of-the-queue) frame, as well as the total size of its queue. E D D + S R P T scheduler first selects the flows based on the deadline of the first frame in their queue (i.e., according to EDD). If the first frame of several flows have equal deadline, then they are scheduled in ascending order of their frame size (i.e., according to SRPT). After serving the frames closest to expiration, the remaining channel time, if any, is allocated to the flows according to their total queue size using SRPT algorithm. A new signaling scheme is also presented in [2]. When control information is piggyback onto the data, it requires a complementary method to signal new arrival during the off period of data. Reserving fields in the M A C header for piggyback control information also incurs excessive overhead. The signaling method in [2] provides a solution. In this scheme, when a D E V finishes the transmission of its burst, a time slot is assigned to the D E V for sending its control packet. If the D E V has no data to send, the P N C allocates one time slot to the D E V in each superframe so that it can signal new arrival. Reference [2] also describes burst eligibility method which improves the scheduling efficiency. Channel time wastage can be caused by partial transmission of bursts. When a burst expires before being completely transmitted, the transmitted portion is useless for the receiver and wastes channel time. Using burst eligibility scheme, the P N C determines if a burst can be fully transmitted by its deadline. If this is not possible, the burst is not scheduled at all. Thus, channel time can be utilized more efficiently. Channel time wastage can also be caused by incomplete usage of CTAs. Allocating 27 Chapter 2. Background and Related Work the exact instantaneous required C T A to the flows in the system incurs a lot of overhead caused by the control messages that convey the status of the queues. Furthermore, the variable size of the frames makes it even more difficult to allocate the exact required C T A . Also, if a D E V does not receive a beacon correctly (because of channel error), it cannot use the CTAs allocated to it, and those CTAs would be unused. An idle timeslot reuse scheme is proposed in [27] to increase channel utilization. This scheme consists of three phases: detection, cancelation and reuse. The detection of idle C T A can be done in two ways. First, the P N C can determine if a C T A is idle using clear channel assessment (CCA) procedure. Second, when a D E V partially uses its C T A , it may send a cancellation request to the P N C and let the P N C know that it is not using the rest of its C T A . After detecting the idle C T A , the P N C broadcasts a cancellation message. After that, all the DEVs can contend to use the idle part of the C T A . The result is higher channel utilization and lower average queueing delay. According to the current I E E E 802.15.3 standard, once a C T A request is received from a D E V , the P N C shall remember that as the outstanding request for that stream for every superframe until another C T A request for that stream is received from the D E V . Since C T A P is shared between different priority classes, it can fill up very fast, and new connections will be rejected regardless of their priority class. This can cause undesirably long waiting time and poor QoS for high priority streams. A QoS degradation policy is proposed in [28] to mitigate this problem. It is suggested that the lower priority traffic lend some channel time to the newly arrived traffic with higher priority so that they 28 Chapter 2. Background and Related Work do not get rejected. An M/M/c/oo queueing model is used to analyze the impact of changing c (number of available CTAs) on the average waiting time. By getting some share of low priority traffic in favor of the higher priority class, some CTAs are released (i.e., c increases), and consequently the average waiting time reduces. As a result of reducing the average waiting time, more DEVs get the chance to use C T A P . This causes more C T A requests and other commands to be transmitted during CAP, and the limited capacity of C A P may partly cancel out the benefit of the proposed scheduling algorithm for CTAP. In order to increase the number of successful requests (or other commands in general) during CAP, the overhead should be decreased. The command frames have very long headers and preamble, compared to their actual pay load. The command-aggregation scheme proposed in [28] divides C A P into two parts: CAP1 and CAP2. During CAP1, each D E V aggregates all of its commands in a single command and sends it to the P N C within one access to the channel. During CAP2, the P N C aggregates the responses to all the commands, again in a single message. Aggregating the commands significantly reduces the overhead of headers and preamble. It also reduces the number of contentions within CAP. Thus, the collision probability is decreased. The result is the overhead reduction and capacity improvement of the C A P period, so that the enhancement in the C T A P can be fully utilized. 29 Chapter 2. Background and Related Work Application-aware Schemes In this section, we summarize the scheduling proposals that are specifically designed to provide QoS to video streams. They use application layer information for scheduling video frames, namely the type of the frames. Kim and Cho suggested that each video frame type be scheduled with a pre-assigned priority (PAP) based on its importance, i.e., in / , P, and B order [3]. E D D and SRPT are used to break ties among frames of the same type, when they cannot be served together in one superframe. They also proposed using mini packets with short duration for signaling purposes. This method divides the C T A P part of the superframe into two parts. The first part is C T A and is used for transmission of real-time traffic. The second part is Feedback CTAs (FCTAs). The P N C allocates this part for transmission of mini packets. There are two classes of information contained in the mini packets. The first class is used for C T A allocation to video flows in the system. This includes data rate, A C K policy and fragmentation threshold between source and destination of the flow, as well as video frame type and queue status information of the source of the flow. These pieces of information are contained in the mini packet only when necessary (i.e., when their value changes). The second class of information is used for deciding the allocation of F C T A for mini packets. In order not to waste channel time with unused F C T A , it is only allocated to the DEVs that need to send mini packet to the P N C (i.e., when a new video frame arrives at the DEV) . Therefore, this class of information includes the time of the 30 Chapter 2. Background and Related Work next arrival. By using that information, the P N C can allocate FCTAs only to the flows that need it. After receiving all the mini packets from the DEVs, the P N C performs scheduling based on the received information. In the scheme proposed in [29], each D E V informs the P N C about the maximum size of its 7, P and B frames, as well as its G O P pattern. Thus, the P N C can determine the type of each frame, and allocates channel time to it as much as its maximum size. Since even one frame type of video stream has high maximum to mean size ratio, this scheme suffers from low channel utilization. Energy efficiency is considered by the scheduler proposed in [30]. It defines different service categories in order to balance between energy efficiency and QoS. It also uses the application layer information to assign priorities to the buffered frames at the source of a flow. Furthermore, a simple flow admission control is proposed to admit new flows of different service categories to the system. Other Schemes Choosing a proper A C K policy for data communication is not addressed in the I E E E 802.15.3 standard for M A C . In [31], Xiao et al. studied optimal A C K policies under U W B channel error condition. For a given A C K policy and error channel condition, they analytically determined the optimal payload size, which maximizes the throughput. Since C T A P and C A P have different channel access mechanism, the optimal A C K policies are studied separately for each case. In C T A P , the analytical model gives the optimal 31 Chapter 2. Background and Related Work payload size for each A C K policy, given a fixed (allocated) C T A . In CAP, however, a model similar to Bianchi's model [32] is used to calculate C A P throughput for each A C K policy. The optimal payload size is calculated by maximizing the throughput. The proper burst size of Dly-ACK policy is also not covered in the standard. The overhead of Imm-ACK policy is very high because A C K frame is sent at the base rate, which can be much less than the data rate. Furthermore, the interframe spacing of Imm-ACK (SIFS) is longer that that of Dly-ACK (MIFS). Therefore, using Dly-ACK can potentially reduce the overhead caused by A C K frames. Liu et al. [33] present an adaptive delay acknowledgement algorithm for video traffic. It dynamically adjusts the burst size of the Dly-ACK policy in response to the channel quality. The results show that this scheme gives a higher throughput and lower JFR, compared to Dly-ACK with fixed burst size or Imm-ACK. 2.5 Markov Decision Process (MDP) In real life, as well as laboratory and simulation settings, there are many cases where one should make decisions while accounting for the relationship between present and future decisions. In other words, decisions are not made in isolation. Such decision making problems can formally be analyzed using the model for probabilistic sequential decision making under uncertainty. In this model, a decision maker or an agent observes the system state at any decision making epoch, based on which it chooses an action. As a result of the chosen action, the agent receives (incurs) an immediate reward (cost) 32 Chapter 2. Background and Related Work and the system evolves to a new state according to a probability distribution. Both the reward and the transition probabilities depend on the state and the chosen action. At the next decision epoch, the agent faces a similar decision making problem. However, the system may be in a different state and there may be a different set of actions to choose from [34]. At any decision epoch, the action is chosen according to a decision rule, which may depend on the current state alone, or on all previous states and actions. A policy provides the agent with a set of decision rules so that the agent can choose an action in any possible system state. In other words, the agent uses the policy as a prescription to choose an action in any situation. When an agent implements a policy, it receives a sequence of rewards over time. The sequential decision problem is: given the system information (i.e., the set of decision epochs, states, actions, rewards and transition probabilities), what the optimal policy is with respect to a predetermined performance criterion. This criterion can be a function of the reward sequence yielded by the policy. Possible choices for this function include the expected total discounted reward or the long-run average reward [34]. Markov decision process (MDP) is a special case of sequential decision making model, where the set of available actions, the rewards, and the transition probabilities depend only on the current state and action, and not on the visited states and chosen actions in the past. Any decision making problem modeled by M D P has the following elements: decision epochs, system states, available actions, transition probabilities, and immediate 33 Chapter 2. Background and Related Work rewards. Decision epochs are the points of time at which the agent makes decisions. The set of decision epochs can be either a discrete set or a continuum. It can also be classified into finite or infinite set, based on its size. The MDPs based on the former are finite horizon, as opposed to infinite horizon MDPs that are based on the latter. In this chapter, we consider the discrete set of decision epochs for infinite horizon MDPs. Thus, the set of decision epochs is the set of integers. At each decision epoch, the system is in a state. This state describes the status of the system at that epoch. We denote the set of possible states by S. Hence, the underlying Markov chain of an M D P with discrete epochs set is given by X = {Xn : Xn € S, n = 0, 1, ...}, where Xn denotes the system state at the nth decision epoch. Depending on the system state s 6 S at any decision epoch, there is a set of available actions As that the agent can choose from. When the agent chooses an action a € As, the system state evolves according to the transition probability p(s'\s,a), which represents the probability of moving from state s to state s' under action a in one step. Furthermore, the agent receives an immediate reward r(s, a). This reward function is defined as r: S x A —> It, where R denotes the set of real numbers, and r(s, a) is the expected reward for taking action a in state s. That is, r(s, a) — ^2sieS r(s, a, s')p(s'\s, a), where r(s, a, s') is the immediate reward in transition from state s to state s' under the action a. Sequential decision problem consists of stages of decision making and receiving reward. This allows the use of of dynamic programming, where the whole multi-stage problem 34 Chapter 2. Background and Related Work of finding the optimal value can be reduced to a sequence of simpler inductively defined single-stage "choosing action and receiving reward" problems. When the decision making agent follows policy it up to the decision epoch T , it yields a bivariate discrete-time reward process given by [34] {(Xn,r(XniYn))\Xn€S,YneAXn,n = 0, 1, . . . , T } . The first component is the state of the system at decision epoch n which was described before. The second component is the reward received when action Yn is chosen in state Xn according to policy it. The average reward or gain of policy it is defined to be p ' ( S )4 r l im i £ 7 r | ^ r ( X n , y n ) | ^ n=l ' where E*{-} is the expectation over all the possible state-action sequences beginning with s that the policy can yield: {(Xi,Yi,X2,Y2, • • •) \Xi = s}. Dynamic programming (DP) methods can find a stationary deterministic policy it* (which is a mapping it*: S —> A), that is optimal with respect to the average reward criterion (i.e., optimal gain policy). Value iteration and modified value iteration [34] are examples of DP technique. 2.6 Reinforcement Learning (RL) In many real-life sequential decision making problems, DP techniques are not practi-cal because of two main reasons. First, the agent may not have complete knowledge about dynamics of the system or environment. Second, when the system state and/or action space is huge (e.g., continuous state spaces), it is not feasible to exploit DP. The 35 Chapter 2. Background and Related Work former problem is called curse of modeling, while the latter is called curse of dimension-ality. Reinforcement learning (RL) combines DP methods with other methods including stochastic approximation and function approximation to tackle these problems [35]. RL is the problem faced by a decision making agent that must learn from its own experience of interacting with a dynamic environment. The agent should learn a policy (i.e., how to map situations to actions) through trial and error. The agent is not provided with the policy, but instead it must discover which actions yield the most reward by trying them. Figure 2.8 illustrates the standard RL model. In this model, the agent has an indication i of the system state s (e.g., if the agent is a robot, this indication can be the signals that the robot receives from its different sensors). The agent takes action a, which leads the system to another state. The value of this state transition is communicated to the agent through a scalar reinforcement signal (or reward) r. The agent should choose actions that tend to increase some function of r. Similar to DP, this function can be average discounted reward, average reward, or other reward functions. Unlike DP, however, the agent should learn to maximize this function over time by systematic trial and error, guided by a wide variety of algorithms [35] [36]. The system environment can be either real or simulated. If a simulator is used, complete knowledge of the random variables that govern the system dynamics is required [37]. In order to obtain a lot of rewards, the agent should be greedy; i.e., it should choose the actions that it has discovered to produce a lot of reward. In other words, it should exploit its current experience. However, there might exist better actions that the agent 36 Chapter 2. Background and Related Work ^— Real or Simulated Model j System Figure 2.8: Standard R L model has not yet discovered or explored. As a result, if the agent occasionally deviate from the greedy manner (i.e., perform exploration), chances are that it can find better actions. The dilemma in R L is that neither exploration nor exploitation should be pursued exclusively. In fact, one of the challenges of RL is to balance the trade-off between exploitation and exploration. The agent must try a variety of actions and progressively favor those that appear to be best. On a stochastic task, each action must be tried many times to gain a reliable estimate of its expected reward [36]. In the next section, we elaborate more on how RL overcomes the challenges that render DP to be impractical. 2.6.1 Curse of Modeling As mentioned in Section 2.5, DP requires the exact information about all the components of the M D P that it tries to solve. The component that gives the system model is the transition probabilities p(s'\s,a) for all states and actions. In many applications, this model is not available. In this section, we summarize an algorithm to overcome this 37 Chapter 2. Background and Related Work problem, which is proposed in [38]. Model-free Q-Learning When the agent does not have the system model, it should build a knowledge base ac-cording to its interaction with the system. In Q-learning, this knowledge base is made up of factors called Q-values. For each state s € S and each action a G A3, the agent saves a Q-value Q(s, a) and updates these values as the agent learns. Before the learn-ing begins, all the Q-values are set to the same value. At a decision epoch, when the system is in state 5, the agent chooses an action a that maximizes the Q-value (i.e., a = argmax Q / g A 3 Q(s,a')). The simulator will then simulate the chosen action a in the current state s and evolve to the new state s'. It also gives the accrued reward and the time spent during the state transition. The learning algorithm embedded in the agent will then use this information to update Q(s,a). Basically, Q(s,a) increases if a was a good action (i.e., good actions are rewarded), and decreases otherwise (i.e., bad actions are punished). As mentioned before, an exclusively greedy learning algorithm is not the best. Consequently, the agent should occasionally explore and choose an action other than the one with the maximum Q-value. Over time, as the agent explores and tries out all the state-action pairs often enough, it learns the best action in each state. In other words, it learns an optimal (or near-optimal) policy. At this point, the agent should cease exploration and follow the learned optimal policy in a greedy manner. Q-learning is a value iteration based RL. It learns the Q-value of taking action a in 38 Chapter 2. Background and Related Work state s using the following updating procedure [37]: Q(s,a) — r(s, a)p(s'\s, a) max Q(s', a') a'eA., s'es 3 r(s, a, s') + max Q(s', a') a'eA,, (2.1) s'es where the last equality follows from the definition of r(s, a). There are at least two reasons that RL, including Q-learning, requires the updating in equation (2.1) to be gradual [37]. First, R L is asynchronous (i.e., unlike DP, the Q-factors of state-action pairs are not all updated simultaneously at each iteration) and requires a proper learning rate to converge. Second, the major goal of RL is to avoid computation of transition probabilities, which also requires a learning rate. For a learning rate p G (0,1], we can write the equation (2.1) as: Q(s,a) = (1 - p)Q(s,a) + p^2p{s'\s,a) r(s, a, s') + max Q(s', a') a'eA,, (2.2) s'es Since equation (2.2) uses the probability distribution p(s'\s,a), it gives the updating equation for model-based Q-Learning algorithm, which still suffers from the curse of modeling. Stochastic approximation schemes can be used to circumvent this problem. One such scheme is proposed by Robbins and Monro [39], which is used to approximate the mean of a random variable with methods such as simulation. Given a stochastic function f(Q), the updating procedure Q *— (1 — p)Q + p[f(Q)\ with a suitable learning rate produces an averaging effect (i.e., converges to the average of /(Q)) if updating is performed frequently enough. This updating procedure is very similar to equation (2.2). Since the term [r(s,a,s') + maxa,eAs, Q(s',a')] is averaged anyway, one can replace the 39 Chapter 2. Background and Related Work expectation over s' in equation (2.2) by a sample. Thus, the Robbins-Monro version of Q-learning can be written as follows [40]: Q(s, a) <— (1 - p)Q(s, a) + p r(s, a, s') + max Q(s', a') a'eA., (2.3) which is free of transition probabilities. Equation (2.3) describes the model-free coun-terpart of equation (2.1). Since the average reward value iteration can be numerically unstable [34], the numerically stable version of model-free Q-learning should be based on relative value iteration. We describe this in the next section. SMART Algorithm Most of the R L algorithms in the literature, including Q-learning, are based on the discounted reward optimality criterion. Moreover, these algorithms cannot automatically extend to the average reward criterion [41]. As we will show later in Section 4.2.5, the average reward criterion is more suitable for our scheduling problem. We use an algorithm called S M A R T (Semi-Markov Average Reward Technique) [38] [42] [40] that is designed for average reward RL. The convergence analysis of this algorithm is given in [42] and it has been successfully applied to different problems including production inventory [38], airline seat allocation [40], and QoS provisioning in wireless cellular networks [41]. The difference between Semi-MDP (SMDP) and M D P is that for SMDP, the time that it takes the system to transit from one state to the other is stochastic. M D P is a special case of SMDP when the state transition duration is fixed. Since in Section 4.2 we formulate the scheduling problem in the form of an MDP, we can use this algorithm with fixed state 40 Chapter 2. Background and Related Work transition duration. S M A R T is an R L version of relative value iteration algorithm [34] and subtracts the average reward in each update. The M D P version of the update procedure for S M A R T is as follows [38]: Q{s,a) <— (1 - p)Q(s,a) + p (r(s,a, s') - np + m&xQ(s',a')j = Q(s, a) + r(s, a, s') — rjp + max Q(s', a') —Q(s,a)^ (2.4) where n is the fixed state transition time, p is the average reward (see Section 4.3.3), and p is the learning rate. S M A R T is the base of our scheduler proposed in Chapter 4. 2.6.2 Curse of Dimensionality Besides a lack of knowledge about the system model, the agent may also face a system with vast state and/or action space to interact with. In this section, we address the measures for tackling this challenge. State Aggregation and Features In the systems with large state space, it is sometimes possible to aggregate similar states with slight degradation in accuracy. For instance, consider a problem where a robot needs to learn how to find a moving object in a rectangular 2-D plane. The obvious choice of state space is the coordinates of the robot and the moving object. However, this state space has continuous variables and its size is not finite. To use aggregation in 41 Chapter 2. Background and Related Work • Robot X Target D imens ion 1 Figure 2.9: This figure shows a 2-D plane, in which a robot tries to find a target. Robot and target are depicted in three different positions, which are all equivalent with respect to the value of features fx and / 2 . These different states can be aggregated. this problem, the 2-D plane can be divided to small rectangular tiles. The center of each tile can then represent the coordinates of the robot or the target when they are in that tile. Thus, all the points of the state space that lie inside a tile are aggregated to the center of the tile. Depending on the learning problem, there might exist some features in the states that can fully capture the important aspects of system that influence the learning process. For instance, consider the same problem as in the previous paragraph, where the state space is the coordinates of the robot and the moving object. Using the concepts of simple geometry, it can be verified that as far as the target finding problem is concerned, this state space only has two important features: / i , the distance between the robot and the target, and / 2 , the angle between the robot-target line and the horizontal line passing from the target (see Figure 2.9). Since all the points in the state space that yield the 42 Chapter 2. Background and Related Work same value for features ( / i , / 2 ) can be aggregated and represented by that value, features can be viewed as a form of aggregation. In general, the choice of suitable features is not a straightforward task, and it requires good understanding of the problem, intuition, and experimentation. For large discrete state spaces, state aggregation can reduce the memory requirement of look-up tables and make it feasible to use. However, as we will describe in the next section, memory requirement is not the only challenge in problems with large (or continuous) state spaces. Therefore, state aggregation, by itself, may not solve the curse of dimensionality. Note that in the case of Q-learning, where the agent deals with state-action pairs, aggregation should be used in the state-action space. Generalization and Function Approximation In Q-learning, the knowledge base includes a Q-value for each state-action pair. This is only practical for problems with small state and action spaces. One reason is that the memory requirement for the lookup table is 0(|5| • |v4|), which grows very large for many practical problems. In addition, the lookup table method cannot be directly applied to the problems with continuous state and/or action spaces. Besides memory requirement, another problem with large lookup tables is the amount of time required to fill them up accurately [36] because lookup tables use the gained experience inefficiently. For instance, lookup table does not take advantage of the fact that in a large smooth state space, we generally expect similar states to have similar optimal actions [35]. In many tasks, most encountered states have never been tried 43 Chapter 2. Background and Related Work exactly before. This is usually the case in complex problems or when the state or action spaces include continuous variables [36]. As a result, a more compact representation that allows transfer of experience between similar state-action pairs should be used. In other words, this compact form should generalize from previously experienced state-action pairs to the ones that have never been experienced. The kind of generalization that is required in Q-learning is often called function approximation, because it tries to generalize the Q-values that are experienced to approximate the entire Q-function. Generalization techniques that are successfully applied to R L include artificial neural network methods [43], fuzzy logic [44], cerebellar-based (CMACs) or tile coding [45] [46], and memory-based methods [47]. A class of function approximators called sparse coarse-coded function approximators nicely provides generalization with reasonable memory requirement [48]. This class is the most basic model of associative memory, where several memory locations are used to give the content associated with an input. This value would be similar for two different inputs, if they share many memory locations. In fact, generalization is the result of inputs sharing memory locations. Tile coding and memory-based techniques fall in this class. In memory-based techniques, each memory location (or prototype) corresponds to a group of state-action pairs that the agent has experienced before. In other words, each prototype covers a region of state-action space and is activated by any point in that region. Furthermore, each state-action pair can activate more than one prototype. The set of prototypes can be fixed (e.g., uniformly distributed in the state-action space), or 44 Chapter 2. Background and Related Work • Input point X Act ive prototypes x Inactive prototypes Dimension 1 Figure 2.10: An example of memory-based function approximator. The input point is the point in the state-action space that its Q-value is desired. All the prototypes within the activation radius of the input are activated. The weighted sum of the values of the activated prototypes is the approximated value of the input. dynamically expand as the agent explores new regions of the state-action space. One method for dynamic expansion of prototypes is to add enough new prototypes in order to maintain a minimum density of prototypes along the explored trajectory. In the case of Q-learning, the prototypes are used to approximate the entire Q-function. When the agent tries a new state-action pair, its Q-value is calculated as the weighted sum of the prototypes' values (i.e., contents of the memory locations) that are activated by this pair. In order to determine what prototypes are activated by an input, a similarity metric is required. For instance, in Figure 2.10, all the prototypes within an activation radius of an input are similar to and activated by it. In Chapter 4, we use a memory-based function approximator for our proposed scheduler. 45 Chapter 2. Background and Related Work 2.7 Summary In this chapter, we first introduced different physical layers based on U W B transmission, which enable WPANs to support high-rate applications, such as video streaming. We also investigated the characteristics of video streams, as one of the major traffic types in future high-rate WPANs. It was noted that for video flows, which have inter-frame dependencies, it is more appropriate to use D F R as the performance metric for scheduling. In addition, we described the related work on scheduling at M A C layer, many of which were based on the impulse-based U W B and took advantage of concurrent transmission allowed by this type of physical layer. A few application-aware scheduling proposals, which use application layer information for scheduling video frames, were also mentioned. However, the related work on this class of M A C schedulers is limited. Furthermore, the current proposals do not use the application layer information effectively. An introduction to MDP, challenges of solving large-scale MDPs, and R L were also presented. In Chapter 3, we propose a new technique for video schedulers to incorporate application-layer information more efficiently. We show that this simple technique can significantly improve the scheduling performance measured in terms of DFR. 46 C h a p t e r 3 F rame Decodab i l i t y A w a r e ( F D A ) Techn ique In this chapter, we propose a method to improve video traffic schedulers in I E E E 802.15.3 WPANs, called the frame decodability aware (FDA) technique [6]. We first state the motivations and assumptions, followed by the description of our F D A technique. We then present the simulation results for the impact of the F D A technique on several scheduling algorithms, namely F-SRPT [1], EDD+SRPT [2], and PAP [3] algorithms. 3.1 Motivations and Assumptions Similar to some of the previous work described in Chapter 2, our proposed scheme uses information such as frame size, deadline, and type of the frame. However, one can also determine the decodability of the queued frames in the system at no extra signalling cost. To our knowledge, none of the previous work have explored this opportunity, which is readily available at the PNC, for improving scheduling performance under video traffic. This is the main motivation of our proposed F D A technique, which will be described in the next section. This technique has the following properties: 47 Chapter 3. Frame Decodability Aware (FDA) Technique • It provides a simple way of exploiting the information about video frame type, hi-erarchical structure, and decoding dependencies in order to improve the scheduling performance. As in [3] and [49], we assume there is enough cross-layer interaction to provide the information about frame type at the M A C layer. • It is independent of the scheduling algorithm implementation, which is important from design point of view and allows more flexibility. For simplicity, we assume that the transmission deadline of video frames is constant, and is equal to the frame inter-arrival time. Thus, at any given time, the queue of each flow holds no more than one video frame. We also assume that the length of superframe is a constant. In general, since the size of a video frame is larger than the maximum size of the data frame passed from the upper layers, fragmentation is performed at the M A C layer. 3.2 F D A Technique Let T denote the set of flows. In F D A , the P N C creates a scheduling eligibility table. The number of entries is equal to the number of flows F = \F\. The entry for flow i € F includes the following information: • The remaining amount of channel time k that should be allocated to flow i\ • The deadline di, no later than which lt should be allocated to flow i; 48 Chapter 3. Frame Decodability Aware (FDA) Technique • The type of the video frame in the queue of flow i, which is denoted by j/*. The unit of li is in milliseconds and is locally determined by the D E V to which flow i belongs. It depends on the length of the video frame, as well as the A C K policy and channel rate used by the source and destination of the flow [5]. Since the length of superframe is constant, the unit of di is in terms of the number of superframes left until the deadline expires. The type of a video frame yi € {/, B, P}. The F D A technique determines the decodability of each frame based on the deadline and type information. For each flow i € !F, we define a binary variable Si. The value of 5i is set to 0 if any I or P frame on which the current frame of flow i depends is directly lost; otherwise, Si is set to 1. According to the hierarchical structure of video encoding, a frame depends on its preceding I and P frames up to the most recent I frame. Therefore, when yi = I or P, Si indicates whether the frame of flow i is decodable, (i.e. neither directly nor indirectly lost). If Si is equal to 0, the subsequent frames up to the next / frame (i.e., beginning of the next GOP) are indirectly lost. However, when yi = B, Si only indicates that the frame of flow i is not indirectly lost. The entry of each flow also includes a variable e,, which shows whether or not the frame of flow i € T is eligible for being scheduled. The P N C updates the scheduling eligibility table at the beginning of each superframe before making the scheduling decisions. The pseudo-code of the table update for entry i is shown in Figure 3.1. The table is updated by the following procedures: If some channel time is allocated to flow i € J7 in the previous superframe, then ^ is reduced 49 Chapter 3. Frame Decodability Aware (FDA) Technique For each flow i e F, the scheduling eligibility table maintains the following info: • k is the remaining amount of channel time that should be allocated to flow i. • di is the deadline of allocating li to flow i. • yi is the type of the video frame in the queue of flow i, yi € {/, P, B}. • 6i is set to 0 if any I or P frame, on which the current frame of flow i depends, is directly lost; and is equal to 1 otherwise. • a indicates whether the current frame in the queue of flow i is eligible for being scheduled. (A) At the beginning of each superframe, perform the following steps for each flow i € T\ k *— k — (time allocated to flow i in the previous superframe); di <- (di - 1); yi is not changed; if (di = 0) and (k > 0) then if (yi = I) or (yi = P), then Si <— 0; else Si is not changed. if the DEV corresponding to flow % has sent information about a new arrival in the previous superframe, then /. . mew. A. , jnew. ... , ..new. Li li i ui ai i til iii i where, l^ew, d™™ and yfew denote the required CTA, deadline and type of the new frame, respectively. This information is contained in the message sent to the PNC from the DEV. if y™» = / , then S{ <- 1; else 5i is not changed. if (k > 0) and (<k > 0) and (Si = 1), then set a = 1 (eligible for being scheduled); else set ei = 0 (ineligible for being scheduled). (B) After updating the scheduling eligibility table, feed all the eligible flows to the scheduler. Figure 3.1: Pseudo-code of the F D A technique. 50 Chapter 3. Frame Decodability Aware (FDA) Technique by that amount. The value of /, becomes 0 when the channel time request of flow i is fully accommodated (i.e., when the queue of flow i becomes empty). In any case, di is reduced by one in order to indicate that one superframe has elapsed from the deadline. If di reaches 0 and the queue of flow i is not empty (k > 0), then the frame in that queue has expired and is no longer eligible for transmission. This would impact the decodability of the subsequent frames of flow i if yi is either I or P. In either case, we set <5» to 0 in order to indicate that an I or P frame on which the upcoming frames of the current G O P depend is directly lost. Note that if the expired frame is of B type, then Si is not changed. The reason is that the decodability of subsequent frames is not affected by the loss of a B frame. In case of new frame arrivals, the P N C should also update the entries of the flows that have new frames according to the information sent from the corresponding DEVs. The parameters U, di and yi are set to the required C T A , deadline and type of the new frame arrived at flow i, respectively. If the new frame is of type I, then a new G O P has begun. The decodability status of the frames preceding the new I frame does not affect the decodability of the I frame and its subsequent frames. Therefore, when a new / frame arrives for flow i, we set Si to 1 regardless of the previous value of Si. Finally, after updating the values of li, di, yi and Si, the F D A technique determines if the frame of flow i is eligible for being scheduled. The frame is marked as eligible (i.e., e» = 1) if the following three conditions are valid: (1) it is not completely transmitted (k > 0), (2) the frame has not expired (di > 0), and (3) the frame is not indirectly lost 51 Chapter 3. Frame Decodability Aware (FDA) Technique (Si = 1). When the scheduling eligibility table has been updated by F D A , the scheduling algo-rithm will be applied to the eligible frames. Note that the F D A technique is independent of the type of scheduling algorithm being invoked. It only gives the list of eligible frames to the scheduler. By using the F D A technique, the scheduler does not need to allocate channel time to those frames that are indirectly lost. As a result, some CTAs are freed up for scheduling other frames that are decodable at the receiver. In Section 3.3.2, we study the impact of the F D A technique on several scheduling algorithms. Figure 3.2 shows an example of updating the scheduling eligibility table with 5 video flows in the piconet. At the beginning of superframe n (see Figure 3.2(a)), the frames of flows 3 and 4 expire. The flows 1, 2 and 5 are eligible for being scheduled. Suppose that in superframe n, 15 ms and 32 ms of channel time is allocated to flows 2 and 5, respectively. At the beginning of the next superframe n + 1 (see Figure 3.2(b)), the i h di Vi Si e* i U di Vi Si e» 1 18 1 B 1 1 1 18 0 B 1 0 2 23 1 P 1 1 2 8 0 P 0 0 3 28 0 I 0 0 3 21 3 B 0 0 4 14 0 B 0 0 4 37 3 I 1 1 5 32 2 P 1 1 5 0 1 P 1 0 (a) (b) Figure 3.2: An illustration of how the F D A technique updates the scheduling eligibility table. The table after being updated at the beginning of (a) superframe n, (b) superframe n + 1. 52 Chapter 3. Frame Decodability Aware (FDA) Technique frame of flow 1 expires. Since this frame is of type B, it does not affect 5\. The frame of flow 2 also expires and is directly lost. Since this frame is of type P, 52 becomes 0. A new frame arrives for flow 3 with C T A requirement, deadline and type equal to 21 ms, 3 and B, respectively. This frame is indirectly lost because its previous I frame is directly lost, as indicated by the value of S3. A new I frame arrives at flow 4. The value of 84 will be set to 1. Finally, the queue of flow 5 becomes empty as its channel time request is completely granted in superframe n. The frame of flow 4 is the only frame eligible for being scheduled in superframe (n + 1). Although d3 > 0, the frame of flow 3 is not eligible for being scheduled because it is indirectly lost. By preventing the undecodable frames being scheduled, the F D A technique improves the utilization of the channel time. The implementation of F D A requires a signaling scheme to pass the required info from DEVs to the P N C . For example, F D A can use the signaling method proposed in [25] which is compatible with the 802.15.3 standard. It allocates one M C T A to each flow, at the end of the superframe. Compared to using C A P or allocating M C T A s at the beginning of C T A P , this scheme gives more recent information to the P N C and reduces the average response time of the system. It is important to note that F D A does not depend on (or affect) the implementation details of the scheduler. It merely modifies the input to the scheduler (see Figure 3.3). Thus, F D A technique can be used in any scheduler to provide information about the decodability of video frames. 53 Chapter 3. Frame Decodability Aware (FDA) Technique Schedu l ing dec is ions T i m e a l loca t ion requests o f f l ows Scheduler F D A M o d u l e Schedu l ing e l i g ib i l i t y table Figure 3.3: Logical interaction between F D A module and the scheduler. 3 . 3 Performance Evaluation In this section, we evaluate the performance of various scheduling algorithms with (and without) the use of the F D A technique. The scheduling algorithms that we studied are the F-SRPT [1], E D D + S R P T [2], and PAP [3] algorithms. 3.3.1 Simulation M o d e l The implementation of our proposed F D A technique is done in M A T L A B . We use the real video trace of Jurassic Park movie with G O P pattern of (N=12, M=3) [50]. The frame rate is 30 frames per second. The average rate of each flow is 8 Mbps. We assume that the maximum tolerable delay for video frames is equal to the frame inter-arrival time, which is 1/30 s. The number of flows vary from 2 to 10. The M A C overheads such as headers and inter-frame spacings are set as specified in the I E E E 802.15.3 standard [5]. The channel data-rate is equal to 100 Mbps, the superframe size is 8 ms, and the simulation time is equal to 500 s. Table 3.1 summarizes the simulation parameters. 54 Chapter 3. Frame Decodability Aware (FDA) Technique Table 3.1: Simulation Parameters Parameter Value Channel data rate 100 Mbps Superframe duration 8 ms Number of video flows 2 - 1 0 Video GOP pattern (12, 3) Mean data rate of video flows 8 Mbps Frame inter-arrival time 1/30 s Maximum tolerable delay 1/30 s A C K policy Imm-ACK Max M A C fragment size 2048 bytes Simulation duration 500 s 3.3.2 Simulation Results The start time of different flows in the system affects the burstiness of traffic load, and thus influences the overall performance. In order to show this fact, we assume that there is (f> seconds between the start times of successive flows. In other words, flow i starts at time i<j> [1]. Figure 3.4 compares the average decoding failure rate (DFR) and job failure rate (JFR) achieved by F-SRPT and E D D + S R P T scheduling algorithms when <j> varies from 0 to 30 ms, and the number of video flows is fixed to 9. In our simulation model, for small values of 4> the traffic peaks of each flow coincides with that of other flows. Therefore, the overall load becomes more bursty and both J F R and D F R increase significantly. Thus, we conclude that cf> can affect the performance. Another observation 55 Chapter 3. Frame Decodability Aware (FDA) Technique - S -DFR of F -SRPT •e-DFR of EDD+SRPT - * - J F R of F - S R P T - ^ - J F R of EDD+SRPT 10 15 20 Time Separation [ms] 25 30 Figure 3.4: The effect of time separation 4> on failure rate (F = 9). in Figure 3.4 is the large gap between D F R and J F R for both schedulers. For most values of <f>, D F R is more than 3 times as much as JFR. This validates our claim in Chapter 2 that for video flows, low J F R is not an accurate indicator of good QoS. As mentioned in Section 2.3.1, B frames are likely to be smaller that I and P frames, so they are favored by the SRPT scheduler. Similarly, P frames have a better chance to be scheduled, compared to I frames. In other words, SRPT functions in favor of less important frames. As a result, many of the completed jobs (i.e., scheduled frames) by SRPT are undecodable at the receiver, because they rely on other more important frames that were deprived of channel time in the presence of small B (or P) frames. Therefore, there exists a significant gap between D F R and JFR for SRPT. This gap is the sign of channel time wastage by scheduling undecodable frames. F-SRPT also has the similar drawback, because it is essentially the same as SRPT (see Figure 3.5). It only maintains 56 Chapter 3. Frame Decodability Aware (FDA) Technique fairness among high and low rate flows. Using the F D A technique, F-SRPT scheduler does not waste channel time by scheduling undecodable frames. Since these frames are mainly of the less important types, this would free up some channel time for scheduling frames of more important types. In effect, none of the undecodable frames are scheduled, and any scheduled frame is decodable. Therefore, J F R and D F R are identical when F D A technique is used. As mentioned before, for small values of <f>, the traffic load is very bursty and D F R is thus high. However, since P N C is in charge of admitting new flows to the system, it can alleviate this issue to some extent. For instance, it can reduce the burstiness of traffic by delaying the initiation of a flow for a few milliseconds, which is negligible from user's point of view. Therefore, we focus on the cases when <f> is larger than superframe size, and present the performance evaluations averaged over 8 ms < cf> < 30 ms. Figure 3.6 shows that dropping undecodable frames can reduce D F R of F-SRPT scheduler up to 61%. Note that F D A technique improves the objective measure of video quality, despite causing J F R to increase. As explained before, we are not concerned about an increase of JFR, as long as D F R decreases. Using E D D along with SRPT, the E D D + S R P T scheduler gives higher priority to the frames with more stringent deadline. This approach reduces both JFR and DFR, compared to SRPT. The nature of SRPT, however, again causes a large gap between J F R and DFR. The F D A technique improves D F R of EDD+SRPT, basically in the same manner as described for SRPT. Figure 3.7 depicts the effect of <fi and shows that 57 Chapter 3. Frame Decodability Aware (FDA) Technique 0.9 0 i . 1 1 . 1 1 0 5 10 15 20 25 30 Time Separation [ms] Figure 3.5: Effect of F D A technique on F-SRPT scheduling algorithm versus time sep-aration (j>. 0.45 0.4 B0.35 co rr 0.3 C D M 0.25 co LL C B 0.2 O ) C O a> 0.15 * 0.1 0.05 0 - • - D F R -#-DFR with FDA -0-JFR with FDA -+-JFR • 5 6 Number of Video Flows Figure 3.6: Effect of F D A technique on F-SRPT scheduling algorithm versus the num-ber of flows F. 58 Chapter 3. Frame Decodability Aware (FDA) Technique -H-DFR -*- DFR with FDA JFR with FDA -+-JFR 0 25 30 5 10 15 20 Time Separation [ms] Figure 3.7: Effect of F D A technique on E D D + S R P T scheduling algorithm versus time separation <j>. F D A technique can significantly reduce DFR. Figure 3.8 depicts this improvement versus the number of video flows in the system, which is up 60% reduction in DFR. Unlike the previous two schedulers, PAP scheduler treats video frame types differently, and uses the information about the type of frames. It serves I frames with the highest priority, followed by P and B frames, respectively. These priorities are based on the importance of each frame type. The F D A technique has the least impact on the PAP scheduler, and does not affect D F R for <p > 5 ms (see Figure 3.9). Since PAP schedules I and P frames before B frames, most of the I and P frames are scheduled, and they have low rate of missing their deadline. As explained in Section 2.3.1, indirect loss is the result of expiry of I or P frames. Therefore, PAP scheduler results in a small number of undecodable frames, which reduces channel time wastage due to scheduling undecodable 59 Chapter 3. Frame Decodability Aware (FDA) Technique OA 0.35 a) 0.3 ca rr co 0.25 _=? i£ 0.2 CD ^0.15 CD < 0.1 0.05 0 -E -DFR -*-DFR with FDA -0-JFR with FDA -+-JFR 6 7 Number of Video Flows 0 0 10 Figure 3.8: Effect of F D A technique on E D D + S R P T scheduling algorithm versus num-ber of flows F. QI i i i i I I 0 5 10 15 20 25 30 Time Separation [ms] Figure 3.9: Effect of F D A technique on PAP scheduling algorithm versus time separa-tion <j>. 60 Chapter 3. Frame Decodability Aware (FDA) Technique 0.2 0.05 -0-PAP with FDA - B - F - S R P T with FDA -0-EDD+SRPT with FDA 10 25 30 15 20 Time Separation [ms] Figure 3.10: Comparison of F-SRPT, EDD+SRPT, and PAP schedulers when F D A is used. frames. Thus, as shown in Figure 3.9, the gap between D F R and J F R for PAP scheduler is less than the other two schemes. So there is little room for improving PAP, as the F D A technique uses the wasted channel time to reduce DFR. Figure 3.10 illustrates the D F R of the three studied scheduling algorithms with F D A technique. It shows that none of the schedulers is the best for all values of <j>. 3.4 Summary In this chapter, we proposed an F D A technique to improve the performance of schedulers for video traffic. Our F D A technique can be applied to various types of schedulers and does not incur any extra signalling overhead between DEVs and the PNC, except for exchanging the information about frame type. We used both D F R and JFR as 61 Chapter 3. Frame Decodability Aware (FDA) Technique performance metrics when evaluating video traffic schedulers, and emphasized that low D F R is a more appropriate indicator of the QoS given to video traffic. We analyzed the effect of F D A technique on F-SRPT, E D D + S R P T and PAP schedulers, and showed up to 61%, 60% reduction in DFR, for the first two schedulers. 62 C h a p t e r 4 Schedu l ing A l g o r i t h m for V i d e o F lows 4.1 Introduction and Motivations In Chapter 3, we proposed the F D A technique for performance improvement. In addition, it was emphasized that D F R is a suitable performance metric for scheduling algorithms, and reflects the user-perceived QoS. We also presented the performance of three schedul-ing algorithms under the D F R metric. A natural question that may arise is that what is the best possible performance that we can obtain in terms of D F R metric. We shall address the answer to this question in this chapter. The mathematical techniques that we used in this chapter were summarized in Sec-tions 2.5 and 2.6. The rest of this chapter is organized as follows. In Section 4.2, we describe the M D P formulation for our scheduling problem [7]. Algorithm implementa-tion and simulation results are presented in Sections 4.3 and 4.4, respectively. Finally, summary is given in Section 4.5. 63 Chapter 4. Scheduling Algorithm for Video Flows 4.2 Problem Formulation In this section, we formulate the problem of finding the scheduling policy with minimum average D F R in the form of a Markov decision process (MDP) with average reward criterion [34]. We next relate the gain of the scheduling policy with the average D F R that it yields. As a result of the linear relationship between gain and average DFR, we can use reinforcement learning (RL) techniques [35] [36] that find optimal gain policy, in order to find the scheduling policy which minimizes the average DFR. 4.2.1 Assumptions and System M o d e l We assume that the number of video flows, denoted by F, is fixed. For simplicity, it is assumed that the deadline of video frames is constant, and is equal to frame inter-arrival time 7. As a result, the queue of each flow holds no more than one frame at any time. We denote the GOP pattern of the ith flow (i G T = {1, 2 , . . . , F}) by [Nu Mi), where Nt and Mi are the I-to-I and I-to-P frame distance, respectively. The G O P pattern of all flows is assumed to remain constant during the flows' lifetime. We model the scheduling problem for video flows as an MDP. The scheduler is the decision making agent. 4.2.2 Decision Epochs The scheduler should make a decision at the beginning of each superframe, and decide to schedule which flows in that superframe. The superframe size 77 is assumed to be fixed and is less than 7. For simplicity, we show the decision epochs in terms of superframe 64 Chapter 4. Scheduling Algorithm for Video Flows size. For example, the actual time of decision epoch n, which is the beginning of n superframe, is (n — l)n. Therefore, the set of decision epochs is {1,2,... , T}. 4.2.3 States Let the function txj(x) give the amount of channel time that flow i requires to send x bits of data. The simplest form of this function is tx^ (a;) = x/channeLdatajratej. However, depending on the acknowledgement policy, inter-frame spacing times, and maximum M A C fragment size, this function can have a different form. Moreover, assume that the maximum frame size of flow i is L™ax. We define the set of possible system states as follows: S 4 | s = (l,d,g,5) | V z 6 ^ : 0 < < t^ iL?"™), ek e {0,.. . ,D m a x}, gie{o,...,Ni- l}, 6t e {o, 1} j (4.1) where • 1 = [Zi Z2 • • • IF], a n d h is the channel time required by the frame in the queue of flow i, in milliseconds. • d = [di d2 • • • dp], and d\ is the number of full superframes left until the arrival of the next frame. Since the maximum tolerable delay for video frames is the frame inter-arrival time, d\ can alternatively be interpreted as the deadline of the frame in the queue of flow i, in terms of superframe size. 65 Chapter 4. Scheduling Algorithm for Video Flows • g — [5i 92 • • • 9F], and & is the offset of the frame in the queue of flow i, with respect to the beginning of the frame's GOP. For the first frame of the G O P Qi = 0, and for the last one gi = Ni — 1. • 8 = [5i 52 • • • 8p], and Si is set to 0 if any I or P frame, on which the current frame of flow i depends, is directly lost; and is equal to 1 otherwise (see Section 3.2). • jjmax A |^J ^ j g ^.jjg m a x i m u m frame deadline in units of superframe size 77, where [•J is the floor function. The scheduler incorporates the F D A technique, in order to determine the value of <5 based on 1, d and g. 4.2.4 Actions Let A be the set all possible actions: A = | a = ( a i , a 2 , ••• aF, p)\p € ^ n{0}; a* e {0,1}, Vi e T\ ap = 0 if p ^ o|. (4.2) where a» is equal to 1 if the scheduler allocates enough channel time to flow i so that it can fully transmit its frame. Otherwise, it is equal to 0. The parameter p is the flow that can only transmit parts of its frame during the channel time that the scheduler allocates to it. If no such flow exists, then p = 0. Here we assume that in each superframe, the scheduler allows at most one partial frame transmission, and the rest are full frame transmissions. The last conditional statement in equation (4.2) implies that the frame of flow p cannot be both fully and partially transmitted simultaneously. 6 6 Chapter 4. Scheduling Algorithm for Video Flows At each decision epoch, i.e., the beginning of each superframe, the scheduler should choose an action depending on the current state. The chosen action must satisfy a few constraints. First, the channel time required to transmit scheduled frames should not be more than the superframe length n: F X > i • li) < V- (4-3) i= i Second, all the scheduled flows must be eligible: at < ei, Vi € T (4.4) dp > 1 and ep = 1 if p 0 (4.5) where is the eligibility of the flow i, defined in Section 3.2. Equation (4.4) implies that ai = 0, if ei = 0. In other words, the ineligible flows are never scheduled. Equation (4.5) implies that p should be eligible and have a deadline greater than 1. Partial transmission of a frame with deadline one is wasting channel time, since the frame will expire at the end of the superframe. After the scheduler makes its decision about which frames should get fully transmitted in a superframe, there may still remain some channel time in that superframe that is not enough for full transmission of any unscheduled frame. This remaining time is given by: F partial = r]_J2(ai-li). i=l The fact that l P a r t t a l is not enough for full transmission of any eligible frame that is not scheduled, can be formally expressed as the following constraint: h > l^™1, V i € ei = 1, a i = 0. (4.6) 67 Chapter 4. Scheduling Algorithm for Video Flows The idle channel time is allocated to the flow p. If no such flow exists, then p is set to 0. Consequently, the set of possible actions in state s G S, denoted by As, is the largest subset of A, whose members satisfy all the constraints (4.3)-(4.6). These constraints guarantee that the scheduler accommodates as many eligible frames as possible. Note that As is stationary and only depends on the system states. 4.2.5 Reward Function and G a i n As mentioned in Section 2.3.1, because of the hierarchical structure and interdependency of video frames, some frames may get indirectly lost. The reward function that we choose should account for this fact. We give a reward of one unit when a frame is scheduled. On the other hand, if the deadline of a frame expires, the scheduler receives a penalty (negative reward) of W units, where W is the number of frames that depend on the expired frame. Table 2.2 shows the number of dependencies between G O P frames of a (N,M) video flow. Let Q ( S ) be the state-dependent penalty (or cost) function for flow i: Ci{s) (Ni + {Mi - 1)) • a • U{di=1}, if yi{gi) = I, ei-U{di=i], \iyi(gi) = B, (Ni-l-iJl- l)Mi) • &i • U{di=1}, if yi(gi) = P, 68 Chapter 4. Scheduling Algorithm for Video Flows where the function yi(g) : {0,..., Ni — 1} — > { / , B, P} maps g to frame types as follows: I, itg = 0, Vi(9) = { B, if g mod Mt > 1, P, if g mod Mi — 0 and p ^ 0. Furthermore, U{.y is the indicator function and is equal to 1 if its argument is true, and is 0 otherwise. The product ej(n) x U{di(n)=i} indicates if the flow i has an urgent eligible frame. Hence, the scheduler should receive Cj(s(n)) units of penalty if it does not schedule the frame of flow i in the current superframe. As a result, we can express the state- and action-dependent reward that the scheduler receives at superframe n by: F r(s(n),a(n)) = ^ i=l Oi(n) - Cj(s(n))(l - Oi(n)) (4.7) In order to show the merits of the reward function in equation (4.7), we study the policy gain that it yields. The gain of policy n under the average reward criteria is the average accumulated reward. In our formulation, the policy gain is given by: P* = ^X>(s(n),a(n)) n=l E L i ai(n) ~ E L i Cj(s(n))(l - Oi(n)) (4.8) The total number of frames for each flow is total-frames = . t t o t a l ; t i ' ° e — = 2 l . Further-mter_arnval_time 7 more, the terms En=i ai(n) a n d En=i c i ( s ( n ) ) ( l — a » ( n ) ) m equation (4.8) are in fact the total number of scheduled frames and the total number of undecodable frames for each 69 Chapter 4. Scheduling Algorithm for Video Flows flow, respectively. In addition, these two terms sum up to the total number of frames for flow i. Consequently, equation (4.8) can be rewritten as: ij ^—\ / ioiai_sciieuuieu — LUiai_uiiuecuuauie \ 7 ^ V totalJrames / t=i n ^ - v / 2 x totaLundecodable \ 7 ^ V total-frames / F ^ ( 1 - 2 D F R * ) 7 where DFR^ is the decoding failure rate of flow i under the policy n. Let DFR7" = F 2Ji=i F>FR" denote the average DFR under the policy n. Using equation (4.9), we have 1 7 DFR = - - rr-=p*. (4.10) Equation (4.10) shows that in our formulation, the average DFR is a linear function of gain. Therefore, we conclude that: arg max p* = arg min DFR . (4-11) 7t 7t Hence, an optimal (maximum) gain policy yields the optimal (minimum) average DFR, which is what we aim to find. 4.2.6 State Transitions Assume that at superframe n, the system is in state s(n) and chooses the action a(n). In this section, we determine the system state at superframe n + 1, s(n + 1). This state 70 Chapter 4. Scheduling Algorithm for Video Flows depends on s(n), a(n) and new frame arrivals. We determine the state transitions per flow, i.e., we show how the state variables related to each flow change. The whole system state is updated by performing the same procedure for all the flows: When di(n) > 1, no new video frame will arrive for flow i. So, the remaining time until the next arrival is reduced by one superframe. If the flow is scheduled, its length will become 0. And if it is partially transmitted, its length will reduce by l P a r t i a l . Otherwise, the length remains unchanged. The frame offset within G O P also does not change. The value of 5i changes only when an I or P frame of flow i expires, or when a new G O P starts, which is indicated by arrival of a new I frame for flow i. None of these happens when di(n) > 1. Hence, if di(n) > 1, then the state deterministically changes as follows: di(n + 1) = di(n) — 1 k(n + l) = h(n)(l - Oi(n)) - l^^U^y 5i(n + l) = &(n) 6i(n + l) = Si{n). When di(n) = 0, it means that the frame in the queue of flow i has expired, if it has not already been sent, i.e., if k(n) ^ 0. It also means that a new frame will arrive for flow i, within superframe n. We denote the deadline and channel time requirement of the newly arrived frame for flow i within superframe n by d?ew(n) and l™ew(n), respectively. 71 Chapter 4. Scheduling Algorithm for Video Flows Thus, when di(n) = 0 the state changes with probability pzohi(l1lew(n)) as follows: di(n + l) = <%ew{n) k(n+l) = lTw{n) gi(n + 1) = (gi(n) + 1) mod Nt 1, if yi(gi(n + 1)) = I, 0, if yi(gi{n)) = I,P and k(n) ^ 0, <5,(n), . otherwise, where prob i(Z"ei"(n)) is the probability of the channel time request of flow i be lfew(n). This probability is simply related to the frame size distribution probability of flow i. 5i(n + l) = { 4.3 Algorithm Implementation In this section, we discuss how to find the optimal gain policy defined by the M D P given in the previous section. Our formulation has modeling and dimensionality issues, so DP cannot be used. First, we do not have the knowledge of the state transition probabilities, because the scheduler does not know the frame size distribution of the video flows beforehand. We use the S M A R T algorithm [38] to overcome the curse of modeling. Second, in the state space defined in equation (4.1) we have one continuous variable, i.e. the frame length. Even if we quantize this variable to several levels, the state space would still be huge. For instance, suppose that Dmax = 4 and Ni = 12 for all i £ and we quantize k's to 8 levels. Thus, \S\ = (8 x 5 x 12 x 2 ) F w 1000F. Furthermore, it is easy to verify that \A\ — F • 2F. According to equation (2.4), S M A R T 72 Chapter 4. Scheduling Algorithm for Video Flows learns the Q-function Q : S x A - » R , which has impractical memory requirement and convergence rate given the size its domain. We solve the problem in two phases. In Section 4.3.1, we describe a compact representation of the state-action space, which is a mapping from S x A to a smaller set f 2 . Then, in Section 4.3.2, a memory-based function approximator is used to map fi to the set of real numbers R. 4.3.1 Features and State Space Representation We use an aggregation scheme [36], which gives a more compact representation of the state-action space. This scheme is useful when we have knowledge of an initial part of the environment's dynamics but not necessarily of the full dynamics. For instance, in our formulation, the effect of chosen action on the system state can be deterministically simulated by the scheduler. On the other hand, the new frame arrivals cannot be deter-ministically simulated. Using this scheme, we obtain a more efficient learning method by breaking the environment's dynamics into the immediate effect of the action, which is deterministic and perfectly known, and the arrival process of new frames with unknown size distribution [36]. We refer to the system state after the effect of the taken action as afterstate [36]. The actual system state in the next superframe is the afterstate updated according to the new frame arrivals. In order to explain how afterstates can cause aggregation within the state-action space, consider the case when at superframe n the scheduler can accommodate all the eligible frames in the system (i.e., when the system is under-loaded). Examples of this 73 Chapter 4. Scheduling Algorithm for Video Flows case are: • Flow 1 has a frame with li = 2, d\ — 2, g\ = 0, and 6i = 1. The rest of the flows have no frame and their queues are empty; • Flow 2 has a frame with l2 = 1, d2 = 3, g2 = 5, and 52 — 1. The rest of the flows have no frame and their queues are empty; • Flows 1 and 3 have a frame with li = 0.5, di = 2, gi — 3, #i = 1, l3 = 2.5, c/3 = 1, g3 = 7, and £3 = 1. The rest of the flows have no frame and their queues are empty. In any of the above examples, the scheduler's action is that all the eligible frames be transmitted in superframe n. The effect of this action is that the system will become empty. Thus, the afterstate (i.e. the system state after the effect of the action) corre-sponds to an empty system. As a result, all of the state-action pairs, which correspond to an under-loaded system becoming empty, have the same afterstate and can be ag-gregated. Similarly, any afterstate represents all the state-action pairs that result in that afterstate. Using afterstates is in fact a mapping from S x A to S. Reference [51] successfully applies this idea to admission control problem in cellular networks. The state space of our M D P has binary variables in S, continuous variables in 1, and integer variables in d and g. Even after using afterstates, the scheduler should learn the Q-function over S, which is still very large. Therefore, we need to specify the important features in each afterstate, which most affect the scheduling decision. As mentioned before, there is no systematic way of choosing the features. We approach this task by 74 Chapter 4. Scheduling Algorithm for Video Flows studying the behavior of the SRPT, E D D and PAP schedulers. All of these schedulers can be viewed as single-stage decision making process, where the scheduler chooses an action merely based on its immediate effect, i.e., based on the resulting afterstate. SRPT uses information about length. Since it minimizes the number of pending frames in the system, it chooses an action that yield an afterstate with minimum number of frames. In other words, the only feature of the afterstate that is important for SRPT is the number of frames in the system. Similarly, E D D + S R P T uses the number of frames in the system, but groups them according to their deadlines. It chooses an action that leads to the afterstate with the minimum number of frames in the group with the smallest deadline. If this value is the same for the afterstates of two actions or more, the size of the group with second smallest deadline is used, and so on. Therefore, the features that E D D + S R P T used are the number of frames in each deadline group. PAP, which is an application-aware scheduler, further partitions the frames in the system according to their type. In other words, SRPT, EDD+SRPT, and PAP schedulers mainly focus on the information about length, deadline, and type, respectively. With the addition of F D A technique, they can take advantage of decodability as well. Since the optimal scheduler should focus on all the information, it is reasonable to assume that it uses all the features used by those schedulers. In light of these similarities, the following features are chosen: * Fy^ non.urgent {i | k > 0, yi{gi) = y, Si = 1, di > 1} is the number of all the frames of type y in the system that have a deadline greater that 1. 75 Chapter 4. Scheduling Algorithm for Video Flows * Fy, urgent {i\k> 0, yi(gi) = y, 5i = 1, = 1} is the number of all the frames of type y in the system that have a deadline equal to 1. . i is the Py, lost {i | k > 0, yi(gi) = y, di = 0} U {i \ k > 0, yi(gi) = y, Si = 0} number of all the frames of type y in the system that are either expired or indirectly lost. The parameter y can be substituted with I, P, and B. Thus, in total we represent each afterstate using 9 (i.e., 3x3) features. A vector whose elements are the features is referred to as a feature vector. Since in our formulation every feature can vary from 0 to F, the set of all feature vectors is fl — {0, 1, ..., F}9. So far in this section, we have performed two aggregation steps. The first one is the use of afterstates, which is a function from S x A to S. The second aggregation is feature extraction, which is a function from S to fi. Using the procedures described in this subsection, we can define the function / : S x A —> £1, where x — /(s, a) is the feature vector of the afterstate resulted by taking action a in state s. In the rest of this section, we describe how we can perform generalization (i.e., function approximation) over the set of feature vectors ft. 4.3.2 Function Approximation: Kanerva Coding As the dimension of the state space grows, the complexity of many function approxima-tors, such as tile coding, increases exponentially with it. However, the complexity of the target function can be unrelated to the size and dimensionality of the state space, and 76 Chapter 4. Scheduling Algorithm for Video Flows need not grow exponentially with it [36]. For instance, we might have a 100-dimensional state space where only one of the dimensions happens to affect the target function. Kan-erva coding is a memory-based function approximator that allows us to deal with the complexity of a reasonable approximation of the target function, and not the dimen-sionality of the state space [36]. It was originally presented in the context of sparse distributed memory [52]. Like other memory-based techniques, it has nice memory re-quirement and generalization properties, and also allows dynamic memory allocation. The strength of Kanerva coding increases with the number of its memory locations (i.e., prototypes), which can be set independent of the dimensionality of the state space. The more the number of prototypes, the more the complexity of the target function that it can approximate. Kanerva coding is successfully applied in the context of R L in [53] and [54]. In our scheduling problem, we use Kanerva coding to approximate the function Q: fi —• R, which gives the value of a feature vector. The prototypes of Kanerva coding are feature vectors in Q. The value of any feature vector in f2 is then approximated using the values of those prototypes. In order to do so, a similarity metric should be defined. We define the distance of two feature vectors x,z e fi as the number of unequal features in the two vectors, i.e., dist(x,z) = Y^fiiU{Xj^Zj} [54]. Furthermore, we define the similarity of two feature vectors x and z as follows [55]: A(x,z)= { l - ^ ^ l , dist(x,z)</3 0, otherwise 77 Chapter 4. Scheduling Algorithm for Video Flows where 3 is the activation radius. In other words, two feature vectors have no similarity if they have more than 3 different features. Let be the /eth prototype in Kanerva coding. To calculate the value of a feature vector x, we first determine the set of prototypes that are activated by x, which are similar to it. We denote this set by Hx and formally define it as Hx = {k | Afe > 0}, where A*, = A(x, h )^ is the similarity between x and hfc. Let be the value corresponding to h^. Then the value of x is given by Q(x) = ^ . (4.12) Thus, the function Q(-) is a linear combination of the value of prototypes. We now give the update procedure that should be used in place of equation (2.4). Using the standard gradient descent algorithm [36] for linear approximation, the update equation becomes [55]: Wi<—Wi + p-, • | [r(s, a) - Vp + max Q(/(s', a'))] - Q(/(s, a)) j, Vz G Hx. (4.13) Since in our formulation, the immediate reward does not depend on s', we have used r(s, a) instead of r(s, a, s'). The distribution of prototypes in the state space is crucial for the approximation accuracy. In our problem, it is not efficient to cover the whole state space, because the scheduler does not necessarily explore all of it. Instead, we use a dynamic prototype addition method, which covers the vicinity of the trajectory that the scheduler explores within the state space. We follow the approach given in [55]. The dynamic addition algorithm starts with an empty set of prototypes, and builds up over time. If the set is too 78 Chapter 4. Scheduling Algorithm for Video Flows sparse around an encountered feature vector, enough prototypes are added. Furthermore, the prototypes should not be too close (similar) to each other. Therefore, the following condition is imposed on the similarity of prototypes: for any pair of memory locations hj and hj. K is the minimum number of prototypes that we aim to be activated for a feature vector. In other words, we want to have at least K prototypes in a sphere of radius B centered at any encountered feature vector. As a result, K and B determine the minimum density of prototypes along the scheduler's trajectory in the state space. The more the density, the more prototypes we have, and the more accurate the approximation will be. The dynamic prototype addition algorithm has the following rules: Rule 1: If fewer than K prototypes are activated by the input feature vector x, add x to the set of prototypes, if its addition does not violate the condition (4.14). Rule 2: If after applying Rule 1, the number of active prototypes (i.e., |#x|) is K0 < K, then (K — K0) prototypes are randomly added within the neighborhood of x. The neigh-borhood of x is the set of all feature vectors that are similar to it, i.e., {z | A(x, z) > 0}. Condition (4.14) is enforced when adding new prototypes. The value of a newly added prototype is set to the value of the function Q evaluated at that prototype according to equation (4.12). For discussion on how random selection of prototypes can give good approximation, (4.14) 79 Chapter 4. Scheduling Algorithm for Video Flows see [56]. Successful application of this algorithm to R L tasks can be found in [53]. 4.3.3 RL Scheduler Since we use average reward optimality criterion, S M A R T [38] is a suitable model-free learning algorithm to find the optimal gain policy. Using SMART, we determine the scheduling policy that achieves the minimum average DFR. We call this scheduler as the RL scheduler. The pseudo-code of the RL scheduler is given in Figure 4.1. At superframe n = 0, i.e., when the R L scheduler begins operating, cumulative reward CR and average reward p are both set to 0. The cumulative time CT is the duration over which C R is accumulated. We approximate the Q-value of actions with Kanerva coding. The set of prototypes of Kanerva coding is also empty at the beginning. Af-ter initialization, the RL scheduler makes scheduling decisions at the beginning of each superframe based on the following procedure. It first determines the exploration proba-bility and the learning rate based on Darken-Chang-Moody (DCM) search-then-converge procedure [57]. Using D C M method, the exploration probability and learning rate at su-perframe n, i.e., qn and pn, are given by q0/[l + ( ^ ) ] and p0/[l + ij^)], respectively. The parameters qo, qr, p,Q, and pr are constants. Afterwards, with probability 1 — qn, the R L scheduler chooses the greedy action; with probability qn, a random exploratory action (i.e., any action other than the greedy one) is chosen. In Step 3, the RL scheduler executes the chosen action and calculates the immediate reward (see equation (4.7)) and the next system state s'. In step 4, the scheduler updates the set of prototypes according 80 Chapter 4. Scheduling Algorithm for Video Flows Initialize the superframe number n = 0, cumulative reward CR = 0, cumulative time CT = 0, and the average reward p = 0. In addition, the set of prototypes is initialized as the empty set. The Superframe size is rj. Suppose that the system starts in state s. while n < MAX.STEPS do 1. Calculate exploration probability qn and learning rate \xn using the DCM method. 2. With probability 1 — qn, choose the greedy action a € As that maximizes Q(/(s,a)); otherwise, choose a random exploratory action from the set {As \ a}. 3. Execute the chosen action. Let the system state at the next superframe be s', and the immediate received reward be r(s,a). 4. Apply Rule 1 and Rule 2 of Section 4.3.2 to the feature vector x = /(s,a). 5. Update the weight of the prototypes according to: Wi <- Wi + Hn^ ^ . {[r(s,a) - r?p +maxQ(/(s', a'))] - Q(/(s,a))l, Vi € Hx 6. if a greedy action was chosen in Step 2, then Update CT <-CT + m Update CR<-CR + r(s, a) and p «- CR/CT; else go to Step (2). 7. Go to the next superframe, i.e., update n <— n + 1 and s <— s'. Figure 4.1: Pseudo-code of the RL scheduler. to Rule 1 and Rule 2 of Section 4.3.2. The weights of the prototypes are updated in Step 5. This update is the same as equation (4.13). In step 6, the cumulative reward and average reward are updated only if a greedy action was chosen. Finally, in step 7, the scheduler goes to the next superframe and repeat the same procedure again. Figure 4.2 illustrates the structure of our proposed R L scheduler. 81 Chapter 4. Scheduling Algorithm for Video Flows Find action Identify Next set As state s superframe £ Aggregation and feature extraction Retrieve the value of all actions Choose greedy or exploratory action based on qn Action value representation Action value update Figure 4.2: Structure of the R L scheduler. 4 . 4 Simulation Results In Section 4.2, we presented an M D P formulation for finding the scheduling policy with minimum average DFR. In Section 4.3, we described a few simplification and approx-imation steps to make the problem solvable by RL techniques in practical time. As a result, we cannot make any rigorous claim on the optimality of the final solution given by the S M A R T algorithm. Nevertheless, we show through simulation that the final policy, hereafter referred to as the R L scheduler, has a significantly lower D F R than F-SRPT, EDD+SRPT, and PAP schedulers. In addition, we show that the R L scheduler is near-optimal in one special case. The simulator was implemented using M A T L A B . In our simulations, the first iteration (or simulation run) corresponds to a 500 s long scenario with F video flows and with start time separation <f>. In the subsequent iterations, the scheduler performs in the same scenario, but this time it can do a better job based on its experiences from the previous iterations. With superframe size n = 8 ms, each iteration would consist of 8 ^ - 3 = 62,500 superframes. The parameters 82 Chapter 4. Scheduling Algorithm for Video Flows qo and po of the S M A R T algorithm, depend on how fast the scheduler can learn the optimal policy. The more the value of these parameters, the longer the scheduler should search and explore to find the optimal policy. In simulations, we set qo = fio = 0.1 and qr — pr = 10 1 0, as it gives the RL scheduler enough time to explore and find the optimal policy without too many iterations. We end the simulation when the learning rate \in and the exploration probability qn fall below 0.005. Hence, there is no need to explicitly set MAX_STEPS in Figure 4.1. Like the parameters of D C M , the parameters of dynamic prototype addition for Kanerva coding can be found with experimentation. We found that 8 — 3 and K = 8 provides a sufficient prototype density for a reasonable approximation of the value-function corresponding to the feature vectors. The rest of the simulation parameters are the same as in Chapter 3 and are summarized in Table 3.1. As mentioned in the previous chapter, the start time separation of the flows in the system can greatly affect the average DFR. The special case of interest is when cf> — 0, i.e., when all the flows in the system start at the same time. As a result, at the beginning of every superframe, all the frames in the system have the same deadline and type. Therefore, the deadline and type information is of no use for the scheduler. The scheduler has as much as one inter-arrival time to schedule the frames that arrive together. Obviously, the optimal scheduler in this case must leave the least possible number of frames behind at the end of this period. As mentioned earlier in Section 2.4.2, SRPT scheduler has this property, and is the optimal scheduler for <fi — 0. Consequently, we know exactly the optimal policy for = 0, and can compare our R L scheduler 83 Chapter 4. Scheduling Algorithm for Video Flows 0.6j—1 1 1 1 1 1 1 r Number of Video Flows Figure 4 .3 : Comparison of R L scheduler and optimal scheduler for <f> = 0. RL scheduler is nearly optimal in this case. with the optimal scheduler. Figure 4.3 shows that RL scheduler is nearly optimal for all the simulated values of F and achieves the minimum DFR. For <f> = 0, we do not expect any improvement beyond the performance of the optimal scheduler. However, for other values of <f>, we cannot pursue the same approach, as the optimal scheduling policy is not known. Instead, we compare the performance of the R L schedulers with the other three schedulers (F-SRPT, EDD+SRPT, PAP). Similar to the approach in Chapter 3, the subsequent results given for average D F R are the mean of the results for 8 ms < <f> < 30 ms. Figure 4.4 shows this effect on F-SRPT, EDD+SRPT, and PAP schedulers with FDA. As one can see, for some values of 4> PAP has the smallest average DFR, while for some other values E D D + S R P T has the smallest average DFR. On the other hand, the R L 84 Chapter 4. Scheduling Algorithm for Video Flows 0.2 rr 0.15 LL Q CD CO CO L _ CD < 0.1 h 0.05 I 1 1 -£-PAP with FDA - B - F - S R P T with FDA -e-EDD+SRPT with FDA -«-RL 1 1 1 10 25 30 15 20 Time Separation [ms] F i g u r e 4.4: Effect of time separation on average D F R for F = 9. The RL scheduler has the smallest average D F R for 0 > 8 ms. scheduler performs better than the other three for all values of (f>, and is less sensitive to <t>. Figure 4.5 illustrates the average D F R for all four schedulers, including the RL sched-uler. Similar to previous results, the R L scheduler has the smallest average D F R for all values of F. It has up to 42%, 49%, and 53% less average D F R when compared to EDD+SRPT, PAP, and F-SRPT schedulers, respectively. Table 4.4 shows that reduction of the average D F R can also be translated to system capacity enhancement. Suppose that the acceptable user perceived quality is equivalent to the average D F R being less than 5%. Thus, the capacity of the system can be defined as the number of video flows that can be admitted to the system, while the average D F R is less than the maximum allowable value of 5%. Using this definition, the system capacity 85 Chapter 4. Scheduling Algorithm for Video Flows 0.2 0.15 LT LL. o CD 0.1 JO o3 > < 0.05 P A P with FDA -e- F - S R P T with FDA •e-EDD+SRPT with FDA -*-R|_ 8 9 Figure 4.5: Comparison of R L scheduler and other schedulers. 5 6 7 Number of Video Flows is 7 flows for the conventional schedulers, as opposed to 8 flows for the R L scheduler. Consequently, in this example, the RL scheduler increases the system capacity by 14.3%. As mentioned in Section 4.2.5, the policy gain and D F R have a linear relationship. We can verify the validity of equation (4.10) as follows. First, the estimated average Number of Flows F 6 7 8 9 10 Minimum Average D F R (%) 1.5 4.0 7.2 10.7 14.4 R L Average D F R (%) 0.9 2.3 4.4 7.2 10.5 Absolute Reduction (%) 0.6 1.7 2.8 3.4 3.9 Relative Reduction (%) 42 42 38 32 27 Table 4.1: Quantitative comparison among R L scheduler and conventional schedulers. The minimum average D F R given by F-SRPT, EDD+SRPT, and PAP is used for comparison. 86 Chapter 4. Scheduling Algorithm for Video Flows 0.085 0.06 i i i -B-Exact Average DFR -*- Estimated Average DFR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Iteration Number Figure 4.6: Comparison of the exact and the estimated average D F R (F = 9). D F R is calculated by substituting the policy gain p in equation (4.10). R L schedulers calculates the value of p as part of its algorithm. Second, the exact average D F R is measured by counting the number of scheduled frames. Figure 4.6 compares these two values. As one can see, D F R in equation (4.10) under-estimates the exact average DFR, because the gain is only updated when the scheduler takes a greedy action. However, both greedy and exploratory actions affect the exact average DFR. Over time, with more iterations, as the R L scheduler learns the optimal policy and the exploration probability decays, the exact and estimated average D F R converge together. This result verifies the fact that the optimal gain policy yields the minimum average DFR. 87 Chapter 4. Scheduling Algorithm for Video Flows 4.5 Summary In this chapter, we presented an M D P framework for finding the optimal scheduling algorithm with respect to average DFR. However, due to curse of modeling and curse of dimensionality, we appealed toward RL techniques. A model-free RL algorithm, called SMART, was used to overcome the modeling issue. We further used afterstates and features to reduce the dimensions of the state space and represent it in a more compact and suitable form. Moreover, Kanerva coding along with a dynamic prototype addition algorithm was used to the approximate the value-function of feature vectors. The final policy, i.e., the RL scheduler, yields a lower D F R and higher system capacity compared to its conventional counterparts. 88 Chapter 5 Conclusions and Future Work In this thesis, we examined the problem of scheduling video flows in high-rate WPANs. Our aim was to improve QoS measured by low DFR, since we have shown that D F R is more appropriate than JFR. Our contributions in this thesis can be summarized as follows: • We proposed a simple and efficient F D A technique for using the information about frame type at the M A C layer. Using the F D A technique, the scheduler can de-termine whether the receiver of a video frame has all the required information to decode that frame. This technique minimizes the channel time wastage caused by scheduling undecodable frames, and consequently, improves the QoS. • The F D A technique is independent of the scheduler and can be used along with dif-ferent scheduling algorithms. Simulation results showed that when F D A technique is applied to the F-SRPT [1] and E D D + S R P T [2] schedulers, there are up to 61% and 60% reduction in DFR, respectively. • We formulated the scheduling of video flows in high-rate WPANs as an M D P prob-lem. This model incorporates the decodability information extracted by the F D A 89 Chapter 5. Conclusions and Future Work technique. It also takes into account the number and pattern of video flows, and their hierarchical structure. The solution to this M D P problem is the optimal scheduling policy that minimizes the average DFR. • Using compact state space representation and function approximation, we simplified the M D P problem in order to solve it with RL. Our proposed RL scheduler is the solution given by an RL technique called SMART. Simulation results showed that the RL scheduler reduces the average D F R by 42%, 49%, and 53% when compared to E D D + S R P T [2], PAP [3], and F-SRPT [1] schedulers, respectively. 5.1 Future Work Given that most of the current work in the literature related to scheduling in WPANs are either based on impulsed U W B or do not use application layer information, our work can pave the way for many future works. Due to deadlock in the standardization process of the physical layer for high-rate WPANs in I E E E 802.15 Task Group 3a (TG3a), the supporters of each scheme have developed their own standards for M A C . More specifically, the standard that is envisioned to be dominantly used for M A C in high-rate WPANs is ECMA-368 [58], which is partly similar to the I E E E 802.15.3 standard for M A C . Yet, because of some major differences, extending our work in the context of ECMA-368 M A C is a potential future work. With the variety of applications that a high-rate W P A N should be able to support, 90 Chapter 5. Conclusions and Future Work it will serve different classes of traffic with different QoS requirements. We focused on the video traffic class in this thesis. Therefore, integrating the F D A technique and/or the RL scheduler with a more versatile scheduler that can handle and recognize different traffic types can be a possible extension to our work. Moreover, the issue of fairness among the scheduled video flows is also a critical point, which is not considered in our work. A possible future work is to account for fairness by formulating the problem as an M D P with constraints that enforce the fairness criteria. Using fuzzy logic for function approximation can also be of interest. 9 1 Bibliography [1] R. Mangharam, M . Demirhan, R. Rajkumar, and D. Raychaudhuri, "Size matters: size-based scheduling for MPEG-4 over wireless channels," in Proc. of SPIE/ACM Multimedia Computing and Networking (MMCN), Santa Clara, C A , Jan. 2004, pp. 110-122. [2] A. Torok, L. Vajda, A. Vidacs, and R. Vida, "Techniques to improve scheduling performance in I E E E 802.15.3 based ad hoc networks," in Proc. of IEEE Globecom, St. Louis, M O , Nov. 2005. [3] S. M . Kim and Y . J. Cho, "Scheduling scheme for providing QoS to real-time mul-timedia traffics in high-rate wireless PANs," IEEE Transactions on Consumer Elec-tronics, vol. 51, no. 4, pp. 1159-1168, Nov. 2005. [4] "IEEE P802.15-03/268r3, Multi-Band O F D M Physical Layer Proposal for I E E E 802.15 Task Group 3a, Mar. 2004. Project: I E E E P802.15 Working Group for Wire-less Personal Area Networks (WPANs)." [5] "IEEE Std 802.15.3-2003: Wireless medium access control (MAC) and physical layer (PHY) specifications for high rate wireless personal area networks (WPANs)," Sept. 2003. 92 Bibliography [6] S. Moradi and V. W. S. Wong, "Technique to improve MPEG-4 traffic schedulers in I E E E 802.15.3 WPANs," in Proc. of IEEE International Conference on Communi-cations (ICC), Glasgow, Scotland, June 2007. [7] S. Moradi, V . W. S. Wong, and A. H. Mohsenian, "A novel scheduling algorithm for video traffic in high-rate WPANs," submitted to IEEE Global Telecommunications Conference (GLOBECOM), Nov. 2007. [8] X. Gu and L. Taylor, "Ultra-wideband and its capabilities," vol. 21, no. 3, pp. 56-66, July 2003. [9] R. Cardinali, L. D. Nardis, M . G. Benedetto, and P. Lombardo, "UWB ranging accuracy in high- and low-data-rate applications," IEEE Transactions on Microwave Theory and Techniques, vol. 54, no. 4, pp. 1865-1875, June 2006. [10] G. R. Aiello and G. D. Rogerson, "Ultra-wideband wireless systems," IEEE Mi-crowave Magazine, vol. 4, no. 2, pp. 36-47, June 2003. [11] X. Shen, W. Zhuang, H. Jiang, and J. Cai, "Medium access control in ultra-wideband wireless networks," IEEE Transactions on Vehicular Technology, vol. 54, no. 5, pp. 1663-1677, Sept. 2005. [12] A. Batra, J . Balakrishnan, G. R. Aiello, J . R. Foerster, and A. Dabak, "Design of a multiband O F D M system for realistic U W B channel environments," IEEE Trans-actions on Microwave Theory and Techniques, vol. 52, no. 9, pp. 2123-2138, Sept. 2004. 93 Bibliography [13] J. H. Reed, An introduction to ultra wideband communication systems. Prentice Hall, Apr. 2005. [14] "MPEG-4 part 2: Visual (IS 14496-2), doc. N2502," Oct. 1998. [15] A. Ziviani, B. E . Wolfinger, J. F. Rezende, O. C. Duarte, and S. Fdida, "Joint adoption of QoS schemes for M P E G streams," Multimedia Tools and Applications, vol. 26, no. 1, pp. 59-80, May 2005. [16] C. Y. Zou and Z. Haas, "Optimal resource allocation for U W B wireless ad hoc networks," in Proc. of IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Berlin, Germany, Sept. 2005, pp. 452-456. [17] J. Y . LeBoudec, R. Merz, B. Radunovic, and J. Widmer, "A M A C protocol for U W B very low power mobile ad hoc networks based on dynamic channel coding with interference mitigation," EPFL-DI-ICA, Tech. Rep. IC/2004/02, Jan. 2004. [18] B. Radunovic and J. Y . L. Boudec, "Optimal power control, scheduling, and routing in U W B networks," IEEE Journal on Selected Areas in Communications, vol. 22, no. 7, pp. 1252-1270, Sept. 2004. [19] Y . C. Chu and A. Ganz, "Joint scheduling and resource control for QoS support in UWB-based wireless networks," in Proc. of IEEE Military Communications Confer-ence (MILCOM), Monterey, C A , Nov. 2004, pp. 1100-1106. 94 Bibliography [20] J. Shen, I. Nikolaidis, and J. Harms, "Dynamic 802.15.3 W P A N scheduling us-ing maximal matching," in Proc. of IEEE Global Telecommunications Conference (GLOBECOM), San Francisco, C A , Nov. 2006. [21] F. Cuomo, C. Martello, A. Baiocchi, and F. Capriotti, "Radio resource sharing for ad hoc networking with UWB," IEEE Journal on Selected Areas in Communications, vol. 20, no. 9, pp. 1722-1732, Dec. 2002. [22] K. H. Liu, L. Cai, and X. Shen, "Performance enhancement of medium access con-trol for U W B WPAN," in Proc. of IEEE Global Telecommunications Conference (GLOBECOM), San Francisco, C A , Nov. 2006. [23] H. Jiang, K. H. Liu, and X. S. W. Zhuang, "Distributed medium access control in pulse-based time-hopping U W B wireless networks," in Proc. of IEEE Global Telecommunications Conference (GLOBECOM), San Francisco, C A , Nov. 2006. [24] L. Schrage, "Proof of the optimality of the shortest remaining processing time dis-cipline," Operations Research, vol. 16, pp. 678-690, 1968. [25] X. Liu, Q. H. Dai, and Q. F. Wu, "Scheduling algorithms analysis for MPEG-4 traffic in UWB," in Proc. of IEEE Vehicular Technology Conference (VTC-Fall), Los Angeles, C A , Sept. 2004, pp. 5310-5314. [26] S. S. Panwar, D. Towsley, and J. K. Wolf, "Optimal scheduling policies for a class of queues with customer deadlines to the beginning of service," Journal of the ACM, vol. 35, no. 4, pp. 832-844, 1988. 95 Bibliography [27] E . Kwon, D. Hwang, and J. Lim, "An idle timeslot reuse scheme for I E E E 802.15.3 high-rate wireless personal area networks," in Proc. of IEEE Vehicular Technology Conference (VTC-Fall), Dallas, T X , Sept. 2005, pp. 715-719. [28] R. Zeng and G. S. Kuo, "A novel scheduling scheme and M A C enhancements for I E E E 802.15.3 high-rate WPAN," in Proc. of IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, C A , Mar. 2005, pp. 2478-2483. [29] S. H. Rhee, K. Chung, Y . Kim, W. Yoon, and K. S. Chang, "An application-aware M A C scheme for I E E E 802.15.3 high-rate WPAN," in Proc. of IEEE Wireless Com-munications and Networking Conference (WCNC), Atlanta, G A , Mar. 2004, pp. 1018-1023. [30] X. Chen, Y . Xiao, Y. Cai, J. Lu, and Z. Zhou, "An energy Diffserv and application-aware M A C layer scheduling for multiple V B R video streaming over high-rate WPANs," Elsevier Computer Communications, vol. 29, no. 17, pp. 3516-3526, Nov. 2006. [31] Y. Xiao, X. Shen, and H. Jiang, "Optimal A C K mechanisms of the I E E E 802.15.3 M A C for ultra-wideband systems," IEEE Journal on Selected Areas in Communi-cations, vol. 24, no. 4, pp. 836-842, Apr. 2006. [32] G. Bianchi, "IEEE 802.11-saturation throughput analysis," IEEE Communications Letters, vol. 2, no. 12, pp. 318-320, Dec. 1998. 96 Bibliography [33] X. Liu, Q. H. Dai, and Q. F. Wu, "Adaptive delayed acknowledgement algorithm for MPEG-4 traffic in U W B networks," Tsinghua Science and Technology, vol. 11, no. 3, pp. 278-286, June 2006. [34] M . L. Puterman, Markov decision processes: discrete stochastic dynamic program-ming. Wiley-Interscience, 1994. [35] L. P. Kaelbling, M . L . Littman, and A. P. Moore, "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996. [36] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cam-bridge, MA: MIT Press, 1998. [37] A. Gosavi, "Reinforcement learning for long-run average cost," European Journal of Operational Research, vol. 155, pp. 654-674, 2004. [38] T. K. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck, "Solving semi-markov decision problems using average reward reinforcement learning," Journal of Man-agement Science, vol. 45, no. 4, pp. 560-574, 1999. [39] H. Robbins and S. Monro, "A stochastic approximation method," Annals Mathe-matical Statistics, vol. 22, pp. 400-407, 1951. [40] A. Gosavi, N. Bandla, and T. K. Das, "A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking," HE Transactions, vol. 34, pp. 729-742, 2002. 97 Bibliography [41] F. Yu, V. Wong, and V. Leung, "A new QoS provisioning method for adaptive multimedia in cellular wireless networks," in Proc. of IEEE Infocom, Hong Kong, China, Mar. 2004, pp. 2130-2141. [42] A. Gosavi, "An algorithm for solving semi-markov decision problems using rein-forcement learning: convergence analysis and numerical results," Ph.D. dissertation, University of South Florida, 1999. [43] D. E . Rumelhart and J . L . McClelland, Parallel Distributed Processing: Explorations in the micro structures of cognition. Cambridge, M A : MIT Press, 1986, vol. 1. [44] C. Y . Liao, F. Yu, V. Leung, and C. J. Chang, "Reinforcement-learning-based self-organization for cell configuration in multimedia mobile networks," European Trans. Telecommunications, vol. 16, pp. 385-397, 2005. [45] J. S. Albus, Brains, Behavior, and Robotics. Peterborough, New Hampshire: B Y T E Books, Subsidiary of McGraw Hill, 1981. [46] R. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996. [47] A. W. Moore and C. G. Atkeson, "The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces," Machine Learning, vol. 21, no. 3, pp. 199-233, 1995. 98 Bibliography [48] J. C. Santamaria, R. S. Sutton, and A. Ram, "Experiments with reinforcement learning in problems with continuous state and action spaces," Adaptive Behavior, vol. 6, no. 2, pp. 163-218, 1998. [49] J. Gross, J. Klaue, H. Karl, and A. Wolisz, "Cross-layer optimization of O F D M transmission systems for MPEG-4 video streaming," Computer Communications: Applications and Services in Wireless Networks, vol. 27, no. 11, pp. 1044-1055, July 2004. [50] F. Fitzek and M . Reisslein, "MPEG-4 and H.263 video traces for network perfor-mance evaluation," IEEE Network, vol. 15, pp. 40-54, 2001. [51] J. M . Gimenez-Guzman, J. Martinez-Bauset, and V. Pla, "An afterstates reinforce-ment learning approach to optimize admission control in mobile cellular networks," Lecture Notes in Computer Science: Wireless Systems and Network Architectures in Next Generation Internet, vol. 3883, pp. 115-129, 2006. [52] P. Kanerva, Sparse distributed memory. Cambridge, MA: MIT Press, 1988. [53] B. Ratitch, S. Mahadevan, and D. Precup, "Sparse distributed memories in reinforce-ment learning: Case studies," in Proc. of the Workshop on Learning and Planning in Markov Processes - Advances and Challenges, San Jose, C A , July 2004, pp. 85-90. [54] K. Kostiadis and H. Hu, "KaBaGe-RL: Kanerva-based generalisation and reinforce-ment learning for possession football," in Proc. of IEEE/RSJ International Confer-ence on Intelligent Robots and Systems, Maui, HI, Nov. 2001, pp. 292-297. 99 Bibliography [55] B. Ratitch and D. Precup, "Sparse distributed memories for on-line value-based reinforcement learning," in Proc. of the European Conference on Machine Learning (ECML), Pisa, Italy, Sept. 2004, pp. 347-358. [56] R. S. Sutton and S. D. Whitehead, "Online learning with random representations," in Proc. of the International Conference on Machine Learning, Amherst, M A , June 1993, pp. 314-321. [57] C. Darken, J. Chang, and J. Moody, "Learning rate schedules for faster stochas-tic gradient search," in Proc. of IEEE Workshop on Neural Networks for Signal Processing, Copenhagen, Denmark, Sept. 1992. [58] "ECMA-368 standard: High rate ultra wideband P H Y and M A C standard," Dec. 2005. 100 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0101142/manifest

Comment

Related Items