UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

H.265/HEVC video transmission over 4G cellular networks Jassal, Aman 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2016_may_jassal_aman.pdf [ 1.9MB ]
JSON: 24-1.0223893.json
JSON-LD: 24-1.0223893-ld.json
RDF/XML (Pretty): 24-1.0223893-rdf.xml
RDF/JSON: 24-1.0223893-rdf.json
Turtle: 24-1.0223893-turtle.txt
N-Triples: 24-1.0223893-rdf-ntriples.txt
Original Record: 24-1.0223893-source.json
Full Text

Full Text

H.265/HEVC video transmission over4G cellular networksbyAman JassalDipl.Ing., Ecole Supe´rieure d’Inge´nieurs en Informatique et Ge´nie desTe´le´communications, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January 2016c© Aman Jassal 2016AbstractLong Term Evolution has been standardized by the 3GPP consortium since2008, with 3GPP Release 12 being the latest iteration of LTE Advanced,which was finalized in March 2015. High Efficiency Video Coding has beenstandardized by the Moving Picture Experts Group since 2012 and is thevideo compression technology targeted to deliver High-Definition video con-tent to users. With video traffic projected to represent the lion’s share ofmobile data traffic in the next few years, providing video and non-videousers with high Quality of Experience is key to designing 4G systems andfuture 5G systems.In this thesis, we present a cross-layer scheduling framework which de-livers video content to video users by exploiting encoding features used bythe High Efficiency Video Coding standard such as coding structures andmotion compensated prediction. We determine which frames are referencedthe most within the coded video bitstream to determine which frames havehigher utility for the High Efficiency Video Coding decoder located at theuser’s device and evaluate the performances of best effort and video usersin 4G networks using finite buffer traffic models. We look into throughputperformance for best effort users and packet loss performance for video usersto assess Quality of Experience. Our results demonstrate that there is sig-iiAbstractnificant potential to improve the Quality of Experience of best effort andvideo users using our proposed Frame Reference Aware Proportional Fairscheme compared to the baseline Proportional Fair scheme.iiiPrefaceI hereby declare that I am the author of this thesis. This thesis is an original,unpublished work under the supervision of Dr. Cyril Leung. In this work,I played the primary role in designing and performing the research, doingdata analysis and preparing the manuscript under the supervision of Dr.Cyril Leung.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Basics of H.265/HEVC . . . . . . . . . . . . . . . . . . . . . . 42.1 Syntax Structures and Syntax Elements . . . . . . . . . . . . 42.2 Coding Structures and Reference Picture Lists . . . . . . . . 72.2.1 Coding Structures . . . . . . . . . . . . . . . . . . . . 82.2.2 Reference Picture Lists . . . . . . . . . . . . . . . . . 10vTable of Contents2.3 Motion Compensated Prediction . . . . . . . . . . . . . . . . 132.4 Operation with Networking Layers . . . . . . . . . . . . . . . 153 Cross-Layer Frame Reference Aware Scheduling Framework 183.1 Mathematical Formulation of the Shared Resource AllocationProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Solution to the proposed Shared Resource Allocation Problem 254 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1 H.265/HEVC Video Content Generation . . . . . . . . . . . 284.2 LTE-Advanced System Model . . . . . . . . . . . . . . . . . 304.2.1 Network Model . . . . . . . . . . . . . . . . . . . . . 304.2.2 Traffic Model . . . . . . . . . . . . . . . . . . . . . . . 344.2.3 Channel Model . . . . . . . . . . . . . . . . . . . . . 354.2.4 Feedback Model . . . . . . . . . . . . . . . . . . . . . 405 Simulation Results and Analysis . . . . . . . . . . . . . . . . 425.1 Simulation Assumptions . . . . . . . . . . . . . . . . . . . . . 435.2 Simulation Results and Discussion . . . . . . . . . . . . . . . 485.2.1 Results for video users . . . . . . . . . . . . . . . . . 495.2.2 Results for Best Effort users . . . . . . . . . . . . . . 546 Conclusions and Future Work . . . . . . . . . . . . . . . . . . 606.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 606.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64viList of Tables2.1 Generic NAL unit syntax, adapted from [3] . . . . . . . . . . 52.2 Reference Picture Sets for the Hierarchical-B Coding Struc-ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Reference Picture Lists for the Hierarchical-B Coding Struc-ture of GOP-size 8 . . . . . . . . . . . . . . . . . . . . . . . . 134.1 H.265/HEVC Table of Video Test Sequences . . . . . . . . . 284.2 H.265/HEVC Parameters . . . . . . . . . . . . . . . . . . . . 304.3 FTP Traffic Model 1 . . . . . . . . . . . . . . . . . . . . . . . 334.4 H.265/HEVC Traffic Model . . . . . . . . . . . . . . . . . . . 355.1 LTE-Advanced Parameters . . . . . . . . . . . . . . . . . . . 465.2 Offered Load and corresponding Resource Utilization . . . . . 49viiList of Figures2.1 Frame dependencies in the reference coding structure. . . . . 92.2 Uni- and bi-predictive inter-prediction illustration from adja-cent pictures, adapted from [4] . . . . . . . . . . . . . . . . . 142.3 RTP Single NAL unit packet structure . . . . . . . . . . . . . 162.4 H.265/HEVC system layer stack . . . . . . . . . . . . . . . . 174.1 Hexagonal Network Grid Layout . . . . . . . . . . . . . . . . 314.2 Wrap Around of Hexagonal Network . . . . . . . . . . . . . . 324.3 LTE Downlink PRB allocation illustration . . . . . . . . . . . 335.1 Video users’ active download time . . . . . . . . . . . . . . . 505.2 Satisfied Video User Percentage . . . . . . . . . . . . . . . . . 515.3 CRA LDU Loss Ratio . . . . . . . . . . . . . . . . . . . . . . 535.4 Average throughput for Best Effort users . . . . . . . . . . . 555.5 Coverage throughput for Best Effort users . . . . . . . . . . . 565.6 Illustration of the outer 10% of the coverage area . . . . . . . 575.7 Average BE user throughput in Cell-Edge region . . . . . . . 58viiiList of Acronyms3GPP Third Generation Partnership Project.ADT Active Download Time.BE Best Effort.CB Coding Block.CDF Cumulative Distribution Function.CQI Channel Quality Indicator.CSI Channel State Information.CVS Coded Video Sequence.DASH Dynamic Adaptive Streaming over HTTP.EESM Exponential Effective SNR Mapping.FDD Frequency Division Duplex.GOP Group of Pictures.ixList of AcronymsH.264/AVC Advanced Video Coding.H.265/HEVC High Efficiency Video Coding.HTTP Hypertext Transfer Protocol.IETF Internet Engineering Task Force.IP Internet Protocol.ITU-R International Telecommunications Union Radiocommunications Sec-tor.JCT-VC Joint Collaborative Team on Video Coding.KPIs Key Performance Indicators.LDU Logical Data Unit.LTE Long Term Evolution.LTE-A LTE Advanced.MANE Media Aware Network Element.MIESM Mutual Information Effective SNR Metric.MIMO Multiple Input Multiple Output.MOS Mean Opinion Score.MPEG Moving Picture Experts Group.xList of AcronymsMU-MIMO Multi User Multiple Input Multiple Output.NAL Network Abstraction Layer.NGMN Next Generation Mobile Networks.OFDMA Orthogonal Frequency Division Multiple Access.OSI Open Systems Interconnection.PB Prediction Block.PLR Packet Loss Ratio.PMI Precoding Matrix Indicator.POC Picture Order Count.PRB Physical Resource Block.QAM Quadrature Amplitude Modulation.QoE Quality of Experience.QoS Quality of Service.QPSK Quaternary Phase Shift Keying.RBSP Raw Byte Sequence Payload.RI Rank Indication.RTP Real Time Protocol.xiList of AcronymsRU Resource Utilization.SINR Signal to Interference and Noise Ratio.SNR Signal to Noise Ratio.SRST Single RTP stream on a single media transport.SU-MIMO Single User Multiple Input Multiple Output.TCP Transmission Control Protocol.UDP User Datagram Protocol.UMTS Universal Mobile Telecommunications System.VCL Video Coding Layer.Wi-Fi Wireless Fidelity.xiiAcknowledgementsI would like to take this opportunity to express my utmost gratitude andsincerest thanks to my supervisor, Dr. Cyril Leung, who has given me greatsupport, encouragement and guidance throughout my work and my M.A.Scprogram. My discussions with him were a constant source of inspirationand his insights helped make this research work more valuable. Without hisinvaluable knowledge and understanding in this research area, this thesiswould have never been possible.I would also like to thank Dr. Ahmed Saadani for his guidance andsupport throughout my engineering program and at Orange Labs where hegave me the opportunity to do research work on 4G systems. My formercolleagues, Mr. Sebastien Jeux and Dr. Sofia Martinez Lopez, and moregenerally all the research community involved in research and standardiza-tion with the 3GPP, have had a great influence on me and without theirinspiration I would have never undertaken my program at the University ofBritish Columbia.All of the work that has been done in this thesis was supported in part bythe Natural Sciences and Engineering Research Council (NSERC) of Canadaunder Grant RGPIN 1731-2013.xiiiDedicationTo my parents and my sisterxivChapter 1IntroductionWith the emergence of Long Term Evolution (LTE) and its subsequent it-erations standardized by the Third Generation Partnership Project (3GPP)consortium, video services are fast becoming the dominant data servicesin 4G mobile networks and mobile video traffic is projected to account for72% of the total mobile data traffic by 2019 [1]. The transmission of videoservices over cellular networks is challenging due to the large bandwidthrequirement, the low latency required due to protocol stack inter-operationand the effect of error propagation within the video sequence in the eventof packet losses. The current dominant standard for video coding is Ad-vanced Video Coding (H.264/AVC) [2] and is used to deliver a wide rangeof video services. However, H.264/AVC requires extremely high bandwidth,making the delivery of High-Definition (HD) video services impractical. Itssuccessor, High Efficiency Video Coding (H.265/HEVC) [3], was standard-ized by the Moving Picture Experts Group (MPEG) in 2012 and is expectedto reduce the bit rate compared to H.264 High Profile by about 50% whilemaintaining comparable subjective quality [4]. Therefore H.265/HEVC is amore practical choice for delivering HD and Ultra High-Definition (UHD)video content to consumers using wired and wireless networks.1Chapter 1. IntroductionAs we move towards 5G, one of the key targets that we need to achieveis to provide a more consistent user experience across the whole networkas well as higher Quality of Experience (QoE) [5]. Cross-layer QoE-awareresource allocation schemes have been proposed for Orthogonal FrequencyDivision Multiple Access (OFDMA) systems [6], where the scheduling al-gorithm uses the Mean Opinion Score (MOS) as a way to provide QoE.Other attributes that the research community has been focusing on in orderto improve the QoE of video users are the playback buffer status and therebuffering time [7]-[8]. One of the limitations in these works is the relianceon video traces that were generated for low-definition video sequences en-coded using H.264/AVC, which are not representative of the targets that5G networks are supposed to satisfy. Rather they are aimed at deliveringHD or UHD video services anywhere anytime. Other works have consideredH.265/HEVC video streaming over Wi-Fi wireless networks and shown thatthe QoE of video sequences, reflected through the use of MOS, is very sen-sitive to network impairments such as packet losses. Nightingale et al. [9]assumed that packet losses are random; however in cellular networks thisassumption is rarely valid as the combination of traffic load, the characteris-tics of the video sequence and the individual user’s link quality will dictatethe overall performance that can be achieved.In this thesis, we focus on the use-case of H.265/HEVC video trans-mission over 4G networks. Existing works have not used the compressionproperties of H.265/HEVC, specifically in terms of exploiting the tempo-ral inter-dependence between frames within coding structures, or evaluatedhow well video services can be delivered in 4G/beyond-4G networks with2Chapter 1. Introductiondynamic user arrivals. We use performance evaluation methodologies whichuse Key Performance Indicators (KPIs) that have been recommended by theNext Generation Mobile Networks (NGMN) Alliance for 5G networks [5].The main novel contributions of this thesis are as follows:1. The definition of a cross-layer scheduling framework exploiting framereferencing to deliver video content2. The evaluation of capacity for the delivery of H.265/HEVC video ser-vices over beyond-4G networks3. The joint-assessment of the QoE of video users and Best Effort usersThe remainder of this thesis is organized as follows. Chapter 2 out-lines the basics of the H.265/HEVC standard that are relevant to this work.Chapter 3 presents the proposed cross-layer scheduling framework for videocontent transmission. The simulation model is presented in Chapter 4. Sim-ulation results, analysis and discussions are provided in Chapter 5. Conclu-sions and future work are presented in Chapter 6.3Chapter 2Basics of H.265/HEVCIn this chapter, we describe the features of the H.265/HEVC standard thatare directly relevant to this thesis and to the problem formulation thatwill be presented and developed in Chapter 3. Specifically, we present thehigh-level syntax used to represent the video data, the motion predictiontechniques used for video compression and the coding structures and refer-ence picture lists used to perform the motion-predicted compensation task inH.265/HEVC [3]. The main point to understand is that the encoder knowsabout the specifics of the coding structure and it has to provide the decoderabout the information needed to reconstitute it. This is done through usinga given coding order (which is implicitly embedded in the way LDUs areordered) and through using Reference Picture Sets and Reference PictureLists (the former are explicitly transmitted and the latter are derived dur-ing the decoding process). In this chapter we will explain how all of thesefeatures work.2.1 Syntax Structures and Syntax ElementsH.265/HEVC uses so-called syntax structures to represent the encoded videodata. An H.265/HEVC encoder generates syntax structures encapsulated42.1. Syntax Structures and Syntax ElementsTable 2.1: Generic NAL unit syntax, adapted from [3]nal unit(NumBytesInNalUnit) {forbidden zero bitnal unit typenuh layer idnuh temporal id plus1NumBytesInRbsp=0for(i=2; i < NumBytesInNalUnit; i++)if(i+2 < NumBytesInNalUnit && next bits(24) == 0x000003) {rbsp byte[NumBytesInRbsp++]rbsp byte[NumBytesInRbsp++]i+=2emulation prevention three byte /* equal to 0x03 */} elserbsp byte[NumBytesInRbsp++]}inside logical data units called Network Abstraction Layer (NAL) units.An H.265/HEVC decoder decapsulates NAL units and consumes syntaxstructures to reconstitute a given picture1. The sequence of NAL unitscan be viewed as a text written in a specific language with a syntax andsemantics that the decoder can read and understand. The syntax is the setof words the decoder knows and the semantics tells the decoder how thesyntax is to be used. The information conveyed by the combination of thesyntax and the semantics is recovered through the decoding process, whichis fully specified in [3].Table 2.1 illustrates the syntax structure of a generic NAL unit and thesyntax elements it carries, syntax elements are highlighted in bold. Syntaxelements have associated descriptors which are used for parsing purposesbut these are not covered in this thesis and the interested reader is in-1In this thesis, we will interchangeably use the terms ”Picture” and ”Frame”.52.1. Syntax Structures and Syntax Elementsvited to refer to [4] (Chapter 5) for more details. Every NAL unit carriesNumBytesInNalUnit bytes, which further breaks down into a 16-bit headermade of 4 syntax elements and a payload which is the Raw Byte SequencePayload (RBSP) data structure, carrying NumBytesInRbsp bytes. Thefirst syntax element is the forbidden zero bit (forbidden zero bit). Thesecond syntax element is nal unit type, which is written over 6 bits andcarries the type of the RBSP contained in the NAL unit. The values thatit can take are specified in Table 7-1 of [3], NAL unit types belong either toVideo Coding Layer (VCL) or non-VCL. VCL types comprise all NAL unitsthat contain coded video data whereas non-VCL types contain parameterinformation. The third syntax element is the layer identifier, nuh layer id,which is written over 6 bits. Its value is always 0 although other valuescan be specified by future recommendations of ITU-T that relate to futurescalable or 3D video coding extensions of [3]. The fourth and final syntaxelement of the header is the temporal identifier, nuh temporal id plus1,which is written over 3 bits. Its value is typically 1, which means that thereis only one temporal layer. We assume that this is the case throughout thethesis. The temporal identifier for the NAL unit, TemporalID, is obtainedas:TemporalID = nuh temporal id plus1− 1 (2.1)The payload of NAL units is the RBSP, denoted as the rbsp bytesyntax element, where rbsp byte contains NumBytesInRBSP bytes andrbsp byte[i] is the ith byte of the RBSP. Because there are various types ofNAL units, the RBSP itself can be viewed as a syntax structure carrying syn-62.2. Coding Structures and Reference Picture Liststax elements. For each nal unit type, the H.265/HEVC standard providesthe description of the associated syntax structure. For instance, the RBSPof a Video Parameter Set has a dedicated syntax structure (Section [3]), the RBSP of a Clean Random Access NAL unit has a dedicatedsyntax structure further broken into a slice segment header, a slice segmentdata and trailing bits (Section of [3]), etc. In order to guaranteethat every NAL unit has a unique start identifier byte, the H.265/HEVCstandard uses dedicated bytes called emulation prevention three byte.During the decoding process, this byte is usually discarded. In this thesis,we assume that a bitstream is only made of generic VCL NAL units andfrom this point onwards, a NAL unit will be referred to as Logical Data Unit(LDU).2.2 Coding Structures and Reference PictureListsAn H.265/HEVC bitstream is made up of several entities called Coded VideoSequence (CVS). A CVS is the coded representation of a sequence of pictureswhich can be decoded using pictures within that sequence. Similarly, a codedpicture is the coded representation of a picture, which typically consists ofmultiple LDUs. A coded picture is embedded in a so-called access unit whichcontain all the LDUs associated with that picture. In this section we willpresent some of the tools used by the H.265/HEVC standard for motioncompensated prediction: coding structures and reference picture lists.72.2. Coding Structures and Reference Picture Lists2.2.1 Coding StructuresH.265/HEVC relies on temporal coding structures to perform its video com-pression task. A coding structure designates a set of consecutive pictureswith clearly defined dependencies between pictures and a given coding or-der. The purpose of having pictures depend on others is for prediction, whichcan be done from one picture or two pictures (called uni-prediction and bi-prediction respectively). Coding structures define a coding order, which isdifferent from the output order: the coding order is the order in which pic-tures are encoded while the output order is the order in which pictures aredisplayed on the screen. Because of this difference, the H.265/HEVC stan-dard uses a Picture Order Count (POC) to uniquely identify a given picturein output order. From this point onwards and for the sake of convenience,we will refer to the picture whose POC is equal to n as pocn.The definition of a coding structure bears a strong similarity to that of aGroup of Pictures (GOP) in H.264/AVC. In earlier video compression stan-dards such as H.264/AVC, a GOP designates a set of consecutive pictureswith clearly defined dependencies where the first picture is an intra-codedpicture (or equivalently an I-Frame). The difference between a GOP anda coding structure is that the first picture in a coding structure does nothave to be an I-Frame. Basically, the pictures that belong to a coding struc-ture only reference other pictures within the coding structure for predictionpurposes. In this case, the coding structure is called a closed GOP. TheH.265/HEVC standard also allows cases where a picture within a codingstructure references a picture from another coding structure, in which case82.2. Coding Structures and Reference Picture ListsFigure 2.1: Frame dependencies in the reference coding structure.the coding structure is called an open GOP. Throughout this chapter, wewill use the hierarchical-B coding structure that was used by the Joint Col-laborative Team on Video Coding (JCT-VC) for the Main Profile RandomAccess encoder configuration as described in [10]. All figures and tableswill refer to that specific coding structure. For simplicity, throughout theremainder of this thesis, we will refer to this coding structure simply as thereference coding structure.92.2. Coding Structures and Reference Picture ListsFig. 2.1 depicts four illustrations of frame dependencies in the referencecoding structure. Referenced pictures are denoted by a (*) and arrows pointfrom the referenced picture to denote all direct dependent pictures. Depen-dent pictures can be either before or after the referenced picture in displayorder. The reference coding structure is actually an open GOP coding struc-ture and by design it operates with a GOP size of 8. We can see the openside of the reference coding structure in Fig. 2.1 on the examples where poc0,poc4 and poc6 are the referenced pictures. They are referred by pictures be-yond the GOP size: poc0, poc4 and poc6 are all referenced by poc16. Thereference coding structure uses I-Frames and B-Frames. The coding orderof this coding structure is defined as {pocn, pocn−4, pocn−6, pocn−7, pocn−5,pocn−2, pocn−3, pocn−1}. poc0 is a special case and constitutes a GOP onits own since there are no pictures before poc0. Using this definition, wecan easily identify that after poc0, the next GOP is comprised of {poc8,poc4, poc2, poc1, poc3, poc6, poc5, poc7}. The reference coding structure isthen be applied periodically on the succeeding pictures throughout the videosequence. The encoder can change the coding structure if it yields betterperformance but we assume that it remains unchanged throughout the en-coding of a video sequence. The decoder at the receiver side will extract theinformation regarding the referenced pictures from Reference Picture Lists,which we describe in the next section.2.2.2 Reference Picture ListsCoding structures specify the coding order and the dependencies between agiven set of pictures. The decoder does not have any knowledge about the102.2. Coding Structures and Reference Picture ListsTable 2.2: Reference Picture Sets for the Hierarchical-B Coding Structureof GOP-size 8Reference Picture Set Reference POCs0 pocn−8, pocn−10, pocn−12, pocn−161 pocn−4, pocn−6, pocn+42 pocn−2, pocn−4, pocn+2, pocn+63 pocn−1, pocn+1, pocn+3, pocn+74 pocn−1, pocn−3, pocn+1, pocn+55 pocn−2, pocn−4, pocn−6, pocn+26 pocn−1, pocn−5, pocn+1, pocn+37 pocn−1, pocn−3, pocn−7, pocn+1coding structure that was used by the encoder, it must derive this informa-tion from the LDUs that carry the encoded video data. In this section, weexplain how the encoder transmits the information regarding the dependen-cies between pictures.At the receiver end, as a picture gets decoded, it is either displayed onthe screen or stored in the Decoded Picture Buffer until it is eventuallyoutput. Any picture located in the Decoded Picture Buffer can be reused asreference for prediction. Pictures that are available for inter prediction arelisted in a so-called Reference Picture Set. The Reference Picture Set is sentin the Sequence Parameter Set and each picture indexed in there is explicitlyidentified using its POC value. Table 2.2 lists the different Reference PictureSets defined for the reference coding structure that was used by the JCT-VCfor the Main Profile Random Access encoder configuration as described in[10]. Eight Reference Picture Sets are defined and for a given picture pocn,the corresponding referenced POCs are given. Since poc0 is the first POCof a video sequence, there can be no negative POC, therefore if poci with112.2. Coding Structures and Reference Picture Listsi < 0 were to be in a Reference Picture Set, the picture would simply notbe included.The LDUs of a given picture carry a header that specifies which ReferencePicture Set to activate. H.265/HEVC uses two Reference Picture Lists forinter prediction, called List0 and List1. The decoder reconstructs theselists from the Reference Picture Sets that were supplied in the SequenceParameter Set and this process is specified in Section 8.3.4. of [3]. Themain difference between a Reference Picture Set and a Reference PictureList is that a Reference Picture List is a subset of the Reference PictureSet which is actually used for inter prediction. For uni-predicted frames(P-Frames) only List0 is activated while for bi-predicted frames (B-Frames)both List0 and List1 are activated. Motion compensated prediction is thenperformed using the activated lists. The resulting prediction can be eithermade from one picture only or a combination of pictures. Using these lists,the hierarchy between pictures can be recovered. Table 2.3 depicts thehierarchical-B coding structure of size 8 that was used by the JCT-VC forthe Main Profile Random Access encoder configuration as described in [10].This is the reference coding structure that we use throughout this thesis forall our video sequences. For each picture, we provide the Reference PictureSet that is used and the POCs of the pictures in the Reference PictureLists. The first picture of a coded video sequence is usually an I-Frameand I-Frames do not use Inter Prediction. Therefore it does not have anyassociated Reference Picture Set and its associated Reference Pictures Listsare empty. poc8 and poc16 both use the same Reference Picture Set, howeverfor poc8, three of the pictures do not exist therefore poc8 only references122.3. Motion Compensated PredictionTable 2.3: Reference Picture Lists for the Hierarchical-B Coding Structureof GOP-size 8POC RPS used List0 POCs List1 POCs0 - N/A N/A8 0 0 04 1 0, 8 8, 02 2 0, 4 4, 81 3 0, 2 2, 43 4 2, 0 4, 86 5 4, 2 8, 45 6 4, 0 6, 87 7 6, 4 8, 616 0 8, 6, 4, 0 8, 6, 4, 012 1 8, 6 16, 810 2 8, 6 12, 169 3 8, 10 10, 12... ... ... ...poc0. By combining the information in Table 2.2 and Table 2.3, one caneasily reconstitute the direct dependencies that we illustrated earlier in Fig.2.1 for the reference coding structure.2.3 Motion Compensated PredictionThere are two types of prediction used in video compression: Intra(-frame)Prediction and Inter(-frame) Prediction. Intra prediction is used for intra-coded frames (I-Frames) whereas inter prediction is used for all other frames,which can be uni-predicted frames (P-Frames) or bi-predicted frames (B-Frames). Inter prediction in H.265/HEVC relies on Motion CompensatedPrediction in order to perform efficient compression. The main idea be-hind inter prediction is that a given picture uses another picture as ref-132.3. Motion Compensated PredictionFigure 2.2: Uni- and bi-predictive inter-prediction illustration from adjacentpictures, adapted from [4]erence, searches for the block in that reference picture that best matchesthe predicted area and encodes the information of the motion of that blockbetween both pictures. In H.265/HEVC, a given picture may use one ortwo pictures as reference for inter prediction. Fig. 2.2 illustrates the con-cept of uni-predictive and bi-predictive inter prediction. This is achievedusing the coding structures that we introduced in Section 2.2.1. poc doesuni-prediction from picture poc− 2 and does bi-prediction from its adjacentpictures poc − 1 and poc + 1. Note that bi-prediction does not require thepictures to be adjacent to poc, one CB from poc uses poc− 2 and poc− 1 forbi-prediction.The H.265/HEVC standard operates on a block-basis. The most basicblock used in H.265/HEVC is called a Coding Block (CB). Each picture ispartitioned into multiple CBs. Each CB is further partitioned into smallerblocks called Prediction Block (PB). After the picture has been partitionedinto PBs, the encoder will then perform prediction on a PB-basis from thereference pictures whose POCs are given in the Reference Picture Lists.142.4. Operation with Networking LayersThe encoder will look through the reference pictures for the same area asthe one in the PB on a PB-basis using a rate-distortion criterion. Onceit finds the area which presents the lowest amount of rate-distortion, itencodes the information of the shift as the tuple of the motion vector andthe reference picture’s POC. The motion vector is the shift between thearea corresponding to the PB and the area in the reference picture whichpresented the lowest amount of rate-distortion. The basic idea behind rate-distortion optimization is that the encoder looks for the best possible codingmode that reduces the loss of video quality, i.e. the distortion, and therequired bit rate to encode that area, i.e. the rate. It is beyond the scopeof this thesis to delve into rate-distortion algorithms and their specifics andthe interested reader is invited to refer to [11] and to [4] (Chapter 2) formore details on the application of rate-distortion in video compression.2.4 Operation with Networking LayersVideo compression techniques such as H.264/AVC and H.265/HEVC operateat the Application layer, which sits at the highest level in the Open Sys-tems Interconnection (OSI) model [12]. The encoder generates LDUs whichare then sent to the lower layers for transmission over packetized networksbased on the Internet Protocol (IP). One of the commonly used solutionsfor delivering video content over IP networks is to use the Real Time Proto-col (RTP). The Internet Engineering Task Force (IETF) has formulated theRFC 6184, which details the operation of RTP for delivering H.264/AVCcontent [13]. Similarly the IETF has formulated a draft RFC for the op-152.4. Operation with Networking Layerseration of RTP for delivering H.265/HEVC content [14]. We will look intothe specifics of RTP operation for delivering H.265/HEVC content. In thisthesis, we assume that that for all users we have a Single RTP stream on asingle media transport (SRST) and all LDUs are sent in RTP packets thatuse the Single NAL unit packet structure. Fig. 2.3 shows the structure ofsuch an RTP packet. The PayloadHdr field is the bit-exact copy of the LDUheader, the DONL field is optional and carries the 16 least significant bitsof the Decoding Order Number. We assume that this field does not exist.The NAL unit payload data field is the payload of the LDU and the lastfield is also optional and included for the purpose of padding. We assumethat all RTP packets have a padding field occupying 10 bytes. Given thatthe RTP specification for H.265/HEVC is still at a draft-level at the time ofwriting, we allow ourselves to make some modifications and introduce a newfield in the Single NAL unit packet structure: the RefCount field. Since theencoder knows exactly what coding structure is used to compress a videosequence, it can also keep track of the number of times a given picture isreferenced within the video sequence and propagate that information to theFigure 2.3: RTP Single NAL unit packet structure162.4. Operation with Networking LayersFigure 2.4: H.265/HEVC system layer stackRTP packets. We assume that the RefCount field occupies 2 bytes.For live streaming services, RTP is used in conjunction with the UserDatagram Protocol (UDP) to supply packets to IP. Another solution thathas been developed for buffered streaming services by MPEG is DynamicAdaptive Streaming over HTTP (DASH). DASH performs video streamingover the Hypertext Transfer Protocol (HTTP) using adaptive bit rate and iscodec-agnostic. Since this solution is based on HTTP, packets are suppliedto IP using the Transmission Control Protocol (TCP). IP packets can thenbe supplied to different wireless access technologies, such as LTE or WirelessFidelity (Wi-Fi). Fig. 2.4 gives an illustration of how the protocol stacksare set up. In this thesis, we will focus on using video streaming servicesto cellular users. We assume the use of RTP and UDP to supply packetsover IP, using the modified Single NAL unit packet structure for the RTPpayload, and LTE-A as the air interface.17Chapter 3Cross-Layer Frame ReferenceAware SchedulingFrameworkIn the previous chapter, we presented some of the features of the H.265/HEVCstandard that are relevant for video compression. We presented codingstructures, syntax structures and syntax elements, which are used to en-code video content. We also presented motion compensated prediction formore bandwidth-efficient encoding and reference picture lists for helping thedecoder track which pictures to use as reference when doing motion predic-tion. Using these features, we define a cross-layer scheduling frameworkthat exploits these features and delivers video content based on their de-pendencies between each-other. In this chapter, we propose a mathematicalformulation of the shared resource allocation problem for delivering videocontent and derive the optimal solution to this problem.183.1. Mathematical Formulation of the Shared Resource Allocation Problem3.1 Mathematical Formulation of the SharedResource Allocation ProblemLet us consider S to be the set of users actively sharing resources. Letus consider a user k and let the channel capacity of user k for time-slot nbe denoted by Ck(n). Kelly [15] has provided a mathematical formulationof the shared resource allocation problem, which has been widely used bythe research community for tackling rate control problems in communicationnetworks. This shared resource allocation problem, which we will call SRAP,is formulated as the following constrained optimization problem and solvedat the beginning of every time-slot n.SRAP:maximize F (~r(n)) ,∑k∈SUk(rk(n)) (3.1)subject to rk(n) < Ck(n), rk(n) ≥ 0, k ∈ S (3.2)F is the objective function that we are trying to maximize, Uk(rk(n)) denotesthe utility function of user k and rk(n) is the average throughput of user kup to time-slot n. Constraint (3.2) ensures that the rate of the user doesnot exceed the channel capacity Ck(n) that user k is experiencing duringtime-slot n. Under the assumptions that the objective function F in (3.1)is strictly concave and differentiable and that the feasible region in (3.2) iscompact, we know from Nonlinear Programming Theory [16] that an optimalsolution exists for SRAP and Kelly has provided an explicit optimal solutionto this problem using Lagrangian methods [15].193.1. Mathematical Formulation of the Shared Resource Allocation ProblemIn wireless networks, the channel capacity and the number of users ac-tively sharing resource varies with time. This is due to the random natureof the wireless channel and the network’s traffic. As a result, the optimalsolution to SRAP also varies with time. Hosein [17] proposed a solutionto SRAP by observing that finding the optimal solution consists in findingthe user which maximizes the gradient of the objective function. Hoseindeveloped his solution by introducing update equations using exponentialsmoothing filters in order to keep track of each user’s throughput, whoseexpression is given as followsrk(n+ 1) =(1− 1τ)rk(n) +dk(n)τif user k is served,(1− 1τ)rk(n) otherwise.(3.3)dk(n) is the throughput of user k estimated for time-slot n in bits per sec-ond. τ > 1 is the time constant of the exponential smoothing filter. rk(n)is the average throughput of user k up to time-slot n. Because the objec-tive function is strictly concave, Hosein showed that all we need to find isthe direction, i.e. the user, which maximizes the gradient of the objectivefunction. If we denote this user as user k∗ thenk∗ = argmaxk∈S{∇F (~r)}. (3.4)As an example, if the utility function Uk of each user k is defined as thelogarithmic function of the rate of that user log(rk), then the maximumgradient direction, i.e. the user maximizing the gradient function, is given203.1. Mathematical Formulation of the Shared Resource Allocation Problemby:k∗ = argmaxk∈S{dk(n)rk(n)}(3.5)(3.5) is the well-known Proportional Fair metric, widely used for schedulingin cellular networks such as Universal Mobile Telecommunications System(UMTS) and LTE. An alternate way of finding this result is as follows2. Theutility function Uk of each user k is defined as the logarithmic function ofthe rate of user k and we know how the rate of each user is computed. Letus assume that user i is selected at time-slot n, the new utility value will be∑k∈Sk 6=ilog((1− τ−1)rk(n)) + log((1− τ−1)ri(n) + τ−1di(n)). (3.6)By adding and subtracting log((1− τ−1)ri(n) in Eq. (3.6), the sum will beperformed for all users and Eq. (3.6) then becomes∑k∈Slog((1− τ−1)rk(n)) + log((1− τ−1)ri(n) + τ−1di(n)(1− τ−1)ri(n)). (3.7)After some simplifications, Eq. (3.7) eventually boils down to∑k∈Slog((1− τ−1)rk(n)) + log(1 +1(τ − 1)di(n)ri(n)). (3.8)From Eq. (3.8), it is obvious to see that the overall utility is maximized ifuser i maximizes di(n)ri(n) , which is the Proportional Fair metric. Hosein [17]also proposed the use of barrier methods in order to account for Quality ofService (QoS) constraints. In nonlinear programming, barrier methods are2The author of this simple and elegant proof is Dr. Cyril Leung.213.1. Mathematical Formulation of the Shared Resource Allocation Problemused on optimization problems in order to force the solutions to remain inthe interior of the feasibility region. Another alternative to barrier methodsare penalty methods, which forces the solutions to remain in a certain areaof the feasibility region by imposing large penalties to solutions that lieoutside of that area. In this thesis, we propose to use barrier functions inorder to deliver video content by exploiting frame references. For a detaileddiscussion of penalty and barrier methods, the interested reader is invitedto refer to Chapter 13 of [16].In order to deliver video content, we extend the formulation of SRAPto account for frame reference awareness and call this new problem SRAP-FRA. We introduce a new constraint on the frame reference count of user k,ck(n), to account for the fact that the network does not hold transmissionqueues of infinite size. This also prevents the scenario where a video userwatches an infinitely long video sequence. This aspect is modelled throughFinite-Buffer traffic models and these will be discussed in greater detail inChapter 4. Just like SRAP, SRAP-FRA is also solved at the beginning ofevery time-slot n. The expression of SRAP-FRA is as follows.SRAP-FRA:maximize F (~r(n),~c(n)) ,∑k∈SUk(rk(n), ck(n)) (3.9)subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.10)ck(n) < C′k(n), ck(n) ≥ 0. (3.11)C ′k(n) is the constraint on the number of frame references the transmission223.1. Mathematical Formulation of the Shared Resource Allocation Problemqueue of user k can hold at any given time-slot n, ck(n) is the average numberof frame references that user k has been receiving up to time-slot n andUk(rk(n), ck(n)) is the combined utility function of user k that we introducefor our frame reference aware scheduling framework. For our schedulingframework, we need to track for each user whether its transmission queueis holding any frame that is referenced within the video sequence user k iswatching and take any decision based on that. Essentially, we are buildinga scheduling framework where users watching video content get sent contentthat the decoder needs to perform its task as efficiently as possible and byincurring as little delay as possible in playback. To that end, we use barrierfunctions and express the combined utility function for each user k asUk(rk(n), ck(n)) = Uk,1(rk(n)) + Uk,2(ck(n)), (3.12)whereUk,1(rk(n)) , log(rk(n)), Uk,2(ck(n)) , −λ exp(−µ(ck(n)− cmin)). (3.13)In (3.13), Uk,2 is a generalized expression of a barrier function, λ and µ arepositive-valued parameters for adjusting the penalty for leaving the feasibleregion. Hosein [17] has proposed the use of such functions for deliveringQoS though there is no indication in the literature to suggest that this typeof approach is the most optimal way of accounting for QoE constraints.Other approaches and methodologies should definitely be investigated foraddressing such issues. Our motivation for using a barrier function based233.1. Mathematical Formulation of the Shared Resource Allocation Problemapproach is to provide a simple scheduling framework.In parallel to the update equation of the rate of user k, we also introducean exponentially smoothed update equation for keeping track of the framereference count of user k.ck(n+ 1) =(1− 1T)ck(n) +tk(n)Tif user k is served,(1− 1T)ck(n) otherwise,(3.14)where ck(n) is the frame reference count of user k at the beginning of time-slot n, cmin is the minimum number of frame references that we force thesystem to provide to each video user, T > 1 is the time constant of theexponential smoothing filter and tk(n) is the number of frame referencesbeing transmitted to user k at time-slot n. Due to the assumptions that wemade regarding the proposed combined utility function, the formulation ofSRAP-FRA can be rewritten asSRAP-FRA:maximize F (~r(n),~c(n)) ,∑k∈S(Uk,1(rk(n)) + Uk,2(ck(n)))(3.15)subject to rk(n) < Ck(n), rk(n) ≥ 0. (3.16)ck(n) < C′k(n), ck(n) ≥ 0. (3.17)This is the cross-layer scheduling framework that we propose to solve in thisthesis and for which we derive a solution in the following section.243.2. Solution to the proposed Shared Resource Allocation Problem3.2 Solution to the proposed Shared ResourceAllocation ProblemIn this section, we are going to derive the solution to the proposed optimiza-tion problem SRAP-FRA (3.15). We need to find the user that maximizesthe gradient of the objective function. Since we constructed our combinedutility function as the sum of two separate utility functions (3.12), maxi-mizing the combined utility can be written as maximizing∑k∈S Uk,1(rk(n))and∑k∈S Uk,2(ck(n)) individually. We already know the solution to themaximization of the sum of the first utility function Uk,1(rk(n)). We willfocus on deriving the solution to the maximization of the sum of the secondutility function Uk,2(ck(n)). Let us call j the user selected to be served attime-slot n by the network. Let us call β the parameter with which weparameterize the movement of the sum of the second utility functions in thedirection of serving user j. The objective function F can then be writtenas:Fj,2(β) = Uj,2(cj(n) + β(cj(n+ 1)− cj(n)))+∑k∈Sk 6=jUk,2(ck(n) + β(ck(n+ 1)− ck(n))) (3.18)253.2. Solution to the proposed Shared Resource Allocation ProblemUser j is served and all other users are not. Given the update equations ofthe frame reference count (3.14), (3.18) simplifies toFj,2(β) = Uj,2(cj(n) + βtj(n)− cj(n)T)+∑k∈Sk 6=jUk,2(ck(n)− β ck(n)T). (3.19)Taking the partial derivative of Fj,2 with respect to β and setting β to 0, weget:∂Fj,2∂β=tj(n)− cj(n)TU ′j,2(cj(n))−∑k∈Sk 6=jck(n)TU ′k,2(ck(n)). (3.20)Eq. (3.20) can be rewritten as:∂Fj,2∂β=tj(n)TU ′j,2(cj(n))−∑k∈Sck(n)TU ′k,2(ck(n)). (3.21)Since we are looking to maximize∂Fj,2∂β , we can ignore the second term of(3.21) as this term is a sum which is common to all users in the network. Wealso know the expression of Uk,2, so the expression of the maximum gradientdirection isk∗ = argmaxk{λµTtk(n) exp(−µ(ck(n)− cmin))}. (3.22)Essentially, this means that the system will maximize the utility of usersby prioritizing the transmission of referenced frames ahead of unreferencedframes. As we saw in Section 2.2.1, there is a clear hierarchy in the way263.2. Solution to the proposed Shared Resource Allocation Problemframes depend upon each-other in video sequences. If a video user is pro-vided frames which the decoder can always decode or if the decoder does nothave to wait for other frames before being able to decode those frames, thenvideo users can watch video sequences with no perceptible delay and thiswill enhance the Quality of Experience of video users. This sort of procedurehelps counter error propagation within the video decoding process, thereforethe proposed cross-layer scheduling framework can be seen as a form of errorresilience. Using (3.5), (3.12) and (3.22), the final expression of the metricfor the proposed scheduling framework (3.15) can then be expressed as:dk(n)rk(n)+λµTtk(n) exp(−µ(ck(n)− cmin)). (3.23)For the rest of this thesis, we shall refer to our proposed scheduling schemeas Frame Reference Aware Proportional Fair (FRA-PF).27Chapter 4System ModelIn this chapter, we describe our system model and simulation methodologyfor evaluating the performance of our proposed scheduling framework. Ourevaluation methodology is centered upon using system-level simulations. Inthis chapter we will cover the components that are of utmost relevance to thisthesis. More in-depth and complete description of system level simulationmethodologies can be found in [18], [19] and [20].4.1 H.265/HEVC Video Content GenerationAnalytical traffic models have been proposed for near-real time video stream-ing in [18], where the packet sizes and packet inter-arrival times are basedon truncated Pareto distributions. While this analytical model captures theTable 4.1: H.265/HEVC Table of Video Test SequencesSequence length Frame rate Resolution(frames) (fps) (px x px)FourPeople 600 60 1280x720Johnny 600 60 1280x720KristenAndSara 600 60 1280x720SlideShow 200 20 1280x720SlideEditing 300 30 1280x720284.1. H.265/HEVC Video Content Generationvariability in the packet sizes coming from the video source, it is agnostic tothe specifics of the H.265/HEVC standard and therefore cannot be relied onfor generating realistic video traffic. Moreover, our objective is to evaluatethe application level experience of H.265/HEVC video users and to this end,we use HM 14.0 to generate video bitstreams [21]. We use different videotest sequences which were used for development and testing purposes byMPEG: FourPeople, Johnny, KristenAndSara, SlideShow and SlideEditing.The characteristics of these video test sequences are given in Table 4.1. Foreach of these video sequences, we generate the corresponding bitstream andtrace files using HM 14.0 [21], from which we extract the information of theReference Picture Lists, as defined in Section 2.2.2, for all frames in orderto determine the frame reference dependence structure.For simplicity, we assume that each frame consists of only one slice seg-ment (see Section 2.1), so that each frame is encoded inside one LDU. TheGoP size is set to 8, the Intra-Period is defined as the interval between twoconsecutive I-Frames in terms of frames. The Intra-Period is always set sothat an I-Frame can be found approximately every second. Its value de-pends on the frame-rate of the video sequence: for a frame rate of {20, 24,30, 50, 60} fps, the Intra-Period is set to {16, 24, 32, 48, 64} (respectively).Aside from I-Frames, we use B-Frames only. Using the bitstreams generatedfrom the video sequences we selected, we create a custom Traffic Model foreach video sequence and use it as input to our LTE-A simulator, which isdescribed below. The H.265/HEVC parameters used to generate the bit-streams are summarized in Table 4.2. Other parameters needed to run HM14.0 are left to their default values as in [10].294.2. LTE-Advanced System ModelTable 4.2: H.265/HEVC ParametersHigh Efficiency Video Coding ParametersVideo Sequence Length 10 secondsSliceMode 0Coding Unit size 64 pixels x 64 pixelsGoP size 8Quantization Parameter 32Frame Structure IBB...BIBB...BDecoding Refresh Type Clean Random Access4.2 LTE-Advanced System ModelIn order to evaluate the performance of our proposed scheduling framework,we use system-level simulations based on openWNS and IMTAphy [22]-[23]. The performance evaluation methodology is based on the simulationmethodology described in Annex A of the 3GPP Technical Report 36.814[19] and in the Evaluation Methodology Document of IEEE 802.16m [18].In this section, we will describe some of the components and features thatwe use in our performance evaluation. Evaluation methodologies based onsystem-level simulations require many components to capture aspects of thephysical layer and the protocols implemented at the link layer.4.2.1 Network ModelWe consider a downlink LTE Advanced (LTE-A) system using FrequencyDivision Duplex (FDD) with N = 19 base stations. Each base station isassumed to have three sectors each in order to provide coverage, thus thereis a total of 57 sectors in the network. An illustration of the hexagonalgrid layout is provided in Fig. 4.1. To ensure that all cells experience304.2. LTE-Advanced System ModelFigure 4.1: Hexagonal Network Grid Layoutsimilar interference and that we accurately model the impact of outer-cells,we implement a wrap-around technique. The full system is actually modelledas a network consisting of 7 clusters, where each cluster is made of N = 19base stations. The central cluster is where the users are created and whereall of the statistics are collected. Fig. 4.2 illustrates the concept of wrap-around. Virtual clusters are depicted in grey while the central cluster isdepicted in white, the central base station of each cluster is depicted inyellow. The surrounding clusters are virtual clusters in the sense that no useris actually dropped there. All the cells in the virtual clusters are copies of the314.2. LTE-Advanced System ModelFigure 4.2: Wrap Around of Hexagonal Networkoriginal cells in the central cluster. Everything the virtual cells have is thesame in terms of antenna configuration, traffic and fast-fading, with the onlydifference being the location. Users are dropped independently at uniformlyrandom locations in the central cluster. For all base stations, we assume thateach sector uses 4 transmit antennas and each user uses 2 receive antennas.This corresponds to a 4x2 Multiple Input Multiple Output (MIMO) system.The system bandwidth B is assumed to be 10 MHz. Resource Allocation324.2. LTE-Advanced System ModelFigure 4.3: LTE Downlink PRB allocation illustrationType is assumed to be 0, i.e. that we allocate groups of Physical ResourceBlock (PRB) to users. For a system bandwidth of 10 MHz, the 3GPPstandard specifies that users are allocated groups of 3 contiguous PRBs.Fig. 4.3 depicts a PRB allocation with 4 users in a system with 10 MHz ofbandwidth. Note that at 10 MHz, the last group only contains 2 PRBs asthe total number of PRBs at 10 MHz of bandwidth is 50.Table 4.3: FTP Traffic Model 1Parameter Statistical CharacterizationFile size 2 MegabytesUser arrival rate λbe Poisson distributed process with rate λbeNumber of downloads 1 (each user downloads a single file)334.2. LTE-Advanced System Model4.2.2 Traffic ModelWe model two types of traffic: Best Effort (BE) traffic and video traffic.Traffic type assignment probability between BE and video is 0.5 each. Usu-ally users are assumed to be active for the entire duration of the simulation,i.e. they are created at the beginning of the simulation and dropped at theend of the simulation, as stated in [18]. In this thesis, we decided to use morerealistic traffic models. Users are created at random time instants accord-ing to a Poisson distributed random process. Users remain in the networkuntil they have completed their session or until they are dropped from thenetwork. For the BE traffic model, we use FTP Traffic Model 1 defined inthe 3GPP Technical Report [19] and whose parameters are summarized inTable 4.3.Similarly, we define a traffic model for video users; in this thesis we useour own custom traffic model. Because we need information about framereference dependencies, we turn to HM 14.0 to generate realistic video bit-streams for use in our performance evaluation. Section 4.1 covers the actualgeneration of the video bitstreams in more detail. We wrap the video bit-streams around six times as each bitstream individually carries 10 seconds’worth of video data. This helps us generate video traffic representing oneminute’s worth of video data. Video users remain in the network until thereare no more packets left for them to receive. The parameters of our videotraffic model are summarized in Table 4.4.344.2. LTE-Advanced System ModelTable 4.4: H.265/HEVC Traffic ModelParameter Statistical CharacterizationVideo duration 1 minuteUser arrival rate λv Poisson distributed process with rate λvNumber of sessions 1 (each user watches a single video once)4.2.3 Channel ModelFor every user in the network, we need to model the effects of the large-scaleand small-scale fading. Depending on the simulated scenario, the propaga-tion and fading characteristics of the channel may be different. In this thesis,we focus on the Urban Macrocell scenario, also referred to as Case 1 by the3GPP, as defined by the 3GPP in Table A.2.1.1-1 of [24]. It should benoted that Urban Macrocell is also a scenario defined by the InternationalTelecommunications Union Radiocommunications Sector (ITU-R) in reportM.2135 [25]. The ITU-R scenario defines users traveling at vehicular speeds(30 km/h) whereas the 3GPP Urban Macrocell scenario defines users astraveling at pedestrian speeds (3 km/h). The reason for using the 3GPPUrban Macrocell scenario is because we consider services which require highdata rates, which are more practical if the users are moving at pedestrianspeed. System-level simulations typically rely on stochastic channel modelssuch as the Spatial Channel Model [26] to capture these aspects. Typically,channel models capture the number of clusters3 and their spatial character-istics such as the delay spread, the angular spread and the power carriedby each cluster. The original implementation of the system-level simulationtool we used, IMTAphy, uses the channel model specified by the ITU-R in3In this thesis, we will interchangeably use the terms ”Cluster” and ”Tap”.354.2. LTE-Advanced System Modelreport M.2135 [25]. In [25], the channel model for the Urban Macrocell sce-nario is defined as a 20-tap model, whereas the channel model we decided touse is the Spatial Channel Model [26], which is a 6-tap model. There are tworeasons for choosing the 3GPP Spatial Channel Model. The first reason isthat although the ITU-R Channel Model is more accurate, it requires a largememory footprint in terms of storing cluster and ray specific information. Italso requires high computational power due to having to sum a large num-ber of clusters for every link, for every subcarrier and for every time-slot.The second reason is that we are looking to do a fair comparison betweentwo different scheduling schemes. The relevant aspect of the channel modelthat we need in order to do this is to accurately capture statistical char-acteristics of the channel such as Delay Spread and Angular Spread ratherthan to provide accurate performance predictions in real environments. Theradio channel can typically be described through its large-scale and small-scale characteristics. Large-scale characteristics are captured through thepath-loss and the shadow fading distribution. The deterministic path-lossformula used for the Urban Macrocell scenario is defined in [24] as followsPL(d) = 128.1 + 37.6 log10(d) (4.1)where PL denotes the mean path loss in dB between a given user and a givenbase station and d denotes the distance between the user and the base stationin kilometers. This mean path-loss formula is valid for carrier frequenciesaround 2 GHz. The distance between a user and a base station must alwaysbe at least 35 meters. The short-term statistics are characterized by small-364.2. LTE-Advanced System Modelscale parameters. Let us denote the number of clusters in a link by N . Thegeneration of the parameters required to compute the channel coefficientsis documented in [26] and [20]. The eventual channel impulse responsesaccount for the aspects of modelling a MIMO channel and are given for agiven pair of antennas s and u (resp. station and user) and a given clustern:hu,s,n(t) =√1KR + 1hNLoSu,s,n (t) +√KRKR + 1hLoSu,s,n(t) n = 1,√1KR + 1hNLoSu,s,n (t) 2 6 n 6 N,(4.2)where KR is the Ricean factor, hNLoSu,s,n is the non line-of-sight component ofthe channel and hLoSu,s,n is the line-of-sight component of the channel, whichis applied only to the first cluster. The way the Spatial Channel Model isdesigned, the first cluster is the cluster for which the delay is the shortest.The non line-of-sight channel component is expressed for a given cluster andfor a given pair of transmit-receive antenna elements as follows [26]:hNLoSu,s,n (t) =√PnMM∑m=1Frx,u,V (θn,m)Frx,u,H(θn,m)T exp(jΦvvn,m) √κ−1 exp(jΦvhn,m)√κ−1 exp(jΦhvn,m) exp(jΦhhn,m)Ftx,s,V (φn,m)Ftx,s,H(φn,m)exp(jds2piλ−10 sin(φn,m)) exp(jdu2piλ−10 sin(θn,m)) exp(j2piνn,mt) (4.3)where Pn is the power of the nth cluster, M is the number of rays within thecluster, Frx,u,V and Frx,u,H are the field patterns of the uth antenna element374.2. LTE-Advanced System Modelat the receiver side in the vertical and horizontal polarizations respectively,Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at thetransmitter side in the vertical and horizontal polarizations respectively,θn,m and φn,m are the arrival and departure angles of the mth ray in thenth cluster, ds and du are the distance between antenna elements at thetransmitter and receiver side respectively, νn,m is the Doppler frequencycomponent of the mth ray of the nth cluster and t is the time instant.Φvvn,m, Φvhn,m, Φhvn,m and Φhhn,m are uniformly generated random phases usedfor initialization purposes.In a similar fashion to the non line-of-sight channel component, the line-of-sight channel component for a given pair of transmit-receive antenna el-ements and is expressed as follows [26]:hLoSu,s,n(t) =Frx,u,V (θLoS)Frx,u,H(θLoS)T exp(jΦvvLoS) 00 exp(jΦhhLoS)Ftx,s,V (φLoS)Ftx,s,H(φLoS)exp(jds2piλ−10 sin(φLoS)) exp(jdu2piλ−10 sin(θLoS)) exp(j2piνLoSt) (4.4)where Frx,u,V and Frx,u,H are the field patterns of the uth antenna elementat the receiver side in the vertical and horizontal polarizations respectively,Ftx,s,V and Ftx,s,H are the field patterns of the sth antenna element at thetransmitter side in the vertical and horizontal polarizations respectively,θLoS and φLoS are the arrival and departure angles of the line-of-sight ray,ds and du are the distances between antenna elements at the transmitterand receiver respectively, νLoS is the Doppler frequency component of theline-of-sight ray and t is the time instant. ΦvvLoS and ΦhhLoS are uniformly384.2. LTE-Advanced System Modelgenerated random phases used for initialization purposes.The channel impulse responses given by (4.2) are expressed in the time-domain. Since we are considering an LTE-A air interface, which is based onOFDMA, we need frequency domain channel coefficients. The frequency do-main channel coefficients are obtained by applying a Fast Fourier Transformon the time domain channel impulses responses. The equivalent frequencydomain channel matrix at the kth subcarrier for a 4x2 MIMO system aregiven as:H(k) =H1,1(k) H1,2(k) H1,3(k) H1,4(k)H2,1(k) H2,2(k) H2,3(k) H2,4(k) , k ∈ {1, 2, ..., NFFT } (4.5)where NFFT is the Fast Fourier Transform size. Let us denote the FastFourier Transform by F . Each individual component of the channel transferfunction H(k) at a given time-instant t is a function of the channel impulseresponses given by (4.2) and is expressed as follows [20]Hu,s(k) = F [hu,s,1(t), hu,s,2(t), ..., hu,s,N (t)], k ∈ {1, 2, ..., NFFT }. (4.6)In the specific case of LTE, the subcarrier spacing is defined as 15000 Hz.For a system bandwidth of size 10 MHz, we need a sampling rate that is atleast higher than 10 MHz and that is a multiple of the subcarrier spacing, i.e.15000 Hz. Since Fast Fourier Transforms are optimized for lengths that areinteger powers of 2, we use a Fast Fourier Transform of size NFFT = 1024.394.2. LTE-Advanced System Model4.2.4 Feedback ModelCritical to the performance of most wireless communications systems aremechanisms for delivering Channel State Information (CSI) to the transmit-ter. It is shown in Chapter 8 of [27] that with CSI knowledge at the trans-mitter, one can extract the maximum performance available from MIMOsystems. The 3GPP standard has outlined several control signalling mech-anisms for each of the transmission modes it defines. In this thesis, we useTransmission Mode 10 with 4-Tx Release 12 linear precoding matrices [28].The 3GPP standard defines an implicit feedback mechanism to operate theUplink control signalling. What is meant by ”implicit” is that instead ofsending information about the channel matrix itself, the user sends quan-tized information about different channel statistics that can help the net-work make appropriate scheduling decisions. The 3GPP standard definesthe content of the control signalling through 3 indicators [28]:• Rank Indication (RI),• Precoding Matrix Indicator (PMI),• Channel Quality Indicator (CQI).The RI is the rank of the channel matrix, i.e. the number of degrees offreedom that it can carry. The PMI is the index of the Precoding Matrixthat maximizes the received power at the receiver and the CQI is the spectralefficiency that the receiver would be able to achieve. The PMI and CQIreports are conditioned upon the value of the RI. The reporting mode weuse in this thesis is the Aperiodic CSI Reporting Mode 3-1, as defined in404.2. LTE-Advanced System ModelSection 7.2.1 of [28]. Other reporting modes are also defined by the 3GPP[28].Aperiodic CSI Reporting Mode 3-1 consists in a single RI report, a singlePMI report and several subband CQI reports. The size of a subband isspecified by the 3GPP standard to be 6 PRBs for a system bandwidth of 10MHz in [28]. Thus, a single CSI report from the user will contain one valuefor the RI, one value for the PMI and nine values for the CQI (one CQIvalue per subband). In this thesis, we assume that the periodicity of theCSI reports is set to 5 ms. The RI is typically a statistic that is reported lessfrequently than the PMI or the CQI and its periodicity is set to 20 ms. Forthe subband CQI reports, we assume non-ideal channel estimation, which isobtained by modelling a noisy sample of the interference covariance matrixin the equalizer vector using the complex Wishart distribution [29].41Chapter 5Simulation Results andAnalysisSome of the key targets specified by the NGMN Alliance for 5G networkscan be broadly summarized as providing consistent user experience and en-hanced Quality of Experience. These targets are defined and outlined in [5].As an example, one target is for the network to be able to provide a certainuser throughput for 95% of the time across 95% of the coverage area. Thisis typically referred to as the 5th percentile of the Cumulative DistributionFunction (CDF) of the user throughput. We also look at the average userthroughput as an indicator of the overall user experience.In this chapter, our simulation assumptions and results are described,including insights gained from our results. So far, all the works in the fieldof video transmission over wireless networks use Full Buffer methodologiesto evaluate performance. The main problem with Full Buffer methodologiesis that they only capture performance metrics (for instance user throughputand served cell throughput) in a range where the network is operating at fullload. Since cellular networks experience different types of loads dependingon the time of the day, it is useful for carriers to have a more complete425.1. Simulation Assumptionsview of performance at different traffic load points. One motivation forusing traffic models where user arrivals are modelled according to a Poissondistributed random process is to capture performance at traffic load pointsthat are meaningful to carriers.Intuitively, we expect that performance will be good at low traffic loadpoints because there is a small number of users in the network, which resultsin low interference and high user throughputs. This ensures that users thatenter the network are served quickly and leave quickly. This scenario isnot attractive to carriers because although the Quality of Experience isexcellent, they are earning little revenue due to the small number of users.Conversely, we expect that performance will be bad at high traffic load pointsbecause there is a large number of users in the network, which results in highinterference and low user throughputs. This scenario is also unattractive tocarriers because although revenues are high due to the large number ofusers accessing their spectrum, the Quality of Experience is mediocre andthis will lead to customer dissatisfaction. The desirable scenario for carriersis intermediate traffic loads: where the number of users on the network leadsto a reasonable revenue for the carrier; the resulting moderate interferenceleads to acceptable throughputs and users can enjoy reasonably good Qualityof Experience.5.1 Simulation AssumptionsIn this section, we outline some of the assumptions made in our simulations.The main components of our system model are described in Chapter 4. Here,435.1. Simulation Assumptionswe describe some of the other assumptions made. We assume that the basestation in our LTE-A network is a Media Aware Network Element (MANE).A MANE is a network node which has the ability to parse an encodedvideo bitstream and identify specific LDUs. Since our LTE-A base stationscan parse video bitstreams, they can specifically look for each user’s LDUsand keep track of the RefCount field in the LDUs. Using the informationcarried by the RefCount field, the LTE-A system can then keep track of thereferenced frames being sent to each video user, using exponential smoothingupdate equations (3.14) and allocate resources accordingly. In the simulationof our proposed scheduling framework, the following parameter values areused: λ = 25, µ = 1, T = 25 and cmin = 50.Our motivation in this work is to model a realistic 4G/beyond-4G sys-tem. Although several research projects on 5G have been initiated, there isno air interface specified yet for a 5G system. Therefore we use a 4G airinterface with as many up-to-date features as possible to do our performanceevaluation using metrics which have been proposed for 5G systems. For ourLTE-A system, we decide to model a 4x2 MIMO system. We also assume theuse of Single User Multiple Input Multiple Output (SU-MIMO), as opposedto Multi User Multiple Input Multiple Output (MU-MIMO). It is shownin Chapter 7 of [27] that in MIMO systems, the availability of both multi-ple transmit antennas and multiple receive antennas can provide additionalspatial dimensions for communication. These additional degrees of freedomcan be exploited by spatially multiplexing different data streams onto theMIMO channel. The main difference between SU-MIMO and MU-MIMOis that SU-MIMO will focus on sending multiple data streams towards the445.1. Simulation Assumptionssame user whereas MU-MIMO will focus on sending data streams towardsspatially separate users. We also assume the use of Transmission Mode 10and assume the use of 4-Tx Release 12 Precoding Matrices [28]-[30]. Trans-mission Mode 10 is a mode where the system allows the use of so-callednon-codebook based precoding with up to 8 layers. It is beyond the scopeof this thesis to describe the physical layer procedures and processing fea-tures that are relevant for the operation of Transmission Mode 10. Moredetailed description of Transmission Mode 10 and the associated physicallayer procedures are provided in [31]-[28]. For system-level simulations, weneed link-to-system models that can accurately translate an instantaneousSignal to Noise Ratio (SNR) value into a corresponding instantaneous blockerror rate value. Several methods exist in the literature such as Exponen-tial Effective SNR Mapping (EESM) [32] and Mutual Information EffectiveSNR Metric (MIESM) [33]. In this thesis, we use EESM. The basic ideabehind EESM is as follows: let us assume a user received a transmissionover Nsc subcarriers with instantaneous SNR value γk at the kth subcarrier.The instantaneous effective SNR γeff using EESM is obtained as:γeff = −β ln(1NscNsc∑k=1exp(− γkβ)), (5.1)where β is a correction parameter used for tuning a specific modulation.The resulting γeff is then mapped to a corresponding block error rate. Thevalues of the β parameters depend on the modulation and the code rate,e.g. β = 1.49 for Quaternary Phase Shift Keying (QPSK) with a code rateof 13 or β = 7.68 for Quadrature Amplitude Modulation (QAM)-16 with a455.1. Simulation AssumptionsTable 5.1: LTE-Advanced ParametersLTE Advanced ParametersSystem Bandwidth 10 MHzChannel Model Spatial Channel Model [20]Scenario Urban Macro-cell [24]Carrier Frequency 2 GHzLink-to-System Interface Exponential ESMTraffic Model Finite BufferReceiver Type Wishart-IRC [29]MIMO scheme 4x2 SU-MIMOTransmission Mode TM 10Precoding Codebook 4-Tx Release 12 [30]CSI Reporting Mode Aperiodic Mode 3-1 [28]code rate of 45 . These values can be found in Table 19.13, Chapter 19 of[20]. Several sources exist for the values of β that can be applied in anLTE or LTE-A system, for our simulations we use the β values given in [32].Parameter values for our LTE-A simulations are summarized in Table 5.1and reflect those used in study items that 3GPP technical groups have usedfor 3GPP Release 12.As discussed in Section 4.2.2, we use traffic models that generate userarrivals according to Poisson processes. The traffic assignment probabilityis 0.5 each and in our simulations, the user arrivals rates for the two trafficmodels, i.e. BE and video, are equal. This ensures that the average numberof users generated for each traffic type is the same. The length of thesimulation is chosen such that we generate at least 8000 users for each traffictype. This was done to ensure that all the metrics that are reported in thisthesis are obtained within a 95% confidence interval of ±10% around themean value.465.1. Simulation AssumptionsWe use offered load per sector and Resource Utilization (RU) as ourreference points. This is because for finite buffer traffic models the 3GPPconsortium decided to evaluate performance based on the RU values a cel-lular network goes through and we decided to align our methodology withthose assumptions. RU is defined as the ratio of the aggregated number ofradio resource blocks allocated for data traffic to the total number of ra-dio resource blocks in the system bandwidth available for data traffic [19].We first ran simulations using the Proportional Fair scheme and determinethe offered loads corresponding to RU values between 40% and 70%. Thenwe ran simulations using the proposed scheme for those offered loads andcompare the resulting performance and QoE for both BE users and videousers. These offered loads are listed in Table 5.2. It can be seen that forthe PF scheme, the offered load per sector values range between 5.88 Mbpsper sector and 6.94 Mbps per sector. The 95% confidence interval for thereported RU values is within ±3.2% of the reported values.For video users, we report the Active Download Time (ADT), the satis-fied video user percentage and the packet loss ratio of Clean Random AccessNAL units. A user is considered to be satisfied if its MOS is greater than4. Conversely a user is considered to be unsatisfied if its MOS is lower than3. Nightingale [9] showed that even a slight degradation in radio conditions,i.e. a packet loss ratio of 3%, is enough to make the Quality of Experi-ence mediocre. Clean Random Access NAL units carry the encoded videodata of I-Frames and represent the largest percentage of the bitstream interms of bit rate. Since the decoding of the whole video sequence is basi-cally reliant on the correct decoding of these LDUs, the packet loss ratio of475.2. Simulation Results and Discussionthese LDUs provides a good indication of how much video content becomesnon-viewable.For BE users, we report the absolute values of the average user through-put and the 10th-percentile of the user throughput CDF. We also report theaverage user throughput in the outer region of every cell. The reason wechoose to report the 10th-percentile instead of the 5th-percentile is becausemuch longer simulations would be required to generate results within a 95%confidence interval. As an example, simulations generating on average 16000users (8000 video users and 8000 BE users respectively) take between 48 to72 hours of run time. In order to generate results where the 95% confidenceintervals of the 5th-percentile of the user throughput are within ±10%, wewould need to generate possibly over 30000 users. This could potentiallylead to simulation run times of over a week, which is highly impractical. Inthis thesis, we will refer to the 10th-percentile of the user throughput CDFas the coverage user throughput. A given BE user’s throughput is calculatedas the ratio of the total volume of the transferred data to the download time.For BE users, the download time is defined as the difference between thetime instant of the last packet correctly received by the user and the timeinstant of the first packet transmitted to the user.5.2 Simulation Results and DiscussionIn this section, we present our simulation results and discuss the main find-ings. We will present our results for video users followed by those for BEusers.485.2. Simulation Results and DiscussionTable 5.2: Offered Load and corresponding Resource UtilizationOffered Load Resource Utilization(Mbps / Sector) (%)5.88 40.0PF 6.27 50.0scheme 6.58 60.06.94 70.05.88 35.4FRA-PF 6.27 41.9scheme 6.58 47.86.94 Results for video usersFor the performance evaluation of video users, we consider two metrics. Thefirst metric that we introduce is the ADT: which is the time a video userspends actively downloading video content. The second metric is the MOSprovided by users about their viewing experience.The 95% confidence intervals for the active download time are within±6% of the reported values. Fig. 5.1 shows the active download times videousers spend downloading video content while they are in the network. Usingthe Proportional Fair scheme, video users spend between 3.5 seconds and8 seconds downloading video content (for offered loads between 5.9 Mbpsper sector and 6.9 Mbps per sector respectively). These numbers can beexplained by the fact that with the Proportional Fair algorithm tries to befair to all users, video and BE alike. Resources end up being shared by allusers. Using our proposed scheme, video users are given higher importanceif their transmission queues carry referenced frames. This is due to thebarrier functions we introduced in our scheduling framework. Therefore495.2. Simulation Results and Discussion5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.223456789Offered Load [Mbps / Sector]Video Active Download Time [s]  PFFRA−PFFigure 5.1: Video users’ active download timeif a base-station is serving both video and BE users, video users will beprioritized over BE users as long as they have referenced frames to receive.Resource allocation is focused on video users first, which results in thembeing served more quickly, as Fig. 5.1 shows. For offered loads between 5.9Mbps per sector and 6.9 Mbps per sector, video users spend between 2.2seconds and 4.2 seconds downloading video content. This is very significantas any time video users do not spend downloading video content means thatthe resources available at that time can be allocated to BE users.Possibly the most important aspect in the performance evaluation ofvideo services is the MOS which reflects the quality of the viewing experiencefrom the users’ perspective. We are going to look into the MOS that userswould give based on the Packet Loss that they experience, which we denote505.2. Simulation Results and Discussion5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.250556065707580859095100Offered Load [Mbps / Sector]Satisfied Video User Percentage [%]  PF − MOS > 4FRA−PF − MOS > 4Figure 5.2: Satisfied Video User Percentageas the satisfied video user percentage. The 95% confidence intervals of thesatisfied video user percentage results are within ±8% of the reported values.It was shown in [9] that the MOS is very sensitive to the Packet Loss Ratio(PLR). The findings in [9] were that for PLRs below 1.5% correspond to aMOS above 4 (perceptible degradation but not annoying). Assuming thata video user’s MOS is only affected by the PLR it experiences, we can statethat the QoE of a video user will be high if the PLR is below 1.5% (i.e.,its MOS will be greater than 4, and the video user will be satisfied). TheQoE will be low if the PLR is higher than 1.5% (i.e., its MOS will be lowerthan 4, and the video user will experience significant degradation). Fig. 5.2shows the results in terms of video user percentage for which the MOS isgreater than 4.Our proposed FRA-PF scheme leads to a higher percentage of satisfied515.2. Simulation Results and Discussionvideo users, which is expected as video users have unconditional priorityover BE users. As can be seen from Fig. 5.2, for offered loads around 5.9Mbps per sector, both PF and FRA-PF schemes are able to satisfy over90% of video users. However the performance of the PF scheme degradesmore quickly as the load increases: for offered loads around 6.8 Mbps persector, the FRA-PF scheme can satisfy over 80% of video users whereas thePF scheme satisfies less than 60% of video users.Another aspect that we look into is the percentage of Clean RandomAccess (CRA) LDUs lost. I-Frames are typically carried inside CRA LDUsand they represent the most significant portion of the bitstream in termsof bits. Because of the way the video compression process is defined inthe H.265/HEVC standard, I-Frames are the frames that are referenced themost throughout a video sequence and the loss of an I-Frame causes errorpropagation within the decoding process at the receiver end. We alignedour settings for the Intra-Period so that two I-Frames are one second apartfrom each other [10].Intuitively, the loss of an I-Frame causes the loss of about one secondof video content to the end user because all subsequent B-Frames referencean I-Frame, directly or indirectly. Those B-Frames could, strictly speaking,still be usable by the decoder to produce a picture. The problem is thatthose B-Frames could potentially be incomplete, i.e. some sections couldbe missing Luminance or Chrominance sample information. The whole ideabehind H.265/HEVC is to use motion compensated prediction in as manyframes as possible. Fig. 5.3 shows the results obtained for CRA LDU lossratio. Since the proposed FRA-PF scheme is able to locate referenced frames525.2. Simulation Results and Discussion5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.2012345678910Offered Load [Mbps / Sector]CRA LDU Loss Ratio [%]  PFFRA−PFFigure 5.3: CRA LDU Loss Ratioand transmit them with higher priority, FRA-PF has a lower CRA LDU lossratio.Let us consider the bitstream of the video sequence FourPeople as anexample. The original video bitstream contains 9 CRA LDUs and 600 LDUsin total. Since we wrap the bitstream around 6 times, this results in a totalof 54 CRA LDUs for a given user. With the PF scheme, the CRA LDUloss ration goes from 1.6% to 9.0% out of the total 54 CRA LDUs as theoffered load changes from 5.9 to 6.9 Mbps per sector. This corresponds toat least 1 LDU or at worst 5 LDUs. For offered loads near 7 Mbps persector, this means that as much as 5 seconds of video content becomes non-viewable because of the loss of CRA LDUs. With the proposed FRA-PFscheme, the CRA LDU loss ratio goes from 0.1% to 1.18% out of the total54 CRA LDUs as the offered load changes from 5.9 to 6.9 Mbps per sector.535.2. Simulation Results and DiscussionThis means that in either case up to 1 LDU is lost. For offered loads near7 Mbps per sector, this means that as much as 1 second of video contentbecomes non-viewable because of the loss of CRA LDUs. This highlightshow the proposed FRA-PF scheme provides the decoder with the referenceframes to facilitate the task of decoding and also how the proposed schemelocates the packets with greater importance for the H.265/HEVC decoder.Providing referenced frames with greater priority helps maintain continuousplayback at the end user and contributes to enhance the viewing experienceof video users. From the user’s perspective, non-continuous video playbackwill always constitute a source of dissatisfaction. Our proposed FRA-PFscheme reduces the loss of packets carrying referenced frames, which willhelp maintain continuous playback.5.2.2 Results for Best Effort usersFor BE users, we report the absolute gains of the average throughput and thecoverage throughput (which we defined in Section 5.1). The 95% confidenceintervals of the average throughput and coverage throughput are within±3%and ±9% respectively of the reported values.The average throughput is plotted as a function of the offered load inFig. 5.4. The offered load values of interest to us are in the range of 5.9to 6.9 Mbps per sector. From Fig. 5.4, it can be seen that the with thePF scheduling scheme, BE users can expect to get throughputs on averagebetween 15 Mbps and 10 Mbps. With our proposed FRA-PF scheme, userscan expect to get throughputs on average between 16 Mbps and 12 Mbps.This is explained by the fact that our proposed FRA-PF scheme serves video545.2. Simulation Results and Discussion5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.210111213141516Offered Load [Mbps / Sector]Throughput [Mbps]  PFFRA−PFFigure 5.4: Average throughput for Best Effort userstraffic more quickly, as shown in Fig. 5.1. As video users are served morequickly, radio resources then become available to BE users. The availabil-ity of more radio resources helps BE users leave the network more quicklyand therefore experience higher throughputs. Put simply: allocating theresources to the right users at the right time will benefit all users. This isshown by the results we have obtained in terms of the Resource Utilizationby the network and the average throughputs users can get on average.Fig. 5.5 shows the coverage throughput results for offered load valuesbetween 5.9 and 6.9 Mbps per sector. As expected, the coverage throughputis much lower compared to the average throughput. In an LTE-A system us-ing the PF scheduling scheme, users can expect to get coverage throughputsbetween 4.9 Mbps and 1.5 Mbps. In an LTE-A system using our proposedFRA-PF scheme, for the same offered load values, users can expect to get555.2. Simulation Results and Discussion5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.21234567Offered Load [Mbps / Sector]Throughput [Mbps]  PFFRA−PFFigure 5.5: Coverage throughput for Best Effort userscoverage throughputs between 6.1 Mbps and 3.6 Mbps. This result is a lotmore significant than the average user throughput we have shown earlier.It shows that 90% of users can expect a throughput of at least 3.6 Mbps,which is more than double the throughput with the PF scheduling scheme.Because we model a BE type of service, this improvement in throughputtranslates into latency reduction since the volume of data to download isfixed. For other services, e.g. Web Browsing, higher throughput can trans-late into noticeably faster loading times and enhanced Quality of Experience.We stated that the 95% confidence intervals of the coverage throughput arewithin ±9% of the reported values, this is due to the fact that the statisticsof users that experience relatively low Signal to Interference and Noise Ratio(SINR) are very sensitive. We model random user arrivals in our simula-565.2. Simulation Results and DiscussionFigure 5.6: Illustration of the outer 10% of the coverage areations, which leads to inter-cell interference that varies with time. For usersexperiencing low SINR, even slight improvements or degradations can havevery significants impacts on the eventual throughput they experience.Finally, we examine the statistics for users that are geographically lo-cated within the area covering the outer 10% of the coverage area, as de-picted in Fig. 5.6, we will call this region the cell-edge region. The areaA of a hexagon is calculated as A = 2√3a2, where a is the apothem of thehexagon. Using a hexagonal network deployment as shown in Fig. 4.1 andknowing that the inter-site distance is equal to 500 meters, we can easilyfind that the apothem size is then 250 meters. The users in the cell-edgeregion are those who lie outside the inner hexagon, i.e. outside the hexagonof apothem a′ ' 237 meters. The results are shown in Fig. 5.7. For offeredloads ranging from 5.9 to 6.9 Mbps per sector, users in the cell-edge regionexperience throughputs ranging from 11.4 to 8 Mbps with the baseline PFscheme. With our proposed FRA-PF scheme, users in the cell-edge regionexperience throughputs ranging from 12.2 to 9.9 Mbps per sector for thesame offered load values. The trend is consistent with those for the averagethroughput and coverage throughput. However it is interesting to note thatthe average throughput of users in the cell edge region is higher than the575.2. Simulation Results and Discussion5.6 5.8 6 6.2 6.4 6.6 6.8 7 7.288.599.51010.51111.51212.5Offered Load [Mbps / Sector]Throughput [Mbps]  PFFRA−PFFigure 5.7: Average BE user throughput in Cell-Edge regioncoverage throughput values reported in Fig. 5.5. This is because we generateusers randomly over time, which leads to inter-cell interference varying overtime. As a result, a user located in the cell-edge region but not interfered byneighbouring cells will still experience reasonably high throughputs, as Fig.5.7 shows. Usual Full Queue simulation methodologies operate in a rangewhere every cell in the network is always transmitting at all times, thereforeevery cell is always interfering users in neighbouring cells at all times. Asa result, only an individual user’s radio conditions will determine whetherhigh throughputs are achievable or not. Users located closer to their servingcell would suffer from lower path loss, this would translate to higher averageSNR and higher throughput. Using Finite Buffer simulation methodologies,this is no longer true due to users arriving randomly in the network and be-585.2. Simulation Results and Discussioning subject to the inter-cell interference that is present during the time theuser is in the network. Of course, path loss always plays a significant role indictating overall performance but this is now tempered by the fact that usersarrive randomly in the network, which affects the inter-cell interference.In order to provide better QoE to all users, resource allocation schemesshould target users that require the lowest amount of resources in order to besatisfied. This will help the system deliver better user experience to all usersin the network. The QoE of all users improves thanks to the departure ofother users and our proposed scheme does that by serving video users faster.This benefits all users in the network and helps provide a more consistentuser experience across the whole network, which is in line with the objectivesof future 5G networks.59Chapter 6Conclusions and FutureWorkThis chapter summarizes the main contributions of the thesis and providessome suggestions regarding for future work.6.1 ContributionsIn this thesis, we addressed the topic of transmitting video content in 4G andbeyond-4G networks by exploiting information about the way H.265/HEVCoperates. Using knowledge of the coding structures, reference picture listsand the process through which the H.265/HEVC encoder transmits thisinformation to the decoder, we proposed a cross-layer scheduling frame-work which allocates resources to video users that need to receive referencedframes.Our performance evaluation of H.265/HEVC video-content delivery wasmade in a mixed-traffic environment using random user arrivals and finite-buffer traffic models. To the best of our knowledge, there is no similar workreported in the literature. Results showed that both video and BE usersbenefit from the proposed scheduling framework. Video users benefit from606.2. Future Workreduced losses on packets carrying referenced frames while BE users benefitfrom improved throughput. The improvement for video users is achieved bytracking referenced frames and focusing resource allocation towards videousers whenever their transmission queues have packets carrying referencedframes in the video sequence. As long as there are such frames in thetransmission queue of a video user our proposed framework prioritizes theseusers and allocate resources to them. This allows video users to downloadvideo content more quickly and allows BE users to access resources morequickly, leave the network more quickly and enjoy higher throughputs onaverage as a result.As we go towards 5G networks, the expectation from cellular networksis that they provide a consistent user experience across the coverage area.Results showed that 90% of BE users can expect to get between 1 Mbpsto 2 Mbps higher throughput using FRA-PF, which can potentially be thedifference between excellent and mediocre in the Quality of Experience theuser is getting. In addition, it was found that BE users in the cell-edgeregion of each cell actually experience much higher throughputs than the10th percentile of the user throughput CDF. This shows that users thatexperience lower throughputs are not necessarily located in the cell-edgeregion but can in fact be much closer to the base-station.6.2 Future WorkSeveral future directions can be pursued, depending on which side of theproblem one wishes to focus on.616.2. Future WorkIf one were to focus on the communications side, one direction for futurework could be to use an air-interface that is actually going to be used in5G systems. In this work we considered the use of a LTE-A air interfacewith some 3GPP Release-12 features such as the Release 12 4-Tx LinearPrecoding. This is because at the time the work was undertaken, 3GPPwas still working on Release 13 and no air-interface had yet been proposedfor 5G systems so we did not have the opportunity to evaluate performancefor such systems. Instead we focused more on performance evaluation usingrealistic traffic models over an up-to-date LTE-A air-interface and look atthe performance metrics to be used in 5G networks.In our performance evaluation, we did not compare our proposed FRA-PF scheme with a scheduling scheme that would strictly prioritize usersrequesting video services over best effort users. It would be interestingto see whether such a scheduling scheme achieves improvements for bothvideo users and best effort users. We also did not consider any admissioncontrol policies in our traffic models, which would regulate traffic arrival inhigh load situations and can have a significant impact on user experience.Another direction for future work could be to look into traffic offloadingschemes. Since 3GPP Release 8, the 3GPP community has been introducingsupport for heterogeneous networks. Smaller base-stations can be deployedin the cell-edge region in order to provide coverage to users with stringentQoS or QoE requirements. For example: macro base-stations can offloadspecific users in the coverage area of small base-stations in order to providebetter QoE to its own users, and therefore provide a more consistent userexperience across the whole network, something that 5G networks will be626.2. Future Workrequired to provide. The more general problem to address is to designscheduling frameworks which will provide the best user experience and atthe same time maximize revenue for carriers.If one were to focus on the video encoding or video compression side,one direction for future work could be the actual evaluation of subjectivequality. No subjective quality testing was performed in our work. Themajor stumbling block that needs to be overcome is to get the referenceimplementation of the H.265/HEVC decoder to produce a viewable videosequence of a bitstream with missing LDUs. The reference decoder imple-mentation is not designed to be robust against any form of packet loss andaborts the decoding process at the slightest error or absence of an LDU. If wecan reconstitute samples of bitstreams with missing LDUs and output thecorresponding video sequence, it would be possible to do subjective qualitytesting and gain insights into how the loss of specific packets impacts theviewing experience. This will give much clearer insights into how packet lossand Quality of Experience are related for video services, and more specifi-cally how much the loss of packets carrying I-Frames hurts the Quality ofExperience.63Bibliography[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data TrafficForecast Update, 2014-2019,” February 2015.[2] ITU-T, Advanced Video Coding for generic audio visual services - Rec-ommendation ITU-T H.264. February 2014.[3] ITU-T, High Efficiency Video Coding - Recommendation ITU-T H.265.April 2013.[4] M. Wien, High Efficiency Video Coding - Coding Tools and Specifica-tions. Springer, May 2014.[5] N.-G. M. Networks, “NGMN 5G White Paper,” February 2015.[6] M. Rugelj, U. Sedlar, M. Volk, J. Sterle, M. Hajdinjak, and A. Kos,“Novel Cross-Layer QoE-Aware Radio Resource Allocation Algorithmsin Multiuser OFDMA Systems,” IEEE Transactions on Communica-tions, September 2014.[7] S. Singh, O. Oyman, A. Papathanassiou, D. Chatterjee, and J. G. An-drews, “Video Capacity and QoE Enhancements over LTE,” IEEE In-ternational Conference on Communications, June 2012.64Bibliography[8] M. Salem, P. Djukic, J. Ma, and M. Hawryluck, “QoE-Aware JointScheduling of Buffered Video on Demand and Best Effort Flows,” IEEEInternational Symposium on Personal, Indoor and Mobile Radio Com-munications, September 2013.[9] J. Nightingale, Q. Wang, C. Grecos, and S. Goma, “The Impact ofNetwork Impairment on Quality of Experience (QoE) in H.265/HEVCVideo Streaming,” IEEE Transactions on Consumer Electronics, May2014.[10] F. Bossen, “Common HM test conditions and software reference con-figuration,” April 2012.[11] G. Sullivan and T. Wiegand, “Rate-distortion optimization for videocompression,” IEEE Signal Processing Magazine, pp. 74–90, November1998.[12] T. Schierl, M. M. Hannuksela, Y.-K. Wang, and S. Wenger, “SystemLayer Integration of High Efficiency Video Coding,” IEEE Transac-tions on Circuits and Systems for Video Technology, pp. 1871–1884,December 2012.[13] Y.-K. Wang, R. Even, T. Kristensen, and R. Jesup, RTP Payload For-mat for H.264 Video. IETF, May 2011.[14] Y.-K. Wang, Y. Sanchez, T. Schierl, S. Wenger, and M. Hannuksela,RTP Payload Format for H.265/HEVC Video. IETF, August 2015.65Bibliography[15] F. Kelly, “Charging and rate control for elastic traffic,” European Trans-actions on Communications, pp. 33–37, 1997.[16] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming.Springer, 3rd ed., 2008.[17] P. A. Hosein, “QoS Control for WCDMA High Speed Packet Data,”IEEE International Workshop on Mobile and Wireless CommunicationsNetwork, 2002.[18] R. Srinivasan, J. Zhuang, L. Jalloul, R. Novak, and J. Park, “IEEE802.16m Evaluation Methodology Document (EMD),” July 2008.[19] “3GPP TR 36.814 v9.0.0 - Technical Specification Group Radio Ac-cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)- Further advancements for E-UTRA physical layer aspects,” March2010.[20] F. Khan, LTE for 4G Mobile Broadband. Cambridge University Press,2009.[21] “HM 14.0, HEVC Test Model Reference Implementation.” https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/. Accessed: 2014-09-30.[22] “IMTAphy, LTE/LTE-Advanced system level simulator.” http://www.lkn.ei.tum.de/personen/jan/imtaphy/index.php. Accessed: 2014-05-24.66Bibliography[23] “openWNS, open Wireless Network Simulator, open source systemlevel simulation platform for performance evaluation and comparisonof wireless and multi-cellular mobile communication systems.” https://launchpad.net/openwns. Accessed: 2014-05-24.[24] “3GPP TR 25.814 v7.1.0 - Technical Specification Group Radio AccessNetwork; Physical layer aspects for evolved Universal Terrestrial RadioAccess (UTRA),” December 2006.[25] ITU-R, “Guidelines for evaluation of radio interface technologies forIMT-Advanced,” December 2009.[26] “3GPP TR 25.996 v9.0.0 - Spatial channel model for Multiple InputMultiple Output (MIMO) simulations,” December 2009.[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communications.Cambridge University Press, March 2010.[28] “3GPP TS 36.213 v12.2.0 - Technical Specification Group Radio Ac-cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)- Physical Layer Procedures,” June 2014.[29] “3GPP TR 36.829 v11.1.0 - Technical Specification Group Radio AccessNetwork - Enhanced performance requirement for LTE User Equipment(UE),” December 2012.[30] A. Roessler, J. Schlienz, S. Merkel, and M. Kottkamp, “LTE-Advanced(3GPP Rel.12) Technology Introduction - White Paper,” June 2014.67Bibliography[31] “3GPP TS 36.211 v12.2.0 - Technical Specification Group Radio Ac-cess Network - Evolved Universal Terrestrial Radio Access (E-UTRA)- Physical channels and modulation,” June 2014.[32] J. Olmos, A. Serra, S. Ruiz, M. Garcia-Lozano, and D. Gonzalez, “Ex-ponential Effective SIR Metric for LTE Downlink,” IEEE InternationalSymposium on Personal, Indoor and Mobile Radio Communications,September 2009.[33] W. Lei, T. Shiauhe, and M. Almgren, “A fading-insensitive performancemetric for a unified link quality model,” IEEE Wireless Communica-tions and Networking Conference, April 2006.68


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items