UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Mobile edge cloud : computation scheduling and caching Tanzil, S M Shahrear 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2018_september_tanzil_smshahrear.pdf [ 4.61MB ]
JSON: 24-1.0367783.json
JSON-LD: 24-1.0367783-ld.json
RDF/XML (Pretty): 24-1.0367783-rdf.xml
RDF/JSON: 24-1.0367783-rdf.json
Turtle: 24-1.0367783-turtle.txt
N-Triples: 24-1.0367783-rdf-ntriples.txt
Original Record: 24-1.0367783-source.json
Full Text

Full Text

Mobile Edge Cloud: Computation Scheduling andCachingbyS M Shahrear TanzilMASc. in Electrical Engineering, The University of British Columbia -Okanagan, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical & Computer Engineering)The University of British Columbia(Vancouver)May 2018c© S M Shahrear Tanzil, 2018The following individuals certify that they have read, and recommend to theFaculty of Graduate and Postdoctoral Studies for acceptance, the dissertation en-titled:Mobile Edge Cloud: Computation Scheduling and Cachingsubmitted by S M Shahrear Tanzil in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Electrical & Computer EngineeringExamining Committee:Dr. Vikram Krishnamurthy, Electrical & Computer EngineeringSupervisorDr. Lutz Lampe, Electrical & Computer EngineeringSupervisory Committee MemberDr. Z. Jane Wang, Electrical & Computer EngineeringSupervisory Committee MemberDr. Rabab Ward, Biomedical EngineeringUniversity ExaminerDr. Steven Shechter, Business AdministrationUniversity ExamineriiAbstractMobile edge cloud has been proposed to accommodate computational intensiveapplication on mobile devices without augmenting their computational capacitiesand to combat with the growing traffic volume in the network. The main conceptof mobile edge cloud is to bring computation and storage near to the end users,serving users’ requests locally, and reducing wide area network latency. Althoughmobile edge cloud improves network performance and users’ quality of experi-ence, it brings several challenges due to possessing limited resources. The unifiedfocus of the thesis is to design mechanisms for mobile edge cloud to maximallyexploiting the computation and storage resources at the edge of the network tominimize network traffic and latency.In the first part of the thesis, we design a distributed computational resourcesharing mechanism where femtocell access points (FAPs) share their resourceswith neighbouring FAPs and form local clouds. The aim of forming such localfemto-clouds is to serve computational requests locally, reducing the data transferdelay and improving users’ quality of experience. The resource sharing problemis formulated as an optimization problem and a myopic procedure is presentedthat enables FAPs to collaboratively find its solution in a distributed fashion.In the second and third part of the thesis, we focus on designing caching mech-anisms for mobile edge network. It has been illustrated that almost 60% of thedata traffic results from the asynchronous requests for some popular content. Bycaching those few popular content at the edge of the network, demand for the samecontent can be served locally, resulting in a reduction in the data traffic volume andiiidownloading delay in the network. In the second part of the thesis, we constructa caching scheme that accounts for content popularity prediction and propertiesof the network (e.g. cache size, bandwidth, and network topology). The cachingscheme is further extended in the final part of the thesis that includes content pop-ularity prediction errors and routing mechanism in the caching decision. For thecaching schemes mixed-integer linear programming is used to compute where tocache the content in the network to minimize content downloading delay.ivLay SummaryMobile edge cloud has been proposed that brings computation and storage re-sources near to the end users to serve users’ requests locally. The unified focusof the thesis is to design mechanisms for mobile edge cloud, maximally exploit-ing the computation and storage resources at the edge of the network to minimizenetwork traffic and latency.At first, we design a distributed computational resource sharing mechanismwhere edge nodes share their resources with neighbouring nodes and form localclouds to reduce latency.Then we design caching mechanisms for mobile edge network to minimizecontent downloading delay. The goal of these mechanisms is to cache popularcontent at the edge of the network, serving maximum requests locally to reducedownloading delay and traffic volume in the network. The presented cachingschemes account for content popularity prediction, properties of the network, pre-diction errors, and routing mechanism in the caching decision.vPrefaceThe work presented in the thesis is based on the research works performed inStatistical Signal Processing Laboratory at the University of British Columbia-Vancouver. All the major contributions of the thesis including concept, litera-ture review, problem formulation, analysis, and simulation were conducted by theauthor with assistance from Prof. Vikram Krishnamurthy. Dr. Omid NamvarGharehshiran is the co-author of chapter 2: section 2.3 who helped the author toformulate the problem using game theory. The results presented in this thesis re-lated to machine learning (Chapter 3: section 3.3 and section 3.4.2; and Chapter 4:section 4.4 and section 4.5.2) were performed by Dr. William Hoiles. Dr. WilliamHoiles also helped the author to write reviewers response letter. A detailed list ofpublications associated with different chapters of this thesis is provided below.• The work of Chapter 2 has appeared in the following publications:– Tanzil, S., Gharehshiran, O.N., and Krishnamurthy, V. (2016) A Dis-tributed Coalition Game Approach to Femto-Cloud Formation. IEEETransactions on Cloud Computing– Tanzil, S., Gharehshiran, O.N., and Krishnamurthy, V. (2015) Femto-cloud formation: A coalitional game-theoretic approach. In Proceed-ings of the IEEE GLOBECOM• The work of Chapter 3 has been appeared in the following publication andfiled patent:vi– Tanzil, S., Hoiles, W., Krishnamurthy, V. (2017) Adaptive Scheme forCaching YouTube Content in a Cellular Network: A Machine Learn-ing Approach. IEEE Access– Hoiles, W., Tanzil, S., Duan Y., Krishnamurthy, V., Ngoc Dung, andZhang, H. (2016) Systems and methods for caching (US patent filed)• The work of Chapter 4 has been submitted to a peer-reviewed journal– Hoiles, W., Tanzil, S., Krishnamurthy, V. (2017) Risk-Averse CachingPolicies for YouTube Content in Femtocell Networks using DensityForecasting.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiList of Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Mobile Edge Computing . . . . . . . . . . . . . . . . . . 21.1.2 Mobile Edge Caching . . . . . . . . . . . . . . . . . . . . 81.2 Main Contributions of the Thesis . . . . . . . . . . . . . . . . . . 111.2.1 A Distributed Coalition Game Approach to Femto-CloudFormation . . . . . . . . . . . . . . . . . . . . . . . . . . 12viii1.2.2 Adaptive Scheme for Caching Content in a Mobile EdgeNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.2.3 Risk-Averse Caching Scheme for Heterogeneous Networks 191.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 222 A Distributed Coalition Game Approach to Femto-Cloud Formation 242.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Formulation of the Femto-Cloud Formation Problem . . . . . . . 262.2.1 Local Femto-Clouds and Their Utility . . . . . . . . . . . 272.2.2 Optimization of the Femto-clouds with FAP Incentives . . 322.3 Distributed Femto-Cloud Formation and Convergence to the Core 342.3.1 Distributed Femto-Cloud Formation Algorithm . . . . . . 342.3.2 Implementation Considerations . . . . . . . . . . . . . . . 372.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.1 Object Recognition Tasks . . . . . . . . . . . . . . . . . . 392.4.2 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 412.4.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . 422.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Adaptive Scheme for Caching Content in a Mobile Edge Network . 533.1 System Model and Problem Formulation . . . . . . . . . . . . . . 543.2 Content and Network Aware Adaptive Caching Scheme for Cellu-lar Base Stations . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.2.1 Mixed-Integer Linear Program Formulation . . . . . . . . 573.2.2 Implementation Considerations . . . . . . . . . . . . . . . 603.3 Extreme Learning Machine (ELM) for Popularity Prediction . . . 633.3.1 Predicting Content Popularity with Extreme Learning Ma-chines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3.2 Feature Construction for Popularity Prediction . . . . . . . 653.3.3 Optimizing the Number of Neurons in the Extreme Learn-ing Machine . . . . . . . . . . . . . . . . . . . . . . . . . 67ix3.3.4 Stochastic Feature Selection . . . . . . . . . . . . . . . . 683.4 Numerical Example of Content and Network Aware Adaptive Cachingusing Real-World YouTube Data . . . . . . . . . . . . . . . . . . 703.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . 713.4.2 Performance of Extreme Learning Machine for Caching . 733.4.3 Performance of the Content and Network Aware CachingScheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 804 Risk-Averse Caching Scheme for Heterogeneous Networks . . . . . 824.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.2 Dynamic Caching Schemes . . . . . . . . . . . . . . . . . . . . . 884.3 Risk-Neutral and Risk-Averse Static Caching Schemes . . . . . . 914.3.1 Risk-Neutral (RN) Static Caching Scheme . . . . . . . . . 924.3.2 Risk-Neutral and Network-Aware (RNNA) Static CachingScheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.3.3 Conditional Value-at-Risk (CVaR) and Content RetrievalDelay Minimization . . . . . . . . . . . . . . . . . . . . . 964.3.4 Risk-Averse (RA) Static Caching Scheme . . . . . . . . . 994.3.5 Risk-Averse and Network-Aware (RANA) Caching Scheme 1014.4 Content Request Cumulative Distribution Function Forecasting . . 1024.4.1 Content Group Association Classifier . . . . . . . . . . . 1034.4.2 Risk-Averse Feed-foward Neural Network for PredictingContent Requests . . . . . . . . . . . . . . . . . . . . . . 1044.4.3 Conformal Prediction Algorithm for Content Requests . . 1094.5 Numerical Evaluation of the Conformal Prediction Algorithm andCoherent Risk Minimization Caching Schemes for YouTube Content1114.5.1 Network Parameters and YouTube Dataset . . . . . . . . . 1114.5.2 Conformal Prediction Algorithm for YouTube Content . . 1144.5.3 Selection of the Confidence Level α for Maximum Con-tent Retrieval Delay Guarantees . . . . . . . . . . . . . . 118x4.5.4 Performance of the Risk-Neutral and Risk-Aware CachingSchemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 1235 Conclusions and Future Research Directions . . . . . . . . . . . . . 1255.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.2 Future Research Problems . . . . . . . . . . . . . . . . . . . . . 1275.2.1 Frame Allocation Mechanism for Edge Cloud AssistedMobile Gaming . . . . . . . . . . . . . . . . . . . . . . . 1285.2.2 Collaborative Learning for Edge Caching . . . . . . . . . 1305.2.3 Joint Computation and Caching for Adaptive Bit Rate VideoStreaming . . . . . . . . . . . . . . . . . . . . . . . . . . 131Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133xiList of TablesTable 1.1 Comparison of fog computing, cloudlet, femto-clouds /MEC [1] 5Table 2.1 Notations and Terminology . . . . . . . . . . . . . . . . . . . 28Table 2.2 Simulation setup: LTE system parameters in NS-3 . . . . . . . 44Table 2.3 Femto-cloud coalition structures in heuristic schemes . . . . . 45Table 2.4 Simulation setup: FAP parameters in the numerical example . . 45Table 2.5 Femto-clouds coalition structures in Example 1 . . . . . . . . . 49Table 2.6 Femto-clouds coalition structures in Example 2 . . . . . . . . . 49Table 3.1 Glossary of Parameters . . . . . . . . . . . . . . . . . . . . . 54Table 3.2 Number of files in each YouTube Category in the CollectedDataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Table 3.3 ELM Performance Comparison: TP (true positive), TN (truenegative), and training times . . . . . . . . . . . . . . . . . . . 77Table 4.1 Notation for Risk-Averse Caching . . . . . . . . . . . . . . . . 84Table 4.2 Group Classification and Neuron Number Selection . . . . . . 116xiiList of FiguresFigure 1.1 A typical mobile cloud computing architecture . . . . . . . . 2Figure 1.2 A typical mobile edge computing architecture . . . . . . . . . 6Figure 1.3 A typical mobile edge caching architecture . . . . . . . . . . 10Figure 1.4 A schematic view of the contributions of the thesis. . . . . . . 12Figure 1.5 Schematic of the caching scheme . . . . . . . . . . . . . . . . 19Figure 1.6 A schematic view of the remainder of the thesis. . . . . . . . . 23Figure 2.1 A typical femto-cloud architecture . . . . . . . . . . . . . . . 25Figure 2.2 FAPs and FCMs locations inside the building . . . . . . . . . 43Figure 2.3 Computational capacity of FAP-1 vs. average data transferdelay in the femto-clouds . . . . . . . . . . . . . . . . . . . . 47Figure 2.4 Computational capacity of FAP-1 vs. femto-cloud incentive . . 48Figure 2.5 User arrival rate at FAP-1 vs. femto-cloud incentive . . . . . . 50Figure 2.6 Computational capacity of FAP-1 vs. average data transferdelay in the femto-clouds . . . . . . . . . . . . . . . . . . . . 51Figure 2.7 Computational capacity of FAP-1 vs. femto-cloud incentive . . 52Figure 3.1 A typical network architecture . . . . . . . . . . . . . . . . . 55Figure 3.2 A schematic of the adaptive caching scheme . . . . . . . . . . 61Figure 3.3 A schematic of the Segmented Least Recently Used (S3LRU)cache replacement scheme . . . . . . . . . . . . . . . . . . . 62Figure 3.4 Schematic of the network . . . . . . . . . . . . . . . . . . . . 73Figure 3.5 Performance of the ELM . . . . . . . . . . . . . . . . . . . . 74xiiiFigure 3.6 Performance of the feature selection Algorithm . . . . . . . . 75Figure 3.7 Schematic of the Extreme Learning Machine for estimatingthe viewcount . . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 3.8 Viewcount on day 1 . . . . . . . . . . . . . . . . . . . . . . . 76Figure 3.9 Viewcount on day 4 . . . . . . . . . . . . . . . . . . . . . . . 77Figure 3.10 Cumulative average content downloading delay vs. simulationtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Figure 3.11 Cumulative average cache hit ratio in the network vs. simula-tion time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 4.1 Schematic of an LTE wireless network . . . . . . . . . . . . . 86Figure 4.2 Schematic of the interaction between static and dynamic cachingschemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Figure 4.3 A schematic illustration of the risk-neutral (RN), risk-neutraland network-aware (RNNA), risk-averse (RA) and the risk-averse and network-aware (RANA) caching schemes . . . . . 92Figure 4.4 Schematic of the conformal prediction algorithm . . . . . . . 110Figure 4.5 Schematic of the network . . . . . . . . . . . . . . . . . . . . 115Figure 4.6 Empirical cumulative distribution function of the error ε(g) in(4.22) for the groups g ∈ G . The gray dots indicate the empir-ical cumulative distribution function F̂E|g(ε|g), and the blackline indicates the fitted generalized extreme value distribution. 117Figure 4.7 Group association probability P(g|x f ) . . . . . . . . . . . . . 119Figure 4.8 Conformal prediction of the number of content requests . . . . 120Figure 4.9 Quantile-quantile plot for the empirical cumulative distribu-tion function F̂Y (y f |x f ) and the YouTube content requests . . . 121Figure 4.10 Cumulative distribution function of the content retrieval de-lay FD(d) using the RNNA and RANA caching schemes for aconfidence level α = 0.9,0.99 . . . . . . . . . . . . . . . . . 122xivFigure 4.11 The cumulative distribution function FD(d) of the content re-trieval delay for the RN, RNNA, RA, RANA, RNWR cachingschemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Figure 5.1 A schematic view of the future research challenges. . . . . . . 128xvList of GlossaryBS Base StationCDN Content Delivery NetworkCM Cache ManagerCVaR Conditional Value at RiskELM Extreme Learning MachineFAP Femtocell Access PointFCM Femto Cloud ManagerLRU Least Recently UsedLTE Long-Term EvolutionMCC Mobile Cloud ComputingMEC Mobile Edge CloudMILP Mixed Integer Linear ProgrammingNS-3 Network Simulation-3QoE Quality of ExperienceSAP Smallcell Access PointSGW Smallcell GatewaySLRU Segmented Least Recently UsedUE User EquipmentWAN Wide Area NetworkxviAcknowledgmentsI would like to express my sincerest gratitude to my supervisor, Prof. VikramKrishnamurthy for his guidance and persistent help throughout the research. Thisthesis would not have been possible without his support. I am greatly indebtedto Prof. Vikram Krishnamurthy for his continuous encouragement and valuablesuggestions.I would like to extend my sincerest appreciation to Dr. William Hoiles forhis valuable suggestions and always encouraging me to solve new challenges. Iam also thankful to Dr. Omid Namvar Gharehshiran for his valuable insights informulating research problem.I would like to thank all the members of my research group for their encour-agement. Last, but not least, I would like to express my gratefulness to my family.xviiChapter 1Introduction1.1 OverviewThe gaining popularity of mobile devices for ubiquitous use to enjoy a varietyof applications pose new challenges to the wireless network operator. On theone hand, some of these applications are time critical and traditional centralizedcloud-based solution that introduces high wide area network latency is not suit-able. On the other hand, applications related to video streaming increase trafficvolume in the network. Mobile edge cloud has recently been proposed to supportcomputational and traffic intensive applications on mobile devices while satisfy-ing latency constraint associated with the applications and reducing traffic volumein the network. The idea of mobile edge cloud is to bring cloud functionalities i.e.,computation and storage near to the end users and working as an intermediate plat-form between users and centralized cloud. In mobile edge cloud architecture, aportion of the users’ requests are served by the local cloud and forwards remain-ing portion of the requests to a cloud server. As a result, the amount of data trafficsent to the cloud server is reduced and users’ quality of experience is improvedby reducing wide area network latency. Deploying computational resources at theedge of the network is known as mobile edge computing while bringing storageresources at the edge of the network is referred to as mobile edge caching. In this1Base StationMobile userAccess PointCloudCore NetworkInternetFigure 1.1: A typical mobile cloud computing architecture. Access pointssuch as small cell, femtocell and base stations are connected to thecore network via backhaul link. Core network communicates with acloud server over the Internet.thesis, we study both mobile edge computing and mobile edge caching. In thenext section, we describe the state-of-the-art and available architectures on mo-bile edge computing and mobile edge caching; and their advantages over otherestablished network architectures. Then we explain three research challenges inmobile edge cloud, their motivations and contributions of the thesis. The chapterconcludes providing a brief description of the thesis organization.1.1.1 Mobile Edge ComputingIn this section, we describe the emergence of mobile edge computing and thebenefits of mobile edge computing over other architectures.2Emergence of Mobile Edge ComputingCloud computing has gained a lot of attentions from both industry and academiawhich generally incorporates infrastructure, platform, and software as services [2,3]. Cloud service providers rent their resources as a utility–like electricity andusers access to those resources “pay-as-you-go” basis through the Internet. Incloud computing, users enjoy powerful computing platform with software de-ployed by the cloud providers and store their data in the cloud storage insteadof personal devices. On demand service, resource pooling, rapid elasticity, andlower capital investments are the main advantages of cloud computing [4].On the other hand, mobile devices are becoming an integral part of human lifeand are used for a variety of applications e.g., entertainment, games, on-line bank-ing, social networking, travel, and news. The resource constraints of the mobiledevice e.g., battery life, storage, and computations, however, hinders mobile usersto enjoy various useful but computational intensive applications like health careand personal assistances [5]. One such solution to enjoy complicated applicationsvia the mobile device, yet operating within available resources and without in-creasing mobile device cost; is mobile cloud computing (MCC) [6, 7]. The ideaof MCC is to augment capacities of mobile devices by offloading computations toa remote cloud server over Internet. In MCC, mobile devices act as a sensory in-put, continuously send gathered data to the connected access point which forwardsdata to a cloud server via core network over the Internet. The cloud server per-forms all the computations and sends back the required information to the mobiledevice. A schematic of MCC is illustrated in Fig.1.1.A major limitation in MCC is latency associated with data transferring to thecloud server for computations. Another bottleneck in MCC is that it requireshigh data rate backhaul connection between access point to the core network forfaster data transfer. To alleviate such limitations in MCC, mobile edge computingarchitecture has recently been proposed [8].3What is mobile edge computing?Mobile edge computing (MEC) is a new architecture that enables cloud com-puting functionalities at the edge of the network. The main idea of the mobileedge computing is to bring the cloud computing resources near to the end usersand serving user requests locally. As a result, users’ receive desired output withlow latency and fewer requests are forwarded to the cloud server, resulting in areduction in the network traffic. European Telecommunications Standards Insti-tution (ETSI) proposed mobile edge computing architecture where they assumedthat cloud functionality such as computation and storage will be integrated intothe edge network devices such as macro base stations, eNodeB, small cell accesspoints, and radio network controller [9].Similar to the mobile edge computing, several other paradigms are available inthe literature such as fog computing, cloudlet, and femto-clouds. These paradigmshave some common ground but they differ in their usages. The term fog is used byCisco as a metaphor instead of cloud wherein fog is closer to the end devices thanthe cloud [10]. Fog computing is employed between end devices especially inter-net of thing (IoT) devices and cloud computing platform. Fog nodes are deployedin different network tiers and perform computations associated with the most time-sensitive applications in a distributed fashion and migrate time-insensitive tasks toa cloud server. The main difference between fog computing and mobile edge com-puting is that fog computing aims to bring cloud functionality at the lower layercloser to the proximity of the IoT devices where mobile edge computing specif-ically designs to enhance cloud functionality at the edge of the wireless networksuch as small cell access points and base stations.Satyanarayanan and his colleagues proposed the concept of cloudlet [11]: Atrusted local cloud comprised of multi-core computers that are connected to overthe Internet and is available for use within the proximity of mobile users. Mobiledevices use Wi-Fi network to offload computational tasks to the cloudlet, whichsaves them a considerable amount of energy as compared to offloading over the3G/Long Term Evolution (LTE) cellular network to a remote cloud [7, 12] in4mobile cloud computing. This prolongs the battery lifetime of mobile devicesand, by reducing network latency, it also improves user’s quality of experience(QoE) [13]. It is envisaged that cloudlet will be deployed like Wi-Fi hotspot andwill act as an intermediate layer between mobile device and cloud server.In cloudlet, Wi-Fi network is used for communication between mobile deviceand cloudlet. As a result, mobile devices require switching from the cellular net-work to Wi-Fi network whenever they are accessing to the cloudlet. Handoverfrom the cellular network to the Wi-Fi network is one of the main limitationsof the cloudlet. In addition, Wi-Fi does not provide any guarantee on the qual-ity of service (QoS) and the range of the Wi-Fi coverage area is smaller thantypical cellular base stations and access points. To avoid the limitations in thecloudlet, a European project named TROPIC, introduced the idea of small cellcloud namely femto-clouds. The main idea of femto-clouds is to enhance thesmall cell access points such as femtocell, picocell access points with computa-tional and storage resources. The advantage of the femto-clouds over the cloudletis that both small cell access points and mobile devices work under the same com-munication standard. Throughout the thesis, we use mobile edge computing andfemto-clouds interchangeably as they both consider cloud resources at the wire-less access points. The main difference between femto-clouds and mobile edgecomputing is that femto-clouds considers resource sharing among neighbour ac-cess points while current mobile edge computing architecture considers that thereis no cooperation among access points for computation. A schematic of mobileedge computing architecture is illustrated in Fig 1.2.A comparison among fog computing, cloudlet, and femto-clouds are providedin Table 1.1 [1].Table 1.1: Comparison of fog computing, cloudlet, femto-clouds /MEC [1]fog computing femto-clouds /MEC cloudletComputing devices IoT devices, routers, sensors access points, base stations dedicated computersAccess medium Wi-Fi, bluetooth, cellular networks cellular networks Wi-FiSignalling execution dedicated nodes are used direct direct5Base StationMobile userAccess PointCloudCore NetworkInternetComputing resourceFigure 1.2: A typical mobile edge computing architecture. Access pointssuch as small cell, femtocell and base stations are equipped with com-putation resources and connected to the core network via backhaullink. Core network connects to a cloud server over the Internet. As canbe seen, computation requests can be served locally, resulting lowernumber of requests send to the core network, and reduces networktraffic.Advantages of the mobile edge computingThere are several benefits of mobile edge computing over mobile cloud computingsuch as low latency and reduced network traffic. In this section, we explain someof the advantages of the mobile edge computing which are as follows.Latency: One of the major limitations of the mobile cloud computing is that itcosts a significant amount of latency to transfer data associated with the tasks mi-gration to a cloud server. The data transferring latency in MCC mainly originatesfrom three sources that are: latency between mobile device to the connected accesspoint, access point to the core network, and core network to cloud server. Datatransferring latency between the mobile device and the connected access point6depends on several factors such as wireless channels’ quality, path loss, numberof users’ sharing the bandwidth, and interference. Low data rate backhaul linkcapacity is the main reason for the latency to transfer data from access point to thecore network. Finally, the wide area network latency between the core networkand the cloud server depends on the distance and the number of hops betweenthem. Once the data associated with the tasks are transferred to the cloud server,the cloud server performs all the required computations and sends back the finalresults to the mobile devices via core network and access points.On the other hand, in mobile edge computing, a portion (or entire) of thetasks are processed in the edge cloud. As a result, a significant amount of latencyarises from transferring data from an access point to a cloud server via core net-work can be reduced. A recent field trial operated by China Telecom showed thatlatency can be reduced to 60− 90% by deploying mobile edge computing [14].[15] demonstrates that mobile edge computing reduces up to 88% latency for aug-mented reality application compared to mobile cloud computing.Network traffic: Mobile edge computing enables network devices to performcomputations on behalf of the mobile devices and serves computational requestslocally. As a result, a fewer number of computational requests are migrated to thecloud server which leads to reduction in the network traffic.High data rate backhaul link: High data rate backhaul link is one of the key re-quirements for mobile cloud computing to meet the stringent latency requirementsfor real-time applications such as augmented reality and video processing. How-ever, in mobile edge computing, the network devices perform computation of thetime-sensitive applications, thereby the requirement for high data rate backhaullink is lessened.Energy: In mobile cloud computing architecture, mobile devices offload compu-tational tasks to the could server via an access point, and a core network, incurring7a significant amount of latency. To satisfy stringent latency requirements in thereal-time processing applications, mobile devices perform a portion of the taskswhile offloading the remaining portion. As a result, real-time applications con-sume battery power of the mobile devices. On the other hand, in mobile edgecomputing, the latency is lower which enables the whole tasks or a higher portionof the tasks to be offloaded to the edge cloud. Thereby, energy consumption ofthe mobile devices is reduced in mobile edge computing, resulting in prolongingbattery life-time. Gao at el. reveal that mobile edge computing saves 42% energyconsumption compared to the mobile cloud computing [16].1.1.2 Mobile Edge CachingThe advancement of the mobile devices has facilitated to access multimedia ap-plications such as video on demand streaming and live chatting anytime and any-where [17]. As a result, the wireless network is experiencing substantial growthof data traffic which will be approximate 30.6 exabytes (1018) per month in 2020,an eightfold increase over 2015 [18]. The recent addition of high-density videoformats such as 4K video amplifies the growth of data traffic even further [19].It is challenging for the wireless operators such as mobile operators and Internetservice providers to keep up with the massive growth of traffic while ensuringusers’ QoE requirements and keeping the data transmission cost low. Video con-tent caching at the edge of the network such as caching at the base stations (BSs),at the small cell access points, at the edge routers, and at the wireless infostationsis considered to be an attractive solution to address the demand for the data traffic.In this section, we explain what is mobile edge caching and the main advantagesof the mobile edge caching.What is mobile edge caching?The usage of the mobile device to enjoy video on demand service increases thetraffic growth over the wireless networks. In the traditional wireless network,video content request from the mobile users need to be fetched from a content8delivery network which is outside of the network and far away from the users.In particular, mobile users first send the video content request to the connectedaccess points, which is then forwards it to the core network. The core networkredirects the request to a content delivery network which has the requested videocontent. The requested content is then sent back to the users via core networkand access point. As a result, a popular video that receives multiple requestsasynchronously will be fetched from the content delivery network multiple times,thereby increases the network traffic. In addition, high data rate backhual linksare required to meet the stringent latency constraints to avoid jerky video.To mitigate the limitations in the current network architecture, mobile edgecaching has recently been proposed [20–22]. The concept of mobile edge cachingis to bring video content closer to the end users and serving video content requestslocally. The cost of hardware such as solid-state devices has become cheaper in re-cent years which makes edge caching a cost effective solution instead of purchas-ing additional bandwidth for backhaul links [21, 22] for handling the immensepressure of data traffic in the network. In addition, research works on data traffichave revealed that the main source of data traffic is video traffic. In fact, 60% ofthe mobile data traffic are from video traffic [23] where few popular video con-tent are requested several times asynchronously accounting for the majority of thetraffic [24]. By caching those few popular video content at the edge of the net-work, repetitive requests for the same content can be served locally. This enablescontent delivery faster by shortening the communication distance between contentand users. Caching at the edge of the network also alleviates the burden on thebackhaul links and reduces inter-network traffic since the requested content canbe served locally without the intervention of the core network. A schematic viewof the mobile edge caching is illustrated in Fig.1.3.Several paradigms are available for mobile edge caching such as femtocaching,cache enabled base stations and content-centric network. The main difference iswhere to put the caching resources in the network. Some of the research worksadvocate that caching in the macro base stations is considered to be a promising9Base StationMobile userAccess PointContent DeliveryNetworkCore NetworkInternetCacheCacheCacheCacheFigure 1.3: A typical mobile edge caching architecture. Access points suchas small cell, femtocell and base stations are equipped with storageresources and connected to the core network via backhaul link. Corenetwork is connected to a content delivery network over the Internet.As can be seen, content requests can be served locally, resulting lowernumber of requests forwarded to the core network, and reduces net-work traffic.solution since coverage area of the macro base station is large and it can serve alot of requests [25]. However, to increase the coverage and connectivity of thenext generation heterogeneous network, small cell access points has become anintegral part of the network. The main bottleneck in small cell network is the re-quirement for the high data rate backhaul links. Therefore, several research workspropose that caching in the small cell network namely, femtocaching is beneficialfor the network which reduces network traffic and the requirements for the highdata rate backhual links [26, 27].On the other hand, millions of routers and switches are deployed in the net-work. Equipping routers and switches with caching resources enable content re-quests to be served locally. The content-centric network is a network layer proto-10col specially designed to handle in-network caching such as caching in routers andswitches [28, 29]. In this thesis, we consider edge caching in a cellular networksuch in base stations and small cell access points.Benefits of mobile edge cachingThe main benefits of mobile edge caching are as follows.Latency: In mobile edge caching architecture, content are served locally result-ing a shorter travelling distance for the content compared to content served fromthe content delivery network. As a result a significant amount of latency is re-duced.Network traffic: In mobile edge caching, cache enabled network devices servea significant amount of content request locally and thereby a fewer requests areforwarded to the content delivery network. This leads to a reduction in the networktraffic. Authors of [30] shows that a 22% of backhual link traffic can be reducedusing edge caching which can be extended to a higher gain by deploying higherstorage size.High data rate backhaul link: As already mentioned, video on demand applica-tions require high data rate backhaul link to avoid jerky video. One way to meetthis requirement is to deploy high data rate backhaul link. Another solution ismobile edge caching wherein video content are served locally.1.2 Main Contributions of the ThesisIn this section, a brief description of the major contributions of the thesis are pro-vided. These contributions fall into two category: mobile edge computation andmobile edge caching. A schematic view of the contributions of the thesis is illus-trated in Fig.1.4. A more detailed description of the contributions are provided in11separate chapters.Mobile Edge CloudComputation Scheduling CachingA Distributed CoalitionGame Approach toFemto-Cloud FormationAdaptive Scheme forCaching Content ina Mobile Edge NetworkRisk-Averse Caching Schemefor Heterogeneous NetworksFigure 1.4: A schematic view of the contributions of the thesis.1.2.1 A Distributed Coalition Game Approach toFemto-Cloud FormationThe aim of the work is to efficiently utilize computational resources in the edgecloud. A detailed description of the work is provided in Chapter 2. Motivationand contributions of this work are as follows.MotivationTo increase the semantic richness of sensed data in personal assistant applicationssuch as Apple Siri, Google Now, and Microsoft’s Cortana, high data rate sensorssuch as vision-based sensors are required [31]. Analyzing real-time video andimages captured by such sensors, however, requires intensive computational ca-pacity, which makes it costly (in terms of energy consumption) to be processed inmobile devices. Therefore, offloading-based mechanisms have been developed tosupport vision-based functionalities [11, 12, 31].One such solution is MCC [5] that augments the computational capacity ofmobile devices by offloading computation and storage to a remote cloud. Theinteractive response essential for real-time video/image processing is, however,12limited by two major bottlenecks in MCC, namely, energy consumption and la-tency [11, 32–34]. Therefore, the concept of mobile edge computing such as fogcomputing, cloudlet, and femto-clouds have been proposed as explained in Sec-tion 1.1.1.The main idea in this work is to allow edge network devices such as femtocellaccess points (FAPs) augmented with computational resources to cooperate witheach other and form local computational pools, namely, femto-clouds. FAPs sharethe computational resources exceeding their demands in femto-clouds. Therefore,by maximally exploiting FAPs’ local resources, such femto-clouds reduce latency1and, hence, improve end-user QoE. We assume that FAPs are deployed by differ-ent residential users. To motivate FAP owners to share their excess resources, itis natural to assume an incentive mechanism. The maximal use of FAP resourcesthen translates into both lower handling latency and higher incentives to FAP own-ers. The question that this work focuses on is then: How should FAPs decide onformation of such femto-clouds in a distributed fashion?The data transfer delay and limited computational capacity of FAPs imposestringent constraints that naturally prohibit formation of the grand coalition towhich all FAPs join, namely, grand femto-cloud. Since offloading tasks to otherFAPs within a femto-cloud incurs delay, it is not beneficial to collaborate withFAPs that are far away. On the other hand, the computation tasks exceeding thecomputational capacity of the femto-clouds have to be transported to the remotecloud. This incurs both data transfer delay and remote cloud costs. If such a costexceeds the associated incentives, all FAPs within the femto-cloud will be respon-sible for the loss. Formation of the grand femto-cloud produces a huge pool oftasks, and increases the probability of such losses. Therefore, FAPs form femto-clouds in a way to minimize tasks that are needed to be transported to the remotecloud. The proposed femto-cloud formation scheme identifies such optimal lo-calized femto-clouds, to which only a subset of FAPs subscribe, in a distributedfashion.1Latency can be formulated as the sum of computational delay and data transfer delay.13ContributionsThe main results in this work are:• The resource sharing problem is formulated as an optimization problemwith the objective to maximize the overall utility of all femto-clouds withconstraints on the fair division of incentives among individual FAPs withina femto-cloud. The utility function of each femto-cloud takes into accountthe profile of request arrivals in individual femtocells, previous cooperativebehaviour of FAPs, data transfer delay, and computational capacity of FAPsto determine the overall incentive available to each femto-clouds. There-fore, solving the formulated problem translates into finding the femto-cloudstructure that maximizes utilization of FAPs’ local resources (taking into ac-count users’ experience), yet provides incentives to FAPs for sharing theirresources such that no FAP is willing to give up collaboration within itscurrent femto-cloud to join another femto-cloud.• The similarities between the formulated femto-cloud formation problem andcoalition formation games enable us to employ the dynamic coalition for-mation algorithm in [35] to devise a procedure that prescribes individualFAPs how to revise their decisions as to which femto-cloud to join so asto reach the solution of formulated problem (i.e., core of the underlyingcoalition formation game) in a distributed fashion.• Finally, numerical simulations using network simulator-3 (NS-3) illustratesuperior performance of the proposed scheme in terms of both handlinglatency and incentives provided to FAP owners over alternative heuristicfemto-cloud formation schemes. They further confirm that forming a grandfemto-cloud, comprising of all FAPs in the network, is not always the opti-mal choice.Related WorkHere, we provide a brief description of relevant works in the literature.14Collaboration among cloud providers: There is a large body of research worksdevoted to studying cooperation in cloud computing framework; see, e.g., [36],[37],[38].Cooperation among mobile cloud service providers is studied in [36] for poolingcomputational resources with the goal to maximize revenue. The authors thenuse Shapley value to distribute the revenue among the collaborating cloud ser-vice providers. In [39], a cooperative outsourcing strategy is proposed whichprescribes the providers whether to satisfy users’ requests locally or to outsourceto a certain provider. Dynamic cloud federation formation is also studied in [40].Collaboration among femtocells: Coalition formation in femtocell network hasbeen extensively studied in the literature; see, e.g., [41],[42],[43]. For instance,[44] studies coalition formation among femtocells in order to mitigate interfer-ence in the network. In [43], an interference management model is developed ina femtocell network wherein the cooperation problem is formulated as a coalitionformation game with overlapping conditions. Rami et al. [45] also consider re-source and power allocation in cooperative femtocell networks. All these worksconsider cooperation among femtocells with the aim to improve physical-layerthroughput.Incentives for cooperation in femtocell network: Femtocells are typically de-ployed by mobile network operators in an open/hybrid access mode, in whichFAPs are willing to accommodate guest users; see, e.g., [46],[47],[48],[49]. Tomotivate FAP owners to adopt such an access mode, several incentive schemeshave been studied in the literature, e.g., [46],[47],[48],[49],[50],[51]. Incentivescan be categorized as reputation or remuneration [52]. Reputation reflects thewillingness of wireless nodes’ to cooperate with other nodes. Nodes receive ser-vices from other nodes based on their past behaviour—misbehaving nodes are de-prived from receiving services. In contrast, remuneration-based mechanisms pro-vide monetary incentives for cooperation, e.g., micropayment, virtual currency,E-cash, and credit transfer [53],[54],[55],[56].15Femto-clouds: Femto-clouds are relatively recent and only few studies can befound in the literature. For instance, [13] proposes a mechanism for joint op-timization of communication and computational resources. In [33],[57], an of-floading strategy is proposed for femto-clouds. All these works consider thecloud offloading mechanism while assuming that FAPs are already grouped intocoalitions. Femto-clouds differ substantially from cloud radio access networks(CRAN) [58] in that FAPs are endowed with computational resources and theoffloaded computations are preferred to be performed locally rather than in a cen-tralized cloud (e.g. remote radio head in CRAN) to reduce handling latency.Jessica et al. propose cluster formation strategies in [59] to handle a singleuser’s requests in femto-clouds. These strategies are devised with different objec-tives, e.g., to minimize the experienced latency or to reduce power consumptionin the cluster. This work is extended to a multi-user scenario in [60] where clus-ters are formed for each unserved request according to the strategies proposedin [59]. Their model, however, is suitable only for enterprise femtocell environ-ments where all FAPs share their computational resources with each other. More-over, cluster formation for each unserved request significantly increases the sig-nalling overhead. To the best of our knowledge, the formulation and distributedscheme proposed in this work for formation of femto-clouds considering a re-muneration incentive mechanism and taking into account the delay involved inmigrating tasks between FAPs have not been studied before.1.2.2 Adaptive Scheme for Caching Content in a Mobile EdgeNetworkThe target of this work is to devise caching scheme in a edge network such as incellular network to maximally utilize edge network storage resources. A detailedexplanation of this work are provided in Chapter 3. Motivation and contributionsof this work are as follows.16MotivationThe literature on content caching in edge networks has grown in recent years [61–63]. Caching methods roughly fall into two categories: i) designing new contentcaching methods with different objectives e.g., minimizing downloading delay,energy consumption, network congestions or maximizing users’ QoE while as-suming content popularity is known; and ii) developing new methods for predict-ing content popularity and caching the most popular content. For instance, [26]presents content caching methods for base stations (BSs) to minimize contentdownloading delay. The proposed coded content caching optimization problemin [26] is convex and involves the solution to a linear program. In [62], a multicast-aware caching method is developed that minimizes energy consumption in thenetwork while the method described in [25] improves users’ QoE. A cooperativecontent caching policy is proposed in [64] to improve network performance andusers’ QoE. The presented content caching problem is formulated as an integerlinear programming and suboptimal solutions are devised using the hierarchicalprimal-dual decomposition method.A limitation with the caching methods [26],[64],[62] is that they assume knowl-edge of the content popularity in advance, and assume that the content popu-larity follows a Zipf distribution. In reality, content popularity must be esti-mated [65, 66]. In [67],[68],[69], content popularity is estimated using the re-quest statistics of the content2. The content is then cached based on the estimatedcontent popularity. In [70],[71], collaborative filtering methods are used to clus-ter users with similar content preferences, and then cache the content based onthe number of users in each cluster. For new users, their social network char-acteristics can be used to estimate their content preferences and cluster associa-tion. The combination of collaborative filtering with social network informationis used in [72, 73] to mitigate the cold start and data sparseness issues associ-ated with collaborative filtering. There are two main issues with employing thesemethods for caching content. The first is that these methods are highly intrusive2In this work content refers to YouTube video files.17as the methods require knowing the user’s content requests and social networkinformation–for example, these methods can not be employed in countries suchas Canada without user consent as it would violate privacy laws3. The secondissue is that content popularity is estimated using the content request statistics.Therefore, these methods can not be used to cache new content which have nouser request statistics.ContributionsIn this work, we construct an adaptive caching scheme that accounts for users’behaviour and properties of the cellular network (e.g. cache size, bandwidth, andnetwork topology). The scheme does not require specific user’s content requestsand/or social network information. Instead, the popularity of the content is pre-dicted using features of the content. This allows the caching of popular contentwithout the need for directly observing user requests for the content. Addition-ally, since the popularity of the content is predicted, the content caching can beperformed when the network load is minimal reducing in the overall energy usageof the network. A schematic of the adaptive caching scheme proposed in this workis displayed in Fig. 1.5. The main contributions of this work include:• A machine learning algorithm, based on the extreme learning machine (ELM) [74],to estimate the popularity of content based on the features of the content.• A mixed-integer linear program (MILP) is constructed to perform cache ini-tialization that accounts for the predicted content popularity and propertiesof the cellular network.• An adaptive caching scheme which uses the MILP for cache initializa-tion, and the S3LRU (Segmented Least Recently Used with three segments)cache replacement scheme for dynamically adjusting the cache based onthe requests from users. The combination of these schemes increases theoverall performance of the cellular network.3https://www.loc.gov/law/help/online-privacy-law/canada.php, 7 March, 201718• The performance of the adaptive caching scheme is illustrated using real-world data from YouTube and NS-3 simulator. The results illustrated thatthe adaptive caching scheme improves network performance and users’ QoEcompared with industry standard caching schemes [26, 71, 75].Content FeaturesContent RequestsExtreme LearningMachineContent PopularityContent AnalysisCache DeploymentContent and NetworkAware Adaptive CachingCaching Scheme forQoE and Network TrafficImprovementFigure 1.5: Schematic of the caching scheme. Improving the quality of ex-perience (QoE) involves ensuring a higher cache hit ratio for requestedcontent and reduced downloading delay.1.2.3 Risk-Averse Caching Scheme for HeterogeneousNetworksThis work is an extension of the previous work that accounts for error in the con-tent popularity prediction. More details of this work are provided in Chapter 4.Motivation and contributions of the work are as explained below.MotivationAs already mentioned, the design of efficient caching policies requires i) estima-tion of the future content requests, ii) a method to cache popular content basedon the request estimates and routing protocol to deliver content to users in thenetwork. In this work we design risk-averse caching schemes for heterogeneousnetworks that account for the uncertainty of predicting the future popularity ofcontent, the content routing protocol, and balance the network traffic load.19Given the main source of traffic in the networks is video content, several meth-ods have been proposed for estimating video content requests. These methods fallinto two categories, namely, time-series or content feature based. Time-seriesmethods use the historical content requests to predict the future content requests.Multivariate linear models [76], pure birth stochastic process [77], and ordinarydifferential equations [78] have all been used for constructing time-series meth-ods for estimating content requests. A limitation with time-series methods is thatthey can only be used to predict the content requests of posted content. Contentfeature based methods use the uploader, textual, and image features of the contentto predict the content requests. These methods include multivariate linear regres-sion [79], Markov clustering [80], and extreme learning machines [81]. All thesemethods provide point forecasts (expected value) for the content requests. No es-timate of the confidence interval, prediction interval, or measure of the uncertaintyassociated with the predicted content requests is provided. In this work we utilizeconformal prediction algorithm for estimating the cumulative distribution func-tion of the content requests. The key idea of the conformal prediction algorithm isto first group content, then for each group assume that the resulting error betweenthe point forecasts and the actual content requests are generated from the samecumulative distribution function. The estimated cumulative distribution functionprovides a complete description of the uncertainty associated with the estimatedcontent requests. Both the confidence interval and prediction interval of the con-tent requests can be constructed from the estimated probability distribution. Notethat density forecasting [82],[83] in the time-domain is typically referred to asprequential forecasting [84] in the economics literature.Having estimated the future content requests, the aim is to optimally cache thecontent throughout the network to minimize the delay to transfer the requestedcontent to the users. This requires that content is cached where it is likely to berequested, and to optimally transfer content throughout the network to serve userrequests. The methods in [85],[86],[87] account for user specific downloadingdelay and wireless channel fading gain to determine where to cache content and20which user to connect to which access point. These caching methods can be con-sidered as dynamic cache replacement methods as they adapt the cache based onthe users ability to connect to different access points. However, a limitation withthe caching methods [85],[86],[87] is that they do not account for the routing pro-tocol used in the backhaul network to retrieve content that is not cached in theaccess points. In [88] cooperative caching is used to account for the backhaul linkbandwidth in the network to cache content throughout the network. Once the con-tent is cached in the hetergeneous network, then content and load aware routingmethods are used to transfer the requested content to users [89],[90],[91]. The aimof these routing methods to ensure load balancing occurs throughout the networkto minimize the delay of transferring content to users.A common theme with the caching methods [85],[86],[87],[88] is that theycontain three distinct steps. First, a point estimate of the content popularity isperformed. Second, the content is cached in the network to minimize the delay oftransferring content to users. Third, given the cached content, a routing method isused to deliver the content to the users. Notice that the routing method does notaffect how the content is cached in the network. Additionally, the above methodsare not risk-averse–that is, the point estimate of the content requests is equivalentto computing the expectation of the requests using the forecaster content requestdensity. An issue with using the point estimate is that it does not provide a measureof the uncertainty associated with estimating the future content requests. As such,no probabilistic guarantees can be made on the network operating characteristics(e.g. downloading delay, bit-error-rate, energy consumption, cache miss ratio) fora selected caching decision.ContributionsIn this work, we include routing mechansism in the caching decision and insteadof minimizing the expected downloading delay as explained in the previous work,we replace the additive expectation operator with a more general subadditive riskoperator, to make caching decisions to optimize network performance. Specifi-21cally, we use the Conditional Value-at-Risk (CVaR) risk operator to construct thecaching schemes in the heterogeneous network4. There are two important proper-ties of CVaR that make it useful for performing caching decisions to reduce delay.First, CVaR accounts for the minimal probability of a substantial network delayfor a caching decision where the total delay probability distribution is asymmet-ric. Second, CVaR is a coherent risk measure–that is, CVaR is monotonic, subad-ditive, positive homogeneous, and translation invariant. The monotonic propertystates that if the delay of a caching decision D1 is always less than another cachingdecision D2 almost surely, then the risk of selecting D1 is always less than D2. Ad-ditionally, the subadditive property guarantees that the risk of using two cachingdecisions D1 and D2 is always less than or equal to the risk associated with usingD1 and D2 separately. Since CVaR is a coherent risk measure, the optimizationof CVaR results in a convex optimization problem. As we show, the optimizationof CVaR to confidently reduce the network delay while accounting for the routingprotocol results in a mixed-integer linear program.1.3 Thesis OrganizationThe remainder of the thesis is organized as follows. In chapter 2, we describea mechanism to form femto-clouds using distributed coalition formation game.Chapter 3, explains a caching scheme in edge network. This work utilizes machinelearning algorithms to predict content popularity from their features. Then anoptimal caching scheme is devised using mixed integer linear programming. InChapter 4, we describe caching schemes for heterogeneous networks. This workis an extension of the work described in Chapter 3. Finally, main research findingsand future research directions are provided in Chapter 5. A schematic of theremainder of the thesis is illustrated in Fig.1.6.4CVaR is one of the “big” developments for risk-averse decision making in mathematical fi-nance; see [92–94].22Thesis organizationChapter 2 Chapter 3 Chapter 4 Chapter 5A Distributed CoalitionGame Approach toFemto-Cloud FormationAdaptive Scheme forCaching Content ina Mobile Edge NetworkRisk-Averse Caching Schemefor Heterogeneous NetworksConclusionsFuture WorksFigure 1.6: A schematic view of the remainder of the thesis.23Chapter 2A Distributed Coalition GameApproach to Femto-CloudFormationIn this chapter, we describe a distributed approach for femto-clouds formation.Recall from Chapter 1, the main target of femto-clouds formation is to maximallyexploit computational resources at the edge of the network. In femto-clouds, edgenodes such as FAPs share their computational resources with other FAPs with theobjective to maximize utilities that are lower overall network latency and higherincentives. FAP decides individually which femto-clouds to join in for performingcomputational tasks and receives a fair share of incentives following a distributedcoalition formation game approach.At the beginning of the chapter in Sec. 2.1, we introduce system architectureconsidered for the problem. The utility function for the femto-clouds is defined inSec. 2.2. The distributed femto-cloud formation algorithm is presented in Sec. 2.3.Numerical studies are provided in Sec. 2.4. Finally, Sec. 2.5 concludes the chapter.24eNode-BMacro UERemote CloudFAPHome UEOptical Fiber/EthernetWireless LinkFemto-cloud managerFigure 2.1: A typical femto-cloud architecture. The macrocell and femtocellbase stations are referred to as eNode-B and femtocell access point(FAP), respectively, and the end users are referred to as user equip-ment (UE). FAPs are connected to their closest femtocell cloud man-ager (FCM) via the Z interface while FCM is linked with the remotecloud via optical fiber/Ethernet. The FAPs are also connected to theneighbouring FAPs via the Z interface.2.1 System ArchitectureWe consider a UMTS LTE architecture with V FAPs/Home eNode-Bs (HeNBs)endowed with heterogeneous computational capacity. Each FAP is located in aseparate room and possibly different floor of a multi-story building. The FAPsshare bandwidth with a macro base station (BS) as shown in Fig. 2.1, and aredeployed by different residential users.We assume that there exist NF femtocell cloud managers (FCMs) in the build-25ing, where NF <V . The FAPs are connected to their closest FCMs via Z interfaceaccording to the proposed standalone FCM architecture in [34]. FCMs are respon-sible for:(i) gathering task request information of the connected FAPs, and exchangingthis information with neighbouring FCMs;(ii) implementing the incentive mechanism by monitoring the tasks completedby each FAP;(iii) performing computations for the femto-cloud formation mechanism pro-posed in this work.FCMs are connected to the remote cloud via optical fiber links, hence, can offloadthe computational tasks of the connected FAPs to the remote cloud with no inter-vention of the core network. The FCMs substantially reduce the traffic generatedby the MCC in the core network. It is therefore natural to assume that FCMs areinstalled and maintained by the mobile network operators.It is assumed that FAPs are connected to the core network via wireless back-haul, and can be deployed by the residential users in a plug-and-play fashion. TheFAPs use the 2.6 GHz licensed bandwidth to connect with FCMs, and commu-nicate with other FAPs via the Z interface in a multicast fashion. Since FAPsand FCMs are located in different rooms/floors of the building, the FAP-FAP andFAP-FCM signal propagation undergo several losses. Here, we only considerexternal wall loss, shadowing loss, and the 2.6 GHz path loss models. As a re-sult, the FAP-FAP communication delay depends on the location of the FAPs andchannels’ quality.2.2 Formulation of the Femto-Cloud FormationProblemThis section formulates the femto-cloud formation problem. We first formulatethe utility function that quantifies the performance of individual femto-clouds in26Sec. 2.2.1. The global femto-cloud formation problem with fair allocation of in-centives to FAPs is then formalized in Sec. 2.2.2. We finally discuss the similar-ities between the formulated problem and coalition formation games. Table 2.1summarizes the notations used in this section.2.2.1 Local Femto-Clouds and Their UtilityMobile devices make decisions on offloading their tasks to FAPs based on thehandling latency1 and energy [12]. If offloaded to FAPs, they will then decidewhether to perform computations locally or send them to the remote cloud tak-ing into account the users’ QoE requirements, their computational capacity andworkload. The main goal in this work is to motivate a cooperation protocol tomaximally exploit FAPs’ local resources. Neighbouring FAPs form collaborativecoalitions to increase local computational capacity. Since FAPs are densely de-ployed, sending the data for the requested tasks to such local femto-clouds incursless latency as compared to the remote cloud. This improves users’ QoE whileenabling FAP owners to earn incentive by sharing their excess resources.Resource sharing problems can generally be formulated as constrained opti-mization problems with a utility function that trades-off the benefits and costs as-sociated with collaboration by sharing resources. Consider a set of FAPs, indexedby the set V = {1,2, . . . ,V}, and let C ⊆ V denote a coalition of FAPs formed fora fixed time interval over which the parameters described below remain constant.The case |C | > 1 is referred to as a femto-cloud, whereas |C | = 1 is referred toas an isolated FAP. Here, | · | denotes the cardinality operator. The performanceof femto-clouds are then quantified by the function U : 2V − /0→ R, where 2Vdenotes the power set of the set of FAPs V . This function quantifies the totalincentive earned by a femto-cloud as the result of FAPs sharing their resources,which is then divided among the FAPs in the femto-cloud, and is formulated asU(C ) =U r(C )−Uc(C )+U p(C ), (2.1)1Latency can be formulated as the sum of computational delay and data transfer delay.27Table 2.1: Notations and TerminologySystemParame-tersDescriptionV Number of FAPsNF Number of FCMsRk Trust/reputation value of FAP kdmaxk Computational capacity of FAP kDmaxCOverall computational capacity offemto-cloud Cbk,lUplink data transmission rate fromFAP k to FAP lbkUplink data transmission rate fromFAP k to FCMLWAN latency for sending tasks to re-mote cloudTaskRequestDescriptionNB Data sizedkSample mean of task requests receivedby FAP kDCSample mean of task requests in femto-cloud CHCEntropy of total task requests in femto-cloud CUtilityFunctionDescriptionmr Revenue per unit taskmp Proportionality constant for trustcr Remote cloud charges per unit taskco Offloading delay costcu Penalty for demand uncertainty28where each term on the right hand side is described below:The first term U r(C ) models the revenue earned by the femto-cloud and isformulated asU r(C ) = mr ·DC , (2.2)where mr is the revenue per unit task ($/task). Further, DC denotes the samplemean of the task requests received by femto-cloud C over the past time slots sincethe femto-cloud has been modified/formed. If the requested tasks for a particu-lar FAP exceed its computational capacity, the FAP offloads tasks to femto-cloudmembers and shares the incentive with them. Since femto-clouds are formed forseveral time slots, rather than dealing with instantaneous offloaded tasks, the in-centive function relies on the previously observed statistics of requests.The second term Uc(C ) in (2.1) represents the costs incurred by forming afemto-cloud, and is comprised of four termsUc(C ) =Ucr (C )+Uco,r(C )+Uco,m(C )+Ucu (C ) (2.3)where each term is described below:1) Remote cloud cost: When the accumulated task requests within a femto-cloudexceeds its computational capacity, the excess tasks have to be offloaded to theremote cloud to avoid processing delays. This incurs two types of costs:a) Remote cloud processing cost: The term Ucr (C ) in (2.3) models the remotecloud processing costUcr (C ) = cr ·∣∣DC −DmaxC ∣∣+ , (2.4)where cr is the remote cloud charges in $/task. Further, |x|+ = max{0,x}, andDmaxC =∑k∈C dmaxk is the overall computational capacity of femto-cloud C , wheredmaxk represents the computational capacity2 of the k-th FAP. This term motivates2One unit of computational capacity is equal to one unit of workload.29FAPs to form coalitions with FAPs with low workload to computational capacityratio.b) Remote cloud offloading delay cost: The second term Uco,r(C ) in (2.3) isthe penalty associated with the data transfer delay in offloading excess femto-cloudworkload to the remote cloud, and is formulated asUco,r(C ) = co ·(∣∣DC −DmaxC ∣∣+ ·NB ·( 1mink∈C bk +L)). (2.5)Here, NB denotes the data size, in bytes, of a task, bk is the uplink data transmis-sion rate, in bytes/sec, from k-th FAP to FCM, L represents the wide area network(WAN) latency introduced by transporting the task to the remote cloud via theFCM (byte/second), and co ($/sec) is the dimension for proportionality constant.2) Multicast offloading delay to FAPs: The term Uco,m(C ) in (2.3) represents thepenalty for the delay in transmitting data, associated with the tasks exceedingFAPs’ computational, to the femto-cloud into a monetary penalty. It providesincentive for FAPs to collaborate with neighbouring FAPs to decrease the handlingdelay and improve the QoE of users, and is formally given byUco,m(C ) = co ·(∑k∈C∣∣dk−dmaxk ∣∣+ · NBminl∈C−{k} bk,l). (2.6)Here, bk,l denotes the uplink data transmission rate from the k-th FAP to the l-thFAP, dk is the sample mean of the task request in the k-th FAP over the past timeslots since the femto-cloud has been modified/formed. Finally, dk− dmaxk is thenumber of tasks that exceeds the computational capacity of the k-th FAP, and haveto be sent to the cloud.3) Demand uncertainty cost: Since femto-clouds are formed for multiple timeslots and we use sample statistics rather than instantaneous task requests, it isimportant to account for deviation from the mean demand so as to avoid remote30cloud costs. The last component of the cost function captures such uncertainty inthe overall femto-cloud demand, and is formulated asUcu (C ) = cu ·HC , (2.7)where HC denotes the sample entropy of the accumulated task request time se-ries. This term simply motivates FAPs to form femto-clouds with FAPs with lessvariability around their mean computational demand.Finally, the last term U p(C ) in (2.1) models the priority value of the coalitionC . With each FAP, there corresponds a trust value, denoted by Rk, that captures thequality of its previous cooperative behaviour [95]. By joining femto-clouds andsuccessfully performing computations offloaded by other cloud members, FAPsearn trust. Femto-cloud comprising of FAPs with higher trust values are expectedto perform tasks in a timely manner; therefore, the service provider is willing toprovide them with higher monetary incentives as they improve the users QoE. Thisfurther eliminates free-rider FAPs that join coalitions to obtain incentives withoutperforming tasks.We formulate U p(C ) as follows:U p(C ) = mp ·(∑k∈CRk ·min{dmaxk , f}dk), (2.8)where mp ($) is the proportionality constant that determines the relative weight oftrust in formation of femto-clouds, and f is a system parameter3 that depends onthe overall task requests in the system. Note in the above formulation that higherpriority is placed on FAPs with lower mean demand to computational capacityratios and higher trust values. It is assumed that FCMs are responsible for up-dating the trust values for their neighbouring FAPs. The mechanism for updatingthese trust values is however out of the scope of this work, and merits a separate3Taking the minimum in (2.8) is a technicality to avoid obtaining excess priority for computa-tional capacity that exceeds the femto-cloud demands.31research work. We further assume that Rk remains constant for several time slotswhile the FCM monitors the k-th FAP cooperative behaviour, and is only updatedwhen the femto-clouds structure is being modified.2.2.2 Optimization of the Femto-clouds with FAP IncentivesAs mentioned in Sec. 2.2.1, FAPs expect incentives for sharing their excess re-sources. Let r = (r1, . . . ,rV ) denote the incentive allocation vector. Each elementrk represents the share of each FAP k from the total incentive obtained by thefemto-cloud C that FAP k have joined. To make the problem mathematicallytractable, the set of incentive values is confined to a finite set. Suppose ∆ ($) is thesmallest incentive unit. Each FAP’s demand is then restricted to the setP ={n̂∆; n̂ ∈ N,0≤ n̂∆≤ maxC∈2V − /0U(C )}, (2.9)where N represents the set of all natural numbers, and the function U(·) is definedin Sec. 2.2.1. Let further B denote the set of all possible femto-cloud structures.Each femto-cloud structureS is a partition on the set V , i.e., ∪C∈SC = V . Thefemto-cloud formation problem is then formulated asmaxS∈B ∑C∈SbU(C )c∆,s.t. rk ∈P,∑k∈Crk = bU(C )c∆, ∀C ∈B,∑k∈C ′rk ≥ bU(C ′)c∆, ∀C ′ ⊆ V ,C ′ 6= /0.(2.10)where bxc∆ = b x∆c·∆ denotes the greatest integer multiple of the smallest divisibleincentive unit ∆, andP is defined in (2.9).Before proceeding to provide an intuitive interpretation of (2.10), a few defi-nitions are in order. Let r and r′ denote two V ×1 incentive vectors. The productordering r ≤ r′ holds if and only if rk ≤ r′k for all k ∈ V . An incentive allocation32r is then called efficient if the sum of incentives of all FAPs is equal to the max-imum total incentive, achievable under the most desirable femto-cloud structure.In addition, if a group of FAPs can form a femto-cloud C ′ where the division ofcoalition’s incentive guarantees r′ ≥ r, then C ′ will block the currently formedfemto-cloud C and the associated incentive vector r. An incentive vector r iscalled non-blocking if for all possible femto-clouds C ′, the associated incentiver′ satisfies r ≥ r′. The second constraint in (2.10) ensures that the incentives al-located to FAPs are efficient. The third constraint in (2.10) is the non-blockingcondition, and can be interpreted as a fairness criterion on the division of incen-tives among FAPs in each femto-cloud. An incentive allocation vector is calledfair if no FAP can gain higher incentive by sharing its resources with a differ-ent group of FAPs. The solution to (2.10) can thus be considered as the optimalfemto-cloud structure in that: i) the computational capacity of all FAPs is maxi-mally exploited, and ii) the FAP incentives are distributed in a fair fashion withineach femto-cloud.Coalition Formation Game Interpretation: The femto-cloud formation problemwith FAP incentives outlined above fits well within the context of coalition forma-tion games. The coalition formation games encompass cooperative games wherethe coalition structure plays a major role, and are defined by the pair (G ,Q), whereG denotes the set of players and Q : 2G − /0→ R is the characteristic function4.This function associates with any non-empty coalition a number that quantifiesthe total payoff that can be gained by the coalition. A cooperative game is calledsuperadditive if for any two disjoint coalitions C1,C2 ⊂ G :Q(C1∪C2)≥ Q(C1)+Q(C2).In superadditive games, the grand coalition—the coalition consisting all players—forms the stable coalition structure. The coalition formation games encompass4The term characteristic function is as used in cooperative games and is unrelated to character-istic functions in probability theory.33cooperative games where the coalition structure plays a major role. These gamesare generally non-superadditive; therefore, the optimal coalition structure may becomprised of several disjoint coalitions. Due to the data transfer delay and limitedcomputational capacity of FAPs, it is intuitive that the optimal structure of femto-clouds has to incorporate several disjoint coalitions of FAPs. It is thus natural toformulate the femto-cloud formation problem as a coalition formation game withG = V and Q(·) = U(·). In particular, the solution of the femto-cloud forma-tion problem (2.10) is identical to a solution notion in coalition formation games,namely, modified core [35]. Therefore, solving (2.10) is equivalent to finding themodified core of the underlying coalition formation game. The interested readeris referred to [96],[97],[98] for further details.2.3 Distributed Femto-Cloud Formation andConvergence to the CoreThis section presents a distributed femto-cloud formation algorithm that guaran-tees convergence to the solution of (2.10) almost surely, and elaborates on itsimplementation considerations.2.3.1 Distributed Femto-Cloud Formation AlgorithmDefine network state pair by ω = (S ,r), which contains the femto-clouds struc-tureS and the incentive vector of FAPs r. The distributed femto-cloud formationprocedure relies on the dynamic coalition formation algorithm proposed in [35]and is summarized below in Algorithm 1. The advantage of using the decentral-ized procedure in Algorithm 1 over centralized solutions is that it retains auton-omy of FAP owners as whether to collaborate and better captures the dynamics ofthe negotiation process among them [35]. In a centralized solution, FAP ownershave to be forced to follow the calculated optimal femto-cloud structure. In fact, ifan FAP owner decides to not follow the prescription, the implemented femto-cloudstructure is no longer the optimal solution. In contrast, the decentralized solution34implemented in Algorithm 1 mimics the natural procedure that FAP owners willfollow to form collaborative groups—they explore their options and settle in thefemto-cloud that provides the highest feasible incentive. The implementation con-siderations will be addressed in the next subsection.Algorithm 1 Distributed Femto-Cloud FormationInitialization. Set 0 < ε,ρ < 1, where ρ is the probability of revising strategyand ε is the experimentation probability. Initialize ω0 = (S 0,r0), whereS 0 ={{1}, . . . ,{V}},r0 = (r̂1, . . . , r̂V),and r̂k =U({k}).Step 1. Find blocking coalitions by FCM:Let A n = /0. For all C ∈ 2V − /0,if ∑k∈C rnk < bU(C )c∆, then A n←A n∪C .Step 2. Each FAP k ∈ {1, . . . ,V} independently performs:Step 2.1. With probability ρ , continue with Step 2.2. With the remainingprobability 1−ρ , stay in the same coalition, set rn+1k = rnk , and go to Step 2.5.Step 2.2. ComputeC˜ n+1k = argmaxC∈S n∪ /0(bU(C ∪{k})c∆− ∑l∈C ,l 6=krnl)(2.11)r˜n+1k =⌊U(C˜ n+1k ∪{k})⌋∆− ∑l∈C˜ n+1k ,l 6=krnl (2.12)Step 2.3. If k ∈ A n, with probability ε , go to Step 2.4. With the remainingprobability 1−ε , sample uniformly from the setS n∪ /0, denote it by C˜ n+1k , andset rn+1k = r˜n+1k , where r˜n+1k is computed according to (2.12). Go to Step 2.5.Step 2.4. Set rn+1k = r˜n+1k and, if non-singleton, randomize among C˜n+1k uni-formly.Step 2.5. If k 6=V , continue with the next FAP.Step 3. Form ωn+1 = (S n+1,rn+1).Set n← n+1 and go to Step 1.35The myopic best-reply strategy implemented in Step 2.1-2.3 of Algorithm 1defines a finite-state Markov chain, namely, best-reply process [35]. Standard re-sults on finite state Markov chains show that, no matter where the process starts,the probability that the best-reply process reaches a recurrent set of states aftern iterations tends to one as n tends to infinity. The outcome that which of theseergodic states will eventually be reached is determined by the initial state. Underthe best-reply process, absorbing states do not necessarily guarantee reaching thesolution of (2.10). To address this issue, perturbation has to be introduced. Thatis, to allow FAPs deviate from optimal strategies and choose sub-optimal strate-gies with a small probability with the hope of achieving higher incentives. Theinterested reader is referred to [35] for details and further discussion.Deviation from the best-reply process, namely, experimentation, is formallydefined as follows: In any state, when there exists a potential femto-cloud C ′ ∈ 2Vsuch that∑k∈C ′ rk < bU(C ′)c∆, (2.13)each FAP k ∈ C ′ follows the best-reply process of Step 2.1-2.3 with probability1−ε . With the remaining probability ε , it randomly joins an existing femto-cloud,and demands the surplus incentive that the femto-cloud expects to achieve as theresult of FAP k joining it. The blocking condition (2.13) is checked in Step 1 ofAlgorithm 1. This modified best-reply process defines a finite-state Markov chain,namely, best-reply process with experimentation [35], with the same state space asthe best-reply process (without experimentation) and slightly modified transitionprobabilities.The limiting distribution of the best-reply process with experimentation sum-marized in Algorithm 1 assigns probability one to the states (S n,rn) that solve thefemto-cloud formation problem (2.10). This result is summarized in the followingtheorem.Theorem 2.3.1. Let ωc = (S c,rc) denote the states that solve the femto-cloudformation problem (2.10). Then, the sample path of ωn = (S n,rn) generated by36Algorithm 1 converges almost surely to the core, i.e.,P(limn→∞ωn = ωc)= 1, (2.14)for all initializations ω0 if the solution set is non-empty.• The proof relies on the results of [35] and the analogy between the femto-cloud formation problem (2.10) and the modified core of the underlyingcoalition formation game; see Sec. 2.2.2 for details. It is shown in [35] thatthe best-reply process with experimentation implemented by Algorithm 1converges almost surely to the modified core of the coalition formationgame; see [35] for the detailed proof. Comparing the definition of modi-fied core in [35] with (2.10) then completes the proof.2.3.2 Implementation Considerationsa) Decentralized Implementation: The proposed algorithm, independently fol-lowed by each FAP, provides a decentralized solution to (2.10). This decentral-ized implementation relies on collaboration among the FCMs. It is assumed thatFAPs monitor their users task request statistics over an interval comprising severaltime slots, and periodically transmit this information to their neighbouring FCM.The FCMs then exchange this information with each other so as to be able toevaluate the femto-cloud characteristic function (2.1) and detect for blocked FAPs(Step 1 in Algorithm 1) in their neighbourhood. Note that data size of user requestinformation is negligible compared to the task data size. The FCMs are furtherresponsible for providing FAPs that decided to revise their cooperation strategieswith the feasible incentive (the term inside parentheses in (2.11)) in the associ-ated femto-cloud, and to inform the blocked FAPs of their potential for obtaininghigher incentives in other femto-clouds. Finally, having been enabled to commu-nicate with each other, it is the task of FCMs to collaboratively update the networkstate parameter ωn in Algorithm 1.37b) Complexity: Searching for the blocking coalition (Step 1 in Algorithm 1) isthe main computational complex part of the algorithm. An exhaustive search forblocking coalition requires checking (2V −1) different combination of coalitions.As the number of FAPs increase, the randomized searching algorithm presentedin [99] can be utilized. According to the randomized searching algorithm, onlya subset of the combination of coalitions is constructed for checking blockingcoalition. The randomized searching algorithm still converges to the core of thecoalition formation game with probability one, however, resulting in slower con-vergence rate.c) Time-scales: We assume that the femto-cloud structure remains constant forseveral time slots, and FAPs update their user request statistics with the same fre-quency. During this period, FAPs run Algorithm 1 based on the most recent sam-ple statistics of the user requests. Once convergence to the solution takes place,the femto-cloud structure and associated incentives will be followed in the nextdecision epoch that the femto-cloud structure is being revised. Note that, sinceFAPs and FCMs are both static, the FAP-FAP and FAP-FCM channel responsesvary slowly. In the utility function, average data transmission rate is consideredover which femto-cloud structures are assumed to remain constant.d) Characteristic Function Parameters: The parameters mr, cr, co, cu, and mpin (2.1) could be mathematically be interpreted as weight factors that determinethe relative importance of the different factors considered in formulation of thecharacteristic function such as the delay cost, the demand uncertainty cost, andremote cloud processing cost. The choice of these parameters is an interestingtopic in utility theory. Clearly, the values of these parameters affect the optimalfemto-cloud structure and incentive allocations. The particular choice of thesevalues will depend on the specific application. For instance, in some applicationsusers may be willing to incur longer delays to pay less for using the femto-cloud,in which case co should be smaller relative to mr. In others, users may not tolerate38delay, where co should be set very large. We further emphasize that the utilityfunction formulated in Sec. 2.2.1 is only an example that exhibits how to incorpo-rate different factors into the implementation of femto-clouds. Depending on theapplication specifics, certain terms could be added or omitted.e) Empty Core: Finally, imposing conditions on the utility function to ensureexistence of a solution (modified core of the underlying game) could be inherentlycomplex in some applications. To address this issue, the experimentation factor εin Algorithm 1 can be made to diminish to zero with time, e.g., one can replaceε with εn = 1/nα for 0 < α < 1. This ensures that Algorithm 1 converges to theabsorbing states of the best-reply process (Steps 2.1-2.3 in Algorithm 1) if the coreis empty. Extensive simulations in Sec. 2.4 numerically verify that the results stilloutperform alternative schemes.2.4 Numerical ResultsThis section provides numerical examples to evaluate the performance of the pro-posed incentive-based femto-cloud formation scheme.2.4.1 Object Recognition TasksWe focus on the processing associated with the object recognition task from im-ages and videos captured via vision-based sensors in mobile devices, which isrequired to support mobile augmented reality applications. The formulation, how-ever, is general enough to be adapted to various computationally intensive applica-tions such as face recognition, pattern recognition, and optical character recogni-tion from images/videos5. In particular, applications with different computationalrequirements can be split into several equal-sized computational sub-tasks. Theutility function only requires how many sub-tasks can be executed in the femto-5Different object recognition applications may require different feature descriptors. The choiceof the descriptor, however, is not crucial to the problem formulation and the proposed femto-cloudformation mechanism; it only affects the parameters of the utility function defined in Sec. and the predicted demand of sub-tasks in the coalition.Feature extraction is typically the most computationally intensive task in ob-ject recognition at the deployment stage [100]. We assume that FAPs are equippedwith graphics processing units (GPUs), and are capable of performing parallelcomputations in their GPUs. Therefore, the feature extraction procedure can beperformed either on the UE’s local processor, or on the FAPs. When both UE andFAPs are busy or lack sufficient computation capacity, the task is outsourced tothe remote cloud. Once extracted, the feature vectors are sent to the applicationserver, which compares them with the training models, and sends the best matchedresult(s) to the UE. In the examples to follow, we consider feature extraction taskson both images and videos. At each time, each UE can either offload an image ora video to the FAP for the feature extraction task. In the numerical examples, gPb6is used for feature extraction. We assume that the duration of a video is uniformlydistributed between 1 to 10 seconds.Here, one unit of workload/demand associated with feature extraction is con-sidered to be 144 Giga floating point operations per second (GFLOPs), whichis also used to define one unit of computational capacity7. We assume that a3264×2448 pixels image is divided into 9 sub-images [102, 103] with each sub-image containing 1088× 816 pixels and occupying 2.1 mega bit (Mb) memory.Similarly, videos are divided up into 1 second segments. Each 1 second video of640× 480 pixels and 30 frame rate occupies 2.2 Mb memory. In both cases, thefeature extraction task requires approximately 144 GFLOPs which is equivalentto one unit of workload or computational capacity.6The global probability algorithm (gPb) is a contour detection algorithm that achieves the bestperformance among all such schemes [101]. The computational requirement of gPb is 158,600FLOPS per pixel [100].772 cores, each with 1000 MHz clock speed, are grouped together and considered as one unitof computational capacity.402.4.2 Simulation SetupThroughout this section, the NS-3 simulator is used for simulating LTE system ar-chitecture. We consider a city environment and use the LTE module developed bythe LENA project [104, 105] as follows: We use LENA’s RandomRoomPosition-Allocator function to randomly locate 15 FAPs inside a 10-story building madeof concrete and comprising 20 apartments, as depicted in Fig. 2.2. There exist2 FCMs in the building located close to FAP-2 and FAP-15, respectively. TheFCMs are connected to the remote cloud via 1Gbps optical fiber link. LENA’sHybridBuildingsPropagationLossModel and 3kmphTraceFadingLossModel func-tions (i.e., slowly varying Nakagami-m fading model) are used for propagationloss and channel fading between UEs and FAPs, respectively. We further use theKun2600MhzPropagationLossModel and the NakagamiPropagationLossModel func-tions as the propagation loss model and channel fading for FAP-FAP and FAP-FCM communication. The handover is handled via the LENA’s AddX2Interfacefunction. UEs are further randomly located inside the building and connected toFAP using the AttachToClosestEnb function. At each time slot, sub-channels areallocated to users in each FAP according to the proportional fair (PF) schedul-ing policy with hybrid automatic repeat request (HARQ) re-transmission mecha-nism. Further, the UEs and FAPs are equipped with multiple input multiple output(MIMO) antennas, and support adaptive modulation and coding. UEs transmitUDP packets to the FAP. FAPs also transmit UDP packets for multi-cast commu-nication. The data transfer rates are calculated from the RLCTrace files generatedby the NS-3 simulator. Other NS-3 simulation parameters are listed in Table 2.2.Please note that in this section and throughout the thesis we assume that the net-work carries video traffic and NS-3 simulator takes care of this traffic using LTEEPS Video bearer. The service quality, strict transmission deadline, and priorityof video traffic have already taken care in the NS-3 simulation setup.Finally, the UE is considered to be an iPhone 5S and can perform 76.6 Gigafloating point operations per second.412.4.3 Numerical ExamplesWith the above simulation set-up, in the following examples, the effect of a singleparameter is studied on the formation of femto-clouds while other parameters arekept constant. We set ε = 0.3, ρ = 0.2, ∆ = 1, and α = 0.5 in Algorithm 1.Table 2.4 summarizes the parameters of all FAPs. These parameters are chosen soas to enable illustrating different scenarios. Each point on the graphs of Figs. 2.3-2.6 are averaged over 1000 i.i.d. realizations. The results are compared with twoalternative heuristic schemes for femto-cloud formation. Scheme-1 is based on therelative distance of the FAPs. That is, κ FAPs with the least relative distances forma local femto-cloud. Scheme-2 relies on the computational capacity, the samplemean and sample entropy of demand at the FAPs. That is, FAPs are ranked basedon the value of dmaxk −dk−Hk. Then, κ FAPs with the highest ranks are collectedto form a local femto-cloud with κ lowest ranked FAPs. The procedure continuesuntil all FAPs form/join a coalition. Coalition structures in heuristic schemes arelisted in Table 2.3.Example 1The first example studies the effect of data transfer delay in the formation of femto-clouds. This scenario represents an enterprise environment where all FAPs areowned by the same authority. Therefore, we set mr = cr = cu = mp = 0, andco = 1 $/sec. The utility function defined in equation(2.1) can be rewritten asfollows.Udelay(C ) =−U(C ) =Uco,r(C )+Uco,m(C ), (2.15)where the first term represents remote cloud offloading delay cost which is definedin equation(2.5). The second term defined in equation(2.6) represents multicastoffloading delay to FAPs. The goal will thus be to reduce the overall handlingdelay by forming local femto-clouds. In this case, W = (w1, . . . ,wV ) denote thehandling latency cost allocation vector where wk =−rk, k ∈ V . The femto-cloud42600550X-axis (m)11500 71545010 5 8400 9 1140 216014 318012Y-axis (m)13 6200 440500102030220Z-axis(m)FAPsFAPs with FCMsFigure 2.2: FAPs and FCMs locations inside the building. UE arrival at eachFAP follows a Poisson distribution. The number of UEs in the simula-tion depends on the user arrival rate at each FAP (see Table 2.4).formation problem is then reformulated asminS∈B ∑C∈SbUdelay(C )c∆,∑k∈Cwk = bUdelay(C )c∆, ∀C ∈B,∑k∈C ′wk ≤ bUdelay(C ′)c∆, ∀C ′ ⊆ V ,C ′ 6= /0.(2.16)As can be seen from equation (2.16), the goal of the femto-clouds formation prob-lem is to minimize overall handling latency. Inequality constraint ensures that noother coalition can provide lower handling latency. Algorithm 1 is redefined to43Table 2.2: Simulation setup: LTE system parameters in NS-3Parameters Value/TypeAdaptive Modulation & Coding PiroEW2010Bit Error Rate 0.0005MIMO 2×2FAP Antenna IsotropicAntennaModelExternal Wall Loss 10 dBShadowing Loss 5 dBEPS Bearer GBR CONV VIDEOFAP Transmission Power 20 dbmFAP Noise Figure 5 dbmUE Transmission Power 10 dbmUE Noise Figure 5 dbmMacrocell Bandwidth 20 MHzMobility Model ConstantPositionScheduler PfFfMacScheduleridentify optimal coalition structures that minimize overall handing latency.Fig. 2.3 shows the average data transfer delay in the femto-clouds versus thecomputational capacity of FAP-1. The ‘Isolated FAPs’ case refers to the scenariowhere no FAP is willing to cooperate and operates individually—that is, thereexist no femto-cloud. In contrast, the ‘Grand femto-cloud’ refers to the case whereall FAPs form one large collaborative femto-cloud. In the ‘Isolated FAPs’ case,as the computational capacity increases, FAP-1 can perform more tasks locallyand offloads fewer tasks to the remote cloud. This leads to the reduction of WANlatency. Therefore, the data transfer delay of FAP-1 decreases and, hence, theoverall data transfer delay in the femto-clouds decreases.As can be seen in Fig. 2.3, the data transfer delay in the femto-cloud structuresprescribed by Algorithm 1 is the lowest. This is in contrast to the grand femto-cloud which provides the highest delay. This is mainly because some FAPs arelocated far away in the building; hence, the multicast delay in the grand femto-44Table 2.3: Femto-cloud coalition structures in heuristic schemesFemto-Clouds Coalition StructureScheme-1 {1,2,5,8,9},{3,4,12,13,14},{6,7,10,11,15}Scheme-2(FAP-1 comp.capacity 0–4){1,2,6,10,15},{3,7,9,11,12},{4,5,8,13,14}Scheme-2(FAP-1 comp.capacity 6–8){2,6,7,10,15},{1,3,9,11,12},{4,5,8,13,14}Scheme-2(FAP-1 comp.capacity 10–14){1,2,7,10,15},{3,9,11,12,14},{4,5,6,8,13}Scheme-2(FAP-1 comp.capacity 16–20){2,7,10,12,15},{3,4,9,11,14},{1,5,6,8,13}Scheme-2(FAP-1 arrivalrate 1){1,7,10,12,15},{2,3,4,9,14},{5,6,8,11,13}Scheme-2(FAP-1 arrivalrate 2){1,2,7,10,15},{3,9,11,12,14},{4,5,6,8,13}Scheme-2(FAP-1 arrivalrate 3–5){1,2,6,10,15},{3,7,9,11,12},{4,5,8,13,14}Table 2.4: Simulation setup: FAP parameters in the numerical exampleFAP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Trust Value 0.1 0.5 0.5 0.4 0.1 0 0.1 0.2 0.4 0.1 0.2 0.4 0.1 0 0.5Comp. Capacity 10 10 30 10 10 5 5 20 20 15 15 5 10 10 30User Arrival Rate 2 1 2 2 1 2 3 2 3 1 2 2 1 3 1Mean Process. Requests 20.47 13.13 17.63 15.7 13.51 16.46 20 11.4 17.77 8.13 17.53 11.67 10.64 21.63 13.16Entropy 3.55 2.96 3.38 3.23 3.02 3.23 3.29 2.97 3.28 2.75 3.29 2.97 2.8 3.38 2.9945cloud is high. The data transfer delay in alternative scheme-1 is higher than alter-native scheme-2. This is due to the fact that some FAPs have more requests thantheir computational capacity, in which case tasks are transported to the remotecloud and, hence, the WAN latency increases. The ‘Isolated FAPs’ case ignorescooperation among FAPs, which naturally results in higher delay.The femto-cloud structures are listed in Table 2.5 for various values of compu-tational capacity for FAP-1. FAP-1 forms a femto-cloud with FAP-8 and FAP-15when its individual computational capacity is low. In this case, FAP-1 offloadsa portion of the requested tasks to the femto-cloud and reduces WAN latency ascompared to transporting tasks to the remote cloud. However, as the computa-tional capacity of FAP-1 goes beyond its demand, it joins in a different femto-cloud so as to be able to process tasks exceeding the capacity of the femto-cloudmembers. This reduces the overall handling delay in the femto-cloud and im-proves users’ QoE.Example 2This example considers a scenario where FAPs are deployed by residential users.To motivate owners for sharing excess resources, monetary incentives are con-sidered as described in Sec. 2.2.2. Therefore, FAPs are motivated to cooperateby forming femto-clouds not only to reduce the handling delay, but also to earnincentive. We assume mr = 4 $/task, cr = 5 $/task, co = 3 $/sec, cu = 2 $/task,mp = 1, and f = 200 in the characteristic function (2.1).Figure 2.4 plots the total incentive earned by all FAPs versus computationalcapacity of FAP-1. As the capacity of FAP-1 increases, it can serve more tasksexceeding other FAPs’ capacities within the femto-cloud; hence, it receives higherincentives, which in turn increases the total incentive. Note that, for lower compu-tational capacity, the incentive obtained by FAP-1 is still higher than the ‘isolatedFAPs’ case. This is because incentives depend not only on the revenue but also oncosts associated with delay costs. By forming a femto-cloud, FAP-1 can save onits delay costs as explained in Example 1 and, thus, obtains higher incentives.46Computational capacity of FAP-10 2 4 6 8 10 12 14 16 18 20Averagefemto-clouddatatransferdelay30405060708090Isolated FAPsAlgorithm 1Grand femto-cloudHeuristic scheme-1Heuristic scheme-2Figure 2.3: Computational capacity of FAP-1 vs. average data transfer delayper 2.2 Mb data in the femto-clouds (c0 = 1). Here, lower data transferdelay improves users’ QoE.Figure 2.5 also displays the total incentive obtained by all FAPs versus the userarrival rate at FAP-1. As expected, as the user arrival rate at FAP-1 increases, thetasks requested at FAP-1 will increase and the incentives it receives will decrease.This is mainly because FAP-1 (in the isolated case) as well as other FAPs in thefemto-cloud need to transport more tasks to the remote cloud, which increases thedelay costs and remote cloud charges and, hence, reduces the incentives offered toFAPs. Note that this example considers the case where the charge per computationin the remote cloud is higher than the revenue obtained per computation in femto-cloud, i.e., mr ≤ cr in (2.1). Therefore, for fixed computational capacity, FAP-1’sincentives decreases as the user arrival rate increases. The femto-cloud structures47Computational capacity of FAP-10 2 4 6 8 10 12 14 16 18 20Totalincentives200250300350400450500550600650Isolated FAPsAlgorithm 1Grand femto-cloudHeuristic scheme-1Heuristic scheme-2Figure 2.4: Computational capacity of FAP-1 vs. femto-cloud incentive.Higher incentives mean that FAP owners receive more incentives byjoining in the femto-clouds.are listed in Table 2.6.Fig. 2.6 shows the delay-incentive trade-off for a range of computational ca-pacity of FAP-1. As expected, the femto-cloud data transfer delay for the femto-cloud structures in Example 2 is higher than those obtained in Example 1. Thisis due to the fact that the main goal of femto-cloud formation in Example 2 isto maximize the incentives where delay cost c0 is lower than the computationalrevenue mr and remote cloud processing cost cr, whereas the aim of femto-cloudformation in Example 1 was to reduce the data transfer delay.48Table 2.5: Femto-clouds coalition structures in Example 1FAP-1ComputationalCapacityFemto-Clouds CoalitionStructure0{1,8,15}, {2}, {3,7}, {4},{5,10}, {6}, {9}, {11},{12}, {13}, {14}2–14{1,6,8,15}, {2}, {3,4},{5,10}, {7}, {9}, {11},{12}, {13}, {14}16–20{1,3,4,6,8,9}, {5,10},{11,12,15}, {2}, {7}, {13},{14}Table 2.6: Femto-clouds coalition structures in Example 2FAP-1ComputationalCapacityFemto-Clouds CoalitionStructure0-10{1,2,3,4,6,7,8,9},{11,12,13,14,15}, {5,10}12-20{1,6,8,11,12,13,14,15},{2,3,4,5,7,9,10}FAP-1 UserArrival RateFemto-Clouds CoalitionStructure1-5{1,2,3,4,6,7,8,9},{11,12,13,14,15}, {5,10}Example 3In this example, we consider a hotspot scenario where all FAPs are located closelysuch that the multicast offloading delay among FAPs is negligible. More precisely,in such a case, the uplink data transmission rate from the k-th FAP to the l-th FAP,denoted by bk,l in (2.6), is much greater than NB. This results in the Uco,m(C ) termin (2.3) being negligible compared to other terms.49User arrival rate at FAP-11 1.5 2 2.5 3 3.5 4 4.5 5Totalincentives250300350400450500550600650Isolated FAPsAlgorithm 1Grand femto-cloudHeuristic scheme-1Heuristic scheme-2Figure 2.5: User arrival rate at FAP-1 vs. femto-cloud incentive. Higherincentives provide higher benefit to the FAP owners.Figure 2.7 shows the total incentives earned by all FAPs versus computationalcapacity of FAP-1. Here, the grand femto-cloud is the optimal coalition structureand provides the highest incentives to the FAP owners compared to other heuristicschemes.2.5 Chapter SummaryTo reduce the handling latency and costs associated with offloading computation-ally intensive tasks to remote clouds, the local computational capacity of FAPsshould be maximally exploited. To this end, this work proposed formation offemto-clouds comprising of several FAPs wherein their excess computational re-sources are shared. In exchange for sharing their excess resources, FAP own-50Computational capacity of FAP-10 2 4 6 8 10 12 14 16 18 20Averagefemto-clouddatatransferdelay303540455055Isolated FAPsAlgorithm 1-Example 1Algorithm 1-Example 2Figure 2.6: Computational capacity of FAP-1 vs. average data transfer delay2.2 Mb in the femto-clouds. Lower data transfer delay improves users’QoE.ers receive monetary incentives. We formulated the resource sharing problem asan optimization problem with the objective to maximize the overall utilities ofall femto-clouds subject to the fair division of incentives among individual FAPswithin a femto-cloud. We then presented a distributed femto-cloud formation al-gorithm that enabled FAPs to reach the optimal solution in a distributed fashion.We further commented on the similarities between the solution of the formulatedproblem and the modified core of a coalition formation game. Finally, simulationexperiments using the LTE protocol stack in NS-3 showed superior performanceof the proposed scheme in terms of both handling latency and incentives providedto FAP owners. They confirmed the interesting observation that a femto-cloud51Computational capacity of FAP-10 2 4 6 8 10 12 14 16 18 20Totalincentives200300400500600700800Isolated FAPsAlgorithm 1/Grand femto-cloudHeuristic scheme-1Heuristic scheme-2Figure 2.7: Computational capacity of FAP-1 vs. femto-cloud incentive.Higher incentives lead to higher benefit to the FAP owners.comprised of all FAPs is not always optimal—in many cases, multiple disjointfemto-clouds resulted in reduced latency and higher incentives to the FAP own-ers. The numerical examples further verified the applicability of Algorithm 1 ina wide range of scenarios, e.g., hotspot area, residential, and enterprise femtocellenvironments.52Chapter 3Adaptive Scheme for CachingContent in a Mobile Edge NetworkIn this chapter, we explain an adaptive caching scheme for a mobile edge network.The main goal of the caching scheme is to maximally utilize the caching storageat the edge network especially at the cellular network to reduce overall networktraffic and content downloading delay. Recall from chapter 1, the caching schemetakes into account content popularity and network parameters for optimally de-cides cached content in the network. Precisely, the caching scheme estimatescontent popularity from their features using extreme learning machine algorithm.The caching scheme involves solving a mixed integer linear programming prob-lem that considers predicted content popularity and network parameters. At thebeginning of the chapter, we describe the system model and formulate the cachingproblem as a mixed integer linear programming problem. Then we describe thecontent popularity estimation methods using extreme learning machine. Finally,we demonstrate the efficacy of the presented scheme using YouTube dataset.The organization of the chapter is as follows. The system model and problemformulation are presented in Sec.3.1. The content and network-aware adaptivecaching scheme, which accounts for the parameters of the content popularity andtechnological network is presented in Sec.3.2. In Sec.3.3 we describe how ELMs53Table 3.1: Glossary of ParametersParameters DescriptionG = (V ,E ) Network graphV Set of cache-enabled base stationsV = |V | Number of cache-enabled base stationsE Set of communication linksF Set of contentF = |F | Number of contentC Set of categoriesC = |C | Number of categoriesf j Size of content j ∈FPopularity EstimationT Total number of observationst Time indexD = {x j,v j(t)} Observation datasethk(x;θk)Transfer function for hidden-layer neu-ron kθk Parameters of hidden-layer neuron kβk Output weightsL Total neurons of ELMlth Video popularity thresholdcan be used to efficiently estimate content popularity using both content featuresand the request statistics of users as they become available. The performance ofELM for caching, and content and network-aware adaptive caching scheme areillustrated in Sec.3.4 using real-world data from YouTube.3.1 System Model and Problem FormulationWe consider a heterogeneous cellular network in a geographical region where basestations (BSs), such as eNodeB and home eNodeB, are deployed and equippedwith a physical storage/cache capacity. The network shown in Fig. 3.1 can berepresented by a graph, G = (V ,E ). The set of vertices V is used to denote theset of cache enabled BSs which comprises of V BSs and indexed by i ∈ V ={1,2, . . . ,V}. The set of edges E denotes communication links among BSs. BSscan communicate with each other and with a cache manager (CM) via Xn in-terface [106, 107]. CM is connected to a content server such as a telco content54delivery network (CDN) via a high-speed dedicated link and is responsible for:i) retrieving unavailable content from the content server;ii) maintaining a lookup table that stores cached content location in the network;iii) forwarding content request to the neighbouring BS which has the content;iv) gathering information from BSs about the content are being requested;v) making decision when to refresh entire cache of the BSs which can be doneeither specific intervals or when content popularity changes significantly;vi) performing computations for adaptive caching.Content ServerBase StationMobile userHigh Speed Dedicated LinkCommunication LinkCache ManagerFigure 3.1: A typical network architecture. Base stations (BSs) are con-nected with each other and with the cache manager (CM) via het-erogeneous communication links. Cache manager is connected to thecontent server via high speed dedicated link.Mobile users are connected to the BSs according to a cellular network pro-55tocol. Connected BS is responsible for serving users’ content requests. If a re-quested content is in the cache of the connected BS, the request is served instantly.In this case, the content downloading delay is lower, and hence, improves user’sQoE. In addition, no additional load is put on the back-haul connection whichreduces network traffic. On the other hand, when a requested content is not avail-able at the connected BS, the request is forwarded to the CM. The CM checks thelookup table whether the requested content is available in the network. If the con-tent is available in the network, CM performs all the required signaling to fetchthe content from the neighbour BS. Content served by the neighbour BSs incurlower downloading delay and reduce network traffic. Finally, CM fetches contentfrom the content server when requested content is unavailable in the network orwhen retrieving content from neighbour BSs incurs higher delay than the contentserver.The content that can be cached is indexed byF = {1,2, . . . ,F}. Let f j denotethe size of the j-th content. The initial file transferring cost via the CM to a BSi is denoted by dgi (second per byte), and the latency between BS i and BS l isdenoted by dil where l ∈ V . dgi and dil both depends on the network topology,communication link, and routing strategy. The network topology may vary overtime and routing strategy can be adjusted according to the network traffic. dgiand dil also depends on channel quality when BSs are communicating with oneanother via a wireless link.3.2 Content and Network Aware Adaptive CachingScheme for Cellular Base StationsOur proposed content and network aware adaptive caching scheme proceeds asfollows. Given the estimated content popularity, network topology, link capacity,routing strategy and cache deployment budget/energy usage budget in the net-work, the adaptive caching scheme prescribes individual BSs which content tocache and adapts its prescriptions as the preferences of users (content popularity)evolves over time. The caching scheme utilizes popularity estimation to account56for the users content request characteristics. The benefit of using popularity esti-mators in the caching decision is that it allows caching decisions to be made–thatis when the network is not being heavily utilized, popular content can be trans-ferred between BSs without hindering the quality of service of the network.The rest of this section is organized as follows: Sec.3.2.1 formulates thecaching scheme as a mixed-integer linear programming (MILP) while Sec.3.2.2provides implementation considerations of the proposed caching scheme.3.2.1 Mixed-Integer Linear Program FormulationThe adaptive caching scheme takes into account content popularity, link capacity,network topology, cache size, and network operating costs. The network oper-ating costs include storage read/write costs and the cost of data transmission inthe network. In this work, energy usage to read/write files from hardware unitsare considered as cache deployment cost. Hardware units draw energy when theyare active due to read/write of the cached content. On the other hand, hardwareunits do not cache content when they are in sleep/idle mode and draw a neg-ligible amount of energy. Therefore, higher active hardware units mean higherstorage/cache size for caching at a higher energy cost to operate. Network op-erators allow BSs to activate a certain number of hardware unit(s) for caching.The flexible cache size facilitates network operators to provide physical storageat different BSs according to their content popularity distribution and network pa-rameters such as link capacity and network topology while maintaining a targetcache deployment cost in the network at a given time.In the network, each BS can select a maximum of R possible hardware units(e.g. physical cache storage sizes) where each hardware unit has a storage size ofs0. Each BS can only use ri ∈ {1, . . . ,R} active hardware unit(s) due to the cachedeployment cost constraint. Each hardware unit that is activated has an associatedcost defined by z0. The maximum physical storage size that can be used in thenetwork at any given time to maintain target cache deployment cost is denoted byS. The parameter µ̂ ij(t) ∈ [0,1] represents the estimated popularity of content j at57BS i ∈ V for time index t ∈ {1, · · · ,T}. The parameter µ̂ ij(t) is computed by:µ̂ ij(t) =v̂ij(t)∑j∈Fv̂ij(t), (3.1)where v̂ij(t) is total views of content j at BS i for time index t. In Sec.3.3 weprovide a method to estimate the popularity of content based on the features ofthe content.The MILP is formulated in (3.2) which minimizes the content downloadingdelay, taking into account initial file transferring cost, and cache deployment costin the network while maintaining total cache deployment cost. There are threedecision variables in the MILP:i) ri ∈ {1, . . . ,R} denotes the number of memory units used at BS i. The total sizeof the physical cache used at BS i is equal to ris0 where s0 is the physical sizeof the memory units (e.g. one hardware unit may represent 200 GB of physicalmemory);ii) aij ∈ {0,1} which is equal to 1 if content j is cached by BS i;iii) bilj which represents the fraction of content j served by BS i to BS l. Note thatbllj = 1 means that BS l caches the content j and serves the request itself. Note, thetime index t is omitted for brevity in the content popularity and decision variables.58minbilj ,aij,ri(w1 ∑j∈F∑i∈Vl∈Vf jµ̂ ljdilbilj +w2 ∑j∈F∑i∈Vf jdgiaij+w3∑i∈Vriz0)(3.2)subject to constraints∑i∈Vbilj = 1 ∀ j ∈F , l ∈ V (3.3)bilj ≤ aij ∀ j ∈F , i ∈ V , l ∈ V (3.4)∑j∈Ff jaij ≤ ris0 ∀i ∈ V (3.5)∑i∈Vris0 ≤ S (3.6)bilj ≥ 0 ∀ j ∈F , i ∈ V , l ∈ V (3.7)aij ∈ {0,1} ∀ j ∈F , i ∈ V (3.8)ri ∈ {1,2, · · · ,R} ∀i ∈ V . (3.9)The first term of the objective function in the MILP (3.2) accounts for the con-tent downloading delay in the network. The second term of equation (3.2) repre-sents the initial content transferring cost in the network. The third term reflectscache deployment cost in the network. w1 and w2 are the weight of real-timelatency/downloading delay cost and initial file transferring cost in the objectivefunction, respectively. w3 reflects the weight of cache deployment cost in theobjective function. Constraint (3.3) ensures that total fraction of j-th content isequal to 1. Constraint (3.4) represents the fact that BS i can serve other BSs’request only when it caches the requested content. Constraint (3.5) ensures thateach BS i fully uses the available cache where f j is the size of the j-th content,s0 is the size per unit of physical storage, and ri is the number of units of storage.Constraint (3.6) maintains the cache deployment budget in the network.593.2.2 Implementation ConsiderationsMILP SolutionThe MILP (3.2) is NP-hard [108, 109]. Due to the size of the problem such as thenumber of available content and the number of network nodes, it is intractable tofind optimal solutions in real-time. However, several numerical methods exist forestimating the solution to (3.2) which include: Branch-and-bound, cutting planes,branch-and-cut, branch-and-price are popular heuristic approaches to solve MILPvia linear relaxation [67, 110],[111],[112],[113]. In this work, individual videosare grouped into clusters c ∈ C where C = {1, · · · ,C} represents the set of clus-ter/category of videos. Machine learning methods can be used to estimate theoptimal clusters, however, in the YouTube network a suitable clustering method isto cluster the YouTube videos based on their associated category. Examples of cat-egories of YouTube videos include “Entertainment”, “Music”, “News”, “Sports”,“Howto”. The set of content in category c ∈ C is denoted by Fc ⊆F . The popu-larity associated with each category c ∈ C at base station i ∈ V is given by:µ̂ ic(t) = ∑j∈Fcµ̂ ij(t) (3.10)where µ̂ ij(t) is computed using (3.1). Given the categorical content popularity(3.10), the MILP can be used to optimally select where to cache these content.Cache Update FrequencyIn the adaptive caching scheme there are two content caching schemes used. Thefirst is the MILP which performs the cache initialization, and the second cachingscheme which uses users’ request statistics to dynamically cache content. Animportant design parameter to consider when using the MILP (3.2) for cache ini-tialization is when to replace the currently cached content. Given that the MILPmay replace a significant portion of the cached content, typically the solution ofthe MILP will only be used when the network traffic flow is minimal. Fig. 3.260provides a schematic of the adaptive caching scheme.MILPBase Station CacheS3LRUContent popularity µ̂ ic(t0) Network parametersri(t0),aij(t0)aij(t)Content requestsFigure 3.2: A schematic of the adaptive caching scheme. Initially the MILPuses the estimated content popularity µ̂ ic(t0) (3.10) and network pa-rameters to compute the physical cache size ri(t0)s0 to be used at basestation i ∈ V , and aij(t0) ∈ {0,1} indicating if content j ∈F is to becached at base station i ∈ V initially. t0 indicates when the solution ofthe MILP is used to update the base station cache. Then, the S3LRUuses the content requests from users to compute aij(t)∈ {0,1} at timest > t0.The BSs initialize their physical cache size and content to cache based onthe results of the MILP. Then, the S3LRU [114] is used to select the content tocache based on the users’ requests. In the S3LRU caching scheme, the physicalcache of each BS is composed of three segments as illustrated in Fig. 3.3. Thesegments are defined by Level-3, Level-2, and Level-1 in which Level-3 has thecontent with the largest popularity, and Level-1 has the content with the lowestpopularity. Each of the Levels is composed of a head segment and a tail segmentwhere the head segment contains the most popular content, and the tail segmentcontains the least popular content in the associated level. The dynamics how thecontent is cached in the S3LRU is illustrated in Fig. 3.3. If the requested contentis currently in the cache, then the content is moved to the head of the next levelwith all other content shifted down by one. Note that in Level-1, the content in thetail segment is evicted from the cache (that is, it no longer resides in the cachedcontent). If the requested content is not currently in the cache, then the requestedcontent from the cache manager is placed in the head segment of Level-1 of the61H T H T H TLevel-3 Level-2 Level-1 Cache ManagerStorage of a Base StationContent requestContent requestContent requestFigure 3.3: A schematic of the Segmented Least Recently Used (S3LRU)cache replacement scheme with three levels denoted by Level-3,Level-2, and Level-1. The S3LRU is used to control the cache con-tent at each base station after the cache is initialized. Level-3 has thecontent with the highest popularity, and Level-1 has the content withthe lowest popularity. In each Level, the most popular content is placedin the head H of the level, and the least popular content is placed inthe tail T of the level. If the requested content is not currently in thecache, then the requested content is transferred from the cache man-ager (CM) to the head segment of the Level-1 cache. If the requestedcontent is currently in the cache, then the content is moved to the headof the next level with all other content shifted down by one with thecontent in the tail of Level-1 removed from the cache.cache. This replacement policy ensures that the popular content resides in Level-3of the cache, and the least requested content resides in Level-1 of the cache.Effect of Different Parameters on Caching DecisionThe parameters w1, w2 and w3 in equation (3.2) could be mathematically inter-preted as relative importance of the different factors considered in formulation.Clearly, the choice of caching content and caching performance depend on thevalues of these parameters. The choice of these values depends on the networkspecifications. For instance, w1 < w2 when real-time latency has higher effect inthe network than initial file transferring cost.62Cache Deployment CostClearly, the choice of the parameter w3 is crucial since the objective function ac-counts for two different types of costs i.e., energy cost and latency cost. Oneoption is to move the cache deployment cost from the objective function and con-siders it as a constraint. In particular, the objective function defined in equation(3.2 is a Lagrangian relaxation of dual problem of the new formulation that con-siders cache deployment cost as a constraint. The optimal solutions of equation(3.2) provide a lower bound of the new formulation [115]. We used the objec-tive function in equation (3.2) since the formulation can search optimal solutionsfaster than the new formulation [116].3.3 Extreme Learning Machine (ELM) forPopularity PredictionThe adaptive caching scheme in Sec.3.2 requires the future popularity of the con-tent to be known. In this section we use ELMs [117, 118] to estimate the popu-larity of content given the content features and previous request statistics of thecontent. Additionally, we provide methods to optimize the number of neurons ofthe extreme learning machine and select the optimal features to perform the popu-larity prediction. As an illustrative example, we focus on predicting the popularityof videos in YouTube. The prediction of popular content in YouTube is challeng-ing as the features of YouTube video contains significant noise. Therefore themachine learning algorithms used must be able to address this challenging prob-lem of mapping from these type of noisy features to the associated popularity of avideo. Of the machine learning methods tested we found that the ELM [117, 118]provides sufficient performance to estimate the popularity of YouTube videos.Though the results presented in this section are focused on the use of ELM, theconstructed features, neuron selection algorithm, and feature selection algorithmare general and can be used with other machine learning techniques.633.3.1 Predicting Content Popularity with Extreme LearningMachinesConsider a dataset D ={{x j,v j(t)} : j ∈F = {1, . . . ,F}, t ∈ {1, . . . ,T}} of fea-tures x ∈RM, and total views v j(t) on day t for content j ∈F . The aim is to con-struct a model that relates the features x to the total views v based on the datasetD .For example, single hidden-layer feed-forward neural networks can be used for es-timating the functional relationship v j(t) and the features x j. However, in practicethe selection of the model and training method is complex requiring considerationof the universal approximation ability of the model, sequential learning ability, ef-ficiency, parallel implementation, and hardware implementation. Recently, basedon the Rosenblatt perceptron [119], ELMs [74] have been introduced for learningthe functional relationship between inputs x j and output v j(t). The ELM whichsatisfies the universal approximation condition [120, 121], can be implementedin parallel [117], can be trained sequentially for large datasets or as new train-ing data becomes available [122, 123], and can be efficiently implemented onfield-programmable gate array devices as well as complex programmable logicdevices [118]. The ELM is a single hidden-layer feed-forward neural network inwhich the parameters of the hidden-layer are randomly generated by a distribu-tion, and the subsequent output weights are computed by minimizing the errorbetween the computed output v j(t) and the measured output from the dataset D .Each hidden-layer neuron can have a unique transfer function. Popular trans-fer functions include the sigmoid, hyperbolic tangent, and Gaussian however anynon-linear piecewise continuous function can be utilized.The classic extreme learning machine, presented in [124], is given by:v̂ j(t) =L∑k=1βkhk(x j;θk) (3.11)with β1,β2, . . . ,βL the weights of each neuron, h1(x j),h2(x j), . . . ,hL(x j) the asso-ciated transfer function of each neuron, and v̂ j(t) the estimated total views of thevideo content j at time t. Given D , how can the ELM model parameters βk,θk,64and L in (4.16) be selected? For fixed number of hidden neurons L, the ELMtrains βk and θk in two steps. First, the hidden layer parameters θk are randomlyinitialized. Any continuous probability distribution can be used to initialize theparameters θk. Second, the parameters βk are selected to minimize the squareerror between the model output and the measured output from D . Formally,β ∗ ∈ argminβ∈RL{||Hβ −Y ||22}(3.12)with H the hidden-layer output matrix with entries Hk j = hk(x j;θk) for k∈{1,2, . . . ,L}and j ∈F , and Y the target output with entries Y = [y1,y2, . . . ,yF ]. The solutionof (4.18) is given by β ∗ = H+Y where H+ denotes the Moore-Penrose general-ized inverse of H. Several efficient methods can be used to compute β ∗ (refer toGolub and Van Loan, 2012). The benefit of using the ELM, (4.16) and (4.18), isthat the training only requires randomly generating parameters θk; the parametersβk are computed as the solution of a linear algebraic system of equations.3.3.2 Feature Construction for Popularity PredictionHere we describe how the features of YouTube videos are constructed using theYouTube Application Programming Interface1.The meta-data of each YouTube video contains four primary components:Thumbnail, Title, Keywords (also known as tags), and Description. Addition-ally, each YouTube video is associated with a Channel that contains features suchas the number of subscribers. The viewcount of a video is sensitive to the featuresof the Thumbnail, Title, Keywords, and Channel. However, features associatedwith the description appear not to significantly impact the viewcount of a video.This may result because when performing video searched on YouTube, only asubset of the description is provided to the users. In this work we focus on howfeatures of the Thumbnail, Title, Keywords, and Channel can be used to estimate1Specific details on how to interact with the YouTube API are provided at https://developers.google.com/youtube/v3/docs/, 7 March, 2017.65the the viewcount of a YouTube video. Note that our analysis does not include thevideo or audio quality, and the description of the YouTube video. These featureswill impact the dynamics of users subscribing to a channel, and rating the video,however, they do not directly impact the viewcount of a specific video.For the Thumbnail, 19 features are computed which include: the blurriness(CannyEdge, Laplace Frequency), brightness, contrast, overexposure, and en-tropy of the thumbnail. All image analysis is performed using the OpenCV (OpenSource Computer Vision) library2. To compute the brightness ηw of each videothumbnail, we first import the thumbnail in RGB color format. Let us denoteXue as a pixel in the thumbnail image, and R(Xue) ∈ [0,255] as the red light,G(Xue) ∈ [0,255] as the green light, and B(Xue) ∈ [0,255] as the blue light as-sociated with pixel Xue. The total size of the thumbnail is given by NX NY withu ∈ {1, . . . ,NX} and e ∈ {1, . . . ,NY}. The brightness of the image is then com-puted using:ηw(Xue) = 0.299R(Xue)+0.587G(Xue)+0.114B(Xue)ηw =1765NX NYNX∑u=1NY∑e=1ηw(Xue). (3.13)Typically humans’ perceived brightness for color are most sensitive to variationsin green light, less to red, and least to blue. The coefficients in (3.13) are associ-ated with the perceived brightness for color, and the specific values are obtainedfrom the OpenCV software. The contrast of each thumbnail ζw is computed usingthe RMS Contrast given by:ζw =√√√√ 1765NX NYNX∑u=1NY∑e=1(ηw(Xue)−ηw)2. (3.14)As we illustrate, the brightness ηw and contrast ζw of a videos Thumbnail provideimportant information that can be used to estimate the viewcount of a YouTube2http://opencv.org/, 7 March, 201766video.For the Title, 23 features are computed which include: word count, punctua-tion count, character count, Google hits (e.g. if the title is entered into the Googlesearch engine, how many results are found), and the Sentiment/Subjectivity of thetitle computed using Vader [125], and TextBlob 3. For the Keywords, 7 featuresare computed which include: the number of keywords, and keyword length. Inaddition, to the above 49 features, we also include auxiliary video and channelfeatures including: the number of subscribers, resolution of the thumbnail used,category of the video, the length of the video, and the first-day viewcount.In total 54 features are computed for each video. The complete dataset used forthe sensitivity analysis is given byD = {(x j,v j)} j∈F , with x j ∈R54 the computedfeatures for video j ∈F , v j the viewcount t = 14 days after the video is published,and F the total number of videos used for the sensitivity analysis.3.3.3 Optimizing the Number of Neurons in the ExtremeLearning MachineFor online applications of caching where millions of videos may be cached, itis critical to consider the computational cost of evaluating the popularity of thecontent. In this section we consider how to select the number of neurons L in theELM while still ensuring a sufficient predictive performance is maintained.Several methods exist for selecting the number of neurons L (4.16) in the ex-treme learning machine [120, 126, 127]. In [127] a multiresponse sparse regres-sion is utilized to order the neurons from best to worst. Then using the leave-one-out generalization metric the optimal number of neurons can be selected. Anothermethod is to incrementally increase L until the desired accuracy or maximumnumber of neurons is reached [120]. In [126] neurons are added incrementally un-til the output of the ELM negligibly effected as measured using a non-parametricnoise estimator known as the delta test. The main idea in [126] is that if the in-crease in accuracy of the ELM is above the estimated variance of the ELM then a3http://textblob.readthedocs.io/en/dev/, 7 March, 201767neuron is added.The predictive performance (e.g. probability of Type-I and Type-II errors) ofthe ELM, as a function of the number of neurons L, is a random variable as a resultof how each ELM is initialized. Given the desired predictive performance, insteadof having to estimate the mean of the ELM for each L and then using a gradientdecent method, one could instead employ the stochastic perturbation simultaneousapproximation (SPSA) [128] method to compute the optimal number of neurons.The ELM parameter L is adapted to estimate:argminL∈{1,2,...}A(L) = E[P(Type-I error)+P(Type-II error)+gτ](3.15)where τ is the training time of the ELM, and g is a design parameter. Here Edenotes the expectation with respect to the random variable θ defined in (4.16),and P denotes the probability. Since the probability of Type-I errors, Type-IIerrors, and the training time τ is not known explicitly, (3.15) is a simulation basedstochastic optimization problem. To determine a local minimum value of A(L),several types of stochastic optimization algorithms can be used [128]. In this workwe use the following SPSA algorithm (Algorithm 2):The SPSA is a gradient based stochastic optimization algorithm where the gra-dient is estimated numerically by random perturbation(3.17). The nice propertyof the SPSA algorithm is that estimating the gradient ∇LAn(Ln) in (3.17) requiresonly two measurements of the cost function (3.16) corrupted by noise per itera-tion. See [128] for a tutorial exposition of the SPSA algorithm. For decreasingstep size ψ = 1/n, the SPSA algorithm converges with probability one to a localstationary point. For constant step size ψ , it converges weakly (in probability) toa local stationary point.3.3.4 Stochastic Feature SelectionFeature selection algorithms are geared towards selecting the minimum numberof features such that a sufficiently accurate prediction is possible. If the feature68Algorithm 2 SPSA Neuron SelectionStep 1: Choose initial ELM parameters L0 by generating each from the distri-butionN (0,1), and define the video popularity threshold as lth.Step 2: For iterations n = 1,2,3, . . .· Estimate the Type-I and Type-II error probabilities of the ELM with Ln neuronsusingFP =|F |∑j=11{(v̂ j ≥ lth)∩ (v j < lth)},TP =|F |∑j=11{(v̂ j ≥ lth)∩ (v j ≥ lth)},FN =|F |∑j=11{(v̂ j < lth)∩ (v j ≥ lth)},TN =|F |∑j=11{(v̂ j < lth)∩ (v j < lth)},P(Type-I error)≈ FP/(T P+FN),P(Type-II error)≈ FN/(T N+FP), (3.16)where 1{·} is the indicator function and ∩ denotes the logical and operator.Given (3.16), compute the cost Ân(Ln) by substituting (3.16) into (3.15).· Compute the gradient estimate ∇̂LÂn(Ln):∇̂LÂn(Ln) =Ân(Ln+∆nω)− Ân(Ln−∆nω)2ω∆n(3.17)∆n( j) ={−1 with probability 0.5+1 with probability 0.5with gradient step size ω > 0.· Update the number of neurons Ln of the ELM at step n with step size ψ > 0:Ln+1 = Ln−ψ∇̂LÂn(Ln).69set is too large then the generalization error of the predictor will be large. Thoughseveral feature selection algorithms exist [129, 130], only the ELM feature selec-tion method presented in [131] has utilized feature selection to improve the per-formance of the ELM. In this section we construct a feature selection algorithm,Algorithm 3, which relies on computing the optimal features based on the modelsensitivity to variations in the features and an estimate of the generalization errorof the model. Features are removed sequentially while ensuring the generalizationerror is sufficiently low.The main idea of the sequential feature selection algorithm (Algorithm 3) is tosequentially remove the least useful features while ensuring that the performanceof the ELM is sufficiently high. This is performed by computing the output of theELM with all features, then computing the output with one of the features heldconstant at its mean (i.e. the null ELM model). If the output from the ELM andnull ELM are similar under some metric then the feature held constant does notcontribute significantly to the predictive performance of the ELM and should beremoved. This process is repeated sequentially in Algorithm 3 until a performancethreshold is reached.3.4 Numerical Example of Content and NetworkAware Adaptive Caching using Real-WorldYouTube DataThis section provides a numerical example to illustrate the performance of theadaptive caching using real-world YouTube dataset. Sec.3.4.1 describes simula-tion setup. Performance of extreme learning machine for caching is presented inSec.3.4.2. We use results obtained from Sec.3.4.2, to evaluate the performance ofthe adaptive caching presented in Sec. 3 Sequential Wrapper Feature SelectionStep 0: Collect the dataset D ={{x j,v j} : j ∈ F = {1, . . . ,F}} of featuresx j ∈ RM and video view count v j for videos. Select the desired similaritymetric J(·) (e.g. R2 coefficient of determination).Step 1: Train the ELM (4.16) using the dataset D and (4.18). Denote the pre-dicted viewcount from the ELM by v̂D .Step 2: For m ∈ {1,2, . . . ,M}, train the ELM using the dataset Dm where Dm isthe dataset D with the feature x j(m) held at its mean for all j ∈F . Denotethe predicted output from each of the m ∈ {1,2, . . . ,M} ELMs by v̂mD .Step 3: Compute the feature index m with maximum similarity between v̂D fromStep 1 and v̂mD from Step 2:m∗ ∈ argmaxm∈{1,...,M}{J(v̂D , v̂mD)} (3.18)where J(·) denotes the selected similarity metric from Step 0.Step 4: Compute the metrics of performance (Type-I and Type-II error proba-bilities) using the ELM trained using the dataset D∗ where the feature m∗from Step 3 has been removed. If the metrics of performance are too highthen stop. Otherwise return to Step 1 using the dataset D ←D∗.3.4.1 Simulation SetupThe real-world YouTube data was collected using the YouTube API between theyears 2013 to 2015 and consists of F = 12,500 YouTube videos. In the collecteddataset, the viewcounts range from 102 to above 107. Therefore, to prevent themachine learning algorithms from biasing their prediction to only the videos withthe highest viewcount, we scale the viewcount v j to be on the log scale (i.e. ifa video has 106 views then v j = 6). All the content features are scaled to satisfyx(m) ∈ [0,1] for m ∈ {1, . . . ,M}. Note that we also collect the category c ∈ C ofeach YouTube video (e.g. “Entertainment”, “Music”, etc.), however, this infor-mation is not included into the feature set used to train the ELMs. In total there71are 17 YouTube categories in the collected dataset. Each ELM is trained using anidentical 10-fold cross validation method using the datasetD , or in the case of thefeature selection method D∗. The trained ELMs are then used to compute boththe video popularity µ̂ ij(t) (3.1), and categorical popularity µ̂ic(t) (3.10).For evaluating the performance of the adaptive caching scheme we require thenetwork parameters, a method to generate user requests, and the cache initial-ization time of the MILP. Initially, the content is cached via the solution of theMILP (3.2) at simulation time slot p= 1 with the parameters w1 = 1, w2 = 0.005,w3 = 1, and z0 = 0.1. The parameters w3 and z0 are chosen as presented in [132]in such a way so that all the cost terms in equation (3.2) receive priority accordingto the network specification. The topology of the network is provided in Fig. 3.4,and the parameters of the network are given by: S= 9 TB, f j = 500 MB, s0 = 200GB, ri ∈ {1,2}, and Table 3.2 provides the size of all content in each video cate-gory. To generate user requests based on the real-world YouTube data, we use thefollowing stochastic simulation.λ i ∼U [1,10] i ∈ VNip ∼ Poisson(λ i) p ∈ {1, . . . ,50,000}Jiq ∼ Cat(µ(t)) q ∈ {1, . . . ,Nip} (3.19)where Nip is the total number of requests at BS i at simulation time slot p, Jiq isthe video content that is requested at BS i at simulation time slot p by the q-threquest. The categorical distribution Cat(µ(t)) is defined by the video popularityvector µ(t) = [µ1(t),µ2(t), . . . ,µF(t)] where µ ij(t) is defined in (3.1). The pa-rameter µ(t) is computed using the viewcount on day t = 4 (v̂ij(4)). The contentpopularity is assumed to be equal at each BS. With the parameters in (3.19), eachBS i ∈ V will receive on average between 50,000 to 500,000 content requestsper day. To compute the latency parameters, dil and dgi of equation (3.2) we usethe ndnSIM2.0 (an NS-3 based simulator) software [133]. ndnSIM’s Best Routestrategy is used to transfer content between BSs.72CM200 GB 400 GB1 Mbps 10 Mbps100 Mbps 1 Gbps 10 GbpsFigure 3.4: Schematic of the network. The circles with a solid black outlineand light gray fill represent base stations having 400 GB storage size.Other base stations have 200 GB storage size. The associated commu-nication links between the base stations are denoted by the connectedarrows. CM is the content manager. CM represents the cache man-ager which has the access to all the video files, and the other nodesrepresent the base stations used to serve user requests.3.4.2 Performance of Extreme Learning Machine for CachingIn this section, using the real-world data from YouTube, we illustrate how thenumber of neurons of the ELM (4.16) and features can be selected using Algo-rithm 2 and Algorithm 3. Additionally we will illustrate how the ELM can beused to both predict the popularity of new videos, and estimate the popularity of73Table 3.2: Number of files in each YouTube Category in the CollectedDatasetCategory 1 2 3 4 5 6Number of files 230 2 1475 76 151 47Category 7 8 9 10 11 12Number of files 3774 300 406 1839 220 913Category 13 14 15 16 17Number of files 69 126 10 2855 7published videos.Fig. 3.5 illustrates the mean and variance of the specificity, false negative rate,false positive rate, sensitivity, and Gmean computed using 600 independentlytrained ELMs for each number of neurons. Using the SPSA (Algorithm 2) wefound that an ELM with L = 300 provides sufficient accuracy for performing thecontent popularity estimation.100 200 300 400 500 60000. Neurons LPerformanceType-ISensitivityGmeanType-IISpecificity100 200 300 400 500 60000.250.50.7511.251.51.7522.252.52.7533.25Hidden Neurons LTrainingTime[s]Figure 3.5: Performance of the ELM (4.16) for estimating YouTube videocontent popularity as a function of the number of neurons L in theELM. The dataset used for this analysis is presented in Sec.3.3.2.Having computed the optimal number of neurons, the next task is to selectthe video features which are most important for estimating the video viewcount.The performance of Algorithm 3 for selecting the YouTube features of the ELMis illustrated in Fig. 3.6. When using R2, only 3 features are required to maintaina high level of performance for the ELM. These 3 features are the number ofsubscribers, contrast of the video thumbnail, and the overexposure of the video74thumbnail. This illustrates that the title and keywords contribute negligibly to thepopularity of YouTube videos in the dataset analysed when using the ELM.Feature Index10 20 30 40FeatureSelectionIteration1020304010 20 30 4000. Selection IterationPerformanceType-ISensitivityGmeanType-IISpecificityFigure 3.6: Performance of the feature selection Algorithm 3 when the R2coefficient of determination are used as the similarity metric.Having optimized both the number of neurons of the ELM and the importantfeatures required for performing the popularity estimation, we now illustrate theperformance of the ELM compared to several other machine learning methods.Fig. 3.7 provides a schematic of the ELM that can perform both prediction of newand published videos. The meta-data for a video is presented to the feature se-lection algorithm which constructs the video features x j ∈ R4 which is composedof subscribers, contrast, overexposure, and previous day viewcount. Notice thatx j(t) evolves per day t after the video is posted as new request statistics becomeavailable. The predicted viewcount on day t from the ELM is given by v̂ j(t).As expected, with no request statistics available, the predicted viewcount fromthe ELM has a large variance as illustrated in Fig. 3.8 for v j(1). However in typicalcaching applications we are only interested in the top 10% of content in which casewe can construct a binary popularity estimator using the output from the ELM bythresholding. For a video popularity threshold of lth = 104.5 views there are 1379popular videos and 11,121 unpopular videos. Table 3.3 provides the performanceof the Binary ELM classifier and several other machine learning classifiers. Notethat all classifiers were trained using the same dataset and on a standard desktopcomputer. As seen, the ELM has comparable performance to several popularclassifiers and can be evaluated efficiently. As the request statistics arrive theELM can be used to make an accurate prediction of the viewcount dynamics as75illustrated in Fig. 3.9 for the viewcount on day 4 (i.e. v j(4)). Therefore a courseestimate of the popularity of videos can be made using the ELM initially, then asrequest statistics arrive the ELM can be used to provide a high accuracy estimateof the next day popularity of videos.FeatureSelectionExtreme LearningMachinefeatures x j(0) v̂ j(t)x j(t) ∈R4DelayFigure 3.7: Schematic of the Extreme Learning Machine for estimating theviewcount v j(t) of video j ∈F . x j(0) ∈ RM is the initial set of fea-tures of video j ∈F , x j(t) ∈ R4 are the features used by the ELM toestimate the video viewcount v̂ j(t) on day t.Figure 3.8: Viewcount on day 1. Real-World viewcount v j(t) (black dots)and numerically predicted viewcounts v̂ j(t) (grey dots) computed us-ing the ELM (4.16). The ELM is trained using the YouTube dataset Das described in Sec.3.4.76Figure 3.9: Viewcount on day 4. Real-World viewcount v j(t) (black dots)and numerically predicted viewcounts v̂ j(t) (grey dots) computed us-ing the ELM (4.16). The ELM is trained using the YouTube dataset Das described in Sec.3.4.Table 3.3: ELM Performance Comparison: TP (true positive), TN (true neg-ative), and training timesMethod TP TN Times (s)Stochastic Gradient Boosting [134] 432 11509 5.41Independent ComponentRegression [135] 769 11424 0.75Generalized Linear Model [136] 769 11423 0.59k-Nearest Neighbours [136] 1000 11524 24.55Stacked AutoEncoder DeepNeural Network [137, 138] 959 11374 24.27Boosted Tree [139] 1029 11419 16.32Extreme Learning Machine 1067 11399 0.543.4.3 Performance of the Content and Network AwareCaching SchemeThis section illustrates the performance of the adaptive caching scheme presentedin Sec.3.2. Specifically, the content downloading delay and cache hit ratio fromthe adaptive caching scheme are compared with the most popular and randomcache deployment schemes.77In the most popular caching scheme, each BS caches the most popular esti-mated categories (computed using request statistics) of the content until each basestation’s cache is full [26, 71]. The random cache deployment scheme accountsfor content popularity (computed using content request statistics) and network pa-rameters using an MILP which does not account for changes in the physical cachesizes at the BSs [75]. Additionally, the method in [75] incorporates a cache re-placement scheme using the LRU (Least-Recently-Used) scheme. In this section,we compare the performance of the adaptive caching scheme with the schemesin [26, 71, 75] with the predicted popularity from the ELM used in place of therequest statistics, and the S3LRU used in place of the LRU cache replacementscheme.To compute the optimal size of caches for the BSs, we solve the MILP problem(3.2). The results of the MILP are provided in Fig. 3.4 where the circles with asolid black outline and light gray fill represent BSs allocated with a 400 GB cachestorage size. Other BSs are allocated with a 200 GB cache storage size. Asseen in Fig. 3.4, the physical cache sizes in the network are heterogeneous. TheMILP, based on the network topology, link capacity, routing strategy, and contentpopularity, optimally selects the physical cache sizes to use to reduce networkenergy consumption.Fig. 3.10 shows the cumulative content downloading delay in the network vs.the simulation time. From Fig. 3.10, the adaptive caching scheme has the small-est cumulative content downloading delay compared with the most popular andrandom cache deployment schemes. This allows the adaptive caching scheme toincrease the users’ QoE in the network compared to these other caching schemes.The main reason the adaptive caching scheme outperforms the most popular andrandom cache deployment schemes is that the adaptive caching scheme considersadjacent BSs physical cache sizes and cached content to improve network per-formance. Comparing the performance of the most popular caching scheme andrandom cache deployment scheme, it is clear that methods which account for net-work parameters while making caching decisions will improve the users’ QoE.78Simulation time slots ×1040 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5Cumulativeaveragedownloadingdelay(second)×10500.511.522.533.54Random cache deploymentAdaptive cachingMost popularFigure 3.10: Cumulative average content downloading delay vs. simulationtime. The figure illustrates that lower content downloading delay im-proves users’ QoE.Fig. 3.11 plots the cumulative average cache hit ratio in the network vs. thesimulation time. As seen in Fig. 3.11, the adaptive caching scheme performsbetter than the other two caching schemes. This performance improvement is dueto the fact that the adaptive caching scheme takes into account network topology,content popularity and cache deployment in the formulation. Here, higher cachehit ratio in the network means, a higher number of requests are being served bythe connected BSs or by the neighbour BSs. As can be seen from Fig. 3.11, cachehit ratio for adaptive caching scheme is 0.9. That means only 10% of the contentrequests are served by the content server while random cache deployment andmost popular caching schemes account for 15% and 25%, respectively for thegiven simulation setup. Therefore, the adaptive caching scheme reduces networktraffic since fewer requests are served from the content server.79Simulation time slots ×1040 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5Cumulativeaveragecachehitratio00. cache deploymentAdaptive cachingMost popularFigure 3.11: Cumulative average cache hit ratio in the network vs. simula-tion time. The figure illustrates that a higher cache hit ratio reducesthe overall network traffic since fewer requests are served by transfer-ring the content from the cache manager to the BS where the requestoriginated.3.5 Chapter SummaryIn this work an adaptive caching scheme is presented that takes into account users’behaviour and operating characteristics of the cellular network. The cachingscheme uses an optimized extreme learning machine to estimate the popularityof content based on users’ behaviour, features of the content, and request statisticsfrom users as they become available. The features of the content are computedusing a combination of human perception models and network parameters. Theestimates are used in a mixed-integer linear program which takes into accountthe cellular network parameters (e.g., network topology, communication link, androuting strategy) to select where to cache content and also to provide storage rec-80ommendations to the network operator. The scheme is validated using real-worlddata from YouTube and the NS-3 simulator.81Chapter 4Risk-Averse Caching Scheme forHeterogeneous NetworksIn this chapter, we describe risk-averse caching schemes for the heterogeneouswireless network. Recall from chapter 1, the caching schemes presented in thischapter not only consider content popularity and network parameters but also ac-count for the routing protocol in the caching decision to ensure balancing the loadthroughout the network. In addition, the caching schemes take into account un-certainty associated with the predicted content requests. As a result, the cachingschemes are formulated as risk-averse schemes using the CVaR measure insteadof minimizing expected downloading delay as presented in chapter 3.In this chapter, we propose four caching schemes for heterogeneous wirelessnetwork. The chapter is organized as follows. The system model of the hetero-geneous wireless network is provided in Sec.4.1 where Table 4.1 provides a sum-mary of the parameters used throughout the chapter. In Sec.4.2 dynamic cachereplacement schemes are discussed. In Sec.4.3 we discuss four static cachingschemes, two which are risk-neutral and two which are risk-averse. Specifically,in Sec.4.3.1 and 4.3.2 we construct risk-neutral static caching schemes for the het-erogeneous network. The term risk-neutral is used to indicate that these methodsuse point estimates of the content requests–that is, they do not account for the82uncertainty associated with estimating the future content requests. An useful out-come of Sec.4.3.2 is that the static caching scheme, which accounts for contentrequests, cache size, bandwidth, load, and content routing, only requires the solu-tion to a unimodular linear program. In Sec.4.3.4 and 4.3.5, risk-averse cachingschemes are constructed based on the risk-neutral caching schemes in Sec.4.3.1and 4.3.2. The main idea is that the uncertainty associated with estimating contentrequests is accounted for using the CVaR measure. In Sec.4.4 a novel content re-quest conformal prediction algorithm is constructed based on the extreme learningmachine and CVaR optimization. The performance of the risk-neutral and risk-averse caching schemes are evaluated using real-world data from the YouTubesocial network in Sec.4.5. The results show that a 6% reduction in the averagedelay can be achieved if the uncertainty of the content requests is accounted for,and a 25% reduction in average delay is achieved if both the uncertainty and rout-ing protocol are accounted for compared to the risk-neutral caching scheme thatneglects the routing protocol. Therefore, it is essential to account for both the un-certainty of predicting the content requests and routing when performing cachingdecisions.4.1 System ModelIn this section we introduce the system model of the heterogeneous wireless net-work and introduce the mathematical notation that will be used throughout thechapter to formulate the risk-neutral and risk-averse caching schemes.We consider a heterogenous LTE wireless network that contains wireless nodes(e.g. base stations, smallcell access points, smallcell gateway) and a core net-work as illustrated in Fig. 4.1. The nodes in the network are defined by the setV = {1, . . . ,V}. Each wireless node v ∈ V contains a physical cache of size Svwhich stores the cached content. Additionally, the bandwidth between nodes isgiven by the parameter bi j ∈ R+ for i, j ∈ V . If the nodes i and j have no di-rect communication link then their bandwidth bi j = 0. The bandwidth betweennodes in the LTE network are typically heterogeneous as they are composed of83Table 4.1: Notation for Risk-Averse CachingParameters DefinitionF(·) cumulative distribution functionF̂(·) empirical cumulative distribution functionp(·) probability mass functionα confidence levelt timeNetwork ParametersV set of nodes {1, . . . ,V}Vd destination nodes with Vd ⊆ VSv cache size of node v ∈ VC(t) cached content indicator matrixcv(t) cached content indicator vector for node v ∈ Vnv(t) load at node v ∈ Vqv the request-queue-time at node v ∈ VAi j f weight between nodes i, j ∈ V for f ∈Fbi j bandwidth between nodes i, j ∈ Vli j latency between nodes i, j ∈ Vδi jd f shortest-path indicator for i, j,d ∈ V and f ∈Fdv f (t) content retrieval delay at v ∈ Vd for f ∈FContent ParametersF set of content {1, . . . ,F}D(t) dataset of content features and requests at time tG content groups {1, . . . ,G}f content index f ∈Fs f size of content f ∈Fgv f (t) group association of content f ∈F at v ∈ Vdyv f (t) request count for content f ∈F at v ∈ Vdx f feature vector of content f ∈Fĝv f (t) estimated group association of content f ∈F at v ∈ Vdŷv f (t) estimated request count for content f ∈F at v ∈ Vdµg mean vector of group g ∈ GΣg covariance matrix of group g ∈ Gβ neuron weightsθ neuron transfer function parametersL number of neurons84wired, fiber-optic, and wireless links [140]. For example, the bandwidth betweenbase stations and smallcell gateway nodes to the core network are on the order ofseveral GB/s (fiber-optic). The link capacity of base stations and smallcell accesspoints to mobile users are typically on the order of 1-100 MB/s (wireless). Andthe link capacity of smallcell access points to the smallcell gateways are on theorder of 100 MB/s (wired connection). The content server stores all the contentthat can be requested by users. The core network communicates with the con-tent server over the wide area network. Note that the content server is typicallycomprised of a commercial content distribution network (CDN) such as Akamai,Amazon CloudFront, Azure CDN or dedicated telco CDN that is maintained by awireless network operator.When a mobile user connects to the network, the network protocol establishedthe communication link between the mobile user and either a smallcell accesspoint or base station based on the minimal signal-to-interference-plus-noise ratioof the wireless channel. The set of content that can be requested by mobile usersis denoted byF = {1,2, · · · ,F}. The size of each content is denoted by s f ∈R+for f ∈F . When a smallcell access point or base station receives a user requestfor content f ∈ F , the node that receives the request will retrieves the contentfrom the network. The set of nodes (smallcell access points and base stations)that receive users’ requests is denoted by Vd where Vd ⊂ V . The delay betweenwhen the user request was received and when the content is delivered to the useris known as the content retrieval delay. The content retrieval delay for contentf ∈ F requested at node v ∈ Vd at time t is denoted by dv f (t). The contentretrieval delay dv f (t) depends on where the content is cached in the network, theload of each node, bandwidth between nodes, network-layer protocol, and link-layer protocol of the LTE network. The load nv(t) of each wireless node v ∈ V isthe total number of content requests the wireless node is currently processing. Ifa user requests content f ∈F from the wireless node v ∈ Vd and node v has thecontent cached, then the content retrieval delay dv f (t) = s f qvnv(t) where qv is therequest-queue-time of node v and s f is size of the content f . The request-queue-85Base StationMobile userCommunication LinkCore NetworkSmallcell Access PointCache SmallcellGatewayContent ServerFigure 4.1: Schematic of an LTE wireless network. The wireless network iscomposed of smallcell access points, base stations, smallcell gateways,and a core network. The smallcell access point, smallcell gateway,and base stations all contain physical caches that can store content.Mobile users are connected to either the base station or the smallcellaccess point through a low-bandwidth connection. Smallcell accesspoints are connected to the smallcell gateway which is then connectedto the core network. Note that the base stations and smallcell gatewaynodes do not communicate directly with each other. The core networkis connected to the content server over the wide area network.time qv of node v ∈ V provides the average time to process a single packet/byterequest. Note that if node v does not contain the requested content, then it mustbe retrieved by another node in the network or from the content server.The goal of the network is to minimize the total content retrieval delay to serve86all user requests in the network. To achieve this objective, each node in Fig.4.1contains a physical cache and a cache manager. The cache manager of each nodecontrols the content that is cached at the node [66]. The currently cached contentat node v∈ V is given by the cached content indicator vector cv(t)∈ [0,1]F whereF is the total number of contents that users can request in the network. The cacheof each node is composed of a static segment and a dynamic segment. The contentin the static cache does not change for a time interval ∆T and is associated withthe slow-time scale caching decisions. The content in the dynamic cache changesas a function of the user requests and is associated with the fast-time scale. Thecache manager controls the cache replacement scheme for the static and dynamiccaches to minimize content retrieval delay dv f (t). Specifically, the cache managerat each node:• runs the cache replacement scheme to minimize request delays.• forwards content requests to neighbouring nodes or the content server if thecontent is not cached locally.• records the number of requests yv f (t) and feature vector x f for content f ∈F at v ∈ V .It is assumed that each node in the network has computational resources equivalentto a standard desktop computer to allow the operation of the cache manager. Giventhat the cache size Sv, only a small fraction of all the content F can be cachedlocally (except for the content server which contains all contents). If the contentis not cached locally, the content is retrieved from another node that minimizesthe content retrieval delay dv f (t).To perform cache replacement of the static cache, the cache manager recordsthe request statistics yv f (t) and feature vector x f for content f ∈F at node v ∈V . For video content, the feature vector comprises information related to thethumbnail and title of the video, as well as information regarding the user thatuploaded the video such as number of subscribers. The complete set of content87features and requests at each node is contained in the datasetD(T ) ={{x f ,yv f (t),gv f (t)} : v ∈ V , f ∈F , t ∈ [0,T ]}(4.1)where T is the total time the contentF has been available to users. The parametergv f (t) in (4.1) is the group association of content f ∈F at node v ∈ V . The pos-sible groups that content can be associated with is denoted by G = {1,2, . . . ,G}.The cache manager uses the information inD(T ) to estimate the future number ofrequests of content for both new content and previously cached content that usershave requested.Given the network parameters (cache size, load, and bandwidth between nodes)and the content parameters (feature vector, request count, group association), theaim is to design caching schemes to minimize the cumulative content retrievaldelayd(T ) =Kt∑k=1V∑v=1F∑f=1dv f (tk) (4.2)where Kt is the total number of content requests in the time interval [0,T ], andtk ∈ [0,T ] denotes the time of each of the k ∈ {0,1, . . . ,Kt} content requests. Tominimize the delay requires a method to estimate the future content requests, anda method to cache popular content based on the request estimates and routingprotocol to deliver content to users in the network. In this work we construct acontent request density forecasting method, and both risk-neutral and risk-aversecaching schemes to minimize the network delay.4.2 Dynamic Caching SchemesThe cache of each node in the network, illustrated in Fig. 4.1, is composed of adynamic segment and a static segment. This section discusses dynamic cachingschemes, while Sec.4.3 discusses static caching schemes. To give more perspec-tive, recall from Sec.4.1 that the content in the dynamic segment changes as afunction of the user requests, while the content in the static segment changes on88a time interval ∆T that is significantly larger than the time-scale of individualcontent requests. The content in the dynamic and static segments of the cache arecontrolled by the dynamic caching scheme and static caching scheme respectively.Here we briefly discuss dynamic caching schemes that can be used in the network.In Sec.4.3 we present static caching schemes that are used in combination with thedynamic caching schemes presented in this section.A schematic of the interaction of the dynamic and static cache is illustrated inFig. 4.2. There are three possible scenarios that can occur depending on wherethe requested content is cached in the network. In the first scenario Fig. 4.2,the content is transferred from the static cache to the user and no change in thedynamic cache occurs. In the second scenario in Fig. 4.2, the content is transferredfrom the dynamic cache to the user. Additionally, the content in the dynamic cachewill be adjusted according to the dynamic cache replacement scheme. In the thirdscenario in Fig. 4.2, the content must first be transferred to the dynamic cachefrom another node in the network, and then transferred to the user. Additionally,the content in the dynamic cache will be adjusted and evicted according to thedynamic cache replacement scheme. In all of the three scenarios, the content inthe static cache remains unchanged, and identical content is not simultaneouslyavailable in both the static and dynamic caching segments. The content in thestatic cache is only updated at a time interval ∆T after it was first initialized. Theaim of the static caching segment is to store content that is expected to have alarge number of user requests in the duration ∆T , while the dynamic cache storesother content requested by users.Several dynamic caching schemes exist which are based on users’ real-timecontent requests including: Least-Recently-Used (LRU), Segmented Least-Recently-Used (SLRU) [114], Least-Frequently-Used, and Least-Frequently-Used with Dy-namic Ageing and Adaptive Replacement Cache. The LRU cache replacementscheme operates by maintaining an ordered cache where recently requested con-tent will reside at the beginning of the cache (known as most recently used po-sition). As user requests are processed by the node, the content in the dynamic89Dynamic CacheStatic CacheRequested contentNetwork NodesA)B)C)Figure 4.2: Schematic of the interaction between static and dynamic cachingschemes. Three possible scenarios can be resulted from a content re-quest. Scenario (A) refers to the situation when the requested contentis available in the static cache. In this case, content in the static anddynamic cache remain unchanged. B) refers to the case when the re-quested content is available in the dynamic cache. In this case, contentin the dynamic cache are adjusted according to the dynamic cache re-placement scheme and content in the static cache remain unchanged.C) represents the situation when the requested content is unavailable inthe cache. In this case, the content will be retrieved from the networknodes and stored in the dynamic cache. Other content in the dynamiccache will be adjusted and one content in the dynamic cache will beevicted. Content in the static cache remain unchanged.cache that are not requested by users are shifted towards the end of the cache (leastrecently used position). If the requested content is not currently available in thecache, it will be retrieved from other nodes in the network and stored in the mostrecently used position. All other content in the cache will be shifted towards theend of the cache with the content in the least recently used position being evictedfrom the cache. Variants of the LRU scheme commonly used include the SLRU.In the SLRU cache replacement scheme, the entire cache is divided into differentsegments with each segment associated with a priority or popularity level. Whenrequested content is unavailable in the cache, the content will be retrieved and90stored in the most recently used position of the lowest priority segment of the dy-namic cache. Simultaneously, the content that is in the least recently used positionof the lowest priority segment will be evicted from the cache. The main advan-tage of the LRU and SLRU dynamic cache replacement policies is that they donot require knowledge of the network architecture or where the content is storedthroughout the network. As such, these caching schemes are straightforward toimplement and are widely used in commercial distribution networks such as Face-book [114].4.3 Risk-Neutral and Risk-Averse Static CachingSchemesThis section presents four static caching schemes and constitutes the main con-tribution of the chapter. Both risk-neutral and risk-averse caching schemes arediscussed.The static cache replacement scheme controls the content stored in the staticcache of each node in the network illustrated in Fig.4.1. The content in the staticcache remains the same for a time interval ∆T that is significantly longer than thecharacteristic time-scale of individual content requests from users. The goal of thestatic caching scheme is to cache content that is predicted to have a large numberof requests in order to minimize the total content retrieval delay in the time interval∆T . Given the parameters of the network and the content dataset D , this sectionpresents two risk-neutral and two risk-averse static caching schemes as illustratedin Fig. 4.3. The term risk-neutral is used for any scheme that uses point estimatesof the content requests and do not account for the uncertainty associated with pre-dicting the content requests. These include the risk-neutral (RN) and risk-neutraland network-aware (RNNA) caching schemes. Risk-averse caching schemes incontrast to the risk-neutral schemes, account for the uncertainty associated withestimating the content requests yv f for content f at node v. These include the risk-averse (RA) and risk-averse and network-aware (RANA) caching schemes. TheRA and RANA schemes can be viewed as generalizations of the RN and RNNA91schemes. Here, the uncertainty associated with the content requests is accountedfor using the coherent CVaR risk measure with a confidence level α ∈ [0,1]. Notethat if we do not consider risk (e.g. risk-neutral) then the confidence level α = 0.Static caching schemesRisk Neutral Risk AverseRN RNNA RA RANAFigure 4.3: A schematic illustration of the risk-neutral (RN), risk-neutraland network-aware (RNNA), risk-averse (RA) and the risk-averse andnetwork-aware (RANA) caching schemes discussed in Sec.4.3. RNand RNNA use point estimates of the content requests and do notaccount for the uncertainty associated with predicting the content re-quests. RA and RANA account for the uncertainty associated withestimating the content requests yv f for content f at node v. The twonetwork-aware caching schemes RNNA and RANA account for theLTE network parameters such as bandwidth, load at the nodes, request-queue-time, network-layer protocol, link-layer protocol and routingprotocol in contrast to the network oblivious RN and RA cachingschemes.4.3.1 Risk-Neutral (RN) Static Caching SchemeLet us assume that we have the predicted number of requests for each contentf ∈F at node v ∈ V , which is denoted by ŷv f . Then, the content to be cached ateach node can be selected by solving the following binary integer programC∗ ∈ argmincv f{F∑f=1V∑v=1ŷv f (1− cv f )}s.t.F∑f=1s f cv f ≤ Sv for v ∈ V , (4.3)92where C∗ ∈ [0,1]V×F and cv f ∈ [0,1] indicates if the content f ∈F is cached atnode v ∈ V . The inequality constraint in (4.3) ensures that each node does notcache more files than can be stored in each nodes associated cache. Although(4.3) is a binary integer program which has complexity NP-complete, (4.3) can besolved with complexity O(F log(F)) as each node merely caches the maximumnumber of content that are predicted to have the highest number of requests.The RN caching scheme (4.3) is used extensively in the literature [26, 30, 66].The key feature of the risk-neutral caching scheme is that it require an accu-rate estimation of yv f , namely, the number of requests for the content. The RNscheme does not account for any aspects of the network other than the cache sizeof each node. Additionally, (4.3) does not account for the uncertainty associatedwith estimating the number of content requests yv f . Therefore, although (4.3)can be evaluated with low complexity O(F log(F)), the total content retrieval de-lay is expected to be higher compared with static caching schemes that accountfor the network parameters, routing protocol, and the uncertainty associated withpredicted content requests.4.3.2 Risk-Neutral and Network-Aware (RNNA) StaticCaching SchemeHere we construct RNNA static caching scheme to optimally cache content giventhe predicted content requests ŷv f , and the network parameters (bandwidth, load atthe nodes, request-queue-time, and cache size of each node). The RNNA cachingscheme accounts for both the network parameters and routing protocol, howeverneglects the uncertainty associated with predicted content requests ŷv f .The (RNNA) static caching scheme is given by the following binary integer93programC∗ ∈ argminc,k,δ ,r{F∑f=1∑d∈Vd∑i, j∈Vŷd f Ai j f δi jd f}s.t. cs f ∈ [0,1], ksd f ∈ [0,1], δi jd f ∈ [0,1],rsd f ∈ [0,1], Ti j ∈ Z+∑i∈Vδsid f −δisd f = ksd f , ∑i∈Vδdid f −δidd f =−1,1{bi j = 0}+δi jd f ≤ 1 (4.4a)F∑f=1s f cs f ≤ Ss,V∑s=1cs f ≥ 1 (4.4b)V∑s=1ksd f = 1,V∑s=1rsd f = 1,F∑f=1∑d∈Vdδi jd f ≤ Ti j (4.4c)rsd f ≤ ksd f , rsd f ≤ cs f , rsd f ≥ ksd f + cs f −1, (4.4d)∀s ∈ V , ∀d ∈ Vd, ∀ f ∈F .In (4.4), C∗ ∈ [0,1]V×F indicates the content cached in the static cache of all nodes,Vd ⊂ V are the destination nodes in the network, and Ti j is a positive integer thatindicates the maximum number of content transfer paths allowed between nodes iand j (e.g. congestion threshold). The destination nodes Vd communicate directlywith the users and are comprised of the base stations and smallcell access pointsillustrated in Fig. 4.1. Given the predicted content requests ŷd f , the objectivefunction in (4.4) represents the total content retrieval delay over the time-interval[t, t +∆T ]. Note that we have dropped the time-dependence from the parametersin (4.4) to improve readability. The parameter Ai j f in (4.4) is the edge weight of94the network for content f ∈F and is equal toAi j f ={s f (li j +q jn j) if i 6= js f q jn j otherwise(4.5)where s f is the size of content f , li j is the latency between nodes i ∈ V andj ∈ V , q j is the request-queue-time of node j ∈ V . The latency li j (second perbyte) between nodes is a function of the bandwidth bi j, the network topology,the network-layer protocol, and the link-layer protocol used in the network. Theparameter δi jd f indicates if nodes i, j ∈V are used to transfer the content f ∈F tothe destination node d ∈ Vd . ksd f indicates if source node s∈ V is used to retrievecontent f ∈ F for the destination node d ∈ Vd . The parameter rsd f = ksd f cs findicates if the source node s ∈ V , used by destination node d ∈ Vd to retrievecontent f , currently has the requested content cached.The RNNA caching scheme (4.4) optimally selects the cache C∗ to minimizethe total content retrieval delay while accounting for the network parameters andpredicted content requests. Additionally, RNNA accounts for the shortest-pathrouting used to transfer content throughout the network to serve user requests.The path constraints (4.4a) ensures that the δi jd f defines the shortest-path fromsource node s∈ V to destination node d ∈ Vd for transferring content f ∈F . Thecaching constraints (4.4b) ensure that the files cached at each node do not exceedthe nodes cache size, and that atleast one instance of each content f ∈F is cachedin the network. The link congestion constraint (4.4c) ensures that the number ofcontent transferred over the link between node i ∈ V and node j ∈ V satisfiesthe congestion threshold Ti j. The source constraints (4.4d) ensures that only onesource node s ∈ V is used to transfer content f optimally to the destination noded ∈ Vd . Additionally, the source constraints ensure that the source node s has thecontent f to be transferred to the destination node d.The binary integer program (4.4) to be solved for the RNNA caching scheme con-tains a total of (V +2V 2+V 3)F binary variables, (3V +V 2)F equality constraints,and((1+V +2V 2+V 3)F +V 2+V)inequality constraints. If each content has95approximately equal size, then the constraint matrix in (4.4) is a network matrix.As such, (4.4) is a unimodular linear program and can be solved with polyno-mial time complexity using interior-point numerical methods [141]. AlthoughYouTube content is not of equal size, it can be broken into equal sized blocks.The reason is that typically YouTube content is of duration between 30 seconds to5 minutes, with a user attention span of approximately 90 seconds [142]. If eachYouTube video is broken into 30 second intervals, then the decision of where tocache these video blocks throughout the network can be solved in polynomialtime using the optimization problem (4.4). The main challenge, in this case, is toestimate the popularity of individual 30 second intervals of a content. One simpleway to handle this challenge is to estimate the popularity of a content and thenassume that all the individual 30 second intervals of the content have same pop-ularity. However, in practice, the first 30 second of a content may receive moreattention from users that the last 30 second and vice versa. In such case, we needto design a sophisticated prediction method and out of the scope of this thesis.Please note that the optimization problem (4.4) of solving four content with 30second intervals is identical to the problem of solving two content with 60 secondintervals where 60 second is divided into two 30 second intervals.Though the RNNA caching scheme (4.4) uses the network parameters, pop-ularity of content, and content transfer protocols, it does not account for the un-certainty associated with estimating the request count yv f for content f at nodev. Therefore, we can not control how to account for the risk associated with anygiven caching decision. Here the risk can be viewed as a measure of the worstcase content retrieval delay that results from a given caching decision.4.3.3 Conditional Value-at-Risk (CVaR) and ContentRetrieval Delay MinimizationThe two risk-averse caching schemes RA and RANA both use the coherent CVaRrisk measure to account for the uncertainty associated with predicting the contentrequests. Here, we precisely define the CVaR risk measure and how it accounts96for the uncertainty associated with predicting the content requests.Let D be a random variable that denotes the total content retrieval delay ina time-interval [t, t + ∆T ]. Realizations of the total content retrieval delay aredefined by d (4.2). The ability to compare random outcomes of D based on theconfidence α ∈ [0,1] is crucial for accounting for the risk associated with theuncertainty of estimating yv f when performing caching decisions.How can we estimate the total content retrieval delay D for a given confidencelevel α ∈ [0,1] (where α = 1 is completely risk-averse, and α = 0 is risk-neutral).One possibility is to use the Value-at-Risk (VaR) risk measureVaRα(D) = min{d ∈R : FD(d)≥ α} (4.6)where FD(d) is the cumulative distribution function of D. Classically, VaRα(D)was a popular method to estimate risk, however, it has several limitations [92,93]. First, VaRα(D) is difficult to optimize as it is non-convex and is a non-coherent measure of risk as it fails the sub-additive condition. Second, VaRα(D)does not account for the properties of the distribution FD(d) beyond the thresholdVaRα(D).To account for the uncertainty of estimating yv f for computing D, we use theCVaR measure [92, 93] which is a coherent risk measure. That is, CVaR satisfiesthe following properties: it is positive homogeneous, sub-additive, monotonic,translation invariant with respect to first order stochastic dominance, and mono-tonic with respect to second order stochastic dominance. CVaR is the expectedtotal delay given that we are in the α ∈ [0,1] confidence interval of the cumulativedistribution FD(d). Formally, CVaR is given byCVaRα(D) = ED[D|D≥ VaRα(D)]=11−α1∫αVaRβ (D)dβ . (4.7)where ED denotes the expectation with respect to the random variable D that repre-97sents the content retrieval delay. As seen from (4.7), CVaRα(D) can be interpretedas the conditional expectation of D where the expectation is evaluated on the α-confidence portion of the tail distribution FD(d). If α = 0, then CVaRα(D) =E[D]is the expected value of D, and if α → 1, then CVaRα(D) = max{D} gives themaximum value of D.Given the CVaR risk measure (4.7), to minimize the total content retrieval de-lay in a time-interval [t, t+∆T ] for a confidence level α the following optimizationproblem needs to be solvedC∗ ∈ argminz∈Z {CVaRα(D)}, (4.8)where z are the decision variables that affect the total content retrieval delay D,and Z are the associated constraints on the decision variables. C∗ in (4.8) is theassociated caching decision that minimizes the total content retrieval delay witha confidence of α . If α = 0 in (4.8), then (4.8) is equivalent to the risk-neutralcaching schemes discussed in Sec.4.3.1 and Sec.4.3.2. Here, we are interested inrisk-averse caching schemes for evaluating (4.8) where α ∈ (0,1].The evaluation of (4.8) for general α ∈ (0,1] is non-trivial as it requires ananalytical expression for the cumulative distribution function FD(d) which is un-known. Recall that the random variable D representing the total content retrievaldelay in the time-interval [t, t +∆T ] which depends on the network parameters,popularity of content, content transfer protocols, and the uncertainty associatedwith estimating the request count yv f for content f at node v. Using Theorem 2in [92], the optimization problem (4.8) can be represented byC∗ ∈ argminz∈Z ,c∈R{c+ 11−αED[max{0,D− c}]} (4.9)where the expectation is taken with respect to the random variable D. Though thedistribution FD(d) is unknown, if the distribution of the content requests FYv f (yv f )at node v ∈ V is known, then the objective function (4.9) can be approximatedusing Monte-Carlo integration techniques. That is, the expectation ED[·] in (4.9)98is estimated using K independent and identically distributed samples of the con-tent requests yv f generated from the distribution FYv f (yv f ). Additionally, the non-smooth operator max{·} in the objective function (4.9) can be removed by intro-ducing auxiliary parameters ξk. The approximate solution to (4.9) can be com-puted by solving:C∗ ∈ argminz∈Z ,c∈R{c+ 1K(1−α)K∑ξ=1ξk}s.t. ξk ≥ D(z, ŷv f k)− cŷv f k ∼ FYv f (yv f ) for v ∈ V , f ∈Fξk ≥ 0, for k ∈ {1, . . . ,K}. (4.10)where ŷv f k represents the generated sample of the content requests for content fat node v for sample k.The optimization problem (4.10) provides the basis for constructing the risk-averse static caching schemes in this chapter.4.3.4 Risk-Averse (RA) Static Caching SchemeHere we construct a risk-averse (RA) static caching scheme that accounts for theuncertainty associated with predicting the content requests yv f for content f atnode v. The caching method RA is a generalization of the caching method RN inSec.4.3.1 that does not account for the uncertainty associated with estimating thecontent requests.Given the conditional density function FYv f (yv f ) of content requests, the CVaRoptimization problem (4.10), and using the constraints in the popularity based99caching method (4.3), the RA caching scheme is defined byC∗ ∈ argmincv f ,c{c+1K(1−α)K∑ξ=1ξk}s.t. cv f ∈ [0,1],c ∈R,ξk ≥F∑f=1V∑v=1ŷv f k(1− cv f )− c, (4.11a)ŷv f k ∼ FYv f (yv f ),F∑f=1s f cv f ≤ Sv, (4.11b)ξk ≥ 0,v ∈ V , f ∈F ,k ∈ {1, . . . ,K},where C∗ ∈ [0,1]V×F . The parameter K in (4.11) is the total number of the contentrequests generated from the distributions FYv f (yv f ) for v ∈ V and f ∈F . Addi-tionally, the parameter c in (4.11) represents the estimated Value-at-Risk (VaR) ofthe total content delay for a confidence α , and α ∈ (0,1] is the confidence level.The RA caching scheme (4.11) is a mixed integer linear program that contains(2K +V ) inequality constraints, (FV ) binary variables, (K + 1) real variables,and requires the generation of (V FK) samples from the distributions FYv f (yv f ).The complexity of (4.11) is NP-hard, however several numerical methods ex-ist which can be used to evaluate (4.11) including: Branch-and-bound, cuttingplanes, branch-and-cut, and branch-and-price [143]. The selection of the numberof samples K to use is still an open research problem. However, we found that areasonable performance is achieved using K = 10,000 samples.Though RA caching scheme (4.11) accounts for the uncertainty associatedwith predicting the number of requests, it neglects the network parameters (net-work bandwidth, load at the nodes, request-queue-time, content cached at othernodes), and the protocol of the network.1004.3.5 Risk-Averse and Network-Aware (RANA) CachingSchemeHere we construct RANA caching scheme that accounts for the uncertainty as-sociated with predicting the number of requests, network parameters (networkbandwidth, load at the nodes, request-queue-time, content cached at other nodes),and protocol of the LTE network.Given the conditional density function FYv f (yv f ) of content requests, the CVaRoptimization problem (4.10), and using the constraints in (4.4), the RANA cachingscheme is given byC∗ ∈ argminC,k,δ ,r,c{c+1K(1−α)K∑ξ=1ξk}s.t. cs f ∈ [0,1], ksd f ∈ [0,1], δi jd f ∈ [0,1],rsd f ∈ [0,1], Ti j ∈ Z+, c ∈R,ξk ≥F∑f=1∑d∈Vd∑i, j∈Vŷd f Ai j f δi jd f − c,∑i∈Vδsid f −δisd f = ksd f , ∑i∈Vδdid f −δidd f =−1,1{bi j = 0}+δi jd f ≤ 1F∑f=1s f cs f ≤ Ss,V∑s=1cs f ≥ 1V∑s=1ksd f = 1,V∑s=1rsd f = 1,F∑f=1∑d∈Vdδi jd f ≤ Ti j (4.12a)rsd f ≤ ksd f , rsd f ≤ cs f , rsd f ≥ ksd f + cs f −1,∀s ∈ V , ∀d ∈ Vd, ∀ f ∈F , ξk ≥ 0, k ∈ {1, . . . ,K}.A description of each of the constraints in (4.12) is provided below (4.4). Note that101the RANA caching scheme (4.12) is equivalent to the RNNA caching scheme (4.4)if we do not account for the uncertainty associated with predicting the number ofrequests for content f ∈F at node v ∈ V – that is, we set α = 0 in (4.12).The mixed integer linear program (4.12) contains a total of (V +2V 2+V 3)Fbinary variables, (2K + 1) real variables, (3V +V 2)F equality constraints, and((1+V + 2V 2 +V 3)F +V 2 +V + 2K + 1)inequality constraints. As discussedin Sec.4.3.4, several numerical methods can be used to evaluate C∗ in (4.12) fortypical small networks that contain up to V = 10 wireless nodes and F = 1000content.Summary: In this section we constructed four static caching schemes, namely,RN in (4.3), RNNA in (4.4), RA in (4.11), and RANA in (4.12). The RNNAcaching scheme accounts for content requests, cache size, bandwidth, load, andcontent routing, only requires the solution to a unimodular linear program. Theunique feature of the risk-averse caching schemes RA and RANA compared to RNand RNNA is that they use a coherent risk measure to account for the uncertaintyassociated with predicting the content requests. The confidence level α ∈ [0,1] inthe RA and RANA methods can be used to network operator to select the levelof risk when performing a caching decision. For example, setting α = 1 resultin the RA and RANA minimizing the maximum possible content retrieval delayin the network. To evaluate the caching decision using RA or RANA requires aconformal prediction of the content requests–that is, an estimate of the cumulativedistribution function of the content requests. In Sec.4.4 we provide a conformalprediction algorithm for constructing the cumulative distribution function of therequests.4.4 Content Request Cumulative DistributionFunction ForecastingThe risk-neutral and risk-averse static caching schemes constructed in Sec.4.3 re-quire a point estimate ŷv f or cumulative distribution function estimate FYv f (yv f ) ofthe request count yv f for content f ∈F at node v∈V . In this section we construct102a conformal prediction algorithm to estimate FYv f (yv f ) given the dataset D(T ) in(4.1). The key idea of the conformal prediction algorithm is to use discriminantanalysis to perform coarse-grained prediction of the content requests. Then use afeed-forward neural network to perform fine-grained request estimation. For boththe coarse-grained and fine-grained request estimates, the prediction interval ofeach is denoted by P(g|x f ) and FY f (y f |g,x f ) respectively where g ∈ G representsthe group association, and Yf is the random variable representing the number ofrequests for content f ∈ F . Using the total-law of probability the cumulativedistribution function of the content requests isFY f (y f |x f ) =G∑g=1FY f (y f |g,x f )P(g|x f ), (4.13)for content f ∈F . Below we present the coarse-grained and fine-grained requestprediction methods, and the density forecasting algorithm to evaluate (4.13).4.4.1 Content Group Association ClassifierGiven the datasetD(T ) in (4.1), the goal is to construct a classifier that can assigncontent f ∈F to a particular group g ∈ G and provide a confidence estimate ofthe group association. That is, we desire a classifier which learns the conditionalprobability mass function P(g|x) of group association.The elements of the content features x f are commonly constrained to intervalson the real line. For example, for YouTube videos, the number of subscribersto the user that uploaded the video must be a positive number and the minimumlength of a video is 1 second (15 seconds if ads are present). Let us assume that thefeature vector x is a random variable which has a conditional probability densityfunction given by a doubly truncated multivariate normal distributionp(x|g) =N (µg,µ−g ,µ+g ,Σg) (4.14)∝ exp(−12(x−µg)′Σ−1g (x−µg))1{µ−g ≤ x≤ µ+g }.103In (4.14), µg is the mean vector of features in group g ∈ G , µ−g is the minimumvalue of the content features in group g, µ+g is the maximum value of the contentfeatures in group g, and Σg is the covariance matrix of the features in group g.If the mean µg, lower and upper limits (µ−g and µ+g ) on the feature vector, andcovariance matrix Σg are known for each group g∈ G , and the prior probability ofgroup association is P(g), then the probability of content f ∈F being associatedwith group g ∈ G isP(g|x f ) = P(x f |g)P(g)∑Gr=1 P(x f |r)P(r). (4.15)The prior probability P(g) of group association can either be set to an uninforma-tive prior (i.e. P(g) = 1/G), or to the population average with P(g) = ng/F whereng is the number of content inF that are associated with group g ∈ G .Given the dataset D(T ) in (4.1), how can the parameters µg,µ+g ,µ−g ,Σg in(4.14) be estimated? If µ−g = −∞ and µ+g = +∞ (the feature vector x is uncon-strained), then classical discriminant analysis methods can be used to estimateµg and Σg. These include Linear Discriminant Analysis, Factor-Based LinearDiscriminant Analysis, Maximum Uncertainty Linear Discriminant Analysis, andRegularized Discriminant Analysis [144]. Typically, the content features x arecontained on an interval of the real line such that −∞< µ−g and µ+g <+∞. Givenµ−g and µ+g , the estimate of the mean µg and covariance Σg can be computed usingGibbs sampling, maximum likelihood estimation, and generalized method of mo-ments [145]. Here use the maximum likelihood estimation method to estimate µgand Σg using the Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithmwith box constraints to account for the limits of the feature vector x.4.4.2 Risk-Averse Feed-foward Neural Network for PredictingContent RequestsIn this section we construct the risk sensitive extreme learning machine that com-bines the benefits of the extreme learning machine with a risk-aware trainingmethod to predict content requests. The presented method is comprised of a104single-layer feed-foward neural network trained using a stochastic optimizationalgorithm that dynamically adjusts the number of neurons and weights to mini-mize the risk of over fitting with a confidence level α ∈ [0,1].Classic Extreme Learning MachineGiven the dataset D(T ) in (4.1), the goal is to construct a method to estimate thefunctional relationship between the content features x f and the associated requestcount y f (t). A single-layer feed-foward neural network can be used to relate thecontent features x f to the requests y f (t). Denoting ŷ f (t) as the estimated requestcount given x f , the feed-forward neural network is given byy f (t) =L∑i=1βihi(x f ;θi) = β ′h(x f ) (4.16)with β = [β1,β2, . . . ,βL]′ are the weights of each neuron, andh(x f ) = [h1(x f ;θ1), . . . ,hL(x f ;θL)]′ the associated transfer function of each neu-ron. Popular transfer functions include the sigmoid, hyperbolic tangent, and Gaus-sian, however any non-linear piecewise continuous function can be utilized. Theneuron weights βi and θi in (4.16) are computed from the solution toθ ∗,β ∗ ∈ argmin{F∑f=1(y f (t)− ŷ f (t))2}. (4.17)For general transfer functions h(x f ), (4.17) is a non-convex and non-linear opti-mization problem that is commonly solved using stochastic gradient decent, ant-colony optimization, and simulated anealling methods [146, 147]. A draw-backwith these numerical methods for solving (4.17) is that they require a large numberof computations to converge to the global optimal solution of (4.17). However, us-ing random matrix theory, it has been shown that the parameter values at the localminima of the objective function (4.17) yield similar mean-square error comparedto the global optimal solution of (4.17) [148]. Therefore, the mean-square er-105ror of the feed-forward neural network is not sensitive to the set of local minima{θ ∗,β ∗} is used.Could selecting the neuron weights θ randomly, and then fitting the weights βvia least-squares minimization provide a reasonable approximation to the solutionof (4.17)? Since the mean-squared error of the feed-forward neural network is notsensitive to which local minima of (4.17) are used, it is reasonable to postulate thatrandomly selecting θ and then fitting β will produce a reasonable solution. This isthe main idea behind the extreme learning machine (ELM) proposed in [124]. Forfixed number of neurons L, the ELM selects the parameters θ and β in two steps.First, the hidden layer parameters θ are randomly initialized. Any continuousprobability distribution can be used to initialize the parameters θ . Second, β isselected to minimize the mean-square error between the model output and themeasured output from the dataset D(T ) in (4.1). Formally,θ ∗ ∼N (0,1)β ∗ = H(X ;θ ∗)+Y (4.18)whereN (0,1) is the multivariate normal distribution with unit variance, H(X ;θ ∗)is the hidden-layer output matrix with entries Hi f (X ;θ)= hi(x f ;θi) for i= 1, . . . ,Land f ∈F , H+ denotes the Moore-Penrose generalized inverse of H, and Y =[y1, . . . ,yF ]′. Several efficient methods can be used to compute β ∗, for exampleGaussian elimination. The major benefit of using the ELM, (4.16) and (4.18), isthat the training only requires the random generation of the parameters θ ∗, and β ∗is computed as the solution of a set of linear equations.The ELM satisfies the universal approximation condition [120], can be imple-mented in parallel [117], can be trained sequentially for large datasets or as newtraining data becomes available [122, 123], and can be efficiently implementedon field-programmable gate array devices as well as complex programmable logicdevices [118]. A limitation with the ELM is that it can not be used to select thenumber of neurons L while minimizing the risk of overfitting the dataset D(T ) in106(4.1).Regularized Extreme Learning MachineHere we construct the regularized extreme learning machine (RELM) which isa generalization of the ELM that minimizes the risk of overfitting with a confi-dence level α ∈ (0,1]. For example, given the dataset D(T ) in (4.1) and settingα = 1, the RELM selects the parameters θ ,β ,L in (4.16) to minimize the mean-square generalization error between the actual and predicted content requests. TheRELM is equivalent to the ELM if we do not account for risk–that is, we set α = 0.The RELM is trained by solving the following optimization problem (we dis-cuss the motivation below):L∗ ∈ argminL∈Z+,c∈R{c+1K(1−α)K∑k=1zk}s.t. zk ≥ 1(1− γ)F ||η(L)H(Xk;θ k)β k−Y k||22− c (4.19a)Xk,Xk,Y k,Yk ∼ FY(D(T ),γ) (4.19b)θ k ∼N (0,1), (4.19c)β k = [η(L)H(Xk;θ k)]+Yk (4.19d)zk ≥ 0, L≤ Lmax, for k ∈ {1, . . . ,K}. (4.19e)In (4.19),N (0,1) is the multivariate normal distribution, FY(D ,γ) is the Fisher-Yates random permutation which is used to partition the data D(T ) into com-plementary subsets of training data (Xk,Yk) and validation data (Xk,Y k) whereγ ∈ (0,1) denotes the percentage of data used for training, and η(L) is a diagonalmatrix with elementsηii(L) ={1 if i≤ L0 otherwise.(4.20)The parameter Lmax is the maximum number of neurons in the RELM, and K isthe number of samples used to estimate the cumulative distribution function of the107mean-square error of the feed-forward neural network.The objective function in (4.19) represents the CVaR of the mean-square gen-eralization error of the ELMs with L neurons for a confidence level α . The mean-square generalization error of each ELM is evaluated using1(1− γ)F ||η(L)H(Xk;θ k)β k−Y k||22defined in constraint (4.19a). The constraint (4.19b) is used to generate the train-ing (Xk,Yk) and testing (Xk,Y k) data for each of the k ∈ {1, . . . ,K} ELMs. Theconstraint (4.19c) is used to generate the neuron transfer function weights θ k.Given the transfer function weights θ k and the training data (Xk,Yk), constraint(4.19d) is used to construct the neuron weights β k of each ELM. The final con-straint (4.19e) defines the maximum number of possible neurons Lmax and thenumber of ELMs K generated to evaluate the CVaR risk measure. The selectionof Lmax and K are important to restrict the computational resources used to trainand evaluate the RELM. The final result of the RELM (4.19) is the optimal num-ber of neurons L∗ of use for predicting the content requests while minimizing therisk of overfitting the training data with a confidence level α . Given L∗, the ELMweights θ ,β can be constructed using equation (4.18).Discussion of (4.19): Solving the mixed integer nonlinear program (4.19) isequivalent to optimally selecting the number of neurons L to minimize the risk ofoverfitting the dataset D(T ). The training method in (4.19) dynamically adjuststhe number of neurons L based on the available training data in D(T ). Typically,as the number of observations in D(T ) increases, the number of neurons in theRELM will also increase. The solution to (4.19) can be computed in polynomialtime by solving (4.19) for L = 1, . . . ,Lmax independently, and then selecting thesolution that minimizes the objective function in (4.19).1084.4.3 Conformal Prediction Algorithm for Content RequestsIn this section we construct the conformal prediction algorithm to estimate thecumulative distribution function FY f (y f |x f ) of the number of requests for contentf ∈F given the content features x f .A schematic of the conformal prediction algorithm is provided in Fig.4.4. Thealgorithm is comprised of an offline training stage, and an online stage to evaluatethe requests for new content. In the offline stage the dataset D(T ) in (4.1) isused to train the group association classifier defined in Sec.4.4.1, and the RELMdefined in Sec.4.4.2. Then, using the datasetD̂(T ) = {{x f ,g f (t),y f (t), ŷ f (t)} : v ∈ V , f ∈F , t ∈ [0,T ]}, (4.21)the set of prediction errors {ε f (g)}, where ε f (g) = ŷ f (g,x f )−y f for each contentf ∈F associated with group g ∈ G is constructed. The trained parameters fromthe offline stage are then used in the online stage to estimate the request count ofthe content. In the online stage, when a new content with features xo is received,the group association classifier is used to compute the group association probabil-ities P(g|xo). Then the RELM is used to estimate the number of requests ŷ f (xo,g)for the content f . The group association probabilities and predicted number ofrequests from the RELM are then sent to the conformal prediction block whichoutputs the conformal prediction FY (yo|xo).Insight into the design of the conformal prediction algorithm in Fig. 4.4 isgained by considering the content requests y f for content f as a random variable.We assume that the random variable y f satisfiesy f = ŷ f (g,x f )+ ε f (g) (4.22)where ŷ f (g,x f ) is the estimated number of requests from the RELM trained usingdata from group g∈G , and ε f (g) is the random variable that accounts for the errorin the predicted and actual number of requests. The error ε f (g) accounts for theerrors associated with the parametric structure of the RELM, the parameters of the109OfflineOnlineGroupAssociationRELMPredictionErrorGroupPredictionRequestPredictionConformalPrediction F̂Y (yo|xo)D(T )xoP(g|x f )µg,ΣgD̂(T )β g,θ g,Lg {εi(g)}P(g|xo) ŷoFigure 4.4: Schematic of the conformal prediction algorithm. D(T ) in (4.1)is the training dataset, P(g|x f ) is the probability of group association,D̂(T ) is the training dataset with the predicted content requests fromthe RELM, {ε f (g)} are the errors between the predicted and actualrequests for content in group g ∈ G , xo is a new content feature, andF̂Y (yo|xo) is the empirical cdf function of the content given the contentfeatures xo.RELM, and the errors that may be contained in the feature vector x f . The randomvariables ε f (g) are assumed independent and identically distributed. Then, giventhe dataset D̂ , the empirical cdf of the random variable y f isF̂Y (y f |g,x f ) = 1ngng∑i=11{εi(g)≤ y f − ŷ f (g,x f )}(4.23)where ng is the total number of contents in group g ∈ G and 1{·} is the indicatorfunction.Substituting (4.23) into (4.24) gives the conformal prediction of the contentf ∈F . Formally, the empirical conditional distribution function of the number ofrequests for content f ∈F isF̂Y (y f |x f ) =G∑g=1F̂Y (y f |g,x f )P(g|x f ) (4.24)where P(g|x f ) is the conditional probability of content f belonging to group g∈Gfrom the discriminant analysis classifier.Summary: In this section we constructed a conformal prediction algorithm,110illustrated in Fig. 4.4, for content requests. The conformal prediction algorithmis comprised of an offline learning stage in which content request and featuresare used to select the doubly-truncated multivariate normal distribution parame-ters for group association estimation, and train the RELM for constructing pointestimates of the content requests. For new content, the output of the trained groupclassifier and RELM are used to construct the conformal prediction of the contentrequests. This conformal prediction is used with the risk-averse caching methodsin Sec.4.3.4 and 4.3.5 to optimally cache content in the network.4.5 Numerical Evaluation of the ConformalPrediction Algorithm and Coherent RiskMinimization Caching Schemes for YouTubeContentIn this section we evaluate the performance of the conformal prediction algorithmand four static caching schemes (RN, RNNA, RA, RANA) presented in Sec.4.3using real-world YouTube datasets. The results show that a 6% reduction in the av-erage delay can be achieved if the uncertainty of the content requests is accountedfor, and a 25% reduction in average delay is achieved if both the uncertainty andsmallcell routing protocol are accounted for compared to the risk-neutral cachingscheme that neglects the routing protocol. These results illustrate the importanceof both accounting for the risk associated with estimating content requests and ac-counting for the routing protocol used to transfer content throughout the network.4.5.1 Network Parameters and YouTube DatasetTo evaluate the performance of the static caching schemes (RN, RNNA, RA,RANA), and the conformal prediction method illustrated in Fig.4.4, we constructan LTE heterogeneous network and generate user content requests based on real-world YouTube datasets.111Network Parameters: The LTE network topology used for evaluation is il-lustrated in Fig.4.5. The LTE network is composed of a core network that is con-nected to base station nodes, smallcell gateway nodes, and a server which containsall the contentF that can be requested by users. The content server and the corenetwork communicate via the wide area network. The latency in the wide areanetwork is typically in the range of 10 ms to 100 ms depending on the distanceand the number of hops between the server and the core network [149, 150]. Thecore network, base station, and gateway nodes communicate with one another viaan intra-network communication link with link capacity values in the range of 500Mbps to 1 Gbps. Finally, the smallcell access points are connected to the gatewaynodes via heterogeneous backhaul links with link capacity values in the range of100 Mbps to 500 Mbps.In the network illustrated in Fig.4.5, user requests are only received by thesmallcell access points and the base station nodes. Each requested video con-tent has a size of s f = 200 MB. The smallcell access points, base station, andgateway have a cache size of 500 GB (approximately 10% of the entire contentlibrary), and the core network node has a cache size of 1 TB (approximately 20%of the entire content library). For each node, 90% of the cache storage is usedfor static caching, and the remaining is used for dynamic caching. The dynamiccache segment of each node is controlled using the Least-Recently-Used (LRU)replacement method discussed in [75]. Given the cache size, link capacity be-tween the nodes, processing delay, propagation delay, number of hops betweennodes, packet size, and the topology in the network in Fig.4.5, the edge weightparameter Ai j f (4.5) between node i and node j is evaluated using the ndnSIM 2.0NS-3 base simulator [151]. ndnSIM allows a video content f to be addressableand routable inside the network. For the network-layer protocol, ndnSIM’s NDNstack is used while point-to-point communication is considered as the link layerprotocol. In ndnSIM, the smallcell access points and base station are defined asthe Consumer node. We implement a new consumer application method for theConsumer node to generate content requests according to the YouTube datasets.112All the nodes having cache storage are considered as the Producer node whichcan satisfy content requests generated by the Consumer nodes. The ndnSIM BestRoute is used to forward content requests to neighbouring nodes until the contentis retrieved–this is equivalent to the shortest-path routing algorithm. Finally, thendnSIM Application-level trace helper is used to compute the edge weight Ai j f(4.5) between node i and node j for content f . We set Ti j = 16 in equation (4.4c)and in equation (4.12a).YouTube Dataset: The real-world YouTube dataset was collected using theYouTube API between the years 2013 to 2015 and consists of 25,000 YouTubevideos. The dataset is comprised of videos from 17 YouTube categories including“Gaming” (44% of videos), “Entertainment” (40% of videos), “Music” (5% ofvideos), and “Education” (2% of videos). Note that gaming and entertainment areamong the most popular video categories on YouTube. The video requests rangefrom 102 to above 107. Therefore, to prevent the ELM algorithm from biasing it’sprediction to only the videos with the highest requests, we apply a log transformto the requests such that the range is in 2 to 7. All the content features are scaled tosatisfy x(m)∈ [0,1] for m∈ {1, . . . ,M}. We use 11,747 video content to constructthe training dataset D = {{x f ,g f ,y f } : f ∈F}. The remaining videos are usedto test the performance of the conformal prediction algorithm. In total there areG = 3 groups in the dataset, where group g = 1 is associated with videos withless than 100 requests, g = 2 with videos with requests in the range of 100 to30,000 requests, and g = 3 videos that have more then 30,000 requests. Fortesting the performance of the discriminant analysis classifier, extreme learningmachines, and coherent risk-minimization algorithms, we use the dataset D̂ whichis comprised of 13,253 videos. The number of requests for each video content isidentical at each of the smallcell access points and base station.As a preliminary step, the YouTube videos are clustered based on their asso-ciated category. The content requests of cluster c ∈ C at node v is:yvc = ∑f∈Fcyv f , (4.25)113where Fc ⊆F . Given the content requests per cluster, the caching methods pre-sented in this chapter optimally cache each category of content in the network.We use the top 10 most popular YouTube categories to compute the performancemetrics of the caching schemes. To evaluate the delay of the risk-averse cachingschemes a method is required to generate user request that is consistent with theYouTube dataset D .Content Request vector: We discretize time into a total of Kt time slots,where for each time-slot a content request is received. We construct content pop-ularity distribution using dataset D where Pvf denotes the probability of content fbeing requested at node v and Pvf ∼ yv f . Let rv denote the content request vectorwith elements rv(k) ∈F that define the content f being requested time k at nodev. The content request vector satisfies the conditionKt∑k=11{rv(k) = f} ∼ Pvf ∀ f ∈F . (4.26)To construct the content request vector we randomly generate a total of Kt =20,000 content requests in rv such that (4.26) is satisfied. The estimated delayfor a given caching decision is computed by evaluating the total delay associatedwith the content requests from the request vector rv at each of the v nodes.4.5.2 Conformal Prediction Algorithm for YouTube ContentIn this section the performance of the conformal prediction algorithm presentedin Sec.4.4, is evaluated using real-world YouTube datasets. This includes theperformance of the discriminant analysis classifier, extreme learning machines,and the conformal prediction for video content requests.From fig. 4.4, the first step in the conformal prediction algorithm is the offlinestage in which the parameters of the group association classifier (µg and Σg) areselected, the extreme learning machines are trained for each group (β g,θ g,Lg),and the parameters ε f (g) in (4.23) are estimated using the training dataset D .The performance of the group classifier is evaluated using the true positive rate114BSCommunication LinkCore NetworkSAP-1SGWContent ServerSAP-2SAP-3Figure 4.5: Schematic of the network. There are three smallcell accesspoints (SAPs) in the network. These SAPs are connected with thesmallcell gateway (SGW) via heterogeneous communication links.SGW and base station (BS) are connected with the core network. EachSAPs, BS and SGW has a storage of 500 GB. Storage of the core net-work is assumed to be 1 TB and the core network is connected to thecontent server which can cache the entire library via wide area network(WAN).(TPR) and true negative rate (TNR), and the extreme learning machines usingCVaRα(ε2) for an α = 0.95. The results are provided in Table 4.2 where the pa-rameter ng is the total number of content in each of the groups g ∈ G . As seen,the group association classifier has a reasonable TPR and TNR for classifying thegroup association of each video. Using the RELM in Sec.4.4.2, Table 4.2 illus-trates that the selected number of neurons Lg for the ELM associated with group115g ∈ G varies between the three groups, however the error CVaRα(ε2) remainsapproximately equal. This illustrates that the RELM can dynamically adjust thenumber of neurons necessary to effectively estimate the content requests ŷ f of newcontent while minimizing the risk of over-fitting with a confidence level α = 0.95.Table 4.2: Group Classification and Neuron Number SelectionGroup ng TPR TNR CVaRα(ε2) Lg1 2405 0.80 0.96 0.3302 372 10314 0.94 0.79 0.3046 593 534 0.74 0.99 0.3133 24To construct the conformal prediction (4.24) requires an estimate of the pre-diction errors ε f (g) in (4.23). Alternatively, we can construct an estimate ofF̂Y (y f |g,x f ) using the empirical cumulative distribution function F̂E|g(ε|g) of therandom variables ε . Fig.4.6 illustrates the computed group empirical cumula-tive distribution function F̂E|g(ε|g) of the error of the ELM associated with eachgroup g ∈ G . Using maximum likelihood estimation, we find that the empiricalcdf F̂E|g(ε|g) is approximately equal to the generalized extreme value distributiontypically used in risk management, finance, and economics. Given F̂E|g(ε|g), theconditional distribution of content requests can be evaluated usingF̂Y (y f |g,x f ) = F̂E|g(y f − ŷ f (x f ,g)|g,x f ) (4.27)where ŷ f (x f ,g) is the estimated content requests from the ELM associated withgroup g ∈ G for content f .Given the parameters of the group association classifier, ELMs, and F̂Y (y f |g,x f )from the offline stage of the conformal prediction algorithm, we now evaluate theperformance of the online portion of the conformal prediction algorithm for newcontent. The group association probability P(g|x f ) and results of the conformalprediction algorithm for new content are provided in Fig.4.7. The content index116g = 2 g = 3g = 1EmpiricalCDFFˆE|g(ε|g)Figure 4.6: Empirical cumulative distribution function of the error ε(g) in(4.22) for the groups g ∈ G . The gray dots indicate the empirical cu-mulative distribution function F̂E|g(ε|g), and the black line indicatesthe fitted generalized extreme value distribution.is ordered such that the least requested content is f = 1, and the most requestedcontent is f = 13,253 based on the results of the conformal prediction. As Fig.4.7illustrates, the group association probability P(g|x f ) from the discriminant anal-ysis classifier provides a reasonable accuracy for the probability of group associ-ation. From Fig.4.8, there are approximately 1,225 contents (indicated in black)that have predicted number of requests from the ELMs that are outside the 90%confidence interval. Equivalently, approximately 9.2% of the content requestsreside outside the 90% confidence interval computed from the conformal predic-117tion algorithm. To determine if the cumulative distribution function F̂Y (y f |x f ) isconsistent with observed content requests data, we use the quantile-quantile plot.The quantile-quantile plot in Fig. 4.9 is evaluated using the confidence interval ofF̂Y (y f |x f ). As seen from Fig. 4.9, the empirical cumulative distribution F̂Y (y f |x f )is in excellent agreement with the observed data. Therefore, F̂Y (y f |x f ) provides areasonable approximation for the actual content request distribution FY (y f |x f ).The above results indicate that the conformal prediction algorithm presentedin Sec.4.4, and illustrated in Fig. 4.4, can be used to estimate the cumulative dis-tribution function FY (y f |x f ) of YouTube content requests. The estimated cumu-lative distribution function F̂Y (y f |x f ) can then be used in the risk-averse cachingschemes (RA and RANA) to optimally cache content in the smallcell network.Additionally, the trained ELMs can be used for point predictions of the contentrequests for the risk-neutral caching schemes (RN and RNNA).4.5.3 Selection of the Confidence Level α for MaximumContent Retrieval Delay GuaranteesThe risk-averse caching schemes (RA and RANA) account for the uncertainty ofthe predicted content requests using the CVaR risk measure for a given confidencelevel α . From (4.7), for a confidence level α , CVaR is the expected content re-trieval delay given that the delay is greater than or equal to the VaR at α . Theselection of the parameter α ∈ [0,1] is determined by the network operator. Inthis section we evaluate the cumulative distribution function FD(d) to evaluatethe probability that the delay D is less than the threshold d using the RNNA andRANA caching schemes for a given confidence level α . Given FD(d), the net-work operator can select the confidence level α to guarantee that the delay doesnot exceed the given threshold dth.Fig. 4.10 provides the cumulative distribution function FD(d) for the con-tent retrieval delay for confidence levels α = 0.9 and α = 0.99 using the RANAcaching scheme, and the RNNA caching scheme. From Fig. 4.10, the RANAcaching scheme (4.12) provides better performance compared to the RNNA caching118Figure 4.7: Group association probability P(g|x f ). Gray dots indicate theprobability of association with group g= 1, black dots with g= 2, andlight-gray dots with g = 3. Performance of the conformal predictionalgorithm that was schematically illustrated in Fig. 4.4, for YouTubecontent requests. Group g = 1 is associated with all videos that arepredicted to have less than 100 requests, g = 2 with requests in therange of 100 to 30,000, and group g = 3 with more then 30,000 re-quests. The 90% confidence interval is evaluated using the empiricalcumulative distribution function F̂Y f (y f |x f ) of content requests definedin (4.24). As seen, the computed group association, point content re-quests, and conformal predictions are in excellent agreement with theYouTube datasets. The training and evaluation datasets are discussedin Sec.4.5.1.scheme (4.4) for all delay threshold values d. This results as the RANA cachingscheme accounts for the uncertainty associated with estimating the content re-quests. For example, if we are interested in the probability the delay is less thanthe threshold d = 50 seconds, the RNNA scheme is approximately 50%, while the119Figure 4.8: Conformal prediction of the number of content requests. Theexpected number of requests is the solid black line, gray region is the90% interval where the requests are expected to reside, and the blackdots are the real number of requests in D .RANA schemes are approximately 98%. A substantial improvement in perfor-mance is obtained by accounting for the prediction error in the caching decisions.The associated delay of between the RANA caching schemes with α = 0.99 andα = 0.9 are approximately equal except in the delay range of 46 seconds to 49seconds. In this region the selection of α is important. For example, if d = 48seconds, then the probability the delay is 50% for RANA with α = 0.99, and 60%RANA with α = 0.9. Therefore, if the network operator wants to minimize theprobability of the delay exceeding the threshold dth = 48 seconds, an α = 0.9should be selected. The reason that a larger value of α does not guarantee min-imizing FD(d), for a specific d, is that as α → 1, the maximum delay is beingminimized in (4.12).120Figure 4.9: Quantile-quantile plot for the empirical cumulative distributionfunction F̂Y (y f |x f ) and the YouTube content requests. The linear blackline indicated perfect agreement between the data and distribution, andthe grey dots indicate the quantiles computed from the YouTube data.4.5.4 Performance of the Risk-Neutral and Risk-AwareCaching SchemesIn this section, we illustrate the performance of the four caching schemes (RN andRNNA) in Secs.4.3.1, 4.3.2, and (RA and RANA) in Secs.4.3.4, 4.3.5. Addition-ally, we compare the performance of these four caching schemes with the cachingscheme presented in [88]. The caching scheme in [88] accounts for the contentrequests, and network parameters (cache size, bandwidth, latency among nodes),however it does not account for the routing protocol used in the network or the un-certainty associated with the estimated content requests. We refer to the cachingscheme in [88] as the risk-neutral without routing (RNWR) caching scheme.The performance metric we use to compare these five caching schemes is the121Content retrieval delay d per MB file (second)42 44 46 48 50 52 54 56CumulativeDistributionFunctionFD(d) α = 0.9RANA α = 0.99RNNAFigure 4.10: Cumulative distribution function of the content retrieval de-lay FD(d) using the RNNA and RANA caching schemes for aconfidence level α = 0.9,0.99. The circles denote the RNNAcaching scheme (4.4), diamond shape indicate the RANA cachingscheme (4.12) with α = 0.9, and squares for the RANA cachingscheme with α = 0.99. As the value of FD(d) = P(D≤ d) increases,the probability the delay exceeds d decreases.cumulative distribution function FD(d) for the content retrieval delay. To esti-mate FD(d), we set K = 10,000 and α = 0.9, and generate 20,000 samples ofthe total content retrieval delay D. Fig. 4.11 illustrates empirical FD(d) fromthe five caching schemes. The results in Fig. 4.11 illustrate that the RA cachingscheme (4.11) has a lower delay than the RN caching scheme (4.3) for all possiblevalues of d. That is, the delay that results from the RN caching scheme illustratesa first-order stochastic dominance compared with the delay that results from the122RA caching scheme. The delay of the RN and RA schemes are approximatelytwice as large as compared with the RNWR scheme which accounts for the net-work parameters but not the routing protocol used in the network. Therefore,including network parameters can substantially reduce the delay that results whenperforming a caching decision. Comparing the results for the RNWR, RNNA, andRANA, both the RNNA and RANA caching schemes significantly reduce the con-tent retrieval delay in the network compared with the RNWR caching scheme byapproximately 25%. This results as the RNNA and RANA schemes both reducethe congestion of transferring content throughout the network as they account forthe network routing protocol used. Finally, the RANA scheme provides the low-est content retrieval delay compared with the other four schemes as it accounts forthe uncertainty of content requests, network parameters, and the routing protocolused to transfer content throughout the network.4.6 Chapter SummaryIn this work we designed risk-neutral and risk-averse caching schemes for het-erogeneous networks that contain smallcell access points and base stations whichhave limited storage capacity and low bandwidth backhaul links. The risk-aversecaching schemes employed the coherent Conditional Value-at-Risk (CVaR) mea-sure to account for the uncertainty of estimating the content requests to performthe caching decisions. The CVaR risk measure is evaluated using informationfrom the conformal prediction algorithm which constructs the cumulative dis-tribution function of the content requests based on the content features. Usingreal-world datasets from YouTube and the NS-3 simulator, we demonstrate howthe caching schemes reduce the delay of retrieving content in heterogeneous net-works compared with industry standard caching schemes. The results show that a6% reduction in the average delay can be achieved if the uncertainty of the contentrequests is accounted for, and a 25% reduction in average delay is achieved if boththe uncertainty and network routing protocol are accounted for compared to therisk-neutral caching that neglects the routing protocol.123Content retrieval delay d per MB file (second)40 50 60 70 80 90 100 110 120 130 140CumulativeDistributionFunctionFD(d) 4.11: The cumulative distribution function FD(d) of the content re-trieval delay for the RN, RNNA, RA, RANA, RNWR cachingschemes. To evaluate FD(d) for the RA and RANA caching schemes,we set K = 10,000 and α = 0.9 in (4.11) and (4.12). A total of 20,000samples are generated for the content retrieval delay to construct theempirical cdf FD(d). The results illustrate that the delay of the RN,RNNA, RA, and RNWR schemes all first-order stochastically domi-nate the delay associated with the RANA caching scheme.124Chapter 5Conclusions and Future ResearchDirectionsIn this chapter, we summarize the main contributions of the thesis and describefuture research directions.5.1 ConclusionsThe growing popularity of cloud computing offers mobile users to enjoy a widevariety of applications. However, the traditional centralized cloud computing ar-chitecture faces several challenges. Offloading tasks to a distant remote cloudincur data transferring delay which is not suitable for time-critical applicationswhile sending data associated with tasks migration to a remote cloud increasesthe traffic volume in the network. On the other hand, the gaining popularity ofvideo streaming via mobile devices surges traffic volume in the network. To ad-dress these challenges new architecture, namely, mobile edge cloud has gainedattention from both industry and academia. The main idea of mobile edge cloudis to integrate computation and storage at the edge of the network, serving a por-tion of the users’ requests locally, resulting in a reduction in data traffic volumeand data transferring delay. The unified theme of the thesis was to design mecha-125nisms for mobile edge cloud to maximally utilize edge network resources. In thisthesis, we made three major contributions and a summary of these contributionsis as follows.• In the first part of the thesis, we designed a distributed resource sharingmechanism for mobile edge cloud where edge nodes such as FAPs sharetheir computational resources with the neighbours FAPs and form localclouds to reduce the number of offloading tasks to a remote cloud. As such,the resulting local femto-clouds reduce overall latency associated with tasksmigration to a remote cloud, and hence, improve users’ QoE. To motivateFAP owners to share their resources for performing tasks in femto-clouds, amonetary incentive mechanism was introduced. To this end, we proposed adistributed femto-clouds formation mechanism and formulated the problemas an optimization problem with the objective to maximize overall utili-ties such as higher monetary incentives and lower overall network latencywhile ensuring a fair division of incentives among individual FAPs withinthe femto-clouds. In particular, the problem was formulated as a coalitionformation game with modified core and the proposed algorithm reached tothe optimal solution in a distributed fashion. The numerical results verifiedthe applicability of the proposed femto-clouds formation mechanism in awide range of scenarios e.g., hotspot, residential, and enterprise environ-ment of the femtocell network.• In the second part of the thesis, we constructed a caching scheme to max-imally utilize storage resources at the edge of the network. The presentedcaching scheme reduces traffic volume in the network by serving users’ re-quests locally and also reduces content downloading delay. To this end,we constructed a caching scheme that takes into account users’ behaviourand operating characteristics of the network. In particular, the presentedcaching scheme estimated content popularity from the content features andrequests statistics of users as they were available. Then a mixed integer126linear program was formulated that took into account content popularity,and network parameters such as network topology, a communication link toselect where to place content in the network. Numerical evaluations usingYouTube videos demonstrated the efficacy of the proposed caching scheme.• The caching scheme presented in the second part of the thesis did not in-clude content retrieval path in the caching decision. As a result, a separaterouting mechanism was employed to retrieve content from the neighbournodes. In the third part of the thesis, we designed a caching scheme thatnot only considered content popularity and network parameters but also ac-counted for the routing mechanism in the caching decision. As such, thepresented caching scheme reduced content downloading and traffic volumein the network while balancing the traffic load over communication links us-ing the routing mechanism. To this end, we constructed a caching schemewhich is referred to the risk-neutral caching scheme in this thesis. The risk-neutral caching scheme required to solve a unimodular linear program. Inorder to incorporate uncertainty associated with the error in the content pop-ularity prediction, we presented a risk-averse caching scheme where uncer-tainty was modelled using conditional value at risk measure. The numericalevaluation demonstrated that an improved network performance can be ob-tained by including routing mechanism in the caching decision. Numericalresults also demonstrated the benefit of the risk-averse caching scheme overthe risk-neutral caching scheme.5.2 Future Research ProblemsIn this section, we describe three future research problems in mobile edge cloud.Precisely, the first problem is related to the mobile edge cloud computing. Themain target of this problem is to design frame allocation mechanism for edgecloud assisted mobile game to improve mobile users’ gaming experience. Theaim of the second problem is to design caching method where individual wire-127less node learns content popularity locally by observing content request statistics.The third problem combines both computations and caching at the edge of thenetwork. The main target of this problem is to design a caching strategy thatstores content with appropriate bit rate version. In this problem, edge comput-ing resources are employed to perform computations related to transcoding of acontent. A schematic of the future research problems is depicted in Fig.5.1.Mobile Edge CloudComputation Scheduling CachingEdge Cloud assistedMobile GameJoint Computation and Cachingfor Adaptive Bit Rate Video StreamingCollaborative Learningfor Edge CachingFigure 5.1: A schematic view of the future research challenges.5.2.1 Frame Allocation Mechanism for Edge Cloud AssistedMobile GamingWith the proliferation of smart phones, mobile gaming is gaining popularity amongyoung generation. Although smart phones offer a wide variety of games, they aremostly single player games due to the resource constraints of the mobile devicessuch as battery life and computational power. Mobile cloud gaming has beenproposed to support complicated multi-player games on mobile devices withoutaugmenting resources. The idea is to outsource computational burdens to a cloudserver. In mobile cloud gaming, the mobile device behaves like a thin client andworks as a platform to send gaming control inputs to the cloud server. The cloudserver processes the gaming scenarios and then streams back video frames to themobile device. Players are interacting each other in the cloud via a wireless net-128work. Mobile cloud gaming enables users to play any game using their mobiledevices as long as the device has the ability to support video. Although mobilecloud gaming prolongs the battery life of the mobile device, it also introduceslatency which affects players’ QoE. Edge cloud assisted mobile gaming is onepossible solution to overcome latency issue in mobile cloud gaming [152, 153].Though edge cloud assisted mobile gaming takes care of the latency issue, italso introduces several challenges. On the one hand, edge cloud nodes are pos-sessing a finite amount of resources and a number of users are sharing these re-sources simultaneously. Therefore, the availability of the computational resourceshas an impact on the processing delay. On the other hand, to make a game inter-active and to avoid jerky video, a minimum number of video frames needs tobe received by the mobile device in a second. However, the time-varying natureof the wireless channels limits the number of packet delivery through wirelesschannels. In addition, users’ prefer high-quality video frames which consist of ahigher number of packets and hence requires longer transmission and processingtime. Studies reveal that video quality and response delay affects mobile gaminguser experience and users may quit the game [154].Motivated by the aforementioned facts, designing a frame allocation mecha-nism for edge cloud assisted mobile gaming is important where the main objectiveof the frame allocation mechanism is to maximize mobile gaming user experiencewhile satisfying all the constraints: the target number of frames in a second, pro-cessing delay of the video frames, edge cloud nodes computational workload, andwireless channels’ supportable packet rate. One possible way to solve the problemis to formulate the problem as a finite horizon Markov decision problem where theobjective is to maximize mobile gaming user experience over the horizon whilesatisfying all the constraints. To overcome the complexity issue in Markov deci-sion process, designing sub-optimal solutions are also important for this problem.1295.2.2 Collaborative Learning for Edge CachingThe caching methods presented in this thesis first estimate content popularity fromtheir features using learning algorithms. Having computed the content popularity,the caching methods employ optimization techniques to decide the content to becached in the edge nodes. As such, the presented caching methods involve twosteps for caching a content. As already mentioned, there are several advantages ofthe presented caching methods such as:i) the content popularity estimation technique enables to cache popular contentwithout observing requests for the content. Thereby, content downloading delayis reduced even if the content has recently been uploaded;ii) since content popularities are already available from the learning technique,the content caching can be performed when the network load is minimal; andiii) the estimation techniques and optimization techniques are not tethered.As a result, different estimation techniques can be employed for the presentedoptimization techniques and vice versa.Although presented caching methods have several advantages, they incur cacheinitialization cost that is content transferring cost at the beginning of caching. Theentire cache needs to be updated with the recent popular ones when content pop-ularity changes. As a result, when content popularity changes rapidly, learningcontent popularity and cache content simultaneously, provides more benefit thanthe presented caching methods that requires cache initialization.Few research works consider learning and caching content simultaneously forthe edge network [67],[68] where the caching methods involve solving a multi-armed bandit problem. In particular, at the beginning, content are cached ran-domly and edge node records requests for each of the cached content. After aninterval, edge node evicts content that receive a lower number of requests thana target threshold. Then the cache is filled with some new content and previousinterval popular content. This procedure continues and asymptotically the cachewill be filled with the popular content [67]. Nonetheless, the presented cachingmethods require keeping track of the request statistics for each of the content.130To overcome such limitation, contextual multi-armed bandit caching methods areproposed where content are clustered based on context and the eviction decisionsare made based on the cluster the content belongs to[68]. Although the presentedcaching methods improve learning speed, the caching methods still require a cer-tain number of content requests to identify popular clusters.Collaborative learning can provide a significant improvement in learning speedwhen edge nodes are densely populated such as in today heterogeneous hotspotareas. According to the collaborative learning, edge nodes are grouped togetheraccording to their content popularity pattern. As such, edge nodes that receivesignificantly different content requests reside in separate groups. The main ad-vantage of collaborative learning compared to the methods presented in [68] isthat collaborative learning not only exploits hidden information of the context butalso shares the gathered information with the neighbour edge nodes, as such edgenodes with same popularity pattern learn content popularity as a single entity. Theresulting procedure thus takes advantage of content popularity patterns in the datain a way akin to collaborative filtering methods[155],[156].5.2.3 Joint Computation and Caching for Adaptive Bit RateVideo StreamingThe caching methods exist in the literature do not account for different bit rate ver-sion of the content. However, in reality, a content is requested from heterogeneousplatforms such as a laptop, smart phone, tablet and a transcoding mechanism isrequired to adapt the bit rate version of the content according to the platform. Inaddition, the time-varying nature of the wireless channels advocates the necessityfor considering different bit rate version of the content in the caching decision.One way to handle this issue is to deploy some computational resources atthe edge caching node and perform computations associated with the transcoding.The main benefit of such solution is that the edge node can cache diverse contentas such only one copy of a content is cached at the edge node. However, thissolution may not perform well when some popular content contribute to a large131portion of the content requests from the heterogeneous platform. In such case,transcoding for each and every requests not only costs a significant amount ofcomputational resources but also introduces latency, resulting degradation in theusers’ QoE. Therefore, we need a joint computational and caching methods thatnot only prescribe which content to cache but also recommend which version(s)of the content to cache so that computational and caching resources are maximallyutilized. Authors of [157] have recently considered this issue which needs furtherattention from the research community.132Bibliography[1] K. Dolui and S. K. Datta, “Comparison of edge computingimplementations: Fog computing, cloudlet and mobile edge computing,”in Proceedings of the IEEE Global Internet of Things Summit, pp. 1–6,2017. → pages xii, 5[2] N. C. Nguyen, P. Wang, D. Niyato, Y. Wen, and Z. Han, “Resourcemanagement in cloud networking using economic analysis and pricingmodels: a survey,” IEEE Communications Surveys & Tutorials, 2017. →pages 3[3] N. Fernando, S. W. Loke, and W. Rahayu, “Mobile cloud computing: Asurvey,” Future Generation Computer Systems, vol. 29, no. 1, pp. 84–106,2013. → pages 3[4] G. Aceto, A. Botta, W. De Donato, and A. Pescape`, “Cloud monitoring: Asurvey,” Computer Networks, vol. 57, no. 9, pp. 2093–2115, 2013. →pages 3[5] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobile cloudcomputing: Architecture, applications, and approaches,” WirelessCommunications and Mobile Computing, vol. 13, no. 18, pp. 1587–1611,2013. → pages 3, 12[6] M. R. Rahimi, J. Ren, C. H. Liu, A. V. Vasilakos, andN. Venkatasubramanian, “Mobile cloud computing: A survey, state of artand future directions,” Mobile Networks and Applications, vol. 19, no. 2,pp. 133–143, 2014. → pages 3[7] M. V. Barbera, S. Kosta, A. Mei, and J. Stefa, “To offload or not tooffload? the bandwidth and energy costs of mobile cloud computing,” inProceedings of the IEEE INFOCOM, pp. 1285–1293, 2013. → pages 3, 4133[8] P. Mach and Z. Becvar, “Mobile edge computing: A survey onarchitecture and computation offloading,” IEEE Communications Surveys& Tutorials, 2017. → pages 3[9] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edgecomputing—a key technology towards 5g,” ETSI White Paper, vol. 11,no. 11, pp. 1–16, 2015. → pages 4[10] S. Yi, C. Li, and Q. Li, “A survey of fog computing: concepts, applicationsand issues,” in Proceedings of the ACM Workshop on Mobile Big Data,pp. 37–42, 2015. → pages 4[11] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case forVM-based cloudlets in mobile computing,” vol. 8, no. 4, pp. 14–23, 2009.→ pages 4, 12, 13[12] E. Cuervo, A. Balasubramanian, D.-K. Cho, A. Wolman, S. Saroiu,R. Chandra, and P. Bahl, “MAUI: making smartphones last longer withcode offload,” in Proceedings of the International Conference on MobileSystems, Applications, and Services, (San Francisco, CA), pp. 49–62,2010. → pages 4, 12, 27[13] S. Barbarossa, S. Sardellitti, and P. Di Lorenzo, “Joint allocation ofcomputation and communication resources in multiuser mobile cloudcomputing,” in Proceedings of the IEEE Workshop on Signal ProcessingAdvances in Wireless Communications, (Darmstadt, Germany),pp. 26–30, 2013. → pages 5, 16[14] J. Zhang, W. Xie, F. Yang, and Q. Bi, “Mobile edge computing and fieldtrial results for 5g low latency scenario,” China Communications, vol. 13,no. Supplement2, pp. 174–182, 2016. → pages 7[15] J. Dolezal, Z. Becvar, and T. Zeman, “Performance evaluation ofcomputation offloading from mobile device to the edge of mobilenetwork,” in Proceedings of the IEEE Standards for Communications andNetworking, pp. 1–7, 2016. → pages 7[16] Y. Gao, W. Hu, K. Ha, B. Amos, P. Pillai, and M. Satyanarayanan, “Arecloudlets necessary?,” School Comput. Sci., Carnegie Mellon Univ.,Pittsburgh, PA, USA, Tech. Rep. CMU-CS-15-139, 2015. → pages 8134[17] S. H. Chae and W. Choi, “Caching placement in stochastic wirelesscaching helper networks: Channel selection diversity via caching,” IEEETransactions on Wireless Communications, vol. 15, no. 10,pp. 6626–6637, 2016. → pages 8[18] C. V. N. Index, “Global mobile data traffic forecast update, 2015–2020,”2016. → pages 8[19] S. Dernbach, N. Taft, J. Kurose, U. Weinsberg, C. Diot, and A. Ashkan,“Cache content-selection policies for streaming video services,” inProceedings of the IEEE INFOCOM, pp. 1–9, 2016. → pages 8[20] X. Wang, M. Chen, T. Taleb, A. Ksentini, and V. C. Leung, “Cache in theair: exploiting content caching and delivery techniques for 5g systems,”IEEE Communications Magazine, vol. 52, no. 2, pp. 131–139, 2014. →pages 9[21] M. Gregori, J. Go´mez-Vilardebo´, J. Matamoros, and D. Gu¨ndu¨z,“Wireless content caching for small cell and d2d networks,” IEEE Journalon Selected Areas in Communications, vol. 34, no. 5, pp. 1222–1234,2016. → pages 9[22] N. Zhao, X. Liu, F. R. Yu, M. Li, and V. C. Leung, “Communications,caching, and computing oriented small cell networks with interferencealignment,” IEEE Communications Magazine, vol. 54, no. 9, pp. 29–35,2016. → pages 9[23] C. V. N. Index, “Global mobile data traffic forecast update, 2015-2020,”Cisco white paper, 2016. → pages 9[24] C. Yang, Y. Yao, Z. Chen, and B. Xia, “Analysis on cache-enabledwireless heterogeneous networks,” IEEE Transactions on WirelessCommunications, vol. 15, no. 1, pp. 131–145, 2016. → pages 9[25] H. Ahlehagh and S. Dey, “Video-aware scheduling and caching in theradio access network,” IEEE/ACM Transactions on Networking (TON),vol. 22, no. 5, pp. 1444–1462, 2014. → pages 10, 17[26] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, andG. Caire, “Femtocaching: Wireless content delivery through distributed135caching helpers,” IEEE Transactions on Information Theory, vol. 59,no. 12, pp. 8402–8413, 2013. → pages 10, 17, 19, 78, 93[27] N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire, “Femtocachingand device-to-device collaboration: A new architecture for wireless videodistribution,” IEEE Communications Magazine, vol. 51, no. 4,pp. 142–149, 2013. → pages 10[28] M. Xie, I. Widjaja, and H. Wang, “Enhancing cache robustness forcontent-centric networking,” in Proceedings of the IEEE INFOCOM,pp. 2426–2434, 2012. → pages 11[29] R. Wang, X. Peng, J. Zhang, and K. B. Letaief, “Mobility-aware cachingfor content-centric wireless networks: Modeling and methodology,” IEEECommunications Magazine, vol. 54, no. 8, pp. 77–83, 2016. → pages 11[30] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role ofproactive caching in 5g wireless networks,” IEEE CommunicationsMagazine, vol. 52, no. 8, pp. 82–89, 2014. → pages 11, 93[31] S. Agarwal, M. Philipose, and P. Bahl, “Vision: The case for cellular smallcells for cloudlets,” in Proceedings of the International Workshop onMobile Cloud Computing & Services, (Bretton Woods, NH), pp. 1–5,2014. → pages 12[32] P. Bahl, R. Y. Han, E. E. Li, and M. Satyanarayanan, “Advancing the stateof mobile cloud computing,” in Proceedings of the ACM Workshop onMobile Cloud Computing & Services, (Ambleside, UK), pp. 21–28, 2012.→ pages 13[33] S. Barbarossa, P. Di Lorenzo, and S. Sardellitti, “Computation offloadingstrategies based on energy minimization under computational rateconstraints,” in Proceedings of the European Conference on Networks andCommunications, (Bologna, Italy), pp. 1–5, 2014. → pages 16[34] F. L. Vilela, A. J. Ferrer, M. A. Puente, Z. Becvar, M. Rohlik, T. Vanek,P. Mach, O. M. Medina, J. V. Manzano, H. Hariyanto, et al.,“TROPIC-D22 design of network architecture for femto-cloudcomputing,” 2013. → pages 13, 26136[35] T. Arnold and U. Schwalbe, “Dynamic coalition formation and the core,”Journal of Economic Behavior & Organization, vol. 49, no. 3,pp. 363–380, 2002. → pages 14, 34, 36, 37[36] R. Kaewpuang, D. Niyato, P. Wang, and E. Hossain, “A framework forcooperative resource management in mobile cloud computing,” IEEEJournal on Selected Areas in Communications, vol. 31, no. 12,pp. 2685–2700, 2013. → pages 15[37] M. Guazzone, C. Anglano, and M. Sereno, “A game-theoretic approach tocoalition formation in green cloud federations,” in Proceedings of theIEEE International Symposium on Cluster, Cloud and Grid Computing,(Chicago, IL), pp. 618–625, 2014. → pages 15[38] C. A. Lee, “Cloud federation management and beyond: Requirements,relevant standards, and gaps,” IEEE Cloud Computing, vol. 3, no. 1,pp. 42–49, 2016. → pages 15[39] T. Truong-Huu and C.-K. Tham, “A novel model for competition andcooperation among cloud providers,” IEEE Transactions on CloudComputing, vol. 2, no. 3, pp. 251–265, 2014. → pages 15[40] L. Mashayekhy, M. M. Nejad, and D. Grosu, “Cloud federations in thesky: Formation game and mechanism,” IEEE Transactions on CloudComputing, vol. 3, no. 1, pp. 14–27, 2015. → pages 15[41] F. Pantisano, M. Bennis, W. Saad, and M. Debbah, “Spectrum leasing asan incentive towards uplink macrocell and femtocell cooperation,” IEEEJournal on Selected Areas in Communications, vol. 30, no. 3,pp. 617–630, 2012. → pages 15[42] O. N. Gharehshiran, A. Attar, and V. Krishnamurthy, “Collaborativesub-channel allocation in cognitive lte femto-cells: a cooperativegame-theoretic approach,” IEEE Transactions on Communications,vol. 61, no. 1, pp. 325–334, 2013. → pages 15[43] Z. Zhang, L. Song, Z. Han, and W. Saad, “Coalitional games withoverlapping coalitions for interference management in small cellnetworks,” IEEE Transactions on Wireless Communications, vol. 13,no. 5, pp. 2659–2669, 2014. → pages 15137[44] F. Pantisano, M. Bennis, W. Saad, M. Debbah, and M. Latva-Aho,“Interference alignment for cooperative femtocell networks: Agame-theoretic approach,” IEEE Transactions on Mobile Computing,vol. 12, no. 11, pp. 2233–2246, 2013. → pages 15[45] R. Langar, S. Secci, R. Boutaba, and G. Pujolle, “An operations researchgame approach for resource and power allocation in cooperative femtocellnetworks,” IEEE Transactions on Mobile Computing, vol. 14, no. 4,pp. 675–687, 2015. → pages 15[46] S.-Y. Yun, Y. Yi, D.-H. Cho, and J. Mo, “The economic effects of sharingfemtocells,” IEEE journal on selected areas in Communications, vol. 30,no. 3, pp. 595–606, 2012. → pages 15[47] L. Gao, G. Iosifidis, J. Huang, and L. Tassiulas, “Economics of mobiledata offloading,” in Proceedings of the IEEE INFOCOM Workshops,(Turin, Italy), pp. 351–356, 2013. → pages 15[48] Y. Chen, J. Zhang, and Q. Zhang, “Utility-aware refunding framework forhybrid access femtocell network,” IEEE Transactions on WirelessCommunications, vol. 11, no. 5, pp. 1688–1697, 2012. → pages 15[49] S. Hua, X. Zhuo, and S. S. Panwar, “A truthful auction based incentiveframework for femtocell access,” in Proceedings of the IEEE WCNC,(Shanghai, China), pp. 2271–2276, 2013. → pages 15[50] L. Duan, J. Huang, and B. Shou, “Economics of femtocell serviceprovision,” IEEE Transactions on Mobile Computing, vol. 12, no. 11,pp. 2261–2273, 2013. → pages 15[51] N. Shetty, S. Parekh, and J. Walrand, “Economics of femtocells,” inProceedings of the IEEE GLOBECOM, (Honolulu, HI), pp. 1–6, 2009. →pages 15[52] Y. Zhang and M. van der Schaar, “Peer-to-peer multimedia sharing basedon social norms,” Signal Processing: Image Communication, vol. 27,no. 5, pp. 383–400, 2012. → pages 15[53] M. Jakobsson, J.-P. Hubaux, and L. Buttya´n, “A micro-payment schemeencouraging collaboration in multi-hop cellular networks,” in Financial138Cryptography (R. N. Wright, ed.), vol. 2742 of Lecture Notes in ComputerScience, pp. 15–33, 2003. → pages 15[54] B.-G. Chun, K. Chaudhuri, H. Wee, M. Barreno, C. H. Papadimitriou, andJ. Kubiatowicz, “Selfish caching in distributed systems: A game-theoreticanalysis,” in Proceedings of the Annual ACM Symposium on Principles ofDistributed Computing, (Elche, Spain), pp. 21–30, 2004. → pages 15[55] J. Camenisch, S. Hohenberger, and A. Lysyanskaya, “Compact e-cash,” inAdvances in Cryptology (R. Cramer, ed.), vol. 3494 of Lecture Notes inComputer Science, pp. 302–321, 2005. → pages 15[56] L. Buttya´n and J.-P. Hubaux, “Nuglets: a virtual currency to stimulatecooperation in self-organized mobile ad hoc networks,” tech. rep., 2001.→ pages 15[57] O. Munoz, A. Pascual-Iserte, and J. Vidal, “Optimization of radio andcomputational resources for energy efficiency in latency-constrainedapplication offloading,” IEEE Transactions on Vehicular Technology,vol. 64, no. 10, pp. 4738–4755, 2015. → pages 16[58] M. Ali, Q. Rabbani, M. Naeem, S. Qaisar, and F. Qamar, “Joint userassociation, power allocation, and throughput maximization in 5g h-crannetworks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 10,pp. 9254–9262, 2017. → pages 16[59] J. Oueis, E. C. Strinati, and S. Barbarossa, “Small cell clustering forefficient distributed cloud computing,” in Proceedings of the IEEEPIMRC, (Washington, DC), pp. 1474–1479, 2014. → pages 16[60] J. Oueis, E. C. Strinati, and S. Barbarossa, “The fog balancing: Loaddistribution for small cell cloud computing,” in Proc. of IEEE VTC,vol. Spring, (Glasgow, Scotland), pp. 1–6, 2015. → pages 16[61] S. Andreev, O. Galinina, A. Pyattaev, J. Hosek, P. Masek,H. Yanikomeroglu, and Y. Koucheryavy, “Exploring synergy betweencommunications, caching, and computing in 5g-grade deployments,” IEEECommunications Magazine, vol. 54, no. 8, pp. 60–69, 2016. → pages 17[62] K. Poularakis, G. Iosifidis, V. Sourlas, and L. Tassiulas, “Exploitingcaching and multicast for 5g wireless networks,” IEEE Transactions on139Wireless Communications, vol. 15, no. 4, pp. 2995–3007, 2016. → pages17[63] K. Poularakis, G. Iosifidis, and L. Tassiulas, “Approximation algorithmsfor mobile data caching in small cell networks,” IEEE Transactions onCommunications, vol. 62, no. 10, pp. 3665–3677, 2014. → pages 17[64] W. Jiang, G. Feng, and S. Qin, “Optimal cooperative content caching anddelivery policy for heterogeneous cellular networks,” IEEE Transactionson Mobile Computing, vol. 16, no. 5, pp. 1382–1393, 2017. → pages 17[65] B. Bharath, K. Nagananda, and H. V. Poor, “A learning-based approach tocaching in heterogenous small cell networks,” IEEE Transactions onCommunications, vol. 64, no. 4, pp. 1674–1686, 2016. → pages 17[66] S. Mu¨ller, O. Atan, M. van der Schaar, and A. Klein, “Context-awareproactive content caching with service differentiation in wirelessnetworks,” IEEE Transactions on Wireless Communications, vol. 16,no. 2, pp. 1024–1036, 2017. → pages 17, 87, 93[67] P. Blasco and D. Gunduz, “Learning-based optimization of cache contentin a small cell base station,” in Proceedings of IEEE InternationalConference on Communications (ICC), pp. 1897–1903, 2014. → pages17, 60, 130[68] A. Sengupta, S. Amuru, R. Tandon, R. M. Buehrer, and T. C. Clancy,“Learning distributed caching strategies in small cell networks,” inProceedings of the International Symposium on Wireless CommunicationsSystems, pp. 917–921, 2014. → pages 17, 130, 131[69] E. Bas¸tug˘, M. Bennis, E. Zeydan, M. A. Kader, I. A. Karatepe, A. S. Er,and M. Debbah, “Big data meets telcos: A proactive caching perspective,”Journal of Communications and Networks, vol. 17, no. 6, pp. 549–557,2015. → pages 17[70] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role ofproactive caching in 5g wireless networks,” IEEE CommunicationsMagazine, vol. 52, no. 8, pp. 82–89, 2014. → pages 17140[71] E. Bas¸tug˘, M. Bennis, and M. Debbah, “Proactive caching in 5g small cellnetworks,” Towards 5G: Applications, Requirements and CandidateTechnologies, pp. 78–98, 2016. → pages 17, 19, 78[72] E. Bastug, M. Bennis, and M. Debbah, “A transfer learning approach forcache-enabled wireless networks,” in Proceedings of the 13th InternationalSymposium on Modeling and Optimization in Mobile, Ad Hoc, andWireless Networks (WiOpt), pp. 161–166, 2015. → pages 17[73] E. Bastug, M. Bennis, and M. Debbah, “Anticipatory caching in small cellnetworks: A transfer learning approach,” in Proceedings of the 1st KuVSWorkshop on Anticipatory Networks, 2014. → pages 17[74] G. Huang, G. Huang, S. Song, and K. You, “Trends in extreme learningmachines: A review,” Neural Networks, vol. 61, pp. 32–48, 2015. →pages 18, 64[75] D. Applegate, A. Archer, V. Gopalakrishnan, S. Lee, andK. Ramakrishnan, “Optimal content placement for a large-scale vodsystem,” IEEE/ACM Transactions on Networking, vol. 24,pp. 2114–2127, 2016. → pages 19, 78, 112[76] H. Pinto, J. Almeida, and M. Gonc¸alves, “Using early view patterns topredict the popularity of YouTube videos,” in Proceedings of the 6th ACMInternational Conference on Web Search and Data Mining, pp. 365–374,2013. → pages 20[77] Z. Tan, Y. Wang, Y. Zhang, and J. Zhou, “A novel time series approach forpredicting the long-term popularity of online videos,” IEEE Transactionson Broadcasting, vol. 62, no. 2, pp. 436–445, 2016. → pages 20[78] J. Wu, Y. Zhou, M. Chiu, and Z. Zhu, “Modeling dynamics of online videopopularity,” IEEE Transactions on Multimedia, vol. 18, no. 9,pp. 1882–1895, 2016. → pages 20[79] C. Li, J. Liu, and S. Ouyang, “Characterizing and predicting the popularityof online videos,” IEEE Access, vol. 4, pp. 1630–1641, 2016. → pages 20[80] R. Zhou, S. Khemmarat, L. Gao, J. Wan, J. Zhang, Y. Yin, and J. Yu,“Boosting video popularity through keyword suggestion and141recommendation systems,” Neurocomputing, vol. 205, pp. 529–541, 2016.→ pages 20[81] W. Hoiles, A. Aprem, and V. Krishnamurthy, “Engagement and popularitydynamics of YouTube videos and sensitivity to meta-data,” IEEETransactions on Knowledge & Data Engineering, no. 7, pp. 1426–1437,2017. → pages 20[82] A. Tay and K. Wallis, “Density forecasting: a survey,” Journal offorecasting, vol. 19, no. 4, p. 235, 2000. → pages 20[83] D. Hamilton, Time series analysis, vol. 2. Princeton university press,1994. → pages 20[84] L. Fang and D. Bessler, “Stock returns and interest rates in china: theprequential approach,” Applied Economics, pp. 1–14, 2017. → pages 20[85] K. Shanmugam, N. Golrezaei, A. Dimakis, A. Molisch, and G. Caire,“Femtocaching: Wireless content delivery through distributed cachinghelpers,” IEEE Transactions on Information Theory, vol. 59, no. 12,pp. 8402–8413, 2013. → pages 20, 21[86] B. Bharath, K. Nagananda, and H. V. Poor, “A learning-based approach tocaching in heterogenous small cell networks,” IEEE Transactions onCommunications, vol. 64, no. 4, pp. 1674–1686, 2016. → pages 20, 21[87] J. Song, H. Song, and W. Choi, “Optimal content placement for wirelessfemto-caching network,” IEEE Transactions on WirelessCommunications, vol. 16, no. 7, pp. 4433–4444, 2017. → pages 20, 21[88] S. Tanzil, W. Hoiles, and V. Krishnamurthy, “Adaptive scheme for cachingyoutube content in a cellular network: Machine learning approach,” IEEEAccess, vol. 5, pp. 5870–5881, 2017. → pages 21, 121[89] J. He and W. Song, “Optimizing video request routing in mobile networkswith built-in content caching,” IEEE Transactions on Mobile Computing,vol. 15, no. 7, pp. 1714–1727, 2016. → pages 21[90] M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose,D. Towsley, and R. Sitaraman, “On the complexity of optimal requestrouting and content caching in heterogeneous cache networks,”142IEEE/ACM Transactions on Networking, vol. 25, no. 3, pp. 1635–1648,2016. → pages 21[91] K. Poularakis, G. Iosifidis, A. Argyriou, and L. Tassiulas, “Video deliveryover heterogeneous cellular networks: Optimizing cost and performance,”in Proceedings of the IEEE INFOCOM, pp. 1078–1086, 2014. → pages21[92] T. Rockafellar and S. Uryasev, “Optimization of conditional value-at-risk,”Journal of risk, vol. 2, pp. 21–42, 2000. → pages 22, 97, 98[93] T. Rockafellar and S. Uryasev, “Conditional value-at-risk for general lossdistributions,” Journal of banking & finance, vol. 26, no. 7,pp. 1443–1471, 2002. → pages 97[94] P. Krokhmal, J. Palmquist, and S. Uryasev, “Portfolio optimization withconditional value-at-risk objective and constraints,” Journal of risk, vol. 4,pp. 43–68, 2002. → pages 22[95] P. Resnick, R. Zeckhauser, J. Swanson, and K. Lockwood, “The value ofreputation on ebay: A controlled experiment,” Experimental Economics,vol. 9, no. 2, pp. 79–101, 2006. → pages 31[96] J. P. Kahan and A. Rapoport, Theories of Coalition Formation. New York,NY: Psychology Press, 2014. → pages 34[97] W. Saad, Z. Han, M. Debbah, A. Hjorungnes, and T. Basar, “Coalitionalgame theory for communication networks,” IEEE Signal ProcessingMagazine, vol. 26, no. 5, pp. 77–97, 2009. → pages 34[98] G. Owen, Game Theory. New York, NY: Academic Press, 1995. → pages34[99] O. N. Gharehshiran, Distributed Dynamic Coalition Formation forBearings-only Localization in Wireless Sensor Networks. PhD thesis,University of British Columbia (Vancouver), 2010. → pages 38[100] B.-Y. Su, Parallel Application Library for Object Recognition. PhD thesis,EECS Department, University of California, Berkeley, 2012. → pages 40143[101] M. Maire, P. Arbela´ez, C. Fowlkes, and J. Malik, “Using contours todetect and localize junctions in natural images,” in Proceedings of theIEEE CVPR, (Anchorage, AK), pp. 1–8, 2008. → pages 40[102] J. Li and J. Z. Wang, “Studying digital imagery of ancient paintings bymixtures of stochastic models,” IEEE Transactions on Image Processing,vol. 13, no. 3, pp. 340–353, 2004. → pages 40[103] K. Tan and S. Chen, “Adaptively weighted sub-pattern PCA for facerecognition,” Neurocomputing, vol. 64, pp. 505–511, 2005. → pages 40[104] N. Baldo, M. Miozzo, M. Requena-Esteso, and J. Nin-Guerrero, “An opensource product-oriented LTE network simulator based on ns-3,” inProceedings of the ACM MSWiM, (Miami Beach, FL), pp. 293–298,2011. → pages 41[105] N. Baldo, M. Requena-Esteso, M. Miozzo, and R. Kwan, “An open sourcemodel for the simulation of LTE handover scenarios and algorithms inns-3,” in Proceedings of the ACM MSWiM, (Barcelona, Spain),pp. 289–298, 2013. → pages 41[106] J. Zhang, X. Zhang, and W. Wang, “Cache-enabled software definedheterogeneous networks for green and flexible 5g networks,” IEEEAccess, vol. 4, pp. 3591–3604, 2016. → pages 54[107] J. Liu, Q. Yang, and G. Simon, “Optimal and practical algorithms forimplementing wireless cdn based on base stations,” in Proceedings of theIEEE Vehicular Technology Conference, pp. 1–5. → pages 54[108] D. Applegate, A. Archer, V. Gopalakrishnan, S. Lee, and K. K.Ramakrishnan, “Optimal content placement for a large-scale vod system,”in Proceedings of the International Conference on Emerging NetworkingExperiments and Technologies, p. 4, 2010. → pages 60[109] J. Till, S. Engell, S. Panek, and O. Stursberg, “Empirical complexityanalysis of a milp-approach for optimization of hybrid systems,” inProceedings of the Conference on Analysis and Design of HybridSystems, pp. 129–134, 2003. → pages 60144[110] K. Genova and V. Guliashki, “Linear integer programming methods andapproaches–a survey,” Journal of Cybernetics and InformationTechnologies, vol. 11, no. 1, 2011. → pages 60[111] C. Barnhart, E. L. Johnson, G. L. Nemhauser, M. W. Savelsbergh, andP. H. Vance, “Branch-and-price: Column generation for solving hugeinteger programs,” Operations research, vol. 46, no. 3, pp. 316–329, 1998.→ pages 60[112] R. H. Bartels and G. H. Golub, “The simplex method of linearprogramming using lu decomposition,” Communications of the ACM,vol. 12, no. 5, pp. 266–268, 1969. → pages 60[113] F. A. Potra and S. J. Wright, “Interior-point methods,” Journal ofComputational and Applied Mathematics, vol. 124, no. 1, pp. 281–302,2000. → pages 60[114] Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, and H. C. Li,“An analysis of facebook photo caching,” in Proceedings of the ACMSymposium on Operating Systems Principles, pp. 167–181, 2013. →pages 61, 89, 91[115] M. Guzelsoy and T. K. Ralphs, “Duality for mixed-integer linearprograms,” International Journal of Operations Research, vol. 4, no. 3,pp. 118–137, 2007. → pages 63[116] J. N. Hooker, “Integer programming: lagrangian relaxation integerprogramming: Lagrangian relaxation,” in Encyclopedia of Optimization,pp. 1667–1673, Springer, 2008. → pages 63[117] Q. He, T. Shang, F. Zhuang, and Z. Shi, “Parallel extreme learningmachine for regression based on mapreduce,” Neurocomputing, vol. 102,pp. 52–58, 2013. → pages 63, 64, 106[118] A. Basu, S. Shuo, H. Zhou, M. Lim, and G. Huang, “Silicon spikingneurons for hardware implementation of extreme learning machines,”Neurocomputing, vol. 102, pp. 125–134, 2013. → pages 63, 64, 106[119] F. Rosenblatt, “The perceptron: a probabilistic model for informationstorage and organization in the brain,” Psychological review, vol. 65,no. 6, p. 386, 1958. → pages 64145[120] G. Huang and L. Chen, “Enhanced random search based incrementalextreme learning machine,” Neurocomputing, vol. 71, no. 16,pp. 3460–3468, 2008. → pages 64, 67, 106[121] W. Huang, Z. Tan, Z. Lin, G. Huang, J. Zhou, C. Chui, Y. Su, andS. Chang, “A semi-automatic approach to the segmentation of liverparenchyma from 3d ct images with extreme learning machine,” inProceedings of the Annual International Conference of the IEEEEngineering in Medicine and Biology Society, pp. 3752–3755, 2012. →pages 64[122] J. Zhao, Z. Wang, and D. Park, “Online sequential extreme learningmachine with forgetting mechanism,” Neurocomputing, vol. 87,pp. 79–89, 2012. → pages 64, 106[123] Y. Ye, S. Squartini, and F. Piazza, “Online sequential extreme learningmachine in nonstationary environments,” Neurocomputing, vol. 116,pp. 94–101, 2013. → pages 64, 106[124] G. Huang, Q. Zhu, and C. Siew, “Extreme learning machine: theory andapplications,” Neurocomputing, vol. 70, no. 1, pp. 489–501, 2006. →pages 64, 106[125] C. J. Hutto and E. Gilbert, “Vader: A parsimonious rule-based model forsentiment analysis of social media text,” in Proceedings of theInternational AAAI conference on weblogs and social media, 2014. →pages 67[126] Q. Yu, M. Heeswijk, Y. Miche, R. Nian, B. He, E. Se´verin, andA. Lendasse, “Ensemble delta test-extreme learning machine (dt-elm) forregression,” Neurocomputing, vol. 129, pp. 153–158, 2014. → pages 67[127] Y. Miche, A. Sorjamaa, P. Bas, O. Simula, C. Jutten, and A. Lendasse,“OP-ELM: optimally pruned extreme learning machine,” IEEETransactions on Neural Networks, vol. 21, no. 1, pp. 158–162, 2010. →pages 67[128] J. Spall, Introduction to stochastic search and optimization: Estimation,simulation, and control, vol. 65. John Wiley & Sons, 2005. → pages 68146[129] H. Liu and H. Motoda, Feature selection for knowledge discovery and datamining, vol. 454. Springer Science & Business Media, 2012. → pages 70[130] U. Stan´czyk and L. Jain, Feature Selection for Data and PatternRecognition. Springer, 2015. → pages 70[131] F. Benoit, M. Heeswijk, Y. Miche, M. Verleysen, and A. Lendasse,“Feature selection for nonlinear models with extreme learning machines,”Neurocomputing, vol. 102, pp. 111–124, 2013. → pages 70[132] N. Choi, K. Guan, D. C. Kilper, and G. Atkinson, “In-network cachingeffect on optimal energy consumption in content-centric networking,” inProceedings of the IEEE ICC, pp. 2889–2894, 2012. → pages 72[133] S. Mastorakis, A. Afanasyev, I. Moiseenko, and L. Zhang, “ndnSIM 2.0:A new version of the NDN simulator for NS-3,” Technical ReportNDN-0028, NDN, January 2015. → pages 72[134] J. Friedman, “Stochastic gradient boosting,” Computational Statistics &Data Analysis, vol. 38, no. 4, pp. 367–378, 2002. → pages 77[135] A. Hyva¨rinen, J. Karhunen, and E. Oja, Independent component analysis,vol. 46. John Wiley & Sons, 2004. → pages 77[136] W. Venables and B. Ripley, Modern applied statistics with S-PLUS.Springer Science & Business Media, 2013. → pages 77[137] J. Hu, J. Zhang, C. Zhang, and J. Wang, “A new deep neural networkbased on a stack of single-hidden-layer feedforward neural networks withrandomly fixed hidden neurons,” Neurocomputing, 2015. → pages 77[138] J. Schmidhuber, “Deep learning in neural networks: An overview,” NeuralNetworks, vol. 61, pp. 85–117, 2015. → pages 77[139] R. Quinlan, C4.5: programs for machine learning. Elsevier, 2014. →pages 77[140] G. Zhang, T. Q. Quek, M. Kountouris, A. Huang, and H. Shan,“Fundamentals of heterogeneous backhaul design–analysis andoptimization,” IEEE Transactions on Communications, vol. 64, no. 2,pp. 876–889, 2016. → pages 85147[141] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK:Cambridge Univ. Press, 2004. → pages 96[142] M. Zeni, D. Miorandi, and F. De Pellegrini, “YOUStatAnalyzer: A toolfor analysing the dynamics of youtube content popularity,” in Proceedingsof the International Conference on Performance EvaluationMethodologies and Tools, pp. 286–289, 2013. → pages 96[143] G. L. Nemhauser and L. A. Wolsey, “Integer programming andcombinatorial optimization,” Wiley, 1988. → pages 100[144] D. Silva, “Two-group classification with high-dimensional correlated data:A factor model approach,” Computational Statistics & Data Analysis,vol. 55, no. 11, pp. 2975–2990, 2011. → pages 104[145] W. Griffiths, “A Gibbs’ sampler for the parameters of a truncatedmultivariate normal distribution,” Contemporary issues in economics andeconometrics: Theory and application, pp. 75–91, 2004. → pages 104[146] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance ofinitialization and momentum in deep learning,” in Proceedings of theInternational Conference on Machine Learning, pp. 1139–1147, 2013. →pages 105[147] M. Mavrovouniotis and S. Yang, “Training neural networks with antcolony optimization algorithms for pattern classification,” SoftComputing, vol. 19, no. 6, pp. 1511–1522, 2015. → pages 105[148] A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. LeCun,“The loss surfaces of multilayer networks,” Artificial Intelligence andStatistics, pp. 192–204, 2015. → pages 105[149] M. Chen, E. Zadok, A. O. Vasudevan, and K. Wang, “Seminas: A securemiddleware for wide-area network-attached storage,” in Proceedings ofthe ACM International on Systems and Storage Conference, pp. 1–13,2016. → pages 112[150] S. Abolfazli, Z. Sanaei, M. Alizadeh, A. Gani, and F. Xia, “Anexperimental analysis on cloud-based mobile augmentation in mobilecloud computing,” IEEE Transactions on Consumer Electronics, vol. 60,no. 1, pp. 146–154, 2014. → pages 112148[151] S. Mastorakis, A. Afanasyev, I. Moiseenko, and L. Zhang, “ndnSIM 2: Anupdated NDN simulator for NS-3,” Technical Report NDN-0028, Revision2, NDN, November 2016. → pages 112[152] W. Cai, V. C. Leung, and L. Hu, “A cloudlet-assisted multiplayer cloudgaming system,” Mobile Networks and Applications, vol. 19, no. 2,pp. 144–152, 2014. → pages 129[153] W. Cai, Z. Hong, X. Wang, H. C. Chan, and V. C. Leung,“Quality-of-experience optimization for a cloud gaming system with adhoc cloudlet assistance,” IEEE Transactions on Circuits and Systems forVideo Technology, vol. 25, no. 12, pp. 2092–2104, 2015. → pages 129[154] S. Wang and S. Dey, “Modeling and characterizing user experience in acloud server based mobile gaming approach,” in Proceedings of the IEEEGLOBECOM, pp. 1–7, 2009. → pages 129[155] S. Li, A. Karatzoglou, and C. Gentile, “Collaborative filtering bandits,” inProceedings of the ACM SIGIR conference on Research and Developmentin Information Retrieval, pp. 539–548, 2016. → pages 131[156] C. Gentile, S. Li, and G. Zappella, “Online clustering of bandits,” inProceedings of the International Conference on Machine Learning,pp. 757–765, 2014. → pages 131[157] T. X. Tran, P. Pandey, A. Hajisami, and D. Pompili, “Collaborativemulti-bitrate video caching and processing in mobile-edge computingnetworks,” in Proceedings of the IEEE Wireless On-demand NetworkSystems and Services, pp. 165–172, 2017. → pages 132149


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items