Coded Caching: Convex Optimization and GraphTheoretical PerspectivesbySeyed Ali SaberaliA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)October 2020c© Seyed Ali Saberali, 2020The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Coded Caching: Convex Optimization and Graph Theoretical Perspec-tivessubmitted by Seyed Ali Saberali in partial fulfillment of the requirements for thedegree of Doctor of Philosophy in Electrical and Computer Engineering.Examining Committee:Lutz Lampe, Electrical and Computer Engineering, University of British ColumbiaSupervisorIan Blake, Electrical and Computer Engineering, University of British ColumbiaSupervisory Committee MemberStephanie van Willigenburg, Mathematics, University of British ColumbiaUniversity ExaminerJoel Friedman, Mathematics, University of British ColumbiaUniversity ExaminerStark Draper, Electrical and Computer Engineering, University of TorontoExternal ExaminerAdditional Supervisory Committee Members:Lele Wang, Electrical and Computer EngineeringSupervisory Committee MemberiiAbstractData communications is an essential building block of everyday life. A fundamen-tal challenge in communications networks has always been the limited capacity ofthe links that transfer data from sources to destinations. A core technique to al-leviate the load on the network links is to cache popular content in intermediatenodes between the sources and destinations to avoid redundant transmission of thesame data. Although the concept of caching has been well studied in the contextof computer networks, new settings are emerging in communication networks forwhich the conventional caching techniques are significantly inefficient and deliverfar less than the full benefits that caching can provide. One such setting is that ofnetworks in which caches are connected to the backbone communications systemthrough broadcast links. In the recent years, substantial amount of work has beendevoted to characterizing the fundamental limits of the gain of caching in such net-works, and coming up with techniques to achieve those limits. At the center ofthese attempts has been the introduction of coded caching, a technique rooted ininformation theory that takes advantage of network coding to minimize the amountof information that needs to be communicated in the network.This thesis is devoted to the development of coded caching techniques for threespecific settings that are of significant practical interest. In particular, it adapts aconvex optimization perspective to address the problem of caching in the presenceof duplicate demands, and the problem of designing the optimal way to place thecontent in caches when different files are non-uniformly popular. The latter is acore problem in the caching literature, for which we derive the optimal placementscheme for certain settings of the problem. We further look into the problem ofplacement of files in caches without splitting them into sub-packets. We establishiiia graph theoretical model to study this problem and explore the efficiency of codedcaching under this constraint. We believe that this thesis provides fundamentalinsights into these problems and solutions for them.ivLay SummaryCommunication networks have witnessed a tremendous increase in mobile datatraffic over the last decade. Transmission of such large volumes of informationfrom servers to the end users of the network is a challenging task, mainly due tothe limited capacity of the network communications links. A viable solution foralleviating the traffic on the network links is the storage of popular content in net-work nodes close to the end users. While the conventional caching strategies resultin a linear gain in reducing the network traffic as the cache capacities increase,network coding inspired techniques considerably increase such gains and more ef-fectively reduce the traffic. In this thesis, we propose such coding techniques andoptimize them for use in more practical scenarios. We further design variations ofcoded caching techniques that are easier to implement and yet effective.vPrefaceThis thesis presents the original work and contributions I conducted during myPh.D. research under the supervision of Professor Lutz Lampe in the Departmentof Electrical and Computer Engineering of the University of British Columbia.Throughout my work, I always benefited from the technical discussions and feed-back from Professor Ian Blake in the Department of Electrical and Computer En-gineering of the University of British Columbia.The contributions of this thesis have been published in several peer-reviewedarticles. In particular, a version of Chapter 3 appears inS. A. Saberali, H. Ebrahimzadeh Saffar, L. Lampe and I. Blake, “Adaptive Deliv-ery in Caching Networks” in IEEE Communications Letters, vol. 20, no. 7, pp.1405-1408, July 2016.I am responsible for all contributions in Chapter 3. This includes the reviewof the literature, identifying and formulation of the problem, development of thesolution, doing the computer simulations and comparing the results to the resultsexisting in the literature. For this work, I benefited from technical discussions withmy co-author, Dr. Hamidreza Ebrahimzadeh Saffar, who was at the time a PostDoctoral Fellow in our research group. He brought to my attention the works inthe literature that look at the caching problem from a network coding perspective,a perspective from which I conducted my research in this thesis. Professor LutzLampe and Professor Ian Blake supervised this work through technical discussionsand the review of the manuscript.The work in Chapter 4 was published inS. A. Saberali, L. Lampe and I. F. Blake, “Full Characterization of Optimal Un-coded Placement for the Structured Clique Cover Delivery of Nonuniform De-vimands,” in IEEE Transactions on Information Theory, vol. 66, no. 1, pp. 633-648,Jan. 2020.I led the research for all the contributions in Chapter 4, including literature re-view, identification of the problem, formulation of the problem as an optimizationproblem, solving the problem and in particular linking the problem to the math-ematical concepts related to the submodular set functions, and preparation of thepublication manuscript. Professor Lutz Lampe and Professor Ian Blake supervisedthis work through technical discussions and the review of the manuscript.Finally, the contributions presented in Chapter 5 have been published inS. A. Saberali, L. Lampe and I. F. Blake, “Decentralized Coded Caching WithoutFile Splitting,” in IEEE Transactions on Wireless Communications, vol. 18, no. 2,pp. 1289-1303, Feb. 2019.I am responsible for all the contributions in Chapter 5, including the review ofthe literature, identification of the problem, formulation of the problem, coming upwith the mathematical tools to address the problem and solving it, performing allthe simulations, and preparation of the publication manuscript. This work was alsodone under the supervision of Professor Lutz Lampe and Professor Ian Blake.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Caching in wireless networks . . . . . . . . . . . . . . . . . . . . 21.1.1 Practical significance . . . . . . . . . . . . . . . . . . . . 21.1.2 Broadcasting: A key differentiator for wireless caching . . 31.2 Connection to index coding . . . . . . . . . . . . . . . . . . . . . 41.2.1 Overview of index coding . . . . . . . . . . . . . . . . . 41.2.2 Canonical model for coded caching . . . . . . . . . . . . 51.2.3 Similarities to and differences from index coding . . . . . 51.3 Summary of contributions and organization . . . . . . . . . . . . 6viii2 Problem Statement and Review of Literature . . . . . . . . . . . . . 92.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Algorithms core to coded caching . . . . . . . . . . . . . . . . . 112.3 A survey of coded caching literature . . . . . . . . . . . . . . . . 163 Coded Caching for Delivery of Duplicate Demands . . . . . . . . . . 213.1 Inefficiency of the core delivery algorithms for duplicate demands 213.2 Selection between coded/uncoded messages for duplicate demands 233.3 A cutset bound on delivery rate of duplicate demands . . . . . . . 283.4 Numerical explorations . . . . . . . . . . . . . . . . . . . . . . . 293.5 Comparison to optimal coded caching with uncoded prefetchingfor uniform demands . . . . . . . . . . . . . . . . . . . . . . . . 333.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 344 Coded Caching for Nonuniform Demands . . . . . . . . . . . . . . . 364.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . 374.1.2 Our contributions . . . . . . . . . . . . . . . . . . . . . . 384.2 Problem setup and formulation of utilized storage and expecteddelivery rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . 414.2.2 Delivery algorithm . . . . . . . . . . . . . . . . . . . . . 414.2.3 Formulation of expected delivery rate and storage . . . . . 424.3 Formulation of RMSC and characterization of its dual problem . . 454.3.1 Formulation of RMSC in terms of the placement parameters 454.3.2 Duality framework and derivation of JRSM . . . . . . . . 464.4 Optimal solution to JRSM . . . . . . . . . . . . . . . . . . . . . . 474.4.1 An equivalent formulation of JRSM . . . . . . . . . . . . 484.4.2 Review of submodular functions and relevant results . . . 494.4.3 Connection to submodular functions . . . . . . . . . . . . 514.5 Optimal solution to RMSC . . . . . . . . . . . . . . . . . . . . . 544.5.1 Optimal solution of RMSC in terms of optimal JRSM solution 544.5.2 Algorithm to derive the set of base-case memory sizes . . 57ix4.5.3 Numerical exploration . . . . . . . . . . . . . . . . . . . 604.5.4 Discussion of the performance gap to the information-theoreticouter-bound . . . . . . . . . . . . . . . . . . . . . . . . . 644.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 655 Coded Caching at File-Level . . . . . . . . . . . . . . . . . . . . . . 675.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.1.1 Motivation and related work . . . . . . . . . . . . . . . . 675.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 685.2 Placement and delivery algorithms . . . . . . . . . . . . . . . . . 705.2.1 Placement . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2.2 Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.3 Performance analsyis . . . . . . . . . . . . . . . . . . . . . . . . 745.3.1 Overview of Wormald’s differential equations method . . 745.3.2 Statistical properties of the random graph models for sideinformation . . . . . . . . . . . . . . . . . . . . . . . . . 755.3.3 Analysis of the expected rate of CFCM! and CFC-CCD . . 785.4 Performance comparison and simulations . . . . . . . . . . . . . 825.4.1 Numerical examples for expected rates . . . . . . . . . . 835.4.2 Characterization of coding gain . . . . . . . . . . . . . . 845.5 Extension to subfile caching . . . . . . . . . . . . . . . . . . . . 865.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 906 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97A Proofs of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 103A.1 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . 103A.2 Proof of Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 104A.3 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 105A.4 Proof of Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 108A.5 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 109xB Additional Material and Proofs of Chapter 5 . . . . . . . . . . . . . 113B.1 Wormald’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 113B.2 Proof of Proposition 7 . . . . . . . . . . . . . . . . . . . . . . . . 115B.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . 117B.4 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . 119B.5 Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 121xiList of TablesTable 3.1 Statistics of user demands in simulations of Fig. 3.4 . . . . . . 32Table 4.1 Comparison of sets of popular files resulting from Rate Min-imization with Storage Constraint (RMSC) with the ones ob-tained by other techniques . . . . . . . . . . . . . . . . . . . . 63xiiList of FiguresFigure 1.1 A two-tier cellular network with multiple cache-enabled fem-tocells within a macrocell . . . . . . . . . . . . . . . . . . . 4Figure 2.1 Schematic of a network of caches and a central server. . . . . 10Figure 3.1 Information cutset illustration for the caching network . . . . 28Figure 3.2 Rates of different delivery schemes for K = 12 caches . . . . 30Figure 3.3 Rates of different delivery schemes for K = 8 caches . . . . . 31Figure 3.4 Average rates of the different delivery schemes forK = 8 caches 32Figure 3.5 Comparison of the expected rate of the non-adaptive, adaptiveand optimal delivery for uniform demands for fixed number ofdistinct files . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 4.1 Optimal dual parameter of RMSC as a function of cache size . 55Figure 4.2 The effect of cache size on expected delivery rate and the amountof storage to different file partitions . . . . . . . . . . . . . . 59Figure 4.3 Joint effect of cache size and nonuniformity of the file requestprobabilities on the expected delivery rate . . . . . . . . . . . 61Figure 4.4 Comparison of the delivery rates of RMSC and other methodsfor different Zipf distribution parameters . . . . . . . . . . . . 62Figure 5.1 Left: an example side information digraph D. Right: the cor-responding side information graph G. . . . . . . . . . . . . . 71Figure 5.2 Illustration of the dependencies among the edges of the sideinformation graph . . . . . . . . . . . . . . . . . . . . . . . . 77xiiiFigure 5.3 Comparison of the delivery rates of the different caching schemes 83Figure 5.4 Expected number of vertices that are covered by cliques ofsizes larger than 2 by Algorithm 9, normalized by number ofcaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Figure 5.5 Expected per-user delivery rates for caching networks with dif-ferent numbers of caches . . . . . . . . . . . . . . . . . . . . 85Figure 5.6 Comparison of the additive and multiplicative coding gains ob-tained by the different caching schemes . . . . . . . . . . . . 86Figure 5.7 Side information graphs for separate and joint delivery of subfiles 89Figure 5.8 Rates obtained by theoretical approximations vs simulation re-sults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91xivList of Symbols[n] set {1, . . . , n}[n]i set {i+ 1, i+ 2, . . . , i+ n}∆ number of equal-length subfiles per fileγ Lagrange multiplier for capacity constraintpigs probability that for a set of caches S : |S| = s, g ∈ Gs is the set of files indSAD a delivery algorithmC set of distinct demands in dd demand vector (d1, . . . , dK)dS subdemand vector for requests of caches in Sdk index of file demanded by cache kD side information digraphDa asymptotic approximation to D at K/N → 0E set of edges of DF length of files (bits)G side information graphxvGa asymptotic approximation to G at K/N → 0Gs set of all subsets of [N ] with cardinality less than or equal sK number of cachesL number of distinct demands in dM set of cache capacities for base-cases of RMSCM cache capacity (files)Ml max{m ∈M | m < M}Mu min{m ∈M |M < m}N number of filesP a placement of files in cachespn request probability of file nq M/NR set of optimal rates for base-cases of RMSCR delivery rate (files)S a subset of [K]Supp(w) support of vector wXn set of the bits of file nXnS set of bits of file n that are exclusively cached in caches in SxnS length of subfile XnS normalized by F bitsxns xnS for S : |S| = syns(Ks)xnsY∗ set of optimal solutions for base-cases of RMSCxviYi(t) number of cliques of size i formed up to iteration t− 1 by Algorithm 9 onGaxviiGlossaryCDN Content Delivery NetworkCFC Coded File CachingCSC Coded Subfile CachingCFC-CCD Coded File Caching with greedy Clique Cover DeliveryCSC-CCD Coded Subfile Caching with greedy Clique Cover DeliveryJRSM Joint Rate and Storage MinimizationRMSC Rate Minimization with Storage ConstraintSCC Structured Clique Cover (Algorithm 4)xviiiChapter 1IntroductionFrom the early days of the invention of the internet, network congestion deemedto be a fundamental challenge in the communication of data. Enormous amountsof information must be transmitted from servers to clients over network links withlimited communication capacities. During the peak traffic periods, the limited ca-pacity of the links becomes a bottleneck for the transmission of data, adverselyaffecting the quality of service provided to the users. On the other hand, the natureof the web traffic is such that the same content is often demanded by several andpotentially by a large number of users, a property that can be exploited to alleviatenetwork congestion. In particular, the traffic on the network links can be partiallyoffloaded by the storage of highly reused content at intermediate nodes in betweenthe server and the end-users, eliminating the need for redundant transmission ofthe same content over the entire path from the server to the user. This techniqueis known as content caching and is nowadays an indispensable component of theContent Delivery Networks (CDNS).Caching in CDNS is a mature field and the subject has been long studied.However, the newly established connections between content caching and cer-tain elements of information theory, as well as the need for the deployment ofcaching techniques in networks that are fundamentally different from the conven-tional caching networks, has sparked renewed interest in research on caching tech-niques in the span of the last six years. In particular, the index coding problemin information theory corresponds to a fundamental component of the operation1of caching systems and answers the following question: how can the content thatis not stored in the caches be delivered to them most efficiently by exploiting thecontent that they already have available. With the information theoretic perspec-tive to the content caching problem comes a wide range of possible designs forcaching systems, each of which needs to be evaluated and optimized for the bestperformance. This requires the deployment of various tools that are provided byoptimization theory. This thesis therefore approaches the caching problem from anoptimization theory perspective as well as a graph theoretical perspective inspiredby index coding. This is accomplished by providing new designs for caching sys-tems, their evaluation and optimization of their performance.1.1 Caching in wireless networks: A fundamentally dis-tinct caching paradigmA prime motive for the consideration of caching problem in an information theo-retic framework is the emergence of network models that are fundamentally dif-ferent from the conventional caching network models. The network model deter-mines how the caching nodes communicate with the servers in order to receivethe content. Caching in wireless communication networks is an example of a net-work where the cache-server communications pattern significantly deviates fromthe conventional caching systems. In the following, we review the practical impor-tance of caching in wireless networks and elaborate on the factors that differentiatethe caching problem in such networks from the traditional caching paradigm.1.1.1 Practical significanceCommunication networks have witnessed a tremendous increase in mobile datatraffic over the last decade. Projections indicate that this growth will continue from2017 to 2022 with a compound annual growth rate of 46%, leading to a seven-foldincrease in mobile data traffic [1]. The growth in mobile traffic is mainly due tothe proliferation of smart devices and the emergence of a vast array of wireless ser-vices, including multimedia streaming, social networking and web browsing. Theshare of mobile video streaming from the mobile traffic is expected to increase to275% by 2020. Following a similar trend, the average traffic that a mobile-connectedend-user generates in 2020 is estimated to be 3.3 GB per month, up from a 495 MBper month in 2015 [2, 3]. To meet such tremendous traffic demands, the next gener-ation communication networks need to shift to networking paradigms that allocateresources based on awareness of content, user and location [2]. Such paradigmspromote distributed and proactive designs with the ability to adapt to the varyingenvironmental parameters.A powerful approach to meet the traffic demands in wireless communicationnetworks is the deployment of multi-tier architectures, where small cells, aka fem-tocells, underlay the macrocellular network. This architecture brings short-rangeand low-cost base-stations closer to the end-users, providing a considerable capac-ity gain over the conventional high-power macro-cellular networks. Nonetheless,the existing multi-tier networks fall short of meeting the peak traffic demands dur-ing the congestion hours. This is because of the lack of cost-effective backhaulconnectivity of the small cells to the backbone communication network [4, 5].Local content caching is a promising technique to solve the problem of limitedbackhaul connectivity, replacing the expensive backhaul capacity with the cheaperstorage capacity. Local caching exploits storage nodes with large memory capac-ities in the small cells (Fig. 1.1). By storing the popular content closer to theend-users, the caches serve the users locally and offload traffic from the backhaullinks of the network [5]. This is effective in scenarios with large enough portion ofduplicate demands. The cache-enabled networking paradigm is also proactive, inthe sense that the network can take advantage of the users’ contextual informationto predict the future user demands using large-scale machine learning algorithms,and leverage its predictive ability to best utilize the available resources [4].1.1.2 Broadcasting: A key differentiator for wireless cachingAlthough caching in content delivery networks is a mature technique, web cachingtechniques are not sufficient for wireless caching as they ignore certain fundamen-tal characteristics of wireless networks. In particular, a key differentiator betweenweb caching and wireless caching is the broadcasting nature of the wireless chan-nel, i.e., a message sent by a transmitter can be received by multiple clients. This3Figure 1.1: A two-tier cellular network: several caching femtocells are deployedwithin a macro-cellproperty provides the opportunity for smarter designs of caching networks by usingcoding techniques from information theory, and promises notable gains in reducingthe network traffic [6].1.2 Connection to index coding1.2.1 Overview of index codingThe broadcast nature of the wireless channel closely connects the design of cachingsystems to a long-investigated problem in information theory known as index cod-ing or source coding with side information problem [7]. The index coding problemconsists of a server and multiple caching clients. There exists a broadcast channelshared by all the clients, and only the server can transmit on the shared channel.Every clients makes a request for data from the server while it has some side in-formation about the requests of the other clients. More precisely, each user hasspecific parts of the content requested by the other clients available in its cache.Although the clients cannot communicate that information with each other, suchside information can provide opportunities for the server to send less amount ofinformation on the channel in order to deliver the requests of the clients. Such4opportunities are referred to as coding opportunities. In the index coding problem,one aims at a smart construction of the server messages such that it maximallyexploits the side information. This in turn translates to the server broadcastingthe minimal amount of supplementary information on the shared channel such thatevery client is able to decode the transmitted messages for its desired information.Solving the index coding problem is NP-hard in its general form, i.e., for anarbitrary arrangement of the side information on the clients. This results from thefact that the linear index coding problem can equivalently be formulated to the ma-trix min-rank problem [7] for linear codes, which is NP-hard to solve. Connectionsbetween the index coding problem and other NP-hard problems like vertex cliquecover problem and graph coloring problem are also established in [8] and [9] and[10], respectively.1.2.2 Canonical model for coded cachingTo elaborate on the connection between the wireless caching problem and indexcoding, we lay out the canonical model for wireless coded caching problem. Thismodel consists of a network with a central server and multiple caching nodes. Thecentral server is connected to caches through an error-free broadcast channel.A library of popular files is given, only a subset of which can be stored in thecaches due to their limited storage capacity. The goal is to distribute the contentof the files in the caches such that the amount of information transmitted on thebroadcast channel to deliver the missing content is minimized. Since user demandsare random, one can aim at minimizing either the peak or the average amount ofinformation transmitted.1.2.3 Similarities to and differences from index codingIt is evident that the index coding problem is closely related to the coded cachingproblem. In particular, in the caching problem, minimizing the amount of infor-mation that is transmitted by the server directly translates to requiring the server tofetch the least amount of information from the backbone communication channel,which alleviates the traffic load on the backhaul link of the network.5Despite their similarities, the index coding and caching problems have funda-mental differences. In index coding problem, the core task is to construct deliverymessages that best utilize the coding opportunities provided by a given side in-formation arrangement, i.e., the focus is on creating the delivery messages. Incontrast, there are two core tasks to the caching problem: the first task is the place-ment of the side information on the caching nodes, i.e., creating coding opportuni-ties, and the second task is the construction of delivery messages. More precisely,the placement of content in the caches is itself a design aspect of the caching prob-lem. Furthermore, the same placement of content in the caches is used to delivera possibly large number of requests that are successively made by the clients, asone cannot update the content of the caches after delivering every set of client de-mands. The client demands are made randomly based on a probability distribution.Hence, the placement must be done such that it both creates coding opportunitiesand keeps the hit rate of the caches high, which eventually translate into minimiz-ing a measure like the expected amount or maximum amount of supplementaryinformation sent on the broadcast channel over the entire delivery interval. It be-comes evident from our discussion that coded caching is a more general problem,with the index coding constituting one aspect of it. Despite the existence of a richliterature on the index coding problem, this difference has sparked the interest ofthe scientific community in the coded caching field since the year 2014 when theseminal work [11] on coded caching was published.1.3 Summary of contributions and organizationThis thesis investigates three research problems in the field of coded caching. Here,we provide a brief summary of each problem and our contributions. We leave athorough review of the relevant literature to Chapter 2.The first problem concerns the design of coded caching schemes when the filesto be cached have different request probabilities. This is a significant problem asthe core algorithms developed for coded caching are proposed for the case wheredemands for different files1 are uniformly distributed. As will be discussed in1popular files to be cached6Chapter 2, many works in the literature attempted to find the optimal way of al-locating cache memory to files with different levels of popularity when the corecoded message construction algorithms are used. This thesis lays out an optimiza-tion framework to formulate the problem of content placement and finds the op-timal placement strategy analytically. One benefit of the analytical approach wefollow is the insights that our approach provides about the optimal strategy and itsconnection to the placement algorithms that had been derived for uniformly popu-lar files.As the second problem, we explore the development of a coded caching schemethat keeps the files intact during the placement in the caches. In particular, the corecoded caching algorithms require the splitting of each file into a possibly largenumber of chunks which will be distributed over different caching nodes. Thisproperty is inherited by the majority of the works in the coded caching literature.However, splitting files into such large number of chunks is not only difficult inpractice, the theoretically promised caching gains are only valid if files are in-finitely long. In this thesis, we propose a coded caching scheme where files arekept intact for placement in caches. In order to analyze the performance of theproposed method, we use a random graph model and a method of modeling thedynamics of the proposed algorithm by differential equations. We show that al-though the coding gain of file caching is smaller than that of subfile caching, it isstill notably larger than uncoded caching.The third problem that we investigate in this thesis is the construction of codedmessages when multiple caches demand identical files. The core coded cachingalgorithm were not efficient in such a scenario, as they were originally developedunder the assumption that caches request distinct files. We propose a method toimprove the core algorithms to better suit duplicate file requests by different caches.In the chronological order, this was the earliest work completed for the sake of thisthesis. Later works in the literature proposed more efficient techniques to deal withduplicate demands.This thesis is organized as follows. In Chapter 2, we lay out the canonicalmodel for coded caching and the general problem statement. We then briefly dis-cuss the core coded caching algorithms that initiated the field. This is followedby a review of coded caching literature. In Chapter 3, we present our work on7coded caching with duplicate demands. Chapter 4 is devoted to coded cachingwith nonuniform demands. We present our work on coded caching without sub-packetization of files in Chapter 5. We conclude the thesis in Chapter 6.8Chapter 2Problem Statement and Reviewof Literature2.1 Problem statementAt the core of the coded caching literature is the model proposed in [11] for acaching network which consists of a central server and multiple caching nodes. Inthis chapter we first present this model and then review the coded caching literature.Consider a network with a central server that is connected to K caching nodesthrough a shared error-free communication link as in Fig. 2.1. There is a library ofN files, each of which of size F bits. The whole database of files is available tothe server. To the contrary, every cache has a limited storage capacity of MF bits.Hence, each cache can only have the equivalent of M files available in its localstorage. The non-locally available content needed by a cache has to be acquiredfrom the server via the shared broadcast communications link.The described caching network operates in two phases. In the first phase,known as the placement phase, every cache fills its storage with parts of the contentfrom the library, up to their storage capacity. This phase usually takes place duringthe off-peak hours of the network operation. We denote by P the placement offiles in the caches. The knowledge of the file placement is equivalent to knowinga mapping from the bits of every file in the library to the storage of every cache inthe network. The other phase of operation is known as delivery phase. The deliv-9ServerCache 1 Cache 2 Cache K−1 Cache K· · ·Figure 2.1: Schematic of a network of caches and a central server.ery phase takes place during the time when the network is congested and after theplacement phase. During delivery, the placement of content in the caches P doesnot change. Each cache provides its users with the parts of the requested files thatit has locally available. However, the caches need to fetch the missing content fromthe server via the shared link.Notation 1. We use notation [n] to denote the set of the first n positive integers{1, . . . , n}. Similarly, we use [n]i to denote the set of the first n positive integerslarger than i, i.e., {i+ 1, i+ 2, . . . , i+ n}.The server is aware of the local content of all the caches. At each time instant, itis informed of the file requested by each cache. In particular, every cache k ∈ [K]reveals one request for a file dk ∈ [N ]. We refer to d = [d1, . . . , dK ] as the demandvector which represents the demands of all caches at the current time instant.To deliver the content requested by every cache, the server transmits a messageXd of sizeR(d;P, AD)F bits on the shared link. The quantity R(d;P, AD) is thedelivery rate for the demand vector d, given a specific placement of files P and adelivery algorithm used by the server denoted byAD. To avoid notation clutter, we10drop the dependency of the rate on the placement and delivery algorithm from thenotations when it is clear from the context.Every cache needs to reconstruct the file it requested using both the content ithas available locally and the message sent by the server. We say that a memory-ratepair (M,R(d)) is achievable for the demand vector d if every cache k is able toperfectly recover its requested file dk. Further, we say that the memory-peak-ratepair (M,Rpeak) is achievable if it is achievable for all possible demand vectors d ∈[N ]K . For cache sizeM , the smallest rateRpeak for which (M,Rpeak) is achievablecharacterizes the memory-peak-rate tradeoff. This smallest peak rate is denoted byR∗peak(M). Similarly, E (R(d;P, AD)) is the expected delivery rate, where theexpectation is over the randomness in vector d, and possibly the randomness in thedelivery algorithm AD and in the placement algorithm that determines P .The design of a coded caching scheme requires the selection of the appropriateperformance metric. In many practical scenarios, it is of interest to characterize thememory-expected-rate tradeoff (M,E (R(d;P, AD))), i.e., to design the place-ment P and the delivery algorithm AD such that for every demand vector d, R(d)is achievable and the expected delivery rate is minimized. Depending on the designpriorities and requirements, the objective could instead be the characterization ofthe memory-peak-rate tradeoff. Notice that if the expected delivery rate is usedas the measure of merit of a caching scheme, the knowledge of the probabilitydistribution of the files becomes of primary significance as it directly affects theexpected delivery rate.Next in this chapter, we focus on the placement and delivery algorithms at thecore of the coded caching literature, which are originally designed for uniformlydistributed user demands and the peak-rate criterion as the performance measure.We then review the literature for other coded caching schemes that adapt thesealgorithms and their variations for more complex scenarios, as well as the worksthat follow fundamentally different approaches to coded-caching.2.2 Algorithms core to coded cachingAt the center of several coded caching schemes in the literature are the centralizedand decentralized caching algorithms proposed by Maddah-Ali and Niesen in [11,11Algorithm 1 Placement of the centralized caching scheme of [11]Require: K,M,N, {Xn}n=1,...,N1: t← KM/N2: for k ∈ [K] do3: for n ∈ [N ] do4: splitXn into(Kt)non-overlapping subfiles of equal size labeled by {XnS :S ⊂ [K], |S| = t}5: Place subfiles XnS : k ∈ S in the cache of user k6: end for7: end for12], respectively. Because of the conceptual significance of these algorithms andthe fact that many of the techniques developed in the literature and in this thesisare closely related to them, we present these algorithms in this section.First, we lay out the notations that we use throughout this thesis to characterizean uncoded placement of files in the caches.Notation 2. Let Xn be the set of the bits of file n. Also, for each S ⊂ [K], letXnS ⊂ Xn represent the bits of file n that are exclusively cached in the caches inS.1 By definition, subsets XnS are disjoint for different S and⋃S⊂[K]XnS = Xn.Also, define xnS = |XnS |/F as the ratio of bits of file n that are exclusively cachedin the caches in S. Then, it follows that ∑S⊂[K] xnS = 1 for every n ∈ [N ]. Wedenote by x the vector of all placement parameters xnS .Centralized caching scheme of [11] The placement of [11] is presented in Algo-rithm 1. This placement algorithm assumes that the popularity of files is uniform,hence it allocates the same amount of cache storage to each file in the library.About this algorithm, the following features are worthy to recognize:• For integer t = KM/N , each file is split into (Kt ) equal size subfiles. Hence,this algorithm does not keep the files intact during placement.• With memory-rate pairs (M1, R1) and (M2, R2) for Algorithm 2 such thatt1 = KM1/N and t2 = KM2/N are consecutive integers, any point on the1In other words, bits XnS ⊂ Xn are stored in every cache in S and are not stored in any cache in[K] \ S.12Algorithm 2 Delivery of the centralized caching scheme of [11]Require: d; {Xn}n=1,...,N1: t← KM/N2: for S ⊂ [K] : |S| = t+ 1 do3: server sends ⊕k∈SXdkS\k4: end forline segment connecting the pairs is also achievable using memory sharing.In particular, for M = (1 − θ)M1 + θM2, 0 ≤ θ ≤ 1, use Algorithm 2with cache size M1F to delivery the first (1− θ)F bits of every file and withcache size M2F for the rest θF bits of every file.• If the number of caches in the network changes by the arrival of a new cacheor the departure of an existing cache, quantity t could change. This requiresthe content of all the caches to be updated based on the new number ofcaches in the system. Hence, a central node needs to be aware of the globalarchitecture of the network and to determine the content that each cacheneeds to store.• This is a deterministic placement algorithm.Once the content is placed in the caches using Algorithm 1, the missing contentcan be delivered to the caches by Algorithm 2. For a subset S ⊂ [K] of size t+ 1,consider a cache l ∈ S. Then, S\k is of size t and all the subfiles embedded inthe message ⊕k∈SXdkS\k2 are available in cache l, except for XdlS\l. Hence, cache lcan decode the message for the part of its desired file that is not available locally.Repeating this process for all subsets S in the for loop, cache l can recover all thesubfiles of file dl that it has missing, and therefore, is able to fully reconstruct itsrequested file dl.It can be shown that the peak delivery rate of the centralized coded cachingscheme with Algorithms 1 and 2 for placement and delivery is [11]Rcentralizedpeak (M) = K1−M/N1 +KM/NF bits, (2.1)2For compactness of notation, we use S\k instead of S\{k} when only one element is to beexcluded from set S.13Algorithm 3 Placement of the decentralized caching scheme of [12]Require: K,M,N, {Xn}n=1,...,N1: t← KM/N2: for k ∈ [K] do3: for n ∈ [N ] do4: user k independently caches a subset MN F of bits of file n, chosen uni-formly at random5: end for6: end forAlgorithm 4 Delivery of the decentralized caching scheme of [12] (Structured-Clique Cover Algorithm)Require: d; {Xn}n=1,...,N1: for s = 1, . . . ,K do2: for S ⊂ [K] : |S| = s do3: server sends ⊕k∈SXdkS\k4: end for5: end forwhich provides a coding gain of 1 +KM/N .Decentralized caching scheme of [12] The facts that by the centralized place-ment algorithm of 1, the content of the caches are strongly dependent on the totalnumber of caches, and the identity of the subfiles stored by each cache must bearranged in advance, restrict its usage in practice. The placement algorithm inAlgorithm 3 resolves these issues by randomizing the placement process.The corresponding delivery procedure in Algorithm 4 can be seen as a gener-alization of Algorithm 2. Here, in the for loop, the coded messages are constructedfor all nonempty subsets S ⊂ [K] instead of only the subsets of size t+ 1. We callthis algorithm the Structured-Clique Cover Algorithm.The following points characterize some important features of the placementand delivery based on Algorithms 3 and 4:• The content that each cache stores is completely independent from the con-tent that the other caches store. This results in a decentralized placementalgorithm.14• The placement and delivery algorithms effectively split each file into 2Ksubfiles corresponding to the subsets of [K]. Again, the number of subfilesgrows exponentially with K. The lengths of these subfiles are not equal. Inparticular, the expected length of XnS is equal to (MN )s(1 − MN )K−sF bits,where s = |S|.The peak delivery rate of the decentralized coded caching with 3 and 4 forplacement and delivery isRdecentralizedpeak (M) = K(1−M/N)NKM(1− (1−M/N)K)F bits (2.2)as F → +∞.The decentralized caching scheme has a coding gain of NKM(1− (1−M/N)K)and is proved to be order optimal, i.e., its peak delivery rate is within a constantfactor of the optimal delivery rate [12].Modified delivery algorithms of [13] In the case that multiple caches requestthe same file, Algorithms 2 and 4 embed the same subfiles of such a file into multi-ple server messages. This can be redundant and sub-optimal. Modified versions ofthe centralized and decentralized delivery algorithms were proposed in [13, Sec-tion IV.B and Appendix C.A], which resolve this inefficiency and are shown toachieve the optimal memory-rate tradeoff for the case of uniform demands whenuncoded placement is used, for the centralized and decentralized settings [13].More specifically, for any given demand vector d, the modified algorithms firstchoose a set of leader caches U that have the property that they have all requesteddistinct files in d. Hence, the number of leader caches is between 1 and K, de-pending on d. In the decentralized setting, contrary to Algorithm 4, which greedilytransmits the binary sums ⊕k∈SXdkS\k for every S ⊂ [K], the modified algorithmtransmits the binary sum for S ⊂ [K] only if S ∩U 6= ∅. For the leader caches, de-coding of the server messages is straightforward, as they have to only decode oneserver message as in Algorithms 2 and 4. The non-leader caches can still decodethe server messages for the parts they are missing, but that requires each of themto process multiple messages. A similar result holds for delivery in the centralizedsetting.15In case of worst-case demands, U = [K], hence the modified algorithms re-duce to the original algorithms 2 and 4 for the centralized and decentralized set-tings. Hence, the original algorithms are optimal in terms of the peak delivery rate.However, Algorithms 2 and 4 are suboptimal in terms of the expected delivery rateas this case also includes demands where |U| < K.2.3 A survey of coded caching literatureThe seminal work of Maddah-Ali and Niesen [11] in 2014 initiated the field ofcoded caching by providing an information theoretic formulation of caching prob-lem and investigating its fundamental limits. In this work, the authors further pro-posed a certain placement of content in the caches such that symmetric codingopportunities are created among the different caches. By virtue of these symme-try structures, they offered a solution to the resulting index coding problem. Thecombination of these placement and delivery message construction algorithms ledto unprecedented gains in reducing the amount of supplementary information thatthe server needs to broadcast. In the following we provide a more quantitativecharacterization of this gain.A naive adaptation of the conventional uncoded caching schemes to the canon-ical network model in Section 1.2.2 requires every cache in the network to storeM out of the N files. Since each cache is conventionally served separately by theserver, the identity of files stored in each cache is irrelevant in this setting3. Underuniform distribution over the user requests, M/N F bits of the content that eachcache requests is available to it locally on average, and the rest (1 −M/N)F bitsmust be fetched from the server. Hence, the total amount of supplemental infor-mation transmitted by the server on the broadcast channel, the delivery rate, willbe4Rworst-caseuncoded (M ;K,N) = K(1− MN)min{1,NK}F bits. (2.3)3This is assuming files are equally popular and user requests are independent from each other.4This is assuming that at each step, the requests of the caches are distinct. In the case of duplicaterequests, (2.3) is an upper bound on the transmitted information.16Using the coded caching scheme proposed in [11], the worst-case delivery rate,which is an upper bound on the average delivery rate, is derived asRworst-casecentralized coded(M ;K,N) = K(1−M/N) min{11 +KM/N,NK}F bits,which is considerably smaller than the uncoded delivery rate. In particular, thecoding gain is multiplicative and is equal to 1+KM/N . An important observationis that for a fixed storage capacity per cache, as the number of caches increases, thedelivery rate of uncoded caching linearly increases, while the delivery rate of thecoded caching approaches the constant value (N −M)/M .In a later work [12], Maddah-Ali and Niesen proposed the decentralized cachingdesign for coded caching as detailed in Section 2.2, which has delivery rate [12]Rworst-casedecentralized coded = K(1−M/N) min{NKM(1− (1−M/N)K) , NK}F bits.The decentralized nature of this method made it the building block of severalcaching techniques that were designed later for more complicated scenarios [14–18].In [19, 20], it was shown that the worst-case delivery rate of the centralizedscheme in [11] is not only achievable, but it is the optimal worst-case delivery ratefor coded caching schemes with uncoded prefetching and uniformly distributeddemands, when the number of files is larger than the number of caches (N > K).The exact memory rate tradeoff for coded caching with uncoded prefetching anduniform demands was derived in [13], where the authors showed that the placementschemes in [11] and [12] are optimal respectively for centralized and decentralizedsettings both in terms of expected delivery rate and worst-case delivery rate. Thisresult holds regardless of the number of files and caches. Further, expressions forthe optimal worst-case and the optimal expected delivery rates where derived.The coded caching scheme of [12] requires the splitting of each file into a largenumber of subfiles and the storage of subfiles in the caches. The number of sub-files per each file grows exponentially with the number of caches K. This propertyis therefore inherited by all coded caching designs that use these two schemes astheir building blocks. Several works in the literature [21–25] have investigated17coded caching schemes with lower subpacketization requirements. In particular, itwas shown in [21] that the decentralized scheme of [12] has a coding gain of atmost 2 even if the subpacketization levels grows exponentially with the asymptoticcoding gain KM/N . The authors proposed new caching schemes that result inbetter coding gains for smaller subpacketization levels. In [24], the authors pro-posed coded caching schemes based on specific combinatorial structures designsand propose methods with substantially reduced subpacketization levels at the costof an increase in the delivery rate. Further, construction of coded caching schemeswith polynomial subpacketization levels is explored in [22] using specific types ofgraphs, namely Ruzsa-Szemeredi graphs. More results in [23] establish connec-tions between code design in the finite-length file regime and problems in combi-natorics. In Chapter 5 of this thesis, we also investigate the coded caching problemwhen files are kept intact during placement, i.e., we propose a decentralized place-ment of files in the caches such that a file is either cached entirely or is not cached.We propose a greedy clique-cover method for delivery of the content and derive amethod to compute the expected delivery rate of the proposed scheme.For non-uniform popularity distribution of files, the placement phase can beimproved such that it provides more memory resources to the storage of the morepopular files. In such a scenario, the average delivery rate is a better metric forthe performance of the system [14–16, 26, 26–28]. Many of the works on codedcaching with nonuniform demands are based on the decentralized caching tech-nique and a grouping of files such that within each group the popularity of filesis relatively uniform. Other schemes [26] rely on graph theoretic methods for de-livery. One contribution of this thesis is to find the optimal placement of contentin the caches when demands follow a non-uniform distribution and Algorithm 4 isused for delivery.Reference [29] considered the coded caching problem when the capacities ofthe different caches are not identical. Reference [18] investigated the hierarchicalcoded caching problem, where every cache in one layer acts as a server for thecaches in its lower level. Multi-sever coded caching is explored in [30] where thecaching nodes are served by multiple servers. This work focuses on coding delayas the optimization metric instead of the amount of information transmitted on thebroadcast channel. The problem of online coded caching is explored in [17] where18the set of files in the library evolves based on a Markov model, yet the user requestsare uniformly selected from the library. For this setting, an approximate expressionis derived for the long-term average delivery rate for a rule developed to update thecontent of caches and the algorithm for decentralized coded caching. Reference[16] extended the work of [12] to the case that each user of the network is servedby d > 1 caches. Here, the delivery phase is performed based on the decentralizeddelivery of [12].Coded caching with coded prefetching of content is considered in several worksin the literature [31, 32], yet these schemes are proposed for strictly limited regimesof problem parameters. In particular, [31], proposed a coded prefetching for thecase where M = (N − 1)/K, which results in improved delivery rate when thenumber of caches is larger than the number of files, K > N ≥ 3.Unlike most of the literature on coded caching that consider zero-distortionreconstruction of files, [33] investigates the scenario with user-dependent presetaverage distortion requirements, for both the centralized and decentralized cachingsettings. Here, the objective is to minimize the delivery rate such that for everyuser, all possible demand combinations can be met at the specified distortion lev-els. Coded caching in a network with noisy erasure broadcast channel is consideredin [34], where lower and upper bounds on the memory-rate tradeoff are derived.The receivers belong to two disjoint sets, weak receivers with large erasure prob-abilities and cache memories and strong receivers with small erasure probabilitiesand no caching capability. Also in [35], the authors considered coded caching inthe scenario that the different links between the sever and the different caches haveunequal rates instead of assuming such links are error-free.Finally, implementation aspects of coded caching in wireless networks are ex-plored in [36].It is worthwhile to mention that the literature on wireless caching is not limitedto coded caching techniques, and the latter is a subset of the former. In particular,many works in the literature consider models that are different from the canon-ical model in Section 1.2.2 and optimize objectives other than the delivery rate.References [5, 37–40] are examples of such caching schemes that do not use net-work coding techniques for caching. The scope of this thesis only concerns codedcaching methods and we limited our review of the literature accordingly. For a19more general review of the literature, we refer the readers to the survey of cachingtechniques for cellular networks provided in [41], as well as [6, 42] which explorepractical considerations and challenges for wireless caching.20Chapter 3Coded Caching for Delivery ofDuplicate Demands3.1 Inefficiency of the core delivery algorithms for dupli-cate demandsThe core delivery procedures in Algorithms 2 and 4 are efficient when the differentcaches request distinct files in the demand vector. However, when there existsduplicate requests in the demand vector, i.e., multiple caches request the same file,Algorithms 2 and 4 are clearly suboptimal as they ignore the redundancies in theuser demands and deliver the requested content as if all demands were distinct.The presence of redundant demands becomes more likely in several differentscenarios, some of which we enumerate here. First, as the ratio of the number ofcaches to the number of files K/N increases, the chance of all caches requestingdistinct files decreases, leading to duplicate demands. The second scenario is whenthere exists significant differences in the popularity of files. Duplicate demands be-come more likely as many caches request the highly popular files. Third, certaincorrelations among the requests of the different caches could make it more prob-able for them to demand identical content. Such correlated requests are likely inmany practical scenarios. A considerable amount of multimedia requests are madethrough social networks like Facebook and Instagram, where users with common21social connections and interests are likely to request the same content.In this chapter, we investigate the delivery of redundant demands using mod-ified versions of Algorithms 2 and Algorithm 4. For placement, we use the sameschemes as in Algorithms 1 and 3 for the centralized and decentralized cachingscenarios, respectively. The use of such a placement scheme is justified if accu-rate estimates of the popularity of the files to be cached are not known during theplacement.1 In the case of correlated requests also, the marginal probabilities ofrequesting each file could be similar, yet the joint distribution of the demands couldfavor redundancy in the demand vector. Estimation of such a joint distribution canbe challenging and so is designing a corresponding optimal placement of files inthe caches. These are examples where a symmetric placement of files is a viableoption.For the delivery, however, we propose a new scheme based on message selec-tion. Upon receiving a demand vector from the users and based on the redundancypattern of the requests made, the server decides whether to use uncoded messagesor the coded messages as in Algorithm 4 to deliver the different parts of the re-quested files. This distinguishes our work from [11, 12], as our proposed deliverytakes the specifics of the current demand vector into account to decide on the formof the server messages. However, [11, 12] use a fixed structure to compose theserver messages for all demand vectors. We also derive a lower bound on the de-livery rate of redundant requests.Subsequent to our work, [13] proposed different variants of Algorithms 1 and 3which were proved to be optimal in terms of both the worst-case delivery rate andthe expected delivery rate for the centralized and decentralized settings when de-mands are uniformly distributed and placement is uncoded. This means that underthis setting, these algorithms handle the redundant demands optimally. We com-pare the performance of our proposed delivery to the optimal delivery algorithmfor the completeness of the results.1Clearly, in the setting where accurate estimates of the file request probabilities are available, theyshould be taken into account during placement. This is the investigated in Chapter 4 in detail.223.2 Selection between coded/uncoded messages for dupli-cate demandsWe use the caching network model in Section 2.1. To account for the redundanciesin the demand vector, let L represent the number of distinct files in the demandvector, hence, 1 ≤ L ≤ K. The demand vector is called redundant if L < K.Also, denote by ki, the number of requests for the i-th most requested file in thecurrent demand vector. Thus ki ≥ kj for i < j and i, j ∈ [L]. We call (k1, ..., kL)the redundancy pattern of the demand vector.The delivery rate of Algorithm 4 for a non-redundant demand vector isRworst-case =∑S:S⊂[K]S6=∅maxk∈SxdkS\k. (3.1)Since the placement is symmetric w.r.t. all the files xnS\k is the same for all files nfor placement in Algorithm 1 in general and for placement in Algorithm 3 in theinfinite file size regime. Hence, we can drop the superscript in xnS and simply usexS instead. Similarly, both placements in Algorithms 1 and 3 are symmetric w.r.tsubsets S as long as they have the same cardinality. Hence, xS is only a function of|S|. As a result, we further simplify the notation xS to xs, where s = |S|. Eq. (3.1)can then be written asRworst-case =K−1∑s=0(Ks+ 1)xs. (3.2)If file n is requested by multiple users, including user k, Algorithm 4 embedsXnS\k into several messages. If |S| > 1, user k has the side information to directlydecode only one of those messages. As a result, the server needs to send all themessages with |S| > 1.2 This is not the case for the messages with |S| = 1, i.e.,S = {k}. In these cases, ⊕k∈SXdkS\k = Xdk∅ . Such uncoded messages deliver thebits that are not stored at any cache in the system. All the users that request file2Again, this is the case for the simple decoding technique that requires the decoding of exactlyone server message for one subfile at each cache. If each cache can perform the extra processing ofseveral server messages to decode each subfile, as is proposed in [13], a smaller number of messagescan be sent.23n can decode Xn∅ , so it needs to be sent only once. As a result, the load due tothe uncoded messages is Lx0 instead of Kx0 and the delivery rate of a redundantdemand will beLx0 +K−1∑s=1(Ks+ 1)xs. (3.3)Eq. (3.3) suggests that for redundant demand vectors, the rate of Algorithm 4can be smaller than the peak rate if the uncoded messages are sent only once forthe files that are requested by multiple caches. This is the basis of our analysis inSection 3.2 where we show that depending on the redundancy pattern in the de-mand vector, it might be beneficial to favor the transmission of uncoded messagesover coded messages.Delivery based on message selectionTo improve the delivery for demand vectors with duplicate requests, we introducean extra step to the delivery phase, which takes place after receiving each requestvector and before the transmission of the server messages to the users. In thisstep, the server decides whether to send each part of the requested files throughthe corresponding coded message in Algorithm 4 or through an uncoded message.The use of uncoded messages instead of coded messages to deliver file n can bethought of transferring bits from XnS : S > 0 to Xn∅ . In other words, the cachedelivers requests as if parts of its content were not available.Let XˆnS represent the subset of the bits of file n exclusively cached at S afterthe transfer is done, and xˆnS , |XˆnS |/F .In our delivery method, the server first optimizes xˆnS . Then, it arbitrarily picksxˆnSF bits of XnS to form XˆnS , and adds the rest of the bits to Xˆn∅ . Finally, it usesAlgorithm 4 for delivery based on the resulting subsets XˆnS instead of XnS .We now find the optimal lengths of the updated subsets XˆnS to minimize thesum of the lengths of messages ⊕k∈SXˆdkS\k over all S ⊂ [K]. Let C denote the setof the distinct files requested in the current demand vector. Note that |C| = L ≤ K,and both C and L evolve with time. Then, the rate minimization problem is given24byminimizexˆdkS∑S:S⊂Kmaxk∈SxˆdkS\ksubject to∑S:S⊂KxˆdkS = 1, ∀ dk ∈ C0 ≤ xˆdkS ≤ x|S|, ∀ dk ∈ C, ∀S ⊂ K : |S| > 00 ≤ xˆdk∅ ≤ 1, ∀ dk ∈ C.(3.4)In (3.4), x|S| = |XnS |/F are known from the placement phase. For the centralizedplacement algorithm of 1, the placement parameters are given byxs =1/(Kt), s = t0, otherwise(3.5a)if t is an integer, andxs =(dte − t)/(Kbtc), s = btc(t− btc)/(Kdte), s = dte0, otherwise(3.5b)for non-integer t. The parameters of the latter case are obtained by memory-sharing[11]. For the decentralized placement and for large F , with high probability wehave [12]xdecens ≈ qs(1− q)K−s, s = 0, ...,K. (3.6)Quantity maxk∈S xˆdkS\k is the length of the message ⊕k∈SXˆdkS\k. Thus, the ob-jective function is the rate of Algorithm 4 operating based on the adjusted subsetsXˆnS . The equality constraints follow the definition in Notation 2. The parame-ter range constraints permit the server to use uncoded messages instead of codedmessages, but not vice versa.Problem (3.4) can be posed as a linear programming problem by the standardtechnique of defining ancillary variables zS = maxk∈S xˆdkS\k, and adding the extra25Algorithm 5 Delivery algorithm based on message selectionRequire: {XnS}n ,S // From the placement phase1: Procedure AdaptiveDelivery(d1, ..., dK)2: C ← unique(d1, ..., dK) // Set of distinct files requested3: {xˆ∗ dkS }dk∈C,S⊂K ← Solution of Problem (3.4)4: for dk ∈ C do5: Xˆdk∅ ← ∅ // Initialization of Xˆdk∅6: for S ⊂ [K] do7: XˆdkS ← {first xˆ∗ dkS F bits of XdkS }8: Xˆdk∅ ← Xˆdk∅ ∪ { last (1− xˆ∗ dkS )F bits of XdkS }9: end for10: end for11: Use [12, Algorithm 1] with {XˆnS}n,S instead of {XnS}n,SconstraintszS ≥ xˆdkS\k (3.7)for all S ⊂ [K] : |S| > 0 [43, Section 4.3], as xˆdk ≥ 0. The resulting linearprogramming problem can be solved numerically for xˆ∗dkS . Algorithm 5 shows theproposed message-selection based delivery scheme.Simplified messages selectionA simplified version of the message selection step can be formulated by only takingthe number of distinct requests L into account, and ignoring the redundancy patternof the demand vector. Then, because of the symmetry, we set xˆnS = xˆs for all nand all S : |S| = s. This leads tominimizexˆsLy0 +K−1∑s=1(Ks+ 1)xˆssubject toK∑s=0(Ks)xˆs = 10 ≤ xˆs ≤ xs, s = 1, ...,K0 ≤ xˆ0 ≤ 1(3.8)as the simplified message selection problem.26Proposition 1. Let sˆ = bK−LL+1 c. Optimal parameters for the simplified messageselection problem of (3.8) are given byxˆs =∑i=1,...,sˆ(Ki)xi, s = 00, s = 1, ..., sˆxs, s = sˆ+ 1, ...,K. (3.9)Proof. If we transfer bits from the subsets XnS : |S| = s to Xn∅ , the resultingchange in the rate will be L(Ks)xs −(Ks+1)xs. We transfer the bits only if thisdifference is negative. This is the case when s ≤ sˆ. This results in the parametersof (3.9).Algorithm 6 shows the delivery scheme based on the simplified message selectioncriterion.Algorithm 6 Simplified Adaptive Delivery AlgorithmRequire: {XnS}n,S // From the placement phase1: Procedure SimplifiedAdaptiveDelivery(d1, ..., dK)2: C ← unique(d1, ..., dK) // Set of distinct files requested3: L = |C| // Number of distinct requests4: sˆ← bK−LL+1 c5: for dk ∈ C do6: Xˆdk∅ ← ∪S:s≤sˆXdkS // Corresponds to the first rule of (3.9)7: for S ⊂ [K] : |S| > 0 do8: if |S| ≤ sˆ then9: Xˆdk∅ ← ∅ // Corresponds to the second rule of (3.9)10: else11: XˆdkS ← XdkS // Corresponds to the third rule of (3.9)12: end if13: end for14: end for15: Use [12, Algorithm 1] with {XˆnS}n,S instead of {XnS}n,S27Servercache3cache2cache1cache4cache5X1 X2 · · · XbN/scFigure 3.1: An example of a cutset separating the caches and the server from theusers of the caches in S = {1, 2, 4}. Solid (dashed) lines represent the in-formation flow to the users selected (not selected) in the cutset. Here s = 3and there are bNs c server messages. Users with the same color have identicalrequests. Notice that no two users with the same request are picked in S. HereK = 5.3.3 A cutset bound on delivery rate of duplicate demandsLet R∗L(M) denote the minimum rate that is achievable for every possible demandvector with L distinct requests. More specifically, let AL be the set of all demandsvectors with L distinct files. For a placement PM that satisfies the capacity con-straint, i.e., stores an equivalent of at most M files per cache, letRL(PM ) = min{R ≥ 0|∀d ∈ AL : R(d;PM ) ≤ R},where R(d;PM ) is the amount of supplemental information that needs to be sentby the server over the shared link such that every cache can recover its requestedfile without error. Then:R∗L(M) = minPMRL(PM).28Proposition 2 gives a lower bound on R∗L(M).Proposition 2 (Cutset Bound). Assume thatK caches requestL ≤ K distinct files.Then, R∗L(M) must satisfyR∗L(M) ≥ maxs∈{1,...,L}(s− sbN/scM). (3.10)Proof. We modify the cutset bound argument of [11, Sec. VI] to bound the mini-mum delivery rate of the demand vectors with L ≤ K distinct requests.Let S be a subset of caches with |S| = s, such that there are no two cachesin S with identical user requests. Assume that these caches have requested files1, ... , s from the library of N files. Let X1 denote the server’s input to the sharedlink which determines files 1, .., s. Similarly, assume that the same users requestfiles (i− 1)s+ 1, ..., is and the server input Xi determines the files requested. Leti = 1, ..., bN/sc.Consider the cut separating X1, ..., XbN/sc and the caches in S from the corre-sponding users (see Fig. 3.1). Since we assume that all files are perfectly decodedwithout error, the total information available to the users in the cut should be morethan or equal to the total information requested by them. In other words,bN/scR∗L(M) + sM ≥ sbN/sc.Since s can accept any value between 1 and L, (3.10) results.3.4 Numerical explorationsWe now numerically explore the performance of the proposed adaptive methodsand the non-adaptive method of Algorithm 4. Notice that by the rate of non-adaptive method, we refer to the rate of [12] or [11] depending on whether thedecentralized or centralized placement is used, respectively. This rate is calculatedby (3.3).Fig. 3.2 shows the delivery rates of the non-adaptive and adaptive schemes,as well as the lower bound (3.10) for a network of K = 12 caches. The samedecentralized placement is used for all cases with the parameters in (3.6). We290 0.05 0.1 0.15 0.2 0.25 0.301234567q = M/NRate(files)◊◊◊◊◊*****°°°°°sssss◊◊◊◊*****°sssss◊ ◊◊◊*****°°sssss◊◊◊◊* * ***°°sssssL = 7 : (4, 3, 1, 1, 1, 1, 1)L = 5 : (4, 3, 3, 1, 1)L = 4 : (3, 3, 3, 3)L = 3 : (6, 4, 2)Non-AdaptiveSimplified AdaptiveAdaptiveLower Bound◊*s°Figure 3.2: Rates of different delivery schemes for K = 12.consider several redundancy patterns for the demand vector with different valuesof L. In Fig. 3.2, we observe a considerable improvement in the delivery ratefor M/N ≤ 0.25, when the adaptive methods are used. This improvement inthe rate is more considerable when L is smaller. For instance, the performancegap to the lower bound decreases by almost 50% when L = 3. Notice that for asymmetric redundancy patterns like (3, 3, 3, 3), both adaptive methods lead to thesame delivery rate. As the pattern gets more asymmetric, the gap between the ratesof the original and simplified adaptive methods increases. Also, observe that forsome cases, the rate of the non-adaptive method increases with the storage capacityfor small M/N . This shows the inefficiency of Algorithm 4 to deliver redundantrequests.For the second numerical example, we use the centralized placement and plotthe delivery rates resulted from the different methods versus L. Fig. 3.3 shows theresults. Notice that the rate of original adaptive method depends not only on L, butalso on the redundancy pattern. Hence, in this example, for every value of L, thedelivery rate of the original adaptive method is averaged over all the redundancypatterns with L distinct requests, assuming that the requests are independent andfile popularities are uniform. Observe that the superiority of the adaptive methodover the non-adaptive method of [11] is more significant for small L. In particular,we observe a sharp decrease in the adaptive delivery rate when L gets smaller than301 2 3 4 5 6 7 80.511.522.533.544.5LRate(files)****◊◊◊◊◊◊◊◊********ssssssss°°°° ° ° ° °◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊**sss ss s s°° ° ° ° ° °Non-adaptiveSimplified AdaptiveAdaptive (Average)Lower Bound◊*s°M/N = 0.1M/N = 0.2Figure 3.3: Rates of different delivery schemes for K = 8.K/2. This suggests that the adaptive methods are considerably more effective forhighly redundant requests.We now investigate the average rates of the different delivery methods througha stochastic modelling of the dynamics of a caching network with correlated userrequests. Our simulations here provide more insights on the difference between thedelivery rates of the different algorithms as a function of the correlations amonguser demands. Consider a graph representation of the network where vertices rep-resent the caches. An (undirected) edge between two vertices shows that the re-quests of the corresponding caches are correlated. To model the correlations, letN (k) denote the set of the last files requested by the neighbour caches of cachek. We assume that cache k requests a file, either based on its neighbours’ previousrequests with probability r, or independently with probability 1− r. In the formercase, k chooses a file from N (k) uniformly at random. However, when choosingindependently, k picks a file n from the library based on the popularity distributionof files pn. Hence, the chance of requesting file n by cache k ispˆn,k =r/|N (k)|+ (1− r)pn, n ∈ N (k)(1− r)pn, otherwise . (3.11)For our simulations, we use Gibbs sampling [44, Sec. 24.2] to generate 104 sam-ple vectors from the induced joint distribution of user demands. We set K=8 and310 0.05 0.1 0.15 0.2 0.25 0.30.511.522.533.544.55q =M/NAverageRate(files)◊◊◊◊◊◊◊ ◊********ssssssss°°°°°°°°◊ ◊ ◊* *****ssssss°°°°◊ ◊◊* * * **sssssss°°°°Non-AdaptiveSimplified AdaptiveAdaptiveLower Bound◊*s°r = 0.7, θ = 0r = 0.9, θ = 0r = 0.9, θ = 0.75Figure 3.4: Average rates of the different delivery schemes for K = 8.(r, θ) (0.7, 0) (0.9, 0) (0.9, 0.75)Maximum ρˆij 0.19 0.34 0.34Average ρˆij 0.16 0.32 0.31Average L 4.80 3.41 3.18Table 3.1: User requests’ statistics in simulations of Fig. 3.4.N=103, and assume a complete graph for the network. Further, we mainly use uni-form distribution for the popularity of files. We also consider a scenario where theplacement phase is performed based on a uniform popularity distribution, while theactual file popularities in the delivery phase follow a non-uniform Zipf distribution[45] with parameter θ. Note that a Zipf distribution with θ = 0 is identical to theuniform distribution, and increasing θ makes the distribution more non-uniform.We use θ and r to control the popularity distribution and the dependency level ofthe users’ requests, respectively. To characterize the resulting correlation levelsamong the caches’ requests in our simulations, we empirically calculate the cor-relation coefficients −1 ≤ ρˆij ≤ 1 [46, Section 4.1] between the requests of thedifferent caches i and j. A larger ρˆij implies a higher chance that caches i and jrequest the same content, which leads to more redundancy in the demand vector.Table 3.1 presents the average and the maximum ρˆij over all the different i and jpairs (i 6= j), in our simulations.Fig. 3.4 shows the resulting average delivery rates. It also shows a lower bound32on the average rate calculated by averaging the lower bounds of (3.10) for the sam-ple demand vectors. We observe that as requests become more correlated (largerr) and the file popularities get more non-uniform (larger θ), the adaptive methodmakes a larger improvement in the rate. Also, observe that the adaptive schemesare effective in decreasing the average delivery rate for M/N < 0.25.3.5 Comparison to optimal coded caching with uncodedprefetching for uniform demandsIn 2018, reference [13], derived the optimal decentralized and centralized codedcaching schemes for the case of uniform demands and uncoded prefetching as ex-plained in Section 2.2.For the completeness of our discussions in this chapter, we also compare therate of the proposed adaptive method to the rate of the optimal caching schemewith uncoded prefetching for uniform demands. In Fig. 3.5, we compare the ex-pected rate of the non-adaptive delivery, adaptive delivery and the optimal deliveryof [13]. In the hybrid coded/uncoded delivery approach, we replace the sets ofcoded messages whose transmission is inefficient in the presence of redundant de-mands with uncoded messages. Instead, in [13], a subset of coded messages aretransmitted when demands are redundant, as they suffice to extract the required in-formation at each cache, at the expense of further processing of the server messagesby the caches. The latter approach benefits more from the coding opportunities andtherefore results in better delivery rates. For small M/N , the rates of our proposedhybrid method and the optimal method of [13] are close. This is because the cod-ing opportunities are limited and each message is mostly targeted towards a smallnumber of caches. Hence, transmission of uncoded messages instead of codedmessages provides more benefit when demands are redundant compared to extentof the loss of coding gain. For large M/N , however, the loss due to unused codingopportunities dominates and the gap between the hybrid coded/uncoded approachto the optimal delivery increases.33(a) M/N = 0.1 (b) M/N = 0.25(c) M/N = 0.5 (d) M/N = 0.75Figure 3.5: Comparison of the expected rate of the non-adaptive, adaptiveand optimal delivery for uniform demands for fixed number of distinctfiles. Here, K = 8.3.6 Concluding remarksIn this chapter, we identified an inefficiency in the core delivery algorithms ofcoded caching in the presence of duplicate requests in the demand vector. We pro-posed a method to improve the efficiency of the delivery messages by introducings step to decide which parts of the files need to be sent by coded messages andwhich parts should be delivered uncoded, based on the redundancy pattern in thedemand vector. We discussed the improvement that the proposed method offers34over the benchmark algorithms and also compared its performance to the optimaldelivery scheme. An important contribution of this chapter, was the formulationof delivery rate in terms of placement parameters. This allows optimization of theplacement parameters such that the delivery rate is minimized. A generalized ver-sion of this formulation is the foundation of our work in Chapter 4 for placementoptimization.35Chapter 4Coded Caching for NonuniformDemands4.1 IntroductionThe seminal works [11] and [12] designed coded caching methods that minimizethe peak delivery rate, i.e., the delivery rate when all demands are distinct. In thecase of uniform demands, for a fixed number of caches K, the average deliveryrate approaches the peak delivery rate as the the number of files increases, i.e., asN → ∞. Hence, the caching schemes of [11] and [12] are also suitable for uni-form demands. However, a more practical scenario concerns caching of files withdifferent popularity levels, i.e., the probabilities of different files being requestedare different.In its most general form, not only the file request distribution can be nonuni-form, it can even be different from one cache to another. This is particularly rele-vant if each cache is serving a small number of users and therefore the identity ofpopular files is closely tied to the preferences of the individual users covered byeach cache. In such a scenario, the design of coded caching is challenging. On onehand, more global caching gain is achievable if different caches store files sym-metrically from the same library of files, on the other hand, such a scheme cannotbe sensitive to the asymmetries introduced by the different local demand distri-butions. A simpler scenario is the case where the user demands are nonuniform,36but the request distribution is the same for the different caches. This is reasonableassumption when each cache serves a large number of users randomly selectedfrom the population regardless of their demographics and preferences. Even in thiscase, the optimal coded caching strategy is difficult to characterize because of thecomplex relationship between the placement strategy and the delivery rate. Intu-itively, it is expected to allocate more storage to the caching of the more popularfiles during placement. This idea is followed in several works in the literature [14–16, 26, 47, 48]. In this chapter, we focus our attention to derive the optimal fileplacement for coded caching with nonuniform demands where the demand distri-bution is identical for different caches.4.1.1 Related workTwo major approaches are followed for coded caching with nonuniform demands.The first approach is based on the grouping of files into different popularity groupsbased on their request probabilities [14–16]. With the files in each group hav-ing relatively similar request probabilities, the Structured Clique Cover (Algo-rithm 4) (SCC) algorithm is applied separately to the requests belonging to eachgroup for delivery. The advantage of this method is the simplicity of the analy-sis of the expected rate. Its main disadvantage is that it limits the utilization ofcoding opportunities to the files that are within each popularity group. The designobjective in this approach is to find the grouping of files that achieves the lowestexpected rate. A higher number of groups provides higher degrees of freedom toassign different amounts of storage to files with different request probabilities. Onthe other hand, the larger the number of groups is, the more underutilized are thecoding opportunities among the different groups. In [15], the library is groupedinto two categories of popular and unpopular files. The requests for popular filesare delivered by the SCC algorithm while the requests of unpopular files are deliv-ered through uncoded messages. This is an extreme case of the grouping approachand its expected rate is shown to be at most a constant factor away from the infor-mation theoretic lower bound, independent of the file popularity distribution.The second approach is followed in [47] and [48] which applies the SCC al-gorithm to all the user demands regardless of their request probabilities and the37amount of storage allocated to each file. For any fixed placement of content, thisdelivery scheme outperforms the previously discussed group-based delivery. How-ever, optimization of the amount of memory allocated to each file is challengingbecause of the complicated interplay between the memory allocation parametersand the expected delivery rate. References [47] and [48] use a convex optimiza-tion formulation of the memory allocation problem which aims to minimize theexpected delivery rate for a given storage capacity per cache. We refer to this prob-lem as Rate Minimization with Storage Constraint (RMSC). Reference [48] hasfollowed a numerical approach to solve the RMSC problem and is mainly focusedon reducing the computational complexity of the numerical analysis involved. Incontrast, [47] follows a theoretical approach to find structural properties in the op-timal solution of the problem.4.1.2 Our contributionsThe results provided in [47] do not capture specific properties of the optimal solu-tion which can considerably facilitate solving the memory allocation problem. Inthis work, we find such structures in the optimal solution and solve the RMSC prob-lem analytically when user demands are nonuniform and the SCC procedure is usedfor delivery. In particular, we will show that such properties enable the derivationof the optimal solution based on a search over a finite set of points. The cardi-nality of this set scales linearly with the product of the number of caches and thenumber of files. The properties that we derive also provide a unifying interpreta-tion of the optimal placement strategy for both uniform and nonuniform popularitydistribution of files, as we will discuss in the remainder of this section.As the first structural property, we show that for instances of the problem withcache capacities that belong to a finite set M, the optimal placement for RMSCfollows the simple pattern of splitting the library of files into two groups. Onegroup consists of less popular files and the files in this group are not cached at all.The files in the second group are cached but are treated as if they had the samerequest probabilities. We call such instances of RMSC the base-cases. We proposean algorithm to deriveM for any given file request distribution, number of cachesand number of files.For instances of the problem that are not among the base-cases, we prove that38the optimal solution is achieved by a convex combination of the solutions to certainbase-cases of the problem. This solution is identical to the placement parametersobtained by memory sharing between the two base-cases of the RMSC problem.Memory sharing is already shown to be optimal when demands are uniform [47,Lemma 5] and [48, Theorem 1]. Hence, this result shows that memory sharingis also optimal for nonuniform demands when applied to the appropriately choseninstances of the problem. To elaborate, recall that K, N and M are the numberof caches, files in the library and files that each cache can store, respectively. Foroptimal placement of identically popular files when SCC delivery is used, we havethe following [47–49]:• All files are treated identically during placement, in particular, the sameamount of storage is allocated to the caching of each file.• For a cache size M that corresponds to an integer value of t = KMN , the op-timal placement breaks each file into(Kt)non-overlapping segments. Then,it exclusively stores each one of the segments in exactly one of the(Kt)sub-sets of caches that have cardinality t. We refer to these cases of the problemas the base cases and denote by M the set of corresponding cache sizes{0, 1KN, 2KN, . . . , N}.• For other cache capacities, the optimal placement can be obtained by mem-ory sharing between the optimal placements for two instances of the problemwith cache capacities Ml = max{m ∈M | m < M} and Mu = min{m ∈M |M < m}.1We prove that a similar pattern exists in the optimal placement for nonuniformdemands. In particular, we propose an algorithm with worst-case complexity ofO(K2N2) to derive the setM given a nonuniform popularity distribution for thefiles. If M 6∈ M, the optimal placement is obtained by memory sharing betweenMl,Mu ∈M as it was done for uniform demands using the derived setM. In thiscase, optimal placement either follows a two or a three group strategy dependingon the specifics of the two corresponding base-cases used.1The idea of memory sharing for uniform demands was presented in [11] as an achievable schemewhen t is not an integer. References [47–49] proved that memory sharing is optimal for SCC deliverywhen demands are uniform.39For the optimal placement that we derive, the memory allocated to differentfiles does not show a gradual and smooth increase as the request probability in-creases. Instead, for base-cases where the two-group strategy is optimal, the mem-ory allocation exhibits a binary behavior, i.e., as the request probability increasesthe amount of memory allocated to the files shows an abrupt increase from zeroto a new level at a certain request probability and remains at that level thereafter.A similar trend exists for non base-cases, but there might be two thresholds onrequest probabilities where the jumps in the memory allocated to files occur.Finally, we find the results in [15] closely connected to the results that we de-rive in this paper. Reference [15] considers the setting of randomized placementalgorithms and within that setting, it shows that a two-group (or occasionally athree-group) strategy guarantees a delivery rate within a constant factor of the in-formation theoretic lower bound. In this work, we derive a deterministic placementof files in the caches which solves the RMSC problem, i.e., we analytically proveits optimality when the SCC delivery algorithm is applied to all user demands.We prove that the expected delivery rate of our optimal solution is a lower-boundon the expected delivery rates of the group-based methods that apply the SCC al-gorithm within each group of requested files for delivery. Given that the codedcaching in [15] follows such a scheme and its rate is within a constant factor fromthe information-theoretic lower bound, it concludes that the delivery rate of RMSCis also within a constant factor of the optimum delivery rate. Through numeri-cal examples we show that the grouping of files given by these two schemes andthe delivery rates resulting from them can be significantly different for specificregimes of problem parameters. Further, we compare the expected rate of RMSCto the information-theoretic outer-bound on the expected rate of caching schemeswith uncoded prefetching derived in [50]. This comparison suggests that the ex-pected rate of RMSC approaches the outer-bound as the cache size increases. Weprovide a detailed discussion to show that the existence of this performance gap is,at least partially, due to an inefficiency in the SCC algorithm for delivery. We sug-gest directions for future research in order to reduce or fully close this performancegap.The remainder of this chapter is organized as follows. The setup of the prob-lem and formulation of the expected rate and the utilized storage in terms of place-40ment parameters are presented in Section 4.2. The RMSC problem is formulated inSection 4.3 and a duality framework is proposed for it. Structures in the optimalsolution of RMSC for the base-cases are derived in Section 4.4. In Section 4.5, wepropose an algorithm to identify the base-cases of the RMSC problem for any givenpopularity distribution of files and we derive the optimal solution of RMSC.4.2 Problem setup and formulation of utilized storage andexpected delivery rate4.2.1 Problem SetupConsider the canonical network of multiple caches and a central server as describedin Section 2.1. We use Notation 2 to describe an uncoded placement of content inthe caches. Further, similar to the demand vector d = [d1, . . . , dK ] which repre-sents the demands of all caches at the current time instant, we define the subdemandvector dS that determines the files requested by the caches in S ⊂ [K], in the sameorder as in d.We assume that requests for different files are independent and the requestprobabilities do not change for different caches. Let {pn}n∈[N ] represent the re-quest probabilities of the files. Here, files are sorted in the decreasing order ofrequest probabilities, i.e., n > m implies pn ≤ pm. We refer to the file requestprobabilities as popularity distribution.We are interested in minimizing the expected delivery rate Ed (R(d;P, AD)),where the expectation is over the randomness in the demand vector d.4.2.2 Delivery algorithmIn this work, we apply the SCC procedure in Algorithm 4 to all user demands fordelivery regardless of their popularity levels and the memory allocated to them.By following Algorithm 4, the server transmits messages of the form⊕k∈SXdkS\k (4.1)for every nonempty S ⊂ [K]. All the components XS\k embedded in the message41are zero-padded to the length of the largest component. Hence, the length of themessage is maxk∈S |XdkS\k|.2As mentioned in Section 4.1.1, Algorithm 4 contrasts the delivery schemes in[14, 15, 51] which are also based on the SCC procedure but separately apply it to thefiles with close request probabilities. Algorithm 4 has the advantage that it allowscoding among all files regardless of their request probabilities and can result in asmaller delivery rate. To elaborate, message (4.1) delivers every subset of bits in{XdkS\k}k∈S to the corresponding requesting cache in S. Given a grouping of filesinto groups l = 1, . . . , L, if instead of applying the SCC to the whole demand vectorwe applied it to subdemand vectors consisting of files in the same popularity group,the exact same subfiles delivered by (4.1) would have been delivered through the setof messages{⊕k∈SˆlXdkS\k}Ll=1where Sˆl = {k ∈ S|dk ∈ l-th group}. This mes-sage has length∑Ll=1 maxk∈Sˆl |XdkS\k|3 which is lower bounded by maxk∈S |XdkS\k|which is the length of (4.1) with SCC applied to the whole demand vector. A directconsequence of this argument is that with an optimal placement for Algorithm 4,its delivery rate would be a lower-bound on the delivery rates of caching schemeslike the ones in [14, 15, 51] which apply Algorithm 4 to subdemand vectors thatconsist of files that are in identical popularity groups. In particular, the fact thatthe delivery rate of [15] is within a constant factor of the information-theoreticlower-bound [15, Section III] implies that the delivery rate of Algorithm 4 withthe optimal placement that we derive here is also within a constant factor of theinformation-theoretic minimum rate.4.2.3 Formulation of expected delivery rate and storageTo derive optimal placement for SCC delivery, we need to formulate the expecteddelivery rate and the storage used by the placement algorithm in terms of the place-ment parameters xnS .2From a graph theoretic perspective, this message corresponds to XORing the requested subfilesthat form a clique in the side information graph [21, Section II.A] and [7, Section I.A]. Since theset of messages ⊕k∈SXdkS\k delivers all the missing subfiles, it covers all the vertices in the sideinformation graph. Hence, one can see the delivery procedure of [12] as a structured way of coveringthe side information graph vertices with cliques.3This is because the files within each group have the same placement parameters and thereforemaxk∈Sˆl |XdkS\k| = |XdkS\k|, k ∈ S.42Expected delivery rateFor Algorithm 4 as the delivery algorithm, the delivery load isR(d;x) =∑S:S⊂[K]S6=∅maxk∈SxdkS\k (4.2)for a given demand vector d and placement parameters xnS . To formulate the ex-pected delivery rate in terms of the placement parameters, letRS(d;x) = maxk∈S xdkS\kdenote the rate required to deliver the subfiles that are exclusively stored in thesubset of caches S. Then, the expected rate with respect to randomness in userdemands isr(x) =∆ Ed (R(d;x)) =∑S:S⊂KS6=∅Ed(RS(d;x)).We assumed that the popularity distribution of files is the same for differentcaches. We use this symmetry in the demand probabilities of the different cachesto simplify the placement formulation by setting xnS = xns for all S : |S| = s. Inother words, for a given file, the portion of bits that is exclusively cached in anysubset of caches S with cardinality s is the same.Proposition 3. The assumption xnS = xns for all S : |S| = s is without loss ofoptimality for the RMSC problem.Proof. See Appendix A.1.Because of the symmetric structure of the placement, Ed(RS(d;x)) is thesame for all S : |S| = s, and it can be denoted by R¯s(x). Hence, the averagerate can be written asr(x) =K∑s=1(Ks)R¯s(x).Let Gs be the set of all subsets of [N ] with cardinality at most s. Let pigs denote the43probability that files g ∈ Gs be requested by a set of caches S with |S| = s. Then,R¯s(x) =∑g∈Gspigs maxn∈g xns−1and therefore, the expected delivery rate isK∑s=1(Ks)∑g∈Gspigs maxn∈g xns−1,which can equivalently be written asK−1∑s=0(Ks+ 1) ∑g∈Gs+1pigs+1 maxn∈g xns .Storage used by placementThe total storage used by cache k ∈ [K] isN∑n=1∑S⊂[K]:k∈SxnS , (4.3)where under the symmetry condition xnS = xns for all S : |S| = s, this quantitysimplifies toN∑n=1K∑s=1(K − 1s− 1)xnsfor every cache. The inner sum is the storage that is assigned to file n in eachcache, as for each file n, each cache k stores the subfiles XnS : k ∈ S. There are(K−1s−1)subsets of [K] of cardinality s with this property for each file. The outersum adds up the storage used for all the files in the library.44Change of variable for placement parametersFor simpler exposition of the optimization problems and better interpretability ofthe results, we find it useful to use the change of variableyns =(Ks)xns . (4.4)Variable yns is the total portion of bits of file n that is cached exclusively in all the(Ks)different subsets of [K] with cardinality s. As argued in Section 4.2.1, we have∑S⊂[K] xnS = 1. Given that xnS = xn|S| and using the change of variable (4.4), itfollows thatK∑s=0yns = 1, ∀n ∈ [N ]. (4.5)As a result, the expected rate and storage can be formulated as functions of the newplacement parameters yns asr(y) =K−1∑s=0K − ss+ 1∑g∈Gs+1pigs+1 maxn∈g yns , (4.6)m(y) =N∑n=1K∑s=1sKyns . (4.7)Notice that the expected rate and the amount of storage used are a convex and alinear function of the placement parameters, respectively.4.3 Formulation of RMSC and characterization of its dualproblem4.3.1 Formulation of RMSC in terms of the placement parametersUsing (4.4)-(4.7), the problem of finding the storage parameters that minimize theexpected delivery rate under the cache capacity constraint can be formulated as45minyK−1∑s=0K − ss+ 1∑g∈Gs+1pigs+1 maxn∈g yns (4.8a)s.t.N∑n=1K∑s=1sKyns ≤M, (4.8b)K∑s=0yns = 1, n ∈ [N ], (4.8c)yns ≥ 0, n ∈ [N ], s = 0, . . . ,K. (4.8d)4.3.2 Duality framework and derivation of Joint Rate and StorageMinimization problemOptimization problem (4.8) is convex and Slater’s condition holds for it as all in-equality constraints are affine [43, Section 5.2.3]. Hence, with (4.8) as primal, theduality gap between the primal and the corresponding dual problem is zero [43,Section 5.2]. To derive the dual problem, we form the Lagrangian that accounts forthe capacity constraint (4.8b) asL(y, γ) =K−1∑s=0K − ss+ 1∑g∈Gs+1pigs+1 maxn∈g yns + γ(N∑n=1K∑s=1sKyns −M)which results in the Lagrange dual functiong(γ) = minyL(y, γ)s.t.K∑s=0yns = 1, n ∈ [N ],yns ≥ 0, n ∈ [N ], s = 0, . . . ,K.(4.9)46Then, the corresponding dual problem will bemaxγ ≥ 0 g(γ). (4.10)By dropping the terms that are independent of the placement parameters, (4.9) hasthe same minimizers asminyK−1∑s=0K − ss+ 1∑g∈Gs+1pigs+1 maxn∈g yns + γN∑n=1K∑s=1sKyns (4.11a)s.t.K∑s=0yns = 1, n ∈ [N ], (4.11b)yns ≥ 0, n ∈ [N ], s = 0, . . . ,K. (4.11c)We call (4.11) the Joint Rate and Storage Minimization (JRSM) problem, as the ob-jective is to minimize the total bandwidth (expected delivery rate) and storage costof coded caching. Following the standard interpretation of the Lagrange multipli-ers, parameter γ can be viewed as the relative cost of storage per file. Moreover,since strong duality holds, for each storage capacity M , the optimal dual variableγ∗(M) determines a pricing of the storage for which there exists the same mini-mizer to both the RMSC problem (4.8) and the Lagrangian minimization problemin (4.9) (or equivalently the JRSM problem in (4.11)). As a result, we derive theoptimal solution of JRSM in Section 4.4 as an intermediate step in solving RMSC.4.4 Optimal solution to JRSMFinding an analytical solution to (4.11) is challenging because of the presence ofthe max functions that operate over overlapping sets of parameters in the objec-tive. These parameters are tied together by constraints (4.11b) for different valuesof s. The interplay between the outputs of the max function applied to the over-lapping groups under constraints (4.11b) makes the analysis difficult. To facilitatethe analysis, we establish a connection between the nonlinear part of (4.11a) andsubmodular set functions. This allows us to benefit from the results in submodularfunction analysis to find structures in the optimal solution to JRSM.474.4.1 An equivalent formulation of JRSMThe placement parameters corresponding to s = 0 are {yn0 }n∈[N ], which determinethe portion of bits that are not stored in any cache for each file n. Also, each setg ∈ G1 includes exactly one file, say g = {i}. Hence, maxn∈g yn0 = yi0 andpig0 = pi. Thus, the objective function (4.11a) can be written asN∑n=1Kpnyn0 +K−1∑s=1K − ss+ 1∑g∈Gs+1pigs+1 maxn∈g yns +sKγN∑n=1yns+ γ N∑n=1ynK .Notice that the first and last sums are in terms of parameters yn0 and ynK , re-spectively, while the summation in the middle accounts for parameters yns fors ∈ [K − 1].Lemma 1. At optimality,∑Nn=1Kpnyn0 +γ∑Nn=1 ynK can be written as∑Nn=1(Kpnαn+γ(1 − αn))zn where zn = yn0 + ynK , and αn = 1 if Kpn < γ and αn = 0 ifKpn ≥ γ.Proof. For a fixed value of zn, we have∑Nn=1Kpnyn0 +γ∑Nn=1 ynK = γ∑Nn=1 zn+∑Nn=1(Kpn− γ)yn0 . Hence, if Kpn < γ, setting yn0 = zn and ynK = 0 leads to thesmallest objective and if Kpn ≥ γ, the smallest objective results for yn0 = 0 andynK = zn.Corollary 1. For some m ∈ {0, . . . , N}, we have αn = 1, n > m and αn =0, n ≤ m.Using Lemma 1, and the fact that zn = 1−∑K−1s=1 yns , we getminy˜,αK−1∑s=1K − ss+ 1∑g∈Gs+1pigs maxn∈g yns +N∑n=1K−1∑s=1[( sK− 1 + αn)γ −Kpnαn]yns + l(α)(4.12a)s.t.K−1∑s=1yns ≤ 1, n ∈ [N ], (4.12b)yns ≥ 0, n ∈ [N ], s ∈ [K − 1], , (4.12c)αn ∈ {0, 1}, n ∈ [N ], (4.12d)48as a problem equivalent to (4.11), where y˜ is the same as y, except for parametersyn0 and ynK that are removed, and l(α) = K∑Nn=1 αnpn + γ∑Nn=1(1− αn).To find structures in the optimal vector y˜, assume that the optimal parametersα∗n are known. Then, the optimization problem for y˜ becomesminy˜, tt+N∑n=1K−1∑s=1[( sK− 1 + α∗n)γ −Kpnα∗n]yns (4.13a)s.t.K−1∑s=1K−ss+1∑g∈Gs+1pigs maxn∈g |yns | ≤ t, (4.13b)K−1∑s=1yns ≤ 1, n ∈ [N ], (4.13c)yns ≥ 0, n ∈ [N ], s ∈ [K − 1]. (4.13d)In constraint (4.13b), we used maxn∈g |yns |, which is the l∞-norm instead of maxn∈g yns .This does not affect the optimal solution as the two functions are equivalent in thenonnegative orthant specified by (4.13d) but makes the LHS in form of the l∞-norm in Proposition 4 of Section 4.4.2.Notice that objective function (4.13a) is linear, and both the objective functionand the constraints are in terms of parameters yns for s ∈ [K − 1], n ∈ [N ]. Fora linear objective function, if the feasible set is convex and bounded with a finitenumber of extreme points, then there exists an extreme point that is optimal [52,Section 2.5]. In the following, we will show that the feasible set defined by (4.13b)-(4.13d) satisfies these properties for any given value of t, and in particular for t∗ atoptimality, and derive structures in its extreme points. Any such structure readilyimplies a structure in at least one optimal solution to (4.13).4.4.2 Review of submodular functions and relevant resultsWe review the definition of a submodular set function and present the results thatare related to our analysis in Section 4.4. An extended discussion can be found in[53].Definition 1. Let V = {1, . . . , p} be a set of p objects. Forw ∈ Rp, Supp(w) ⊂ V49denotes the support of w, defined as Supp(w) = {j ∈ V,wj 6= 0}.Definition 2. (Submodular function) A set-function F : 2V → R is submodularif and only if, for all subsets A,B ⊂ V , we have F (A) + F (B) ≥ F (A ∪ B) +F (A ∩B).Definition 3. (Lova´sz extension) Given a set-function F such that F (∅) = 0, theLova´sz extension f : Rp+ → R of F is defined asf(w) =p∑k=1wjk [F ({j1, . . . , jk})− F ({j1, . . . , jk−1})] ,where w ∈ Rp+, (j1, . . . , jp) is a permutation such that wj1 ≥ . . . ≥ wjp .Consider vector δ ∈ {0, 1}p as the indicator vector for subset A ⊂ V , i.e, fori ∈ V , δi = 1 if and only if i ∈ A. Consequently, A is the support of δ. Noticethat for the Lova´sz extension, f(δ) = F (Supp(δ)). Hence, f can be seen as anextension of F from vectors in {0, 1}p to all vectors in Rp+. The Lova´sz extensionf has the following properties: 1) it is piecewise-linear, 2) when F is submodular,f is convex, and 3) minimizing F over subsets, i.e., minimizing f over {0, 1}p, isequivalent to minimizing f over [0, 1]p.Definition 4. (Stable Sets) A set A is stable if it cannot be augmented withoutincreasing F , i.e., if for all sets B ⊃ A, B 6= A, then F (B) > F (A).Definition 5. (Separable Sets) A set A is separable if we can find a partition ofA into A = B1 ∪ . . . ∪ Bk such that F (A) = F (B1) + . . . + F (Bk). A set A isinseparable if it is not separable.Proposition 4. [53, Section 4.1] For a set of objects V and a nonnegative set-function d(·), let Ω(w) = ∑G⊂V d(G) ‖wG‖∞ for w ∈ Rp. Function Ω(w) isa norm if ∪G,d(G)>0G = V and it corresponds to the nondecreasing submodularfunction F (A) =∑G:A∩G 6=∅ d(G).Proposition 5. [53, Proposition 2]) The extreme points of the unit ball of Ω arethe vectors 1F (A)v, with v ∈ {−1, 0, 1}p, Supp(v) = A andA a stable inseparableset.504.4.3 Connection to submodular functionsTo find the extreme points of the region characterized by (4.13b), we establish alink to submodular functions. Let us define functionfc(y˜) =∆K−1∑s=1K−ss+1∑g∈Gs+1pigs maxn∈g |yns |.The subscript c is used to highlight that this function returns the average rate dueto the delivery of the bits that are cached in at least one of the caches in the system.We show that fc(y˜) is the Lova´sz extension of a submodular set function. For that,consider the set [(K−1)N ]. For each s ∈ [K−1], objects (s− 1)N + 1, . . . , sNcorrespond to files 1, . . . , N , respectively. Notice that these objects belong to[N ](s−1)N .To define the corresponding set function, let us introduce the following for anys ∈ [K − 1] and g ∈ Gs+1:• Operator u(s, g) that gives the set g˜ = {(s− 1)N + n | n ∈ g} as output.This is basically a mapping from the files in g and set sizes s to the objectsin [(K − 1)N ]. Notice that any resulting set g˜ is a subset of [N ](s−1)N forexactly one s.• Sets G˜s+1 = {u(s, g) | g ∈ Gs+1} and G˜ =⋃s∈[K−1] G˜s+1.• The inverse operators s−1(g˜) and g−1(g˜) that for g˜ ∈ G˜ return the uniques that satisfies g˜ ⊂ [N ](s−1)N , and the set g = {n | s−1(g˜)N + n ∈ g˜},respectively.• Weightsηg˜ =K − s−1(g˜)s−1(g˜) + 1pig−1(g˜)s−1(g˜) (4.14)for all g˜ ∈ G˜. Notice that when |g˜| = 1, g−1(g˜) = {i} which is a singleton.In that case, pig−1(g˜)s−1(g˜) = pi.51Using the operators and parameters defined above, fc(y˜) can be written asfc(y˜) =∑g˜∈G˜ηg˜ maxn∈g−1(g˜)|yns−1(g˜)|. (4.15)Notice that (4.15) has the form of the norm function in Proposition 4 and as a directconsequence we have the following proposition:Proposition 6. Function fc(y˜) is a norm and is the Lova´sz extension of the sub-modular functionFc(A) =∑g˜∈G˜:A∩g˜ 6=∅ηg˜, (4.16)where A ⊂ [(K − 1)N ].From Proposition 6, one concludes that constraint (4.13b) characterizes a norm-ball of radius t.For A ⊂ [N ](s−1)N , let us define P (A) =∑n∈g−1(A) pn. Then, for the ex-treme points of the norm-ball, we have the following lemma.Lemma 2. The extreme points of the norm-ball fc ≤ t are of the formtK−ss+1 [1− (1− P (A))s+1]v, (4.17)where vector v ∈ {−1, 0, 1}KN , Supp(v) = A, and set A is a subset of [N ](s−1)Nfor an s ∈ [K − 1].Proof. See Appendix A.2.Corollary 2. For an extreme point y˜ of the norm-ball defined by (4.13b), for allyns > 0, we have s = sˆ, for exactly one sˆ ∈ [K − 1].Theorem 1. There is an optimal solution to the JRSM problem in (4.11) which isof form(yns )∗ =1, s = 0, n ∈ [N − n∗]n∗ ,1, s = s∗, n ∈ [n∗],0, otherwise,(4.18)52for some s∗ ∈ [K] and some n∗ ∈ {0, . . . , N}.4Proof. See Appendix A.3.Theorem 1 implies that for every γ ≥ 0 there exists an optimal solution to theJRSM problem that is integral. For better illustration, we write such an optimalvector y in the matrix form Y as follows. Matrix Y has N rows corresponding tothe files and K + 1 columns corresponding to the cardinality of subsets of caches.In particular, Yn,s = yns , n ∈ [N ], s = 0, . . . ,K. Based on the structures found foryns in Theorem 1, Y is of formY =0 1 ... s∗−1 s∗ s∗+1 ... K1 0 0 . . . 0 1 0 . . . 02 0 0 . . . 0 1 0 . . . 0.......... . ........... . ....n∗−1 0 0 . . . 0 1 0 . . . 0n∗ 0 0 . . . 0 1 0 . . . 0n∗+1 1 0 . . . 0 0 0 . . . 0.......... . .......... . . ....N 1 0 . . . 0 0 0 . . . 0, (4.19)i.e., i) all entries are either 0 or 1, ii) each row has exactly one entry 1, iii) at mostone column with index s ≥ 1 has nonzero entries, iv) for that column all entriesare 1 for rows 1, . . . , n∗ and 0 for rows n∗ + 1, . . . , N , for some n∗ ∈ [N ]. Asa result, for all values of γ ≥ 0 there exists an optimal solution with matrix form(4.19). Hence, we have the following corollary:Corollary 3. There exists a finite set Y∗ of vectors that correspond to matricesof form (4.19) where |Y∗| ≤ KN + 1 and that set includes at least one optimalsolution to the JRSM problem (4.11) for every γ ≥ 0.5The structure of the optimal solution to JRSM in Theorem 1 has direct implica-tions about the solution of the RMSC problem. In particular, based on the duality4Notice that n∗ = 0 corresponds to the case where for all n ∈ [N ]: yn0 = 1 and yns = 0, s > 0.5We show in Appendix A.5 that for specific values of γ, there are infinite number of solutions tothe JRSM problem.53framework detailed in Section 4.3.2, for the optimal Lagrange multiplier γ∗, if theoptimal solution to JRSM with the structure in Theorem 1 uses a storage equal toM in RMSC, this solution is also optimal to the RMSC problem. This implies thatfor certain storage capacities in the RMSC problem, its optimal solution has thesame structure as in Theorem 1. In Section 4.5, we fully investigate the solutionto the RMSC problem for the general storage capacity M and its connection to thesolution of the JRSM problem.4.5 Optimal solution to RMSC4.5.1 Optimal solution of RMSC in terms of optimal JRSM solutionAssuming that the optimal dual parameter γ∗ is known, we derived structures inthe minimizers of the Lagrangian or equivalently in the optimal solution of JRSM.The derived structures limited the search space for the optimal JRSM solution toKN + 1 vectors specified by Theorem 1. In this section, we derive the optimalsolution to RMSC by building on the results we derived for the solution of JRSM inTheorem 1. For that, let us define two sets as follows:Definition 6. SetsM andR are finite sets defined by storage values {m(y) | y ∈Y∗} and expected rates {r(y) | y ∈ Y∗}, respectively.To characterize the solution of RMSC, we take the following two steps. First, weassume that set Y∗ and consequently setM are known. Based on this assumption,we derive the optimal dual parameter γ∗ as a function of storage capacity M in theprimal problem. Second, we use the derived γ∗-M relationship to find set Y∗ andderive the optimal solution to RMSC.Lemma 3. The optimal dual parameter γ∗ and the storage capacity M in theprimal RMSC problem satisfy the following:1. Parameter γ∗ is non-increasing in M ;2. For certain storage capacities M , a continuum of dual parameters γ∗ areoptimal;540 1 2 3 4 5 600.511.522.533.54Mγ∗Figure 4.1: Here, N = 5, K = 4, and the popularity distribution is Zipf withparameter α = 1.3. For every two consecutive values M1,M2 ∈ M,M1 < M2, any M ∈[M1,M2] leads to the same dual optimal parameter γ∗.Proof. See Appendix A.4.Lemma 3 implies a stairwise relationship between the optimal dual parameterγ∗ and the storage capacity M in the primal problem. An illustration of this rela-tionship is shown in Fig. 4.1. The fact that γ∗ is non-increasing in M is in agree-ment with the interpretation of the Lagrange multiplier γ∗ as the (relative) price perunit of storage [43]: as more storage becomes available, the storage price remainsthe same or decreases. The second point in the lemma corresponds to the verticalline segments in Fig. 4.1. Based on Definition 6, setM which is derived from Y∗is finite and has at mostKN+1 members. However, γ is a continuous variable andfor every γ ≥ 0 there is an optimal solution to JRSM in Y∗. Hence, an interval of γvalues must map to the sameM . Third, a range of values ofM are mapped into thesame γ∗. Notice that parameter M in the primal problem can take any nonnegativevalue, whileM is a set of discrete values and is of finite size. Since every M ≥ 0corresponds to at least one optimal dual parameter γ, then a continuum of valuesfor M must map to the same γ∗. We show that parameter γ∗ and the two solutions55y1,y2 ∈ Y∗ that lead to the two endpoints (m(y1), γ∗) and (m(y2), γ∗) of theline segment are related by γ∗ = r(y1)−r(y2)m(y2)−m(y1) . Notice that m(y1),m(y2) ∈ Mand r(y1), r(y2) ∈ R. In particular, if we sort members ofM in increasing orderas 0 = M0 < M1 < M2 < . . . < Ml = N , then rates Ri that correspond tostorage values Mi follow ordering K = R0 > R1 > . . . > Rl = 0. Henceγ∗(M) ={[ Ri−Ri+1Mi+1−Mi ,Ri−1−RiMi−Mi−1 ], M = MiRi−1−RiMi−Mi−1 , Mi−1 < M < Miwith γ∗(M0 = 0) = [K−R1M1 ,+∞] and γ∗(Ml = N) = [0,Rl−1N−Ml−1 ].The next theorem determines the relationship between the optimal solution ofRMSC and the optimal solution of JRSM which was derived in Theorem 1.Theorem 2. The RMSC problem (4.8) has an optimal solutiony∗RMSC(M) ={y∗JRSM(M), M ∈MMu−MMu−Ml y∗JRSM(Ml) +M−MlMu−Ml y∗JRSM(Mu), M 6∈ M(4.20)where y∗JRSM(m) is the optimal solution of JRSM of the form in Theorem 1 that usesstorage M , and Ml and Mu are the largest element smaller than M and smallestelement larger than M inM, respectively.Proof. See Appendix A.5.Remark 1. (Two or three-group placement is optimal) Given the structure of theJRSM solutions in Theorem 1 (and eq. (4.19)), and the connection between theRMSC and JRSM solutions in Theorem 2, it can be concluded that always a two orat most a three group placement strategy is optimal for RMSC. In case M ∈ M,the placement follows a two-group strategy. In case M 6∈ M also two or three-group placement is optimal. This is because a linear combination of two solutionsof the form in eq. (4.19) can have non-zero entries at only the first column andat most two other columns. Following the notation used in eq. (4.19), let the twoJRSM solutions have non-zero entries as characterized by (s∗1, n∗1) and (s∗2, n∗2).Then, the linear combination of the two solutions results in a matrix where files aregrouped into at most three groups (files within each group have the same placement56parameters or equivalently the corresponding rows are the same): the ones in rows1 to min(n∗1, n∗2), min(n∗1, n∗2) + 1 to max(n∗1, n∗2) and max(n∗1, n∗2) + 1 to N . Incase n∗1 = n∗2, this reduces into two groups.Remark 2. (Optimality of memory sharing) Theorem 2 essentially extends a resultknown for the optimal solution of RMSC for uniform demands to the general casewhere demands can be nonuniform. To elaborate, it has been shown that for uni-form demands, if M ∈ {1, NK , 2NK , . . . ,K NK }, then the optimal solution of RMSCis in the form in (4.19) for some s∗ ∈ [K] and n∗ = N [47–49]. In particular, foruniform demandsMuniform = {0, NK , 2NK , . . . ,K NK }6. For other values of M , theoptimal solution could be obtained by memory sharing between the two values ofstorage inMuniform closest to M . Theorem 2 shows that the same result is validfor nonuniform demands except for the fact that n∗ might be any value between 0and N , depending on the popularity distribution of files.As a result, we propose the following terminology:Definition 7. For a given number of caches and popularity distribution of files, wecall setM the set of base-cases of the RMSC problem.Based on Theorem 1, for base-cases of RMSC, there exists an optimal solutionwhich is integral. Also, from Theorem 2, for other storage capacities, the optimalsolution to RMSC can be obtained by memory sharing between the solutions of twocertain base-cases.4.5.2 Algorithm to derive the set of base-case memory sizesMWe derive an algorithm with a worst-case complexity of O(K2N2) to find set Y∗and consequently M for any given number of caches and popularity distributionof files. WithM determined, Theorem 2 analytically solves the RMSC problem forany cache capacity.To find Y∗, we need to search over the KN + 1 possibilities for y∗ of form(4.18). For each such vector y, the storage it uses can be written as a convex com-bination of the storage used by two other vectors y1 and y2 that satisfy m(y1) ≤6In the case of M = 0, we have n∗ = 0 for the optimal RMSC solution.57Algorithm 7 Procedure to Determine the Set of Base-CasesMRequire: K,N, {pn}1: # Calculate Storage and Rate for the KN + 1 matrices of form (4.19)2: Y0 ← 0N×(K+1), Y0(1 : N, 0)← 1, M0 ← 0, R0 ← K3: for s = 1, . . . ,K do4: for n = 1 : N do5: i← (s− 1)N + n6: Yi ← 0N×(K+1), Y0(n+ 1 : N, 0)← 1, Yi(1 : n, s)← 17: Mi ← m(Y )8: Ri ← r(Y )9: end for10: end for11: (M,R, Y ) ← sortM (M,R, Y ) # relabel (Mi, Ri, Yi) tuples in increasing or-der of Mi12: # Build M, R and Y by keeping solutions that outperform memory sharingbetween other cases13: (Y0,M0,R0)← (0N×(K+1), 0,K)14: c← 015: for i = 1, . . . , NK + 1 do16: base← True17: for j = i+ 1 : NK + 1 do18: Rmsh ← Mj−MiMj−McRc +Mi−MjMj−McRj19: if Rmsh <= Ri then20: base← False # If memory-sharing results in a smaller or the same rate,21: (Mi, Ri, Yi) tuple does not represent a base case22: break23: end if24: end for25: if base then26: c← c+ 127: (Yc,Mc,Rc)← (Yi,Mi, Ri)28: end if29: end form(y) and m(y2) ≥ m(y). In other words, m(y) = m(θy1 + (1 − θ)y2), 0 ≤θ ≤ 1.7 If for such y1,y2, we further have r(y) ≥ r(θy1 + (1 − θ)y2), theny does not belong to Y∗. Hence, by removing such vectors form the KN + 17except for the two vectors with m(y) ∈ {0, N}.58Figure 4.2: The effect of cache size on expected delivery rate and the amountof storage used to cache subsets of files that are exclusively stored insubsets of caches with cardinalities 1, . . . ,K for K = 5, N = 10.Here, the popularity of files follows a Zipf distribution with parameter1.4.possibilities, the remaining set of vectors constitutes Y∗. The BASE procedure inAlgorithm 7 implements this process by starting from Y with all entries in the firstcolumn equal to 1. This correspond to storage 0 and rate K and belongs to Y∗. Itthen proceeds to the next y with the smallest storage value. It checks whether itoutperforms memory sharing between the y that is already in Y∗ with the largeststorage and every remaining vector that uses more storage compared to y. If thatis the case, it adds the new vector to Y∗, otherwise drops the vector and proceedsto the next vector.Fig. 4.2 shows the expected delivery rate of the proposed method versus thecache capacity for a nonuniform distribution of files that follows a Zipf density[45] with parameter 1.4. The expected rate is once calculated based on the solutionobtained by Algorithm 7 and once by solving RMSC numerically. We observethat the resulting optimal rates are in complete agreement. Fig. 4.2 also showsthe amount of storage used to cache subsets of files that are exclusively stored insubsets of caches with different cardinalities s ∈ [K]. In other words, for each s,59it shows Qs =∆ ∑Nn=1sK yns as a function of the cache capacity. As we expect fromour analysis, either one or two values of Qs can be positive for each choice of M .4.5.3 Numerical explorationFig. 4.3 shows the joint effect of the nonuniformity of the file request probabili-ties and the cache size M . The nonuniformity of the probability mass function iscontrolled by parameter α of Zipf distribution [45]. Fig.4.3a shows the expectedrate of RMSC for 0 ≤ α ≤ 2 normalized by the expected rate of SCC for the cor-responding value of α when the the placement for uniform demands is used. Thisnormalization puts the curves for different cache sizes in the same scale for bettervisual clarity and more significantly makes the curves more interpretable. In partic-ular, we observe that for a fixed cache size, the normalized expected rate remainsalmost equal to 1 as α is increased from 0 up to some threshold value. In otherwords, the optimal the delivery rate for a slightly nonuniform popularity distribu-tion is almost identical to the delivery rate given by the placement that treats thefiles as if they were uniformly popular. The threshold value of α depends on theavailable cache capacity. In general, as the Zipf distribution gets more heavy-tailed(smaller α) the normalized rate gets closer to 1. Fig. 4.3b shows the expected de-livery rate of RMSC versus cache size for different values of α. Consistent with ourprevious observation, for heavy-tailed popularity distributions, like the cases withα = 0, 0.5, 0.75, the optimal delivery rates are close to each other and are onlydifferent when cache storage is scarce (M/N ≤ 0.25).Reference [15] also proposed the use of a two-group or three-group strategy forthe caching of files with nonuniform demands. Fig. 4.4 compares the delivery ratesobtained by RMSC to those obtained by [15]. Accordingly, Table 4.1 provides thegroupings of the files made by the two techniques, where n∗ and N1 represent theindex of the last popular files given by RMSC and [15], respectively.8 Accordingly,R∗ andR1 represent the expected delivery rates of the two techniques. The superior8For the caching technique in [15], it is possible forN1 to be smaller thanM . In Table 4.1, this isthe case for M = 4 (M/N = 0.1). In such cases, part of the memory would remain unused if onlythe popular files were to be cached. In such a scenario, the authors suggest that each cache stores theentirety of the first M files, and if M is not an integer, use the remaining M − bMc capacity for thepartial caching of file bMc+ 1. In this scenario, uncoded messages are used to deliver the files thatare not fully cached. For more details refer to [15, Section V].600 0.5 1 1.5 200.10.20.30.40.50.60.70.80.91(a) Normalized expected rate versus the Zipf parameter for differentcache sizes0 0.2 0.4 0.6 0.8 100.511.522.533.54(b) Expected rate versus the normalized cache size for different ZipfparametersFigure 4.3: Joint effect of cache size and nonuniformity of the file requestprobabilities on the expected delivery rate. In (a), the expected rates arenormalized by the expected rate of the delivery algorithm SCC when theplacement for uniform demands is used instead of the optimal place-ment. Here K = 4 and N = 40.61performance of RMSC is evident in both Fig. 4.4 and Table 4.1. In Table 4.1,one observes that the set of popular files given by RMSC and [15] are identicalfor specific set of problem parameters but the delivery rates are different. This isbecause despite the identical grouping of files, the placement of the popular filesand the delivery algorithms are still different for the two techniques. In particular,[15] follows a randomized placement algorithm while RMSC uses a deterministicplacement technique. The randomized algorithm simplifies the placement at theexpense of a higher delivery rate.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.511.522.533.54M/NNormalizedExpectedRateRMSCScheme of Reference [15]Lower-bound of [50, Theorem 2]Lower-bound of [15, Theorem 1](a) α = 0.250 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.511.522.533.54M/NNormalizedExpectedRateRMSCScheme of Reference [15]Lower-bound of [50, Theorem 2]Lower-bound of [15, Theorem 1]](b) α = 0.50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.511.522.533.54M/NNormalizedExpectedRateRMSCScheme of Reference [15]Lower-bound of [50, Theorem 2]Lower-bound of [15, Theorem 1]](c) α = 0.750 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.511.522.533.54M/NNormalizedExpectedRateRMSCScheme of Reference [15]Lower-bound of [50, Theorem 2]Lower-bound of [15, Theorem 1](d) α = 1Figure 4.4: Comparison of the delivery rates of RMSC and method of [15]with the information theoretic lower-bounds for different parameters αof Zipf distribution. In all cases, N = 40.62Table 4.1: Comparison of sets of popular files resulting from RMSC and [15]and the resulting rates. In all cases N = 40.α M M/N n∗ N1 R1/R∗0.254 0.1 40 0 1.119216 0.4 40 40 1.305632 0.7 40 40 1.27530.54 0.1 16 2 1.042416 0.4 40 32 1.373232 0.7 40 40 1.27530.754 0.1 8 3 1.005916 0.4 40 20 1.206332 0.7 40 40 1.275314 0.1 4 3 1.00016 0.4 22 14 1.01432 0.7 38 26 1.047Comparison to the information theoretic outer boundAlso plotted in Fig. 4.4 are two lower-bounds on the expected delivery rate ofnonuniform demands which are reported in [15, Theorem 1] and [50, Theorem 2].The lower-bound in [50, Theorem 2] is over the caching schemes with uncodedplacement of content. Since the SCC algorithm relies on uncoded placement ofcontent, the bound in [50, Theorem 2] must hold for the expected delivery rate ofRMSC. The gap between the lower-bound in [50, Theorem 2] and the expectedrate of RMSC is small in general, suggesting that the performance of the RMSC isclose to the optimal performance for coded caching with uncoded prefetching ofcontent. More specifically, one can make the following two observations. First,for a fixed cache size, this gap increases as the file popularity distribution becomesmore nonuniform (larger α). Second, for a fixed value of α, this gap shrinks andapproaches zero asM/N increases. The existence of this performance gap is due tothe sub-optimality of the SCC procedure as the delivery algorithm that we explorein detail in the discussion section that will follow.For completeness of our numerical exploration, we also include the lower-bound derived in [15, Theorem 1] in Fig. 4.4. Unlike [50, Theorem 2], the boundin [15, Theorem 1] applies to all caching schemes regardless of whether coded or63uncoded prefetching is used. However, we found this bound to be loose in general.This can be seen at the extreme case of M/N = 0, where the minimum amount ofinformation that can be sent over the channel is equal to the number of distinct filesrequested. Yet, we see that the lower-bound in [15, Theorem 1] is considerablysmaller than this value, suggesting that the bound is loose. Notice that the boundin [15, Theorem 1] is derived to prove order-optimality of the caching scheme pro-posed in [15] and is not guaranteed to be a tight bound on the optimal rate.4.5.4 Discussion of the performance gap to the information-theoreticouter-boundThe existence of the gap between the expected rate of RMSC and the lower-boundin [50, Theorem 2], if not fully, is at least partially caused by the insensitivity of theSCC algorithm in Algorithm 4 to the presence of duplicate requests in the demandvector. In other words, if multiple caches request identical files, it still deliversthem as if the files were distinct. For instance, consider the case that M/N = 0and all caches request the same file. In that case, no side information is availableat the caches and all requests must be delivered by uncoded messages. The SCCalgorithm delivers the requests by transmitting the requested file K times. This isclearly suboptimal, as in this case it suffices to transmit the single file requested byall caches only once. This inefficiency of the original SCC algorithm for redundantrequests becomes more complex to characterize for M > 0, but it has been thor-oughly investigated in the literature [13, 54]. In particular, a modified version ofthe SCC algorithm was proposed in [13, Section IV.B and Appendix C.A], whichresolves this inefficiency and is shown to achieve the optimal memory-rate trade-off for the case of uniform demands when uncoded placement is used [13]. Themodified delivery algorithm was discussed in Section 2.2. This difference betweenthe original and modified SCC algorithms can explain the behaviour of the perfor-mance gap in Fig. 4.4. In particular, as M/N increases, the portion of bits cachedat subsets S ⊂ [K] with large |S| increases. At the same time, larger |S| impliesa higher probability that S includes at least one leader cache. Hence, the amountof information transmitted similarly by the original and modified SCC algorithmsincreases as M/N increases. As a result, the sub-optimality of the SCC algorithm64will have a minimal effect on the delivery rate and the gap of the rate of RMSC tothe lower-bound shrinks as M/N increases.9 Similarly, for a fixed M/N ratio, asα increases, more duplicate requests for the more popular files occur in the demandvector due to their considerably higher probability of request. Effectively, this re-duces the number of leader caches for the different demand vectors on average. Byreducing the number of subsets S : S ∩ U 6= ∅, this directly translates into thenecessity of transmission of a smaller number of coded messages, and therefore,a larger gap between the performance of the original and the modified SCC algo-rithms. This analysis explains why the gap between RMSC and the lower-boundincreases as α increases.4.6 Concluding remarksIn this chapter, we applied the structured clique cover delivery algorithm that wasproposed for decentralized coded caching to deliver files with nonuniform requestprobabilities. We fully characterized the structure of the optimal placement pa-rameters. We showed that for a finite set of cache capacities, called base-cases,the optimal placement follows the two group strategy that does not cache a sub-set of less popular files, but treats the other files selected for caching identicallyregardless of their popularities. A polynomial time procedure was also proposedto derive this set of cache capacities. We further showed that the optimal place-ment parameters for other storage capacities can be obtained by memory sharingbetween certain base cases. In this scenario, a grouping of files into two or threegroups is optimal.Motivated by our numerical results as well as the fact that for uniform demands,the modified SCC algorithm proposed in [13] results in the exact memory-ratetradeoff for caching with uncoded prefetching, it is worthwhile to explore whetheran analysis similar to what we presented in this paper can characterize the optimalplacement for coded caching with the modified SCC algorithm of [13] for delivery.Furthermore, it is of interest to see how the rate of such a caching scheme compares9A similar trend can be seen in [13, Fig. 5a] where the gap between the expected rates of theSCC algorithm and the (optimal) modified SCC algorithm vanishes as M/N increases, for the caseof uniform demands.65to the information-theoretic lower-bound on the expected delivery rate of cachingwith uncoded prefetching.66Chapter 5Coded Caching at File-Level5.1 IntroductionMost of the existing coded caching schemes require the splitting of files into a pos-sibly large number of subfiles, i.e., they perform coded subfile caching. Keepingthe files intact during the caching process would be appealing, broadly speakingbecause of its simpler implementation. However, little is known about the effec-tiveness of coded file caching in reducing the data delivery rate. In this chapter, wepropose such a file caching scheme and explore its performance.5.1.1 Motivation and related workWe consider the problem of decentralized Coded File Caching (CFC), where filesare kept intact and are not broken into subfiles. Our first motivation to analyze CFCis the large size requirement of most of the existing decentralized subfile cachings.In particular, [21] has developed a finite-length analysis of the decentralized codedcaching of [12] and showed that it has a multiplicative coding gain of at most 2even when F is exponential in the asymptotic gain KM/N . In [21], the authorshave also proposed a new Coded Subfile Caching (CSC) scheme that requires a filesize of Θ(dN/Meg+1(log(N/M))g+2(2e)g) to achieve a rate of 4K/(3(g + 1)).Another important limitation of the existing coded caching schemes is the largenumber of subfiles required to obtain the theoretically promised delivery rates. Forinstance, [12] requires 2K subfiles per file, and the different missing subfiles of67a file requested by each cache are embedded into 2K−1 different server messages[12]. This exponential growth in K has adverse consequences on the practical im-plementation of the system and its performance. In particular, a larger number ofsubfiles imposes large storage and communication overheads on the network. Asis pointed out in [36], during placement, the caches must also store the identity ofthe subfiles and the packets that are included in each subfile, as a result of the ran-domized placement algorithms used. This leads to a storage overhead. Similarly,during delivery, the server needs to transmit the identities of the subfiles that areembedded in each message which leads to communications overhead. Neither ofthese overheads are accounted for in the existing coded caching rate expressions,while they are non-negligible when the number of subfiles is relatively large. Forcentralized coded caching, [22] has proposed a scheme that requires a linear num-ber of subfiles in K and yields a near-optimal delivery rate. As is pointed outin [22], the construction of the proposed scheme is not directly applicable to thereal caching systems as it requires K to be quite large. However, it is interestingto see that a small number of subfiles can in theory result in a near-optimal per-formance. For decentralized caching, reference [55] has investigated the subfilecaching problem in the finite packet per file regime and has proposed a greedyrandomized algorithm, GRASP, to preserve the coding gain in this regime. Basedon computer simulations, the authors show that with a relatively small number ofsubfiles, a considerable portion of the asymptotic coding gain of infinite file sizeregime can be recovered. This observation further motivates a closer look at thefile caching problem and its performance analysis. Finally, since the conventionaluncoded caching systems do not break files into large numbers of subfiles, it is eas-ier for the existing caching systems to deploy coded caching schemes that requireone or a small number of subfiles per file.5.1.2 ContributionsAlthough CSC has been well investigated, the analysis of CFC is missing fromthe literature. Based on the motivations we discussed earlier, it is worthwhile toexplore CFC for its potential in reducing the delivery rate. In this chapter, we in-vestigate the decentralized CFC problem by proposing new placement and delivery68algorithms. The placement algorithm is decentralized and does not require any co-ordination among the caches. The first proposed delivery algorithm is in essence agreedy clique cover procedure which is applied to certain side information graphs.This algorithm is designed for the general settings of CFC. The second deliveryalgorithm adapts the online matching procedure of [56] and is suitable for CFC inthe small cache size regime.The online matching procedure can be seen as a special case of the messageconstruction algorithm in [57] with the merging procedure limited to 1-level merg-ing.The major contribution of this work is the analysis of the expected deliveryrates of the proposed algorithms for file caching. In particular, we derive an ap-proximate random graph modeling of the side information graphs for the proposedplacement and the distribution of user demands and based on this model, we ex-press the dynamics of the delivery algorithms through systems of differential equa-tions. The resulting systems can be solved to provide a tight approximation to theexpected delivery rate of CFC with the proposed algorithms. We provide a con-centration analysis for the derived approximations and further demonstrate theirtightness by computer simulations.Our results show that CFC is significantly more effective than uncoded cachingin reducing the delivery rate, despite its simple implementation and structurallyconstrained design. It is shown that the proposed CFC schemes preserve a consid-erable portion of the additive coding gain of the state-of-the-art CSC schemes. Wepresent a discussion on the extension of the proposed placement and delivery al-gorithms to subfile caching with an arbitrary number of subfiles per file. We showthat a considerable portion of the additive gain of the existing CSC methods canbe revived by using a small number of subfiles per file, which is consistent withthe observations made in [55]. Hence, our proposed method provides a means totradeoff the delivery rate with the complexity caused by the use of a larger numberof subfiles.The remainder of this chapter is organized as follows. We present our proposedCFC method in Section 5.2. Statistical analysis of the delivery rates and concen-tration results are provided in Section 5.3. We present numerical examples andsimulation results in Section 5.4. In Section 5.5, we discuss the generalization of69Algorithm 8 Decentralized File PlacementRequire: Library of N files1: for k = 1, . . . ,K do2: Cache k stores M out of N files from the library uniformly at random3: end forthe proposed placement and delivery to CSC.5.2 Placement and delivery algorithmsWe consider a system model as in Section 2.1. For the CFC problem which weconsider here, each cache can only store the entirety of a file and partial storageof a file is not allowed during content placement. This is the main element thatdifferentiates the problem setups of CFC and CSC.In the following, we propose algorithms for the placement and delivery of CFC.5.2.1 PlacementAlgorithm 8 shows the randomized procedure that we propose for content place-ment. Here, each cache stores M files from the library uniformly at random. Theplacement is decentralized as it does not require any central coordination amongthe caches. In Algorithm 8, all packets of M files are entirely cached. Notice thateach cache stores any particular file with probability q = M/N , independent fromthe other caches. However, the storage of the different files in a given cache are sta-tistically dependent, as the total number of cached files must not exceedM . Noticethat Algorithm 8 is different from the decentralized subfile placement of [12], asin the latter, an M/N -th portion of the packets of every file in the library is cachedand the packets of each file are cached independently from the packets of the otherfiles.5.2.2 DeliveryFor the proposed delivery, the server forms a certain side information graph forevery demand vector that arrives, and delivers the requests by applying a message70v1v2 v3v4 v5v6 v1v2 v3v4 v5v6Figure 5.1: Left: an example side information digraph D. Right: the corre-sponding side information graph G.construction procedure to that graph. The graphs that we use are defined in thefollowing.Definition 8. For a given placement and demand vector, the side information di-graph D is a directed graph on K vertices. Each vertex corresponds to a cache.A vertex has a loop if and only if the file requested by the cache is available in itsstorage. There is a directed edge from v to w if and only if the file requested bycache w is in cache v.Digraph D was first introduced in the index coding literature and was used tocharacterize the minimum length of linear index codes [7]. This was done throughcomputing the minrank of D [7, Theorem 1], which is an NP-hard problem.Definition 9. For a given placement and demand vector, the side information graphG is defined on the same set of vertices as D. A vertex in G has a loop if and onlyif it has a loop in D. There is an (undirected) edge between v and w if and only ifboth edges from v to w and from w to v are present in D.For an example side information digraphD, the corresponding side informationgraph G is shown in Fig. 5.1.We propose two algorithms that make use of graph G for delivery.Greedy clique cover delivery algorithm For delivery, the server constructs mul-ticast messages with the property that each cache can derive its desired file fromsuch messages using the side information that it has available.71Consider a set of unlooped vertices in graph G that form a clique.1 Each vertexin that set has the file requested by the other caches available, but its own requestedfile is missing from its local storage. As a result, if the server transmits a messagethat is the XOR of the files requested by the caches in that clique, the message isdecodeable by all such caches for their desired files [21, Section II.A]. Hence, todeliver the requests of every cache and minimize the delivery rate, it is of interestto cover the set of unlooped vertices of G with the minimum possible number ofcliques and send one server message per clique to complete the delivery. Noticethat the looped vertices of G do not need to be covered as the files requested bysuch caches are available to them locally.Finding the minimum clique cover is NP-hard [58]. In this section, we adapta polynomial time greedy clique cover procedure for the construction of the deliv-ery messages of CFC. The proposed clique cover procedure is presented in Algo-rithm 9.Notation 3. In Algorithm 9, the content of the file requested by cache k is denotedby Xk and ⊕ represents the bitwise XOR operation.In Algorithm 9, a vertex vt arrives at iteration (time) t ∈ {1, · · · ,K}. If vtis looped, we proceed to the next iteration. Otherwise, we check if there is anypreviously formed clique of size s ∈ {1, . . . , t − 1} that together with vt formsa clique of size s + 1. We call such a clique a suitable clique for vt. If suitablecliques for vt exist, we add vt to the largest one of them to form a new clique. Therationale for choosing the largest suitable clique is explained in Remark 6. If thereare multiple suitable cliques with the largest size, we pick one of them uniformlyat random. If no suitable clique exists, vt forms a clique of size 1. Notice that thecliques formed by Algorithm 9 are disjoint and cover the set of unlooped verticesof G. After the clique cover procedure completes, the server sends a coded messagecorresponding to each clique.The use of Algorithms 8 and 9 for placement and delivery leads to a cachingscheme that we call Coded File Caching with greedy Clique Cover Delivery (CFC-CCD).Remark 3. In Algorithm 9, each cache only needs to listen to one server message1For an undirected simple graph (no loops and no multiple edges), a clique is a subset of verticeswhere every pair of vertices are adjacent.72Algorithm 9 Greedy Clique Cover DeliveryRequire: G, vertex labels v1, · · · , vK// Form Cliques1: C ← ∅ // initialize set of cliques2: for t = 1, . . . ,K do3: if vt has a loop then4: Do nothing5: else if there is a suitable clique for vt in C then6: Join vt to (one of) the largest suitable clique(s) (randomly) and update C7: else8: Add {vt} to C as a clique of size 19: end if10: end for// Transmission of Messages11: for c ∈ C do12: Transmit ⊕k∈cXk13: end forto decode the file it requested, as the entirety of the file is embedded in only onemessage. This is in contrast to the CSC of [12] which requires each cache to listento 2K−1 out of the 2K server messages to derive its requested content.Algorithm 9 has a worst-case complexity of O(K2). This is because thereare at most K unlooped vertices in G, and in each iteration of the algorithm, theadjacency of vt with at most t− 1 vertices must be checked for finding the largestsuitable clique.Online Matching Delivery Algorithm We now propose our second delivery al-gorithm, which is applicable to the small cache size regime, i.e., when q is small.In this regime, the probability of having large cliques is small. Hence, one canrestrict the size of the cliques in the clique cover procedure to reduce the com-plexity without considerably affecting the delivery rate. As an extreme case, Al-gorithm 10 shows an online greedy matching algorithm adapted from [56, Algo-rithm 10], which restricts cliques to those of sizes 1 and 2. Notice that here graphG can have loops which is different from the graph model in [56, Algorithm 10]. InAlgorithm 10, if vt is looped, we remove it from the graph and proceed to the next73Algorithm 10 Online Matching DeliveryRequire: G, vertex labels v1, · · · , vK// Transmission of coded messages1: Q← ∅ // set of matched or looped vertices2: for t = 1, . . . ,K do3: Gt ← subgraph induced by G on vertices{v1, · · · , vt}\Q4: if vt has a loop then5: Q← Q ∪ {vt}6: else if vt has a neighbor in Gt then7: Match vt to a random neighbor vs ∈ Gt8: Transmit Xvt ⊕Xvs9: Q← Q ∪ {vt, vs}10: end if11: end for// Transmission of Uncoded Messages12: for v ∈ {v1, · · · , vK}\Q do13: Transmit Xv14: end foriteration. Otherwise, we try to match it to a previously arrived unmatched vertex.If a match found, we remove both vertices from the graph and proceed to the nextiteration. Otherwise, we leave vt for possible matching in the next iterations.The use of Algorithms 8 and 10 for placement and delivery leads to a methodthat we call cfcm! (CFCM!).5.3 Performance analsyisIn this section, we analyze the expected delivery rates of the proposed file cachingschemes through a modeling of the dynamics of Algorithms 9 and 10 by differentialequations.5.3.1 Overview of Wormald’s differential equations methodTo analyze the performance of CFCM! and CFC-CCD, we use Wormald’s differen-tial equation method and in particular Wormald’s theorem [59, Theorem 5.1]. A74formal statement of the theorem is provided in Appendix B.1. Here, we present arather qualitative description of the theorem and how it can be used to analyze theperformance of the greedy procedures on random graphs.Consider a dynamic random process whose state evolves step by step in dis-crete time. Suppose that we are interested in the time evolution of some property Pof the state of the process. Since the process is random, the state of P is a randomvariable. To determine how this random variable evolves with time, a general ap-proach is to compute its expected changes per unit time at time t, and then regardtime t as continuous. This way, one can write the differential equation counterpartsof the expected changes per unit of time in order to model the evolution of the vari-able. Under certain conditions, the random steps follow the expected trends with ahigh probability, and as a result, the value of the random variable is concentratedaround the solution of the differential equations. In this context, Wormald’s the-orem states that if (i) the change of property P in each step is small, (ii) the rateof changes can be stated in terms of some differentiable function, and (iii) the rateof changes does not vary too quickly with time, then the value of the random vari-able is sharply concentrated around the (properly scaled) solution of the differentialequations at every moment [59, Section 1.1],[60, Section 3].Notice that both Algorithms 9 and 10 can be viewed as dynamic processes onthe random side information graph. In Algorithm 9, a vertex arrives at each timeand potentially joins a previously formed clique. In this case, we are interested inthe number of cliques at each time instant, as at the end of the process, this equalsthe number of coded messages that must be sent. In Algorithm 10, the arrived ver-tex might be matched to a previously arrived vertex. Here, the property of interestis the total number of isolated vertices plus matchings made. We apply Wormald’stheorem to analyze the behavior of these random quantities and to approximatetheir expected values at the end of the process.5.3.2 Statistical properties of the random graph models for side infor-mationFor the analysis of the side information graphs D and G, it is required to character-ize the joint distribution of the edges and the loops for each of these graphs.75In digraph D, every directed edge or loop is present with the marginal proba-bility of q. However, there exist dependencies in the presence of the different edgesor loops in D.Notation 4. We denote by euv the Bernoulli random variable representing the di-rected edge from vertex u to vertex v. In case of a loop, we have u = v. Thisrandom variable takes values 1 and 0 when the edge is present and absent, respec-tively. Also, we represent by E the set of the K2 random variables representing allthe edges and the loops of D. Further, by fu and F−u we denote the random filerequested by vertex u and the random set of distinct files requested by all verticesexcept vertex u, respectively.2Remark 4. To characterize the dependencies among the edges and the loops ofD,consider the random variable euv. The state of the other variables E \ euv affectsthe distribution of euv in the following way:(i) The presence (absence) of a directed edge from vertex u to another vertexv implies that the file requested by v is (not) available in the cache of u.Hence, if fv ∈ F−v, the values of the variables in E \ euv that correspond tothe edges that originate from u fully determine euv. Otherwise, the randomvariables E \ euv provide information about the storage of the files F−v inthe cache of u. Since the cache size is finite, this information affects theprobability of the storage of fv in the cache of u and hence the distributionof euv|E \ euv can be different from the marginal distribution of euv.(ii) Furthermore, the values of E \ euv can affect the probability that fv ∈ F−v(see Fig. 5.2 for an example). Based on point (i), any change in the proba-bility of fv ∈ F−v can directly change the probability of the presence of fvin the cache of u.An example of point (ii) in Remark 4 is shown in Fig. 5.2 for vertices v and w. Thestate of the edges between these two vertices and their loops implies that fv 6= fw.Consequently, euw = 1 reduces the probability of the presence of fv in the cache ofu, or equivalently the probability of euv = 1, from the marginal probability q = MNto M−1N−1 < q.2We have fu ∈ F−u if the file requested by vertex u is also requested by other vertices.76v wuevv = 0evw = 1eww = 0ewv = 1euw = 1Figure 5.2: Solid (dashed) lines represents the presence (absence) of an edge orloop and no information is available about the presence of the edges and loopsthat are not shown. This is an illustration of the dependencies among the edgesof D. The values of eij , i, j ∈ {v, w} in this example imply that files fvand fw are distinct, as otherwise we must have had evv = 1 and eww = 1.Further, euw = 1 implies that fw is cached in u. As a result, the probability ofeuv = 1, i.e., fv being in the cache of u, becomes M−1N−1 . This is different fromthe marginal probability MN for euv = 1, which shows the dependency amongthe edges.Because of the dependencies among the edges of D and therefore the depen-dencies in the edges of G, an analysis of the performance of CFCM! and CFC-CCDis difficult. However, we show that in the regime of KN → 0 such dependenciesbecome negligible.Definition 10. Consider the side information digraph (graph) X as defined in Def-inition 8 (Definition 9) for a network with K caches and a library of N files fromwhich each cache independently and uniformly at random requests a file. We saya random digraph (graph) Xa asymptotically approximates X as K/N → 0 ifit is defined over the same set of vertices as X , and the joint probability distri-butions of the presence of edges in the two graphs satisfy limKN→0 P(X )(E) =limKN→0 P(Xa)(E), where the superscripts determine the graph to which the condi-tional edge probability corresponds.3 For simplicity, we refer to Xa as an asymp-totic model for X .Proposition 7. For KN → 0, the random graph D is asymptotically approximated3Here again, euv represents the random variable corresponding to the presence of a directed edgefrom u to v if X is a digraph and an edge between u and v if X is an undirected graph. Also, Erepresents the set of random variables euv defined over all combinations of u and v.77by the graphDa for which every edge and loop is present independently with prob-ability q. Also, graph G is asymptotically approximated by Ga for which every edgeand loop is present independently with probabilities q2 and q, respectively.Proof. See Appendix B.2.Based on the chain rule for probabilities and since the limits of the conditionalprobabilities in Definition 10 are finite,4 a random graph X and an asymptoticapproximation to itXa, have the same joint distribution over their edges asK/N →0. As a result, if a property defined over random graph X is solely a function of itsedges, for instance the number of cliques or isolated vertices in X , that propertybehaves statistically the same when defined over Xa as K/N → 0.5.3.3 Analysis of the expected rate of CFCM! and CFC-CCDAlgorithms 9 and 10 take a generic side information graph G as input and out-put the communication scheme comprising the delivery messages. The number ofmessages constructed by these algorithms is solely a function of the configurationof the edges of G. Moreover, based on Proposition 7, graph Ga asymptoticallyapproximates G in the K/N → 0 regime. Hence, we perform the analysis of theexpected delivery rate of the proposed caching schemes in the KN → 0 regime usingthe asymptotic model Ga for the side information graph. This approach is math-ematically tractable, as unlike in G, all the edges and loops are independent fromeach other in Ga.Assuming that the input graph to Algorithms 9 and 10 is statistically modeledby random graph Ga, and by applying Wormald’s theorem, we get the followingtwo results for the outputs of the two algorithms.Theorem 3. For Algorithm 10 with input side information graphs that follow therandom graph model Ga, the number of isolated vertices plus the number of match-4In particular, limKN→0 P(X )(E) = limKN→0∏i P(X )(ei | e1, . . . , ei−1) =∏i limKN→0P(X )(ei | e1, . . . , ei−1), where the edge indices represent an arbitrary orderingof the set of all possible edges and loops of the random graph X .78ings made by Algorithm 10 is12[K(1− q)− log(2− (1− q2)K(1−q))log(1− q2)]+O(λK)with probability 1−O( 1λe−Kλ3), for any λ > 0.Proof. See Appendix B.3.Theorem 4. For Algorithm 9 with input side information graphs that follow therandom graph model Ga, the number of cliques that cover their vertices usingAlgorithm 9 isKK∑i=1zi(1; q) +O(λK), (5.1a)with probability 1−O(Kλ e−Kλ3), where functions zi(x; q), i = 1, . . . ,K are givenby the unique solution to the system of differential equations{dzidx =(1−q)[2gi(z)−gi+1(z)+gi−1(z)] ;zi(0; q) = 0,(5.1b)wheregi(z) =K∏j=i(1− q2j)Kzj(x;q) (5.1c)and z = (z1, . . . , zK).Proof. See Appendix B.4.One can combine Proposition 7, and Theorems 3 and 4, to get approximationsto the expected delivery rates of CFCM! and CFC-CCD as is outlined in the follow-ing result.Corollary 4. For 0 K N , the expected delivery rates of CFCM! and79CFC-CCD can be approximated byE (Rm) ≈ 12[K(1− q)− log(2− (1− q2)K(1−q))log(1− q2)](5.2)andE(Rcc) ≈ KK∑i=1zi(1), (5.3)respectively.The approximations in (5.2) and (5.3) become tight as K → ∞ and KN → 0.The former is required for the validity of the concentration results in Theorems 1and 2.5 The latter condition ensures that the asymptotic modelsDa and Ga are validstatistical representations of the side information graphs D and G as per Proposi-tion 7. Notice that this condition is satisfied in many practical scenarios where thenumber of files to be cached is considerably larger than the number of caches inthe network.Remark 5. Applying Algorithm 9 to random graph Ga has the property that asvertex vt arrives, the behavior of the algorithm up to time t does not provide anyinformation about the connectivity of vertex vt to the previously arrived vertices.In other words, vt is connected to any of the previously arrived vertices with prob-ability q2. This is because Algorithm 9 operates based on the structure of the edgesin the side information graph. If the different edges are present independently, thehistory of the process up to time t does not change the probability of the presence ofthe edges between vt and the previously arrived vertices v1, . . . vt−1. This propertyis essential for the proof of Theorem 4.Remark 6 (Rationale for Choosing the Largest Clique in Algorithm 9). Algo-rithm 9 joins vt to the largest suitable clique at time t. The rationale for thischoice is as follows. Let Yi(t) represent the number of cliques of size i right beforetime t, where i ∈ {1, · · · ,K}. Let Nt be the event that there exists no suitable5This is because the asymptotics denoted by O in Theorems 3 and 4 are for K → +∞, and theprobabilities in the statement of the theorems approach 1 as K increases.80clique for vt at time t. Then,P(Nt) =K∏i=1(1− q2i)Yi(t).Given Yi(t), and with the knowledge that vt has joined a clique of size s at time t,we get Yi(t+ 1) = Yi(t), i 6∈ {s, s+ 1}, Ys(t+ 1) = Ys(t)− 1 and Ys+1(t+ 1) =Ys+1(t) + 1. Then, it is easy to show thatP(Nt+1|vt joined a clique of size s at time t,Y (t))=1− q2(s+1)1− q2sK∏i=1(1− q2i)Yi(t), (5.4)where Y (t) = (Y1(t), · · · , YK(t)). In order to minimize the expected rate, one hasto make (5.4) small. Now, assume that at time t, multiple suitable cliques existedfor vt. It is straightforward to check that (5.4) would take its smallest value if thesuitable clique with the largest size s was chosen at time t.Remark 7. In the alternative setting that the popularities of files are nonuniform,the more popular files would be cached with higher probabilities during placement.In that case, the property discussed in Remark 5 does not hold anymore for Algo-rithm 9. In particular, at step t, the posterior probability of vt to connect to thepreviously arrived vertices was strongly dependent on the history of the process upto time t. For instance, the probability of vt to connect to the vertices that haveformed a clique with a large size is higher than the probability of vt to connectto vertices that belong to small cliques. This is because a vertex that belongs toa large clique is more likely to correspond to a request for a more popular file.Such a file is more likely to be availabe in each cache, including the one that cor-responds to vertex vt. The modeling of the posterior probabilities of the presenceof edges makes the analysis of CFC-CCD difficult for the side information graphsarisen from nonuniform demands.815.4 Performance comparison and simulationsIn this section, we present numerical examples and simulation results for the ap-plication of the CFC schemes that we proposed, to demonstrate the effectivenessof CFC in reducing the delivery rate. We further show that the expressions in (5.2)and (5.3) tightly approximate the expected delivery rates of CFCM! and CFC-CCD,respectively.We use the expected delivery rates of two reference schemes to comment onthe performance of our proposed CFC. The reference cases are the uncoded cachingand the optimal decentralized CSC scheme derived in [13]. The latter is the state-of-the-art result on CSC which provides the optimal memory-rate tradeoff over allcoded cachings with uncoded placement. Although the asymptotic average deliv-ery rate derived in [13, Theorem 2] is valid in the infinite file size regime and theunderlying caching scheme requires an exponential number of subfiles inK, it pro-vides a suitable theoretical reference for our comparisons. To compute the expecteddelivery rate of the optimal decentralized CSC, one needs to find the expectation inthe RHS of [13, Theorem 2]. This is done in the following lemma.Lemma 4.Ed(R∗CSC) =N −MM[1−K∑n=1(Nn){Kn}n!NK(1−M/N)n], (5.5)where{Kn}represents the Stirling number of the second kind [61, Section 5.3].Also, the expected rate of uncoded caching is derived asEd(Runcoded) = N(1− (1− 1N)K)(1−M/N) (5.6)= K(1−M/N) +O(1/N2)Proof. See Appendix B.5.We use the approximation Ed(Runcoded) ≈ K(1−M/N) throughout this sec-tion, as we use N ≥ 100 in all the following examples.820 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 105101520253035404550M/NExpectedRate(files)***********°°°°°°°°°°°Uncoded CachingCFC-MD: TheoreticalCFC-MD: SimulationCFC-CCD: TheoreticalCFC-CCD: SimulationOptimal CSC°*(a) K = 50, N = 10000 0.2 0.4 0.6 0.8 1050100150M/NExpectedRate(files)°°°°°°°°°°°°*********** *Uncoded CachingCFC-MD: TheoreticalCFC-MD: SimulationCFC-CCD: TheoreticalCFC-CCD: SimulationOptimal CSC°*(b) K = 150, N = 100Figure 5.3: Comparison of the delivery rates of the different cachingschemes.5.4.1 Numerical Examples for Expected RatesFig. 5.3 shows the expected delivery rates of CFCM! and CFC-CCD, as well as theaverage rates of the optimal CSC and the uncoded caching for two networks, onewith parameters K = 50 and N = 1000 and the other with parameters K = 150and N = 100. First, we observe that CFC-CCD is notably effective in reducingthe delivery rate. In particular, it reduces the delivery rate by 60 to 70 percentcompared to uncoded caching. Notice that as argued in Section 5.2.2, for smallcache sizes, the expected delivery rate of CFCM! is close to that of CFC-CCD, asonly a small fraction of vertices are covered by cliques of sizes larger than 2. Thisis shown in Fig. 5.4. As the cache sizes get larger, the probability of formation oflarger cliques increases and therefore CFC-CCD considerably outperforms CFCM!.Hence, in the small-cache regime, the use of CFCM! is practically preferred toCFC-CCD because of its lower computational complexity.Further, we observe that the theoretical expressions in (5.2) and (5.3) are inagreement with the empirical average rates obtained by simulations. Particularly,we have KN = 0.05 and 1.5 in Figs. 5.3a and 5.3b, respectively. The results for thelatter case imply that the condition KN → 0 required in our theoretical analysis canbe too conservative in practice.In Fig. 5.5, the per user expected delivery rates, i.e., the expected delivery rates8310−2 10−1 10010−810−610−410−2100M/NNormalizedNumberofVertices▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°◊◊◊◊◊◊◊◊ ◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■K = 10K = 25K = 50K = 100▲°◊■Figure 5.4: Expected number of vertices that are covered by cliques of sizeslarger than 2 by Algorithm 9, normalized by K. Here, N = 1000.normalized by K, are shown for different numbers of caches. In all cases, N ischosen such that KN = 0.1. Notice that although the asymptotics in the rate expres-sions in Theorems 3 and 4 are for K → ∞, the proposed approximations closelymatch the simulation results for K as small as 10. In general, our simulation ex-plorations show that in the regime ofK ≥ 20 andK/N ≤ 0.5, the approximationsin Corollary 4 are tight.5.4.2 Characterization of coding gainWe now look at the additive and multiplicative coding gains defined as ga = 1 −E(Rcoded)E(Runcoded) and gm =E(Runcoded)E(Rcoded) , respectively, for CFC-CCD and CFCM!. Notice thatga = 1 − 1/gm. Fig. 5.6 shows ga and gm for different coded caching schemes.First, notice that in all cases, CFC-CCD significantly outperforms uncoded caching.Second, for CFC-CCD, ga and gm reach their maximum for large values of M/N ,where the gap between the additive gains of CFC-CCD and optimal subfile cachingshrinks significantly. The initial increase of the coding gains with M/N is due toformation of larger cliques in the side information graph. However, as M → N ,840 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91M/NExpectedPer-UserRate(files)***************°°°°°°°°°°° °°°*************°°°°°°°°°°********** * * *°°°°°°°°°°Uncoded CachingCFC-MD: TheoreticalCFC-MD: SimulationCFC-CCD: TheoreticalCFC-CCD: SimulationOptimal CSC°*K = 10K = 25K = 100Figure 5.5: The expected per-user delivery rates for caching networks withdifferent numbers of caches. Here, KN = 0.1.the gains decrease again. This behavior is due to the fact that although for largeM/N most of the vertices are connected, at the same time most of the vertices arealso looped and are removed by Algorithm 9. Hence, for M ≈ N , the chance offormation of large cliques diminishes, causing gm to approach 1 and consequentlyga to drop to 0. Third, the coding gains increase with K because of the highernumber of larger cliques formed in the side information graph. For CFCM!, gmis upper bounded by 2 which is the multiplicative coding gain of CFCM! whenperfect matching occurs. Fourth, notice the gap of the curves for the optimal CSCto 100% at M/N = 1. This gap is equal to 1N(1−(1−1/N)K) × 100%6, which canbe approximated by 1K × 100% for large N .Based on our observation in Fig. 5.6, the multiplicative gain of CFC-CCD islimited compared to the optimal CSC, as in the latter, it grows almost linearly with6This is a direct consequence of eqs. (B.9) and (B.11) in which imply that E(Runcoded)E(R∗CSC)=N(1−(1−1/N)K)q∑Km=1 P(Ne(d)=m)(1−(1−q)m). As a result, limq→1 E(Runcoded)E(R∗CSC)= N(1 − (1 − 1/N)K) andlimq→1E(Runcoded)−E(R∗CSC)E(Runcoded) = 1−1N(1−(1−1/N)K) .850 0.2 0.4 0.6 0.8 1020406080100M/NRelativeAdditiveGain(Percentage)***********************************************************************************************************°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°◊◊◊◊◊◊◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊ ◊◊ ◊ ◊◊ ◊ ◊◊ ◊ ◊ ◊◊ ◊ ◊ ◊ ◊ ◊◊◊◊ ◊◊ ◊ ◊ ◊ ◊◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊◊◊◊◊ ◊◊ ◊ ◊ ◊ ◊ ◊◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊K = 25, CFC-CCDK = 25, CFC-MDK = 25, Optimal CSCK = 75, CFC-CCDK = 75, CFC-MDK = 75, Optimal CSCK = 150, CFC-CCDK = 150, CFC-MDK = 150, Optimal CSC***°°°◊◊◊(a) Relative Additive Gain0 0.2 0.4 0.6 0.8 111.522.533.544.55M/NMultiplicativeGain*****************************************************************°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°◊ ◊◊◊◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊◊◊◊◊◊◊◊◊◊◊◊◊◊ ◊◊ ◊◊◊◊◊ ◊ ◊◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊K = 25, CFC-CCDK = 25, CFC-MDK = 75, CFC-CCDK = 75, CFC-MDK = 150, CFC-CCDK = 150, CFC-MD**°°◊◊(b) Multiplicative Gain0 50 100 150 200 250020406080100KRelativeAdditiveGain(Percentage)***************************************************************************************************************************************************°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊M/N = 0.4, CFC-CCDM/N = 0.4, CFC-MDM/N = 0.4, Optimal CSCM/N = 0.8, CFC-CCDM/N = 0.8, CFC-MDM/N = 0.8, Optimal CSCM/N = 0.95, CFC-CCDM/N = 0.95, CFC-MDM/N = 0.95, Optimal CSC***°°°◊◊◊(c) Relative Additive Gain0 50 100 150 200 250123456KMultiplicativeGain*************************************************************************************************°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊◊M/N = 0.4, CFC-CCDM/N = 0.4, CFC-MDM/N = 0.8, CFC-CCDM/N = 0.8, CFC-MDM/N = 0.95, CFC-CCDM/N = 0.95, CFC-MD**°°◊◊(d) Multiplicative GainFigure 5.6: Comparison of the additive and multiplicative coding gains ob-tained by the different caching schemes. Here, N = 100.Kq. 7 However, we observe that a considerable portion of the additive gain ofoptimal CSC can be revived by CFC-CCD. Hence, it is better justified to look atthe coding gains of the proposed CFC schemes as additive and not multiplicativegains.5.5 Extension to subfile cachingWe have shown that the proposed CFC is an easy to implement yet effective tech-nique to reduce the average delivery rate of caching networks. However, there is7To keep the other curves readable, the multiplicative coding gain of the optimal CSC is not shownin Figs. 5.6b and 5.6d.86a gap between the delivery rates of the proposed CFC and the optimal CSC, as thelatter allows for an arbitrarily high complexity in terms of the number of subfilesused per file and also assumes infinite number of packets per file. It is of practi-cal interest to explore the improvement in the expected delivery rate in a scenariowhere the system can afford a given level of complexity in terms of the numberof subfiles used per file. In this section, we consider this problem and show thatthe placement and delivery proposed in Algorithms 8 and 9 also provide a frame-work for CSC when a small number of subfiles per file is used for caching. Theresulting CSC scheme provides a means to tradeoff between the delivery rate andthe implementation complexity, i.e., the number of subfiles used per file.Throughout this section, performance evaluations are based on computer sim-ulations as the theoretical results of Section 5.3 cannot be extended to subfilecaching.PlacementFor subfile caching, we break each file in the library into ∆ ≥ 1 subfiles. Thisleads to a library of N∆ subfiles, where each subfile is of length F/∆. Then, weapply the placement in Algorithm 8 to the library of subfiles. Notice that the pro-posed subfile placement is different from the decentralized placement of [12]. Inparticular, here all the subfiles are of the same length. Moreover, in the placementof [12], the number of packets of each file stored in each cache was M/NF , whilethis number is random here. Also notice that, for each cache, the prefetching of thesubfiles that belong to different files are dependent, as the total size of the cachedsubfiles must be at most M .Remark 8. The choice of ∆ depends on the level of complexity that is tolerable tothe system. The only restriction that the proposed placement imposes on ∆ is forthe ratio F/∆ to be an integer value. This condition can be easily satisfied in thefinite file size regime.DeliverySimilar to the delivery of file caching, for subfile caching the server forms the sideinformation graphsD and G upon receiving the user demands. Then, it delivers the87requests using Algorithm 9, by treating each subfile like a file. Notice that we donot consider CSC with online matching delivery in this section, as its coding gain islimited to 2 and this bound does not improve by increasing the number of subfiles.Side information graphs For subfile caching, both the side information graphsD and G have K∆ vertices. Similar to the approach we followed in Proposition 7,in the following analysis, we consider Da and Ga as the asymptotic counterparts ofD and G for K∆N∆ → 0. Let wk,δ represent the vertex corresponding to cache k andsubfile δ. InDa, vertexwk,δ has a loop if subfile δ of the file requested by cache k isavailable in cache k. The probability of a loop is q. The main structural differencebetween graphs Da for file and subfile caching is that the presence of the edgesare highly dependent in subfile caching. In particular, for two given caches k andl 6= k, and a given subfile δ of the file requested by cache k, there exist ∆ directededges from wl,θ to wk,δ for all θ = 1, . . . ,∆, if subfile δ of the file requested bycache k is available in cache l. This event has probability q. Otherwise, all ∆edges from wl,θ to wk,δ with θ = 1, . . . ,∆ are absent. Also, for subfile caching,we draw no edges from wk,δ to wk,θ for δ 6= θ. This does not affect the outputof the delivery algorithm as if subfile θ of file Xk is present in cache k, vertexwk,θ is looped and hence is ignored (not joined to any clique) by Algorithm 9. Thelatter two properties make the model of the random graph Da for subfile cachingsignificantly different from the model used for file caching.Graph Ga is built from Da in exactly the same way as for file caching. As aresult, each vertex is looped with probability q. Also, any two vertices belongingto two different caches are present with marginal probability of q2. However, thepresence of the edges between the vertices of two caches are highly dependentbecause of the discussed structures in Da.Notice that forming the joint side information graphs for the delivery of all ∆subfiles increases the coding opportunities compared to the case with ∆ separateside information graphs for the disjoint delivery of the δ-th subfiles of every file.In other words, the number of edges in the joint side information graph Ga growssuperlinearly in ∆ as each of the K∆ vertices can connect to (K − 1)∆ othervertices. Fig. 5.7 shows an example of separately formed and the correspondingjointly formed side information graphs.88321 45 6 7 8D(1)D(2)cache 1 cache 2 cache 3 cache 4(a) D(δ) for δ-th subfiles121 45 6 7 8G(1)G(2)cache 1 cache 2 cache 3 cache 4(b) G(δ) for δ-th subfiles321 45 6 7 8cache 1 cache 2 cache 3 cache 4(c) Joint D321 45 6 7 8cache 1 cache 2 cache 3 cache 4(d) Joint GFigure 5.7: K = 4 caches and ∆ = 2 subfiles per file. (a), (b): Side infor-mation graphs for separate delivery of δ-th subfiles with δ = 1, . . . ,∆.(c), (d): Side information graphs for joint delivery of all subfiles.Remark 9. Coding opportunities increase by increasing ∆. Hence, in practice, ∆is determined by the highest level of complexity that is tolerable to the system interms of the number of subfiles used.Delivery algorithms The delivery process is complete with the side informationgraph G inputted to Algorithm 9. We call the resulting scheme Coded SubfileCaching with greedy Clique Cover Delivery (CSC-CCD).Simulation resultsFor a system with K = 50 caches, Fig. 5.8a shows the expected delivery ratesof CSC-CCD with ∆ = 5 and ∆ = 25 as well as the expected delivery rates ofCFC-CCD, the optimal decentralized CSC of [13] and uncoded caching. Notice that89CFC-CCD is identical to CSC-CCD with ∆ = 1 subfile.An important observation is that relative to the 2K subfiles required by theoptimal decentralized CSC, the use of a small number of subfiles in CSC-CCD cansignificantly shrink the rate gap between the delivery rates of CFC-CCD and theoptimal decentralized CSC. In Fig. 5.8a, we have also shown approximate expecteddelivery rates based on the theoretical results in Theorem 4. More specifically, weuse the approximation Rcc,subfile ≈ 1∆Rcc(K∆). As we discussed earlier, becauseof the structural differences between the side information graphs for file and subfilecaching, such an approximation is generally prone to large errors. However, for∆ K and N∆ K∆, it still results in an approximation for the expecteddelivery rate.Figs. 5.8b and 5.8c show the additive and multiplicative coding gains for theproposed CSC scheme. One observes that the rate of change in the additive gaindecays as the number of subfiles increases. For instance, increasing the numberof subfiles from ∆ = 1 to ∆ = 5 results in a larger reduction in the delivery ratecompared to the case where the number of subfiles is increased from ∆ = 5 to∆ = 25. However, the multiplicative gain increases significantly when a largernumber of subfiles is used.5.6 Concluding remarksWe explored the decentralized coded caching problem for the scenario where filesare not broken into smaller parts (subfiles) during placement and delivery. Thisscenario is of practical importance because of its simpler implementation. Weshowed that although the requirement of caching the entirety of a file puts restric-tions on creation of coding opportunities, coded file caching is still an effective wayto reduce the delivery traffic of the network compared to the conventional uncodedcaching. In particular, we proposed the CFC-CCD file caching scheme, which per-forms delivery based on a greedy clique cover algorithm operating over a certainside information graph. We further proposed a file caching scheme called CFCM!,which uses a simpler delivery algorithm, and is designed for the small cache sizeregime. We derived approximate expressions for the expected delivery rate of bothschemes.900 0.2 0.4 0.6 0.8 101020304050M/NExpectedRate(files)*********** * * * * * * * * * ************** * * * * * ********** * * * * * * * *Uncoded CachingOptimal CSCCSC-CCD: ApproximationCSC-CCD: Simulation*CFC-CCDCSC-CCD with ∆ = 5CSC-CCD with ∆ = 25■■■(a) Average Rates0 0.2 0.4 0.6 0.8 1020406080100M/NRelativeAdditiveGain(Percentage)°°°°°°°°°°° °° °° °° ° ° °°°◊◊◊◊◊◊◊ ◊◊ ◊◊ ◊◊ ◊◊ ◊ ◊◊ ◊ ◊ ◊◊◊********* ** ** ** * * * ****CFC-CCDCSC-CCD with ∆ = 5CSC-CCD with ∆ = 25Optimal CSC◊°*(b) Additive Gains0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1123456789M/NMultiplicativeGain°°°° °° °° °°° °°°°°°° °°°* ** ** ** ** ** ** * ** * * ***◊◊◊ ◊◊ ◊◊ ◊◊◊◊◊◊◊◊◊◊◊◊ ◊◊CFC-CCCSC-CCD with ∆ = 5CSC-CCD with ∆ = 25*°◊(c) Multiplicative GainsFigure 5.8: (a) shows both theoretical approximations and simulation results.(b) and (c) are based on computer simulations. Here K = 50 cachesand N = 1000 files.Because of the more restricted setup of the file caching problem, the schemeswe proposed here are sub-optimal compared to the coded subfile caching schemes,but they still promise considerable gains over uncoded caching. Further, the deliv-ery rate gap between CFC-CCD and the state-of-the-art decentralized coded subfilecaching schemes shrinks considerably in the regime of large total storage capac-ity. These findings suggest that in scenarios where the implementation of subfilecaching is difficult, it is still possible to effectively gain from network coding toimprove the system performance.Finally, we discussed the generalization of the proposed file caching schemesto subfile caching with an arbitrary number of subfiles per file. We observed that a91considerable portion of the performance loss due to the restrictions of file cachingcan be recovered by using a relatively small number of subfiles per file. While theconstruction of most of the coded subfile caching methods requires large numbersof subfiles per file, this observation gives grounds for the development of new de-centralized subfile caching schemes that require a small number of subfiles withoutcausing a considerable loss in the delivery rate.92Chapter 6ConclusionThe field of coded caching has attracted significant attention over the last five yearsbecause of its promise in alleviation of network congestion due to the use of codingtechniques. The pioneering works in the field [11, 12] mostly focused on the fun-damental information-theoretic limits of coded caching, seeking the ultimate gainsthat it can possibly provide in reducing the server-to-cache delivery rate, undersimplifying assumptions on the problem setting. Considerable amount of researchhas been devoted to adapt coded caching schemes to complex and realistic scenar-ios that are of practical importance, including extensions to nonuniform demands,finite file size regime, unequal file sizes, etc.In this thesis, we investigated the problem of coded caching from the perspec-tives of optimization theory and graph theory, in an effort to address fundamentalchallenges for the use of coding techniques for caching. In particular, we firstlooked at the coded caching problem as an optimization problem. The work inChapter 3 was among the earliest works that adapted an optimization perspectiveto the coded caching problem. We used a formulation in Chapter 3 for the case ofuniform demands to address the problem of delivering demands that include du-plicate requests. In addition to the importance of the proposed delivery algorithmin Chapter 3 for duplicate demands, that work paved our way to develop a moregeneral formulation of the cache placement optimization problem which includednonuniform demands as well. This generalized formulation was the foundation ofour work in Chapter 4.93The problem of optimal cache placement for coded caching with nonuniformdemands is a central problem in the field, and several works in the literature at-tempted to address it. In particular, parallel to our work, some adapted an opti-mization formulation to solve the cache placement problem. A fundamental factorthat distinguishes our work in Chapter 4 from the other works in the literature is thatwe followed an analytical approach to exactly solve the optimization problem, i.e.,we neither left the optimization to numerical algorithms nor we used approxima-tions to simplify the problem. What made it possible for us to address the problemanalytically was the link we established between the cache placement optimiza-tion problem and the optimization of submodular set functions. We derived strongstructures in the optimal placement strategy by carefully developing the connectionbetween the two problems and aggregating the results developed for submodularfunction optimization into our original problem through the Lagrange duality the-ory. We believe our contributions in Chapter 4 present the state-of-the-art on cacheallocation for coded caching with nonuniform demands and uncoded-prefetchingas our comparison to the other works in the literature shows. Our proposed place-ment is also nearly optimal for that setting as it is suggested by our comparison tothe information-theoretic lower bound.Another major contribution of this thesis was the decentralized algorithm weproposed in Chapter 5 for coded caching under the requirement of keeping the filesintact during placement. Looking at the spectrum of coded caching techniques interms of the number of chunks they need to split every single file into during place-ment, this work represents one extreme of the spectrum where files remain intact.This is opposed to the other extreme in which the core coded caching algorithmsrepresent, as they require every file to be split into a number of chunks that growsexponentially with the number of caches. In our proposed scheme, we used a ran-domized placement and a delivery based on the greedy clique cover algorithm, awell-known graph theoretical technique which is deployed in the index coding lit-erature to construct coded messages. The randomized placement as well as therandom nature of user demands makes the underlying side information graph ran-dom, which is the graph that the clique cover algorithm operates on. The complexrandom nature of the graph as well as the combinatorial nature of the clique coveralgorithm makes the exact analysis of the delivery rate mathematically intractable.94A major contribution of Chapter 5 was the development of an approximate randomgraph model and the deployment of methods in random graph processes to derivea system of differential equations whose solution gives an approximation to thedelivery rate of the proposed scheme. We argued that the approximation becomesmore accurate for a larger ratio of the number of files to the number of caches. Thisratio is usually large in most practical scenarios, making the analysis applicable inpractice.The coded caching problem is complex a problem, which is evident just fromthe fact that solving the delivery component alone is NP-hard in general. Thiscomplexity calls for the deployment of different techniques in various mathemati-cal fields to deal with the problem. This thesis presented our effort to address codedcaching under certain setups by looking at it from the optimization and graph the-oretical perspectives.Directions for future researchOur work in Chapter 4 on coded caching with nonuniform demands resulted in anearly optimal performance for caching schemes with uncoded prefetching. There,we discussed a hypothesis on why there still exists a performance gap between theinformation theoretic lower bound and the rate of our proposed scheme. In particu-lar, although the placement algorithm we proposed is optimal for the delivery algo-rithm in Algorithm 4, the delivery algorithm itself is suboptimal in the presence ofduplicate demands as was dicussed in Chapter 3. We speculate that pairing up themodified delivery algorithm in [13] and a placement structured similar to the place-ment we derived in Chapter 4 might lead to a scheme that achieves the informationtheoretic lower bound on the rate of coded caching with uncoded prefetching fornonuniform demands. This is a promising future research direction.Another important research direction is the estimation of the request probabil-ities of the files. Most of the works in the literature assume that these probabilitiesare known. This is a strong assumption as the prediction of future request prob-abilities is not a straightforward task in the context of wireless caching. This isbecause the cache hit ratio can be small in practice, i.e., the available data can besparse. In that case, what would be a good learning algorithm for popularity pre-95diction? Also, if popularity prediction cannot be done reliably, how robust are thealgorithms in the literature to such deviations from the assumed popularities? Howcan robustness to estimation errors be incorporated in designing coded cachingschemes?Another consequence of small cache hit ratio in wireless caching is that notall caches in the network would necessarily demand a file in each time step. Tworelevant questions are how such a scenario can be modeled statistically, and whatthe expected delivery rate would be in such a scenario. Further, how would thedesign of caching schemes change if the set of caches active in the systems evolvesover delivery period? We find these questions interesting from a theoretical pointof view and of significant practical importance.96Bibliography[1] Cisco, “Cisco visual networking index: Global mobile data traffic forecastupdate, 2017-2022,” tech. rep., Cisco, 2018. → page 2[2] Cisco, “Cisco visual networking index: Global mobile data traffic forecastupdate, 2015-2020,” tech. rep., Cisco, 2016. → page 3[3] Ericson, “Ericsson mobility report,” tech. rep., Ericson, 2015. → page 3[4] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role ofproactive caching in 5G wireless networks,” IEEE Commun. Mag., vol. 52,pp. 82–89, Aug. 2014. → page 3[5] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire,“Femtocaching: Wireless content delivery through distributed cachinghelpers,” IEEE Trans. Inf. Theory, vol. 59, pp. 8402–8413, Dec. 2013. →pages 3, 19[6] G. Paschos, E. Bastug, I. Land, G. Caire, and M. Debbah, “Wirelesscaching: technical misconceptions and business barriers,” IEEE Commun.Mag., vol. 54, pp. 16–22, Aug. 2016. → pages 4, 20[7] Z. Bar-Yossef, Y. Birk, T. S. Jayram, and T. Kol, “Index coding with sideinformation,” IEEE Trans. Inf. Theory, vol. 57, pp. 1479–1494, Mar. 2011.→ pages 4, 5, 42, 71[8] Y. Birk and T. Kol, “Coding on demand by an informed source (iscod) forefficient broadcast of different supplemental data to caching clients,” IEEETrans. Inf. Theory, vol. 52, pp. 2825–2830, June 2006. → page 5[9] N. Alon, E. Lubetzky, U. Stav, A. Weinstein, and A. Hassidim,“Broadcasting with side information,” in Foundations of Computer Science,2008. FOCS ’08. IEEE 49th Annual IEEE Symposium on, pp. 823–832, Oct2008. → page 597[10] K. Shanmugam, A. G. Dimakis, and M. Langberg, “Local graph coloringand index coding,” in 2013 IEEE International Symposium on InformationTheory, pp. 1152–1156, July 2013. → page 5[11] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEETrans. Inf. Theory, vol. 60, pp. 2856–2867, May 2014. → pages6, 9, 11, 12, 13, 16, 17, 22, 25, 29, 30, 36, 39, 93[12] M. A. Maddah-Ali and U. Niesen, “Decentralized coded caching attainsorder-optimal memory-rate tradeoff,” IEEE/ACM Trans. Networking,vol. 23, pp. 1029–1040, Aug. 2015. → pages12, 14, 15, 17, 18, 19, 22, 25, 26, 27, 29, 36, 42, 67, 68, 70, 73, 87, 93[13] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “The exact rate-memorytradeoff for caching with uncoded prefetching,” IEEE Transactions onInformation Theory, vol. 64, pp. 1281–1296, Feb 2018. → pages15, 17, 22, 23, 33, 64, 65, 82, 89, 95, 121, 123[14] U. Niesen and M. A. Maddah-Ali, “Coded caching with nonuniformdemands,” IEEE Trans. Inf. Theory, vol. 63, pp. 1146–1158, Feb 2017. →pages 17, 18, 37, 42[15] J. Zhang, X. Lin, and X. Wang, “Coded caching under arbitrary popularitydistributions,” IEEE Trans. Inf. Theory, vol. 64, pp. 349–366, Jan 2018. →pages 37, 40, 42, 60, 62, 63, 64[16] J. Hachem, N. Karamchandani, and S. Diggavi, “Content caching anddelivery over heterogeneous wireless networks,” in IEEE Conference onComputer Communications (INFOCOM), pp. 756–764, Apr. 2015. → pages18, 19, 37[17] R. Pedarsani, M. A. Maddah-Ali, and U. Niesen, “Online coded caching,”IEEE/ACM Transactions on Networking, vol. 24, pp. 836–845, April 2016.→ page 18[18] N. Karamchandani, U. Niesen, M. A. Maddah-Ali, and S. N. Diggavi,“Hierarchical coded caching,” vol. 62, pp. 3212–3229, June 2016. → pages17, 18[19] Kai Wan, D. Tuninetti, and P. Piantanida, “On the optimality of uncodedcache placement,” in 2016 IEEE Information Theory Workshop (ITW),pp. 161–165, Sep. 2016. → page 1798[20] K. Wan, D. Tuninetti, and P. Piantanida, “On caching with more users thanfiles,” in 2016 IEEE International Symposium on Information Theory (ISIT),pp. 135–139, July 2016. → page 17[21] K. Shanmugam, M. Ji, A. M. Tulino, J. Llorca, and A. G. Dimakis.,“Finite-length analysis of caching-aided coded multicasting,” IEEE Trans.Inf. Theory, vol. 62, pp. 5524–5537, Oct. 2016. → pages 17, 18, 42, 67, 72[22] K. Shanmugam, A. M. Tulino, and A. G. Dimakis, “Coded caching withlinear subpacketization is possible using ruzsa-szeme´redi graphs,” in 2017IEEE International Symposium on Information Theory (ISIT),pp. 1237–1241, June 2017. → pages 18, 68[23] C. Shangguan, Y. Zhang, and G. Ge, “Centralized coded caching schemes:A hypergraph theoretical approach,” IEEE Transactions on InformationTheory, vol. 64, pp. 5755–5766, Aug 2018. → page 18[24] L. Tang and A. Ramamoorthy, “Coded caching schemes with reducedsubpacketization from linear block codes,” IEEE Transactions onInformation Theory, vol. 64, pp. 3099–3120, April 2018. → page 18[25] M. Cheng, J. Jiang, Q. Wang, and Y. Yao, “A generalized grouping schemein coded caching,” IEEE Transactions on Communications, vol. 67,pp. 3422–3430, May 2019. → page 17[26] M. Ji, A. M. Tulino, J. Llorca, and G. Caire, “On the average performance ofcaching and coded multicasting with random demands,” in Proc. 11thInternational Symposium on Wireless Communications Systems (ISWCS),pp. 922–926, Aug. 2014. → pages 18, 37[27] M. Ji, A. M. Tulino, J. Llorca, and G. Caire, “Order-optimal rate of cachingand coded multicasting with random demands,” IEEE Transactions onInformation Theory, vol. 63, pp. 3923–3949, June 2017.[28] J. Hachem, N. Karamchandani, and S. N. Diggavi, “Coded caching formulti-level popularity and access,” IEEE Transactions on InformationTheory, vol. 63, pp. 3108–3141, May 2017. → page 18[29] M. Mohammadi Amiri, Q. Yang, and D. Gu¨ndu¨z, “Decentralized cachingand coded delivery with distinct cache capacities,” IEEE Transactions onCommunications, vol. 65, pp. 4657–4669, Nov 2017. → page 1899[30] S. P. Shariatpanahi, S. A. Motahari, and B. H. Khalaj, “Multi-server codedcaching,” IEEE Transactions on Information Theory, vol. 62,pp. 7253–7271, Dec 2016. → page 18[31] M. Mohammadi Amiri and D. Gu¨ndu¨z, “Fundamental limits of codedcaching: Improved delivery rate-cache capacity tradeoff,” IEEE Transactionson Communications, vol. 65, pp. 806–815, Feb 2017. → page 19[32] J. Go´mez-Vilardebo´, “A novel centralized coded caching scheme with codedprefetching,” IEEE Journal on Selected Areas in Communications, vol. 36,pp. 1165–1175, June 2018. → page 19[33] Q. Yang and D. Gu¨ndu¨z, “Coded caching and content delivery withheterogeneous distortion requirements,” IEEE Transactions on InformationTheory, vol. 64, pp. 4347–4364, June 2018. → page 19[34] S. Saeedi Bidokhti, M. Wigger, and R. Timo, “Noisy broadcast networkswith receiver caching,” IEEE Transactions on Information Theory, vol. 64,pp. 6996–7016, Nov 2018. → page 19[35] A. Tang, S. Roy, and X. Wang, “Coded caching for wireless backhaulnetworks with unequal link rates,” IEEE Transactions on Communications,vol. 66, pp. 1–13, Jan 2018. → page 19[36] Y. Fadlallah, A. M. Tulino, D. Barone, G. Vettigli, J. Llorca, and J. M.Gorce, “Coding for caching in 5g networks,” IEEE Commun. Mag., vol. 55,pp. 106–113, Feb. 2017. → pages 19, 68[37] J. Liao, K. Wong, M. R. A. Khandaker, and Z. Zheng, “Optimizing cacheplacement for heterogeneous small cell networks,” IEEE CommunicationsLetters, vol. 21, pp. 120–123, Jan 2017. → page 19[38] X. Xu and M. Tao, “Modeling, analysis, and optimization of coded cachingin small-cell networks,” IEEE Transactions on Communications, vol. 65,pp. 3415–3428, Aug 2017.[39] E. Ozfatura and D. Gu¨ndu¨z, “Mobility and popularity-aware codedsmall-cell caching,” IEEE Communications Letters, vol. 22, pp. 288–291,Feb 2018.[40] W. Han, A. Liu, W. Yu, and V. K. N. Lau, “Joint frequency reuse and cacheoptimization in backhaul-limited small-cell wireless networks,” IEEETransactions on Wireless Communications, vol. 17, pp. 6917–6930, Oct2018. → page 19100[41] L. Li, G. Zhao, and R. S. Blum, “A survey of caching techniques in cellularnetworks: Research issues and challenges in content placement and deliverystrategies,” IEEE Communications Surveys Tutorials, vol. 20,pp. 1710–1732, thirdquarter 2018. → page 20[42] M. A. Maddah-Ali and U. Niesen, “Coding for caching: fundamental limitsand practical challenges,” IEEE Commun. Mag., vol. 54, pp. 23–29, Aug.2016. → page 20[43] S. Boyd and L. Vandenberghe, Convex Optimization. New York, NY, USA:Cambridge University Press, 2004. → pages 26, 46, 55, 108[44] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MITPress, 2012. → page 31[45] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching andZipf-like distributions: evidence and implications,” in IEEE Conference onComputer Communications (INFOCOM), pp. 126–134, Mar. 1999. → pages32, 59, 60[46] R. Peck, Statistics: Learning from Data. Cengage Learning, 2014. → page32[47] S. Jin, Y. Cui, H. Liu, and G. Caire, “Structural properties of uncodedplacement optimization for coded delivery,” CoRR, vol. abs/1707.07146,2017. → pages 37, 38, 39, 57[48] A. M. Daniel and W. Yu, “Optimization of heterogeneous coded caching,”CoRR, vol. abs/1708.04322, 2017. → pages 37, 38, 39[49] S. A. Saberali, H. E. Saffar, L. Lampe, and I. F. Blake, “Adaptive delivery incaching networks,” CoRR, vol. abs/1707.09662, 2017. → pages 39, 57[50] S. Sahraei, P. Quinton, and M. Gastpar, “The optimal memory-rate trade-offfor the non-uniform centralized caching problem with two files underuncoded placement,” CoRR, vol. abs/1808.07964, 2018. → pages40, 62, 63, 64[51] J. Hachem, N. Karamchandani, and S. N. Diggavi, “Effect of number ofusers in multi-level coded caching,” in Proc. IEEE Int. Symp. InformationTheory, pp. 1701–1705, June 2015. → page 42[52] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming. SpringerPublishing Company, Incorporated, 2015. → page 49101[53] F. R. Bach, “Structured sparsity-inducing norms through submodularfunctions,” in Advances in Neural Information Processing Systems,pp. 118–126, 2010. → pages 49, 50[54] S. A. Saberali, H. E. Saffar, L. Lampe, and I. Blake, “Adaptive delivery incaching networks,” IEEE Communications Letters, 2016. → page 64[55] G. Vettigli, M. Ji, A. M. Tulino, J. Llorca, and P. Festa, “An efficient codedmulticasting scheme preserving the multiplicative caching gain,” in 2015IEEE Conference on Computer Communications Workshops (INFOCOMWKSHPS), pp. 251–256, Apr. 2015. → pages 68, 69[56] D. A. Mastin, Analysis of approximation and uncertainty in optimization.PhD thesis, Massachusetts Institute of Technology, 2015. Available athttp://hdl.handle.net/1721.1/97761. → pages 69, 73, 117[57] U. Niesen and M. A. Maddah-Ali, “Coded caching for delay-sensitivecontent,” in 2015 IEEE International Conference on Communications (ICC),pp. 5559–5564, June 2015. → page 69[58] R. M. Karp, Reducibility among Combinatorial Problems, pp. 85–103.Boston, MA: Springer US, 1972. → page 72[59] N. C. Wormald, “The differential equation method for random graphprocesses and greedy algorithms,” Lectures on approximation andrandomized algorithms, pp. 73–155, 1999. → pages 74, 75, 113, 114[60] J. Dı´az and D. Mitsche, “Survey: The cook-book approach to the differentialequation method,” Comput. Sci. Rev., vol. 4, pp. 129–151, Aug. 2010. →pages 75, 113[61] P. J. Cameron, Combinatorics: topics, techniques, algorithms. CambridgeUniversity Press, 1994. → pages 82, 122[62] N. C. Wormald, “Differential equations for random processes and randomgraphs,” Ann. Appl. Probab., vol. 5, pp. 1217–1235, Nov. 1995. → page 113[63] S. Yang and R. W. Yeung, “Batched sparse codes,” IEEE Trans. Inf. Theory,vol. 60, pp. 5322–5346, Sep. 2014. → page 113[64] T. Richardson and R. Urbanke, Modern Coding Theory. New York, NY,USA: Cambridge University Press, 2008. → page 113102Appendix AProofs of Chapter 4A.1 Proof of Proposition 3Assume that there exists an optimal placement P∗ of files for which there existdistinct subsets S1 and S2 : |S1| = |S2| such that xnS1 6= xnS2 . Since the pop-ularity of files is identical for the different caches, the delivery rate remains thesame if we use any permutation perm(k) of the cache labels in the placementparameters {xnS}S⊂[K]. We denote the placement over the permuted cache la-bels by P∗perm. More specifically, we relabel cache k to perm(k) for k ∈ [K],and use the placement parameters over the relabeled caches. In particular, if un-der P∗ we have xnS∣∣P∗ = c, then under P∗perm, we set xnperm(S)∣∣P∗perm = c, whereperm(S) = {perm(k) | k ∈ S}. There exists K! permutations of K cache labels,leading to placements P∗perm,i, i = 1, . . . ,K!, all with the optimal delivery rate R∗.103Hence:R∗=1K!K!∑i=1r(x)∣∣P∗perm,i=1K!K!∑i=1∑S:S⊂KS6=∅Ed(maxk∈SxdkS\k∣∣P∗perm,i)=Ed∑S:S⊂KS6=∅1K!K!∑i=1maxk∈SxdkS\k∣∣P∗perm,i≥ Ed ∑S:S⊂KS6=∅maxk∈S1K!K!∑i=1xdkS\k∣∣P∗perm,i = Ed ∑S:S⊂KS6=∅maxk∈Sx¯dkS\k ,where we defined x¯nS =1K!∑K!i=1 xnS∣∣P∗perm,iand used the convexity of the maxfunction. Notice that the RHS is the expected rate when for each file n and subsetS, we use the average of the corresponding placement parameters over all permu-tations of the optimal placement. Because of symmetry, x¯nS has the property thatx¯nS1 = x¯nS2 if |S1| = |S2|. From the facts that i) the LHS is the optimal rate, ii) theplacement parameters x¯nS use the same amount of cache storage as P∗ based oneq. (4.3) and iii) the sum of x¯nS over all subsets S ⊂ [K] is 1, we conclude that therate in the RHS must also be equal to R∗. This implies that there exists an optimalplacement with the property that xnS = xns for all S : |S| = s, which completes theproof.A.2 Proof of Lemma 2Based on Proposition 5, the extreme points of the unit ball fc ≤ 1 are closelyconnected to the set of stable inseparable subsets of [(K − 1)N ] with regard toFc. We first argue that all subsets of [(K − 1)N ] are stable. Consider a set A ⊂[(K − 1)N ]. Augment A with a new object i to get A ∪ {i}. Without loss ofgenerality, let s−1({i}) = sˆ. Since g˜ = {i} belongs to Gsˆ+1 with ηg˜ > 0 andit does not intersect with A, we have Fc(A ∪ {i}) > Fc(A). Hence, any setA ⊂ [(K − 1)N ] is stable with respect to Fc. Consequently, every subset of[N ](s−1)N for s ∈ [K − 1] is also stable.104To find the inseparable sets, consider A ⊂ [(K − 1)N ]. Let Bs = {i ∈ A |s−1({i}) = s}. A necessary condition for A to be inseparable is to have only onenonemptyBs. To show this, partitionA to subsetsBs, s ∈ [K−1]. Notice that eachgroup g˜ ⊂ G˜ is a subset of exactly one [N ](s−1)N , s ∈ [K − 1]. Hence, if two ormore subsets Bs are nonempty, then Fc(A) =∑Bs 6=∅ Fc(Bs) and A is separable.Now, consider the case where only one Bs, say Bsˆ, is nonempty. In this case,A ⊂ [N ](sˆ−1)N and A can only have nonempty intersections with sets g˜ ∈ G˜sˆ+1.Since sˆ ≥ 1, for any partitioning of A to P1, . . . , PJ for some J , there is at leastone group g˜ ∈ G˜sˆ+1 with |g˜| ≥ 2 that intersects with both Pi and Pj for every pairi 6= j. Hence, Fc(A) <∑Ji=1 Fc(Pi). As a result, the set of all stable inseparablesubsets of [(K−1)N ] with regard to Fc isA = {A | A ⊂ [N ](s−1)N , s ∈ [K−1]}.According to Proposition 5, the support of every extreme point of the norm-ballof fc belongs to A. Further, the nonzero entries of the extreme point vector thatcorresponds to A ∈ A is either of ±1/Fc(A). Using Proposition 6:Fc(A) =∑g˜⊂G˜:A∩g˜ 6=∅ηg˜ =∑g˜⊂G˜s−1(A):A∩g˜ 6=∅ηg˜ =K − s−1(A)s−1(A) + 1(1− (1− P (A))s−1(A)+1)(A.1)where we used the facts that 1) for A ∈ A, all entries of A belong to only one[N ](s−1)N , so s−1(A) and g−1(A) are well defined, 2) ηg˜ =K−s−1(g˜)s−1(g˜)+1 pig−1(g˜)s−1(g˜) and3)∑g˜⊂G˜s−1(A):A∩g˜ 6=∅ηg˜ equals the probability of having a demand vector with at mosts−1(A) + 1 distinct files from [N ] that has at least one file in g−1(A). The use ofA and (A.1) in Proposition 5 and scaling the radius of the ball from 1 to t resultsin (4.17).A.3 Proof of Theorem 1To prove Theorem 1, we first prove the following lemma.Lemma 5. The extreme points of the region [fc ≤ t]+ defined by (4.13b) and105(4.13d) are the origin and points of the formtK−ss+1 [1− (1− P (A))s+1]v, (A.2)where vector v ∈ {0, 1}KN , Supp(v) = A, and set A is a subset of [N ](s−1)N foran s ∈ [K − 1].Proof. To obtain the extreme points of [fc ≤ t]+ we begin with the extreme pointsof the norm-ball fc ≤ t and remove the ones that have negative coordinates as theydo not belong to the non-negative orthant. Further, [fc ≤ t]+ has extra extremepoints that result from the intersection of fc ≤ t and planes yns = 0. Norm-ball fcis symmetric w.r.t. every plane yns = 0 and hence the extreme point resulting fromthe intersection of the norm-ball and such a plane will either be an extreme pointof the norm ball or the midpoint of two extreme points of the norm-ball with ynscoordinates of +1 and−1. In the latter case, the yns coordinate of the extreme pointof [fc ≤ t]+ will be 0. Either case, Supp(v) will still be a subset of [N ](s−1)N foran s ∈ [K − 1]. If there is no nonzero entry left in the coordinates of the extremepoint of the intersection, which is the case when all planes yns = 0 intersect, theresulting point is the origin.We now prove Thereom 1.At optimality, we have fc(y˜∗) = t∗, as otherwise the objective can be de-creased by replacing t∗ with fc(y˜∗), which contradicts the optimality of t∗.The objective function (4.13a) calculated at an extreme point of form (A.2)with nonzero parameters for s = so andA is [1+∑n∈g−1(A)(so/K−1+α∗n)γ−Kpnα∗n(K−so)/(so+1)(1−(1−P (A))so+1 ]t,which is a factor of t. Denote the denominator of (A.2) by tu(s,A), i.e., tu(s,A) =K−ss+1 [1 − (1 − P (A))s+1]. Notice that for t = tu(so, A), the extreme point of[fc ≤ t]+ that corresponds to so and A, is of form yns = 1 for s = so, n ∈ g−1(A),and yns = 0 otherwise. These parameters satisfy (4.13b)-(4.13d) and are feasible.106Hence, for any so ∈ [K − 1] and A ⊂ [N ](so−1)N , we have[1+∑n∈g−1(A)(soK − 1 + α∗n)γ −Kpnα∗nK−soso+1(1− (1− P (A))so+1]tu(so, A)≥ t∗+N∑n=1K−1∑s=1[(sK−1+α∗n)γ −Kpnα∗n](yns )∗.Also, the objective function (4.13a) finds its minimum over the set [fc ≤ t∗]+ atone of the extreme points of [fc ≤ t∗]+. The extreme points of [fc ≤ t∗]+ are inthe form of (A.2). Denote the extreme point with the smallest objective by y¯ andits corresponding s and A by s∗ and A∗. Since the feasible set of problem (4.13) isa subset of [fc ≤ t∗]+, we havet∗+N∑n=1K−1∑s=1[(sK−1+α∗n)γ −Kpnα∗n](yns )∗≥[1+∑n∈g−1(A∗)(s∗K −1+α∗n)γ−Kpnα∗nK−s∗s∗+1 (1− (1− P (A∗))s∗+1]t∗.Combining the last two equations concludes tu(s∗, A∗) ≥ t∗ and equivalentlyt∗tu(s∗,A∗) ≤ 1. As a result, the extreme point y¯ also satisfies (4.13c) and is fea-sible to (4.13). Given that it has the smallest objective among the extreme pointsof [fc ≤ t∗]+, it is optimal, i.e., y¯ = y∗.Now, the objective[1 +∑n∈g−1(A∗)(s∗K−1+α∗n)γ−Kpnα∗nK−s∗s∗+1 (1−(1−P (A∗))s∗+1]t∗ is linear in t∗. Sincet∗ ≤ tu(s∗, A∗) and t∗ = tu(s∗, A∗) is achievable, at optimality we either havet∗ = 0 or t∗ = tu(s∗, A∗), depending on the sign of the coefficient of t∗. In the for-mer case, we either have cached all files, i.e., ∀n : (ynK)∗ = 1, (yns )∗ = 0, s < K,or no file is cached at all, i.e., ∀n : (yn0 )∗ = 1, (yns )∗ = 0, s > 1, as in both casesthe rate fc due to delivery of the content cached in at least one cache is 0.In the case of t∗ = tu(s∗, A∗), for s ∈ [K−1] we have (yns )∗ = 1, s = s∗, n ∈g−1(A∗) and (yns )∗ = 0 otherwise. Together with Lemma 1, this concludes that atoptimality (zn)∗ = 1 −∑K−1s=1 (yns )∗ ∈ {0, 1}. Hence, when (zn)∗ = 1 we have(yn0 )∗ = 1 and (ynK)∗ = 0 if Kpn < γ and (yn0 )∗ = 0 and (ynK)∗ = 1 if Kpn ≥ γ.107A.4 Proof of Lemma 3To prove Lemma 3, we first show the following result:Lemma 6. The capacity constraint (4.8b) in RMSC is satisfied with equality atoptimality, i.e., no storage remains unused.Proof. Assume that for storage capacity M , there is an optimal solution y∗ withm(y∗) + N = M , where > 0. Then, construct solution y′ with y′ns = (1 −)yns , s < K and y′nK = (1 − )ynK + . Essentially, y′ splits every file into twoparts of lengths (1 − )F and F . It uses y∗ for the placement of the parts oflength (1 − )F and caches the other F parts on every cache. This uses storage(1 − )m(y∗) + N = M − (M − N) < M , which implies that the storageconstraint is satisfied for y′. However, r(y′) = (1− )r(y∗)+ ×0 < r(y∗). Thiscontradicts the optimality of y∗. Hence, the optimal solution of RMSC must satisfythe capacity constraint by equality.The first property follows from the shadow price interpretation of the Lagrangemultipliers for inequality constraints [43, Section 5.6]. In particular, let denote theoptimal solutions to (4.8) with storage budgets M1 and M2 < M1 by y∗1 and y∗2.Also, let γ∗1 and γ∗2 be the optimal dual parameters. If γ∗1 > γ∗2 , the Lagrangianminimization miny r(y) + γm(y) results in m(y1) ≤ m(y2). Given that nostorage remains unused at optimality, this contradicts the assumption M2 < M1.Hence, we should have γ∗1 > γ∗2 .The second property follows from the fact that the set γ ≥ 0 in the Lagrangianminimization problem (4.9), or equivalently in JRSM, is continuous, while set Y∗(or M) is finite. Hence, a range of values of γ must map to the same storageM ∈M and they are all dual optimal.To prove the third property consider y1,y2 ∈ Y∗ that correspond to two con-secutive storage values M1 = m(y1) and M2 = m(y2). Without loss of gener-ality assume that M1 < M2. Clearly, M 6∈ M for any M ∈ (M1,M2). Now,notice that i) each capacity M must correspond to some γ∗ in the dual problem,ii) for each γ ≥ 0 there is an optimal solution to JRSM in Y∗ and a correspond-ing storage value inM, hence none of those solutions uses an amount of storageM 6∈ M and iii) based on Lemma 6, at optimality all the available storage must108be used, and iv) based on property 1, γ∗ is nondecreasing in M . These pointconclude that the optimal dual parameter for any M ∈ [M1,M2] must belong to{γ∗M1 , γ∗M2}, where γ∗m represents the optimal dual parameter for capacity m andbased on property 1, γ∗M2 ≤ γ∗M1 . More specifically, point (iv) requires γ∗M = γ∗M1for M ∈ [M1,M ′] and γ∗M = γ∗M2 for M ∈ (M ′,M2], for some M1 ≤M ′ ≤M2.However, we must have γ∗M2 = γ∗M1as otherwise any γ∗M2 < γ′′ < γ∗M1 cor-responds to a value of M 6∈ M, which contradicts point (ii). This concludesproperty 3, i.e., for two consecutive values M1,M2 ∈ M,M1 < M2, all capac-ities M1 ≤ M ≤ M2 correspond to the same dual parameter γ∗. Further, wecan derive the optimal dual parameter for m(y1) ≤ M ≤ m˜(y2) as it satisfiesr(y1) + γ∗m(y1) = r(y2) + γ∗m(y2) = L∗. Hence,γ∗ =r(y1)− r(y2)m(y2)−m(y1). (A.3)A.5 Proof of Theorem 2To prove Theorem 2, we consider two cases:Case I (M ∈ M) This case is straightforward because of the zero duality gapin the primal-dual framework established in Section 4.3.2. In particular, vectory∗JRSM ∈ Y∗ with m(y∗JRSM) = M is also optimal to RMSC. 1Case II (M 6∈ M) To derive the optimal solution of RMSC for M 6∈ M, weuse Lemma 3. Let y1,y2 ∈ Y∗ be the solutions corresponding to the two con-secutive storage values m(y1),m(y2) ∈ M such that m(y1) < M < m(y2).Let γ∗ and L∗ be the corresponding optimal dual parameter and Lagrangian value,1A direct proof for optimality of y∗JRSM with m(y∗JRSM) = M for RMSC is as follows. Basedon Lemma 6, the optimal solution of RMSC satisfies the capacity constraint with equality. Now,assume that y∗JRSM is not optimal for the RMSC problem. This means that r(y∗RMSC) < r(y∗JRSM).However, since m(y∗RMSC) = m(y∗JRSM) = M , this implies that r(y∗RMSC) + γ∗m(y∗RMSC) 0 and s2 > 0 and pos-sibly for s = 0. Let n1 and n2 be the largest indexes n with nonzero yns1 andyns2 in the respective two solutions. Notice that since yi ∈ Y∗, having ynsi > 0implies ynsi = 1. We consider two cases of s1 6= s2 and s1 = s2. In the for-mer case, r(y1) = K∑Nn=1 pny1n0 +K−s1s1+1(1 − (1 −∑n1n=1 pn)s1+1), r(y2) =K∑Nn=1 pny2n0 +K−s2s2+1(1 − (1 −∑n2n=1 pn)s2+1) and r(θy1 + (1 − θ)y2)) =K∑Nn=1 pn(θy1n0 + (1 − θ)y2n0 ) + K−s1s1+1 (1 − (∑n1n=1 pn)s1+1)θ + K−s2s2+1 (1 −(∑n2n=1 pn)s2+1)(1− θ) = θr˜(y1) + (1− θ)r(y2). For the case of s1 = s2 = so,2The equivalence results from the fact that if r(yθ;y1,y2) + γ∗m(yθ;y1,y2) < L∗, then y1 andy2 could not be optimal for γ∗, which is a contradiction.110since m(y1) < m(y2), we must have n2 > n1. Hence, for the rates we haver(y1) = KN∑n=1pnyn0 1 +K − soso + 1∑g∈Gso+1pigso+1 maxn∈g ynso1= KN∑n=1pnyn0 1 +K − soso + 1∑g∈Gso+1,g∩[n1]6=∅pigso+1r(y2) = KN∑n=1pnyn0 2 +K − soso + 1∑g∈Gso+1pigso+1 maxn∈g ynso2= KN∑n=1pnyn0 2 +K − soso + 1∑g∈Gso+1,g∩[n2]6=∅pigso+1andr(θy1 + (1− θ)y2) = KN∑n=1pn(θyn0 1 + (1− θ)yn0 2)+K − soso + 1∑g∈Gso+1pigso+1 maxn∈g θynso1+ (1− θ)ynso2= KN∑n=1pn(θyn0 1 + (1− θ)yn0 2)+K − soso + 1∑g∈Gso+1,g∩[n1]6=∅pigso+1(θ + (1− θ))+K − soso + 1∑g∈Gso+1,g∩[n1]=∅,g∩[n2−n1]n1 6=∅pigso+1(1− θ)= θK N∑n=1pny1n0 +K − soso + 1∑g∈Gso+1,g∩[n1]6=∅pigso+1+ (1− θ)K N∑n=1pny2n0 +K − soso + 1∑g∈Gso+1,g∩[n2]6=∅pigso+1= θr˜(y1) + (1− θ)r(y2)where we used the fact that if g ∩ [n1] 6= ∅, then n2 > n1 implies g ∩ [n2] 6= ∅.111This completes the proof of the third feature as we now haver(yθ;y1,y2) + γ∗m(yθ;y1,y2) = θr˜(y1) + (1− θ)r(y2) + γ∗[θm˜(y1) + (1− θ)m(y2)]= θ[r(y1) + γ∗m(y1)] + (1− θ)[r(y2) + γ∗m(y2)]= θL∗ + (1− θ)L∗ = L∗.112Appendix BAdditional Material and Proofs ofChapter 5B.1 Wormald’s TheoremIn this appendix, we present Wormald’s theorem [59, Theorem 5.1] and introducethe notations that we use in the proofs of Theorems 3 and 4.1Consider a sequence of random processes indexed by n, n = 1, 2, . . . .2 Foreach n, the corresponding process is a probability space denoted by (Q(n)0 , Q(n)1 , . . .),where each Q(n)i takes values from a set S(n). The elements of the space are(q(n)0 , q(n)1 , . . .), where q(n)i ∈ S(n). Let H(n)t represent the random history ofprocess n up to time t and h(n)t represent a realization of H(n)t . Also, we denote byS(n)+ the set of all h(n)t = (q(n)0 , . . . , q(n)t ), where q(n)i ∈ S(n), t = 0, 1, . . .. Forsimplicity of notation, the dependence on n is usually dropped in the following.Now, consider a set of variables W1(t), . . . ,Wa(t) defined on the componentsof the processes. Let wi(ht) denote the value of Wi(t) for the history h(n)t . Weare interested in analyzing the behavior of these random variables throughout the1Wormald’s theorem was introduced in [62, Theorem 1]. The most general setting of the theo-rem was established in [59, Theorem 5.1]. A simplified version of the theorem is provided in [60,Section 3], and examples of its application can be found in [59], [60, Section 4], [63] and [64, Sec-tion C.4].2For instance, the different processes can be the outcomes of a procedure implemented on graphswith different numbers of vertices n. Then, each process associates with one of the graphs.113process.Theorem 5 (Wormald’s Theorem [59, THEOREM 5.1]). For 1 ≤ l ≤ a, wherea is fixed, let wl : S(n)+ → R and fl : Ra+1 → R, such that for some constantc0 and all l, |wl(ht)| < c0n for all ht ∈ S(n)+ for all n. Assume the followingthree conditions hold, where in (ii) and (iii), D is some bounded connected openset containing the closure of{(0, z1, . . . , za) : P(Wl(0) = zln, l = 1, . . . , a) 6= 0}for some n, and TD(W1, . . . ,Wa) is the minimum t such that(tn ,W1(t)n , . . . ,Wa(t)n)6∈D.(i) (Boundedness hypothesis) For some functions β = β(n) ≥ 1 and γ = γ(n),the probability thatmaxl=1,...,a|Wl(t+ 1)−Wl(t)| ≤ βconditional upon Ht, is at least 1− γ for t < TD.(ii) (Trend hypothesis) For some function λ1 = λ1(n) = o(1), for all l ≤ a∣∣∣E (Wl(t+1)−Wl(t)|Ht)−fl( tn,W1(t)n, . . . ,Wa(t)n)∣∣∣≤λ1for t < TD.(iii) (Lipschitz hypothesis) Each function fl is continuous and satisfies a Lipschitzcondition on D ∩ {(t, z1, . . . , za) : t ≥ 0}, with the same Lipschitz constantfor each l.Then, the following are true.(a) For (0, zˆ1, . . . , zˆa) ∈ D, the system of differential equations defined bydzldx= fl(x, z1, . . . , za), l = 1, . . . , ahas a unique solution that passes through zl(0) = zˆl, l = 1, . . . , a, whichextends to points arbitrarily close to the boundary of D.114(b) Let λ > λ1 + c0nγ with λ = o(1). For a sufficiently large constant C, withprobability 1−O(nγ + βλe−nλ3/β3)Wl(t) = nzl(t/n) +O(λn)uniformly for all 0 ≤ t ≤ σn and for each l, where zl(x) is the solution in(a) with zˆl = 1nWl(0), and σ = σ(n) is the supremum of those x to whichthe solution can be extended before reaching within l∞-distance Cλ of theboundary of D.Hence, functions zl(x) model the behavior ofWl(nx)n for each n, and the solu-tion to the system of differential equations provides a deterministic approximationto the dynamics of the process.Remark 10. In the statement of the theorem, variables Wl and time t are normal-ized by n. This is because in many applications, this normalization leads to onlyone set of differential equations for all n, instead of different systems for each n.Also, the asymptotics denoted by O are for n→ +∞, and the term “uniformly” in(b) refers to the convergence implicit in the O terms.Remark 11. A version of Theorem 5 also holds when a is a function of n, with theprobability in (b) replaced by 1−O(anγ + aβλ e−nλ3/β3), under the condition thatall functions fl are uniformly bounded by some Lipschitz constant and fl dependsonly on the variables x, z1, . . . , zl. The latter condition is because as n→∞, oneneeds to solve a system of infinite number of differential equations which involvescomplicated technical issues. However, when fl depends only on x, z1, . . . , zl, onecan solve the finite systems obtained for each fl by restricting the equations to theones that involve x, z1, . . . , zl.B.2 Proof of Proposition 7We need to prove that limKN→0 P(X )(E) = limKN→0 P(Xa)(E). We have limKN→0 P(Da)(euv |E \ euv) = P(euv) = q by definition of Da. We prove that limKN→0 P(D)(euv |E \ euv) = P(D)(euv) = q for graph D, which implies that euv is independent ofE \ euv in the KN → 0 regime. This concludes that the joint probability distribution115of the presence of the edges in the two graphs are also identical. In the sequel, allprobabilities correspond to random graph D. Hence, we write P in place of P(D)to minimize notational clutter.As per point (i) of Remark 4, the value of euv is fully determined by the stateof the other edges that originate from u, if fv ∈ F−v. Using the Baye’s rule andthe law of total probability for conditioning on fv ∈ F−v, we have:P(euv|E \ euv) = P(euv, E \ euv)P(E \ euv) (B.1)where:P(euv, E \ euv) = P(euv, E \ euv|fv ∈ F−v)P(fv ∈ F−v)+ P(euv, E \ euv|fv 6∈ F−v)P(fv 6∈ F−v)andP(E \ euv) = P(E \ euv|fv ∈ F−v)P(fv ∈ F−v)+ P(E \ euv|fv 6∈ F−v)P(fv 6∈ F−v).However, |F−v| ≤ K − 1. As a result, P(fv ∈ F−v) ≤ K−1N and limKN→0 P(fv ∈F−v) = 1− limKN→0 P(fv 6∈ F−v) = 0. Hence:limKN→0P(euv|E \ euv) =limKN→0 P(euv, E \ euv|fv 6∈ F−v)limKN→0 P(E \ euv|fv 6∈ F−v)= limKN→0P(euv|E \ euv, fv 6∈ F−v). (B.2)By construction, we have P(euv | E\euv, fv 6∈ F−v) = P(fv ∈ Cu|E\euv, fv 6∈F−v), where Cu is the set of files stored in the cache of vertex u. Since fv 6∈ F−v,based on point (i) in Remark 4, E \ euv affects the probability of fv ∈ Cu byproviding information about the amount of storage used for the files requested bythe other caches. This effect is negligible when KN → 0. To prove this, we derivea lower and an upper bound on P(fv ∈ Cu | E \ euv, fv ∈ F−v) by consideringthe two extreme cases where all the other edges and loops that originate from u are116present and absent, respectively. Assume that the number of distinct files requestedby the vertices other than v is α. This implies α ≤ K − 1. Then, the two extremecases result in the inequalities M−αN−α ≤ P(fv ∈ Cu | E \ euv, fv ∈ F−v) ≤ MN−α .Maximizing and minimizing the upper and the lower bounds respectively over α,we bound the range of P(fv ∈ Cu | E \ euv, fv ∈ F−v) over all possible demandsas M−(K−1)N−(K−1) ≤ P(fv ∈ Cu | E \ euv, fv ∈ F−v) ≤ MN−(K−1) . Taking the limits ofthe lower and the upper bounds as KN → 03 and using M = qN for a fixed q, weget:limKN→0P(euv = 1 | E \ euv, fv 6∈ F−v) = MN= q. (B.3)Eqs. (B.2) and (B.3) imply that in the KN → 0 regime, the presence of an edge orloop is independent of the presence of the other edges and loops in digraph D andeach is present with probability q. This asymptotic model is denoted by Da. GivenDa as the asymptotic model of D for KN → 0, by construction, the undirectedgraph G can be asymptotically modeled by Ga for which every edge is present withprobability q2 and every loop is present with probability q.B.3 Proof of Theorem 3To prove Theorem 3, we use Wormald’s theorem following an approach inspiredby the approach in [56, Section 3.4.1]. Let L be the set of looped vertices of Ga.Also, let Q and U represent the sets of unlooped vertices that are matched andremain unmatched by Algorithm 10, respectively. Since a looped vertex will notbe matched by Algorithm 10, these three sets are disjoint and partition the verticesof Ga.We are interested in the number of looped vertices plus the matchings made byAlgorithm 10:|Q|2+ |U| = K − |U| − |L|2+ |U| = 12(K − |L|+ |U|). (B.4)This is because one coded message is transmitted per each pair of matched vertices3Since K ≥ 1, KN→ 0 implies N →∞.117and one uncoded message is transmitted for each unlooped and unmatched vertex.Based on (B.4), to analyze the statistical behavior of the delivery rate one needsto analyze |U| and |L|. We do this using Theorem 5. For that, we index the onlinematching process of Algorithm 10 by K, which is the number of vertices of Ga.Let us define the two variables L(t) and U(t) on the process. Variable L(t) is thenumber of looped vertices in {v1, · · · , vt−1}. Variable U(t) denotes the number ofunlooped vertices in {v1, · · · , vt−1} that are not matched by Algorithm 10 in thefirst t− 1 iterations. Notice that since U(t) < K and L(t) < K, we set c0 = 1 inTheorem 5.In the following, we verify that the three conditions of Theorem 5 are satisfiedfor the defined variables. Both L(t) and U(t) satisfy the boundedness conditionwith β = 1 and γ = 0, as |L(t+ 1)−L(t)| ≤ 1 and |U(t+ 1)−U(t)| ≤ 1 alwayshold. For the trend hypothesis, we haveE(U(t+ 1)− U(t)|Ht) =0× P(vt looped)− 1× P(vt unlooped and matches at timet)+ 1× P(vt unlooped and does not match at timet)= −(1−q)(1−(1−q2)U(t)) + (1−q)(1−q2)U(t)= (1− q)[2(1− q2)U(t) − 1].andE(L(t+ 1)− L(t)|Ht) = P(vt looped ) = q,where Ht is the history of the process up to time t. Since the derived expec-tations are deterministically true, the trend hypothesis holds with λ1 = 0 andf1(x, z1, z2) = (1 − q)[2(1 − q2)Kz1 − 1] and f2(x, z1, z2) = q, with domainD defined as − < x < 1 + , − < z1 < 1 + and − < z2 < 1 + , > 0.Finally, the Lipschitz hypothesis is satisfied as f1 and f2 are Lipschitz continuouson R2, because they are differentiable everywhere and have bounded derivatives.Since the conditions of the theorem are satisfied, the dynamics of z1 and z2 can118be formulated by the differential equationsdz1(x)dx=(1− q)[2(1− q2)Kz1(x) − 1], z1(0) = 0;dz2(x)dx= q, z2(0) = 0,where the initial conditions result from U(0) = L(0) = 0. Notice that the equa-tions derived are decoupled in z1 and z2, and can be solved independently asz1(x) = − log(2− (1− q2)K(1−q)x)K log(1− q2) , z2(x) = qx.Then, for λ > λ1 + c0Kγ = 0, with probability 1−O( 1λe−Kλ3), we haveU(t) = Kz1(t/K) +O(λK) =− log(2− (1− q2)(1−q)t)log(1− q2)+O(λK),L(t) = Kz2(t/K) +O(λK) =qt+O(λK),uniformly for 0 ≤ t ≤ K.By evaluating the derived expressions for U(t) and L(t) at t = K, and theirrespective substitution in (B.4) for |U| and |L|, Theorem 3 results.B.4 Proof of Theorem 4Let Yi(t) be the number of cliques of size i formed up to iteration t − 1 whenAlgorithm 9 is applied to the side information graph modeled as Ga, where i =1, . . . ,K. To analyze the behavior of these variables, we use the version of Wormald’stheorem that is provided in Remark 11. This is because the number of variables Yidepends on K here.It is straightforward to show that the boundedness hypothesis of Theorem 5 issatisfied with β = 1 and γ = 0, as in each iteration of the algorithm, the numberof cliques of each size either remains unchanged, or decrements or increments by1. Furthermore, for the trend hypothesis, we model the expected change of each119variable throughout the process asE(Yi(t+ 1)− Yi(t)|Y (t))= 0× P(vt looped)− P(vt unlooped and joins a clique of size i at timet)+ P(vt unlooped and joins a clique of size i− 1 at time t). (B.5)Algorithm 9 operates solely based on the edges and loops of the side informationgraph. Hence, the output of the algorithm, i.e., the cliques formed by the algorithmup to time t, is a function of the edges and loops of the subgraph over verticesv1, . . . , vt−1. Since in the asymptotic side information graph Ga every edge andloop is present independently, the output of the algorithm up to time t provides noinformation about the connectivity of vertex vt to the previously arrived vertices.As a result, vt is connected to any previously arrived vertex with probability q2.Hence, with probability (q2)j , vt is adjacent to all the vertices in a clique of size j.Similarly, (1−q2j)Yj(t) is the probability that none of the Yj(t) cliques of size j aresuitable for vt at time t. Therefore, gˆi(Y ) =∆ ∏Kj=i(1− q2j)Yj(t) is the probabilitythat at iteration t, there exists no suitable clique of sizes equal or greater than i forvt. Also,(1−(1−q2(i−1))Yi−1(t)) gˆi(Y ) is the probability that the largest suitableclique for vt has size i− 1, which implies that Yi(t+ 1)−Yi(t) = 1. Finally, sincevt is not looped with probability (1− q), we haveP(vt unlooped and joins a clique of size i− 1 at time t)= (1− q)(1−(1− q2(i−1))Yi−1(t))gˆi(Y ).Using this result and by following the same line of argument for the case where Yi120decrements at time t, (B.5) simplifies toE(Yi(t+ 1)− Yi(t)|Y (t)) = (1− q)×−(1− (1− q2i)Yi(t)) K∏j=i+1(1− q2j)Yj(t)+(1−(1− q2(i−1))Yi−1(t)) K∏j=i(1− q2j)Yj(t) , (B.6)where Y (t) = (Y1(t), . . . , YK(t)).Let fi(x, z1, . . . , zK) be the right hand side of (B.6) with Yi(t) replaced byKzi(x; q).4 Here, we used notation zi(·; q) to show that every zi is parametrizedby q. Then, the trend hypothesis of Wormald’s theorem is satisfied for all variablesYi and functions fi with λ1 = 0. Functions fi are differentiable with boundedderivatives, hence they are Lipschitz continuous. Thus, based on Wormald’s the-orem, we get the system of differential equations in (5.1b) for zi, where the ini-tial conditions result from Yi(0) = 0. Also, for any λ > 0, we have Yi(t) =Kzi(t/K; q) +O(λK) with probability 1−O(Kλ e−Kλ3). Hence, (5.1) results.B.5 Proof of Lemma 4In this appendix, we derive the expressions in (5.5) and (5.6) that we use to eval-uate the expected delivery rate of uncoded caching and the expected rate in [13,Theorem 2].As in [13], let Ne(d) be the number of distinct requests in demand vector d.Since d is a random vector, Ne(d) is a random number in {1, . . . ,K}. Givenplacement P , if the rate of a delivery algorithm AD only depends on Ne(d) and4Notice that fn−l only depends on zn, zn−1, zn−(l+1). Hence, by an argument similar to theone in Remark 11, one can solve for fn−l by restricting the equations to the ones that involve thesevariables.121not the individual files requested in d, then, the expected delivery rate will beEd(RAD) =K∑m=1P(Ne(d) = m)R(d;P, AD | Ne(d) = m). (B.7)Assuming that the popularity distribution of files is uniform, we haveP(Ne(d) = m) =(Nm){Km}m!NK. (B.8)This is because in(Nm)ways one can select m out of N files. There are{Km}waysto partitionK objects intom non-empty subsets, where{Km}is the Stirling numberof the second kind [61, Section 5.3]. Also, there are m! ways to assign one of them selected files to each subset. Hence, there are(Nm){Km}m! demand vectors oflength K with m distinct files from a library of N files. Since the total number ofdemand vectors is NK and they are equiprobable, (B.8) results.Uncoded Caching Consider the cases of uncoded caching where delivery mes-sages are uncoded. For placement every cache either stores the same set of M filesout of the N files in the library, or every cache stores the first M/NF packets ofevery file. These correspond to uncoded file and subfile cachings. The rate re-quired to delivery demand vector d with uncoded messages is Ne(d)(1 −M/N)files. Hence, based on (B.7) and (B.8), we getEd(Runcoded) =K∑m=1(Nm){Km}m!NK×m(1− MN)= N(1− (1− 1N)K)(1− MN), (B.9)where we usedK∑m=1m(Nm){Km}m! = NK+1 −N(N − 1)K . (B.10)The reason (B.10) holds is as follows. Consider a set of K + 1 objects that wewant to partition into m subsets such that the subset containing the first object122has cardinality greater than 1, i.e., the first object is not the only object in thecorresponding subset. This can be done in m{Km}ways as we can partition therest of the (K + 1)− 1 objects into m subsets and then add the first object to oneof the resulting m subsets. Assume that the subsets can be labeled distinctly froma set of N ≥ K labels. Then, there are ∑Km=1m{Km}(Nm)m! ways to partitionK + 1 objects and label them with distinct labels such that the subset containingthe first object has cardinality greater than 1. This sum is equal to the total numberof ways to partition K + 1 objects and distinctly label them with the N availablelabels, minus the number of ways to do the same but have the first object as theonly element in its corresponding subset. The former counts to NK+1, and thelatter can be done in N(N − 1)K different ways. This proves (B.10).Optimal CSC From (B.7), the expected rate in the RHS of [13, eq. (27)] can bewritten asEd(R∗CSC) =K∑m=1P(Ne(d) = m)×[N −MM(1− (1−M/N)m)], (B.11)which by substitution of (B.8), results in (5.5).123