Machine Learning-Assisted CRAN Design with HybridRF/FSO and Full-Duplex Self-BackhaulingbySeyedrazieh BayatiM.A.Sc., Amirkabir University of Technology, 2016B.A.Sc., Sahand University of Technology, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)December 2020© Seyedrazieh Bayati, 2020The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Machine Learning-Assisted CRAN Design with HybridRF/FSO and Full-DuplexSelf-Backhaulingsubmitted by Seyedrazieh Bayati in partial fulfillment of the requirements for thedegree of Master of Applied Science in Electrical and Computer Engineering.Examining Committee:Lutz Lampe, Electrical Engineering, UBCSupervisorCyril Leung, Electrical Engineering, UBCSupervisory Committee MemberiiAbstractThe ever increasing demand for higher data rates, lower latency communication,and a more reliable mobile network has led us toward the 5th generation (5G) ofmobile networks. In 5G, resource allocation is one of the most challenging prob-lems. Conventionally, model-driven methods, and analytical approaches have beenused to allocate resources optimally. Despite accuracy, these methods often resultin a non-convex optimization problem that is inherently challenging to handle andrequire proper convex approximation. To overcome such drawbacks, we need moreefficient resource allocation techniques in the 5G mobile network.This research will study the downlink of a cloud radio access network. Thecloud radio access network enables coordinated beamforming and better inter-ference management in ultra-dense networks. This architecture’s bottleneck isbackhaul capacity restriction limiting the benefits that the cloud radio access net-work offers. We will use hybrid radio frequency and free-space optical links toaddress the backhaul capacity limitation. Also, to improve the throughput and in-crease the spectral efficiency of the radio-frequency links, we propose in-band full-duplex self-backhauling radio units. After formulating the mathematical modeland solving it with analytical approaches, we will introduce a novel solution forthe proposed scenario and show that it outperforms the state-of-the-art half-duplexbackhaul technology provided enough self-interference cancellation under variousweather conditions.We will derive a joint optimization problem to design the backhaul and accesslink precoders and quantizers subject to the fronthaul capacity, zero-forcing, andpower constraints. We will show that this problem is non-convex and computation-ally intractable and approximate it with a semi-definite programming that can beiiieffectively solved by alternating convex optimization. We also employed Com-pute Canada computational resources for solving mentioned semi-definite pro-gramming. The computational complexity of the proposed optimization approachmotivates us to employ machine-learning-based optimization methods that recentlyreceived much recognition in academia and industry. We use supervised and un-supervised deep neural networks for learning the optimal resource allocation strat-egy and achieved 80% of the performance compared to the proposed analyticalapproach with only a fraction of computational cost. To meet all feasibility con-straints of the problem, we also propose customized activation functions and post-processing steps.ivLay SummaryResources in the mobile communication networks, such as power and frequency,must be optimally managed and allocated to devices in order to achieve a goodquality of service. On the one hand, the increase in the number of connected de-vices and the current growing demand for fast Internet connection has made theresource allocation problem one of the focus points in the wireless communicationsystems. On the other hand, the advent of the 5th generation (5G) of mobile net-work technology also has strictly imposed constraints on network resources andcalls attention to the importance of efficient resource allocation in 5G. Achievingeffective mobile communication in ultra-dense metropolitan areas and providinga good quality of service for mobile users with different demands requires a veryefficient resource allocation technique. This thesis investigates the state-of-the-arttechnologies that enable fast and reliable wireless communication and proposes anovel network design for 5G. We introduce a machine learning-based resource allo-cation approach that reduces the computational cost of solving resource allocationproblems compared to the conventional analytical methods.vPrefaceThis dissertation is original, unpublished, independent work by the author Seye-drazieh Bayati.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background and Motivations . . . . . . . . . . . . . . . . . . . . 11.1.1 Cloud Radio Access Network . . . . . . . . . . . . . . . 21.1.2 5G Backhauling . . . . . . . . . . . . . . . . . . . . . . 41.1.3 Free Space Optical communication . . . . . . . . . . . . 51.1.4 RF Self-Backhauling . . . . . . . . . . . . . . . . . . . . 61.1.5 Complexity Crunch in CRAN . . . . . . . . . . . . . . . 71.2 Objectives and Contributions . . . . . . . . . . . . . . . . . . . . 81.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9vii2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.1 Hybrid RF/FSO fornthaul channel model . . . . . . . . . 112.1.2 Full Duplex Radios . . . . . . . . . . . . . . . . . . . . . 122.1.3 FSO Channel Model . . . . . . . . . . . . . . . . . . . . 132.1.4 RF Backhaul Channel Model . . . . . . . . . . . . . . . . 142.1.5 RF Access Channel Model . . . . . . . . . . . . . . . . . 142.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 152.2.1 Backhaul Capacity Constraint . . . . . . . . . . . . . . . 152.2.2 Power Constraints . . . . . . . . . . . . . . . . . . . . . 172.2.3 Joint Optimization Problem . . . . . . . . . . . . . . . . 183 CRAN Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Transformation of the CRAN Optimization Problem . . . . . . . . 193.2.1 Transformation of the Objective Function . . . . . . . . . 193.2.2 Convex Approximation of Capacity Constraint . . . . . . 203.3 Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . 213.3.1 Computational Complexity . . . . . . . . . . . . . . . . . 223.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 223.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Machine Learning-Based CRAN Optimization, a Supervised Ap-proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1 Learning to Optimize . . . . . . . . . . . . . . . . . . . . . . . . 284.1.1 Why and When Should We Use The Neural Network? . . 294.1.2 How An Artificial Neural Network Works? . . . . . . . . 304.2 Data Collection and Pre-processing . . . . . . . . . . . . . . . . . 314.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . 314.2.2 Dealing with Complex-Valued Tensors . . . . . . . . . . 334.2.3 Dealing with High-Dimensional Tensors . . . . . . . . . . 344.3 Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . 344.4 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . 36viii4.5 Accounting for Constraints . . . . . . . . . . . . . . . . . . . . . 374.5.1 Backhaul ZF and Capacity Constraints . . . . . . . . . . . 384.5.2 Enforcing PSD Constraints . . . . . . . . . . . . . . . . . 384.5.3 Enforcing Power Constraint . . . . . . . . . . . . . . . . 394.6 Implementation and Numerical Results . . . . . . . . . . . . . . . 414.6.1 PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Machine Learning-Based CRAN Optimization, an Unsupervised Ap-proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.1 Learning to Optimize with Unsupervised Learning . . . . . . . . 475.1.1 Piece-wise Regularization for Enforcing the Constraints . 485.2 Segmented CRAN Optimization Via Unsupervised Learning . . . 505.2.1 Segmented Optimization . . . . . . . . . . . . . . . . . . 515.2.2 Accounting for Constraints . . . . . . . . . . . . . . . . . 515.2.3 Optimization of V in a Semi-Supervised Manner . . . . . 525.2.4 Optimization of Q and W in an Unsupervised Manner . . 545.2.5 Post-Processing Step . . . . . . . . . . . . . . . . . . . . 555.2.6 Complexity Analysis of DNN . . . . . . . . . . . . . . . 565.3 Numerical Results and Discussion . . . . . . . . . . . . . . . . . 575.3.1 Training Challenges . . . . . . . . . . . . . . . . . . . . 585.3.2 PyTorch Limitations . . . . . . . . . . . . . . . . . . . . 616 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 626.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65ixList of TablesTable 3.1 FSO channel weather parameters [1, Table I]. . . . . . . . . . . 23Table 3.2 Simulation Parameters. . . . . . . . . . . . . . . . . . . . . . 24Table 4.1 Compute Canada resources [2]. . . . . . . . . . . . . . . . . . 32Table 4.2 Data-set dimensionality. . . . . . . . . . . . . . . . . . . . . . 34Table 4.3 DNN architecture in supervised learning. . . . . . . . . . . . . 43Table 4.4 Inputs and outputs of DNNs in supervised learning approach. . 43Table 4.5 Performance of semi-DNN and DNN architectures. . . . . . . 43Table 5.1 Dimensionality of inputs and outputs of unsupervised DNNs. . 58Table 5.2 Architecture of unsupervised DNNs . . . . . . . . . . . . . . . 58Table 5.3 Performance of semi-supervised and unsupervised DNN-basedDL CRAN optimization. . . . . . . . . . . . . . . . . . . . . . 58xList of FiguresFigure 1.1 Illustration of the downlink CRAN system model. The CP isconnected to two RUs with backhaul links and RUs serve fourusers cooperatively via access links. . . . . . . . . . . . . . . 3Figure 1.2 5G CRAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Figure 1.3 Time division and frequency division duplex technologies. . . 6Figure 1.4 Illustration of self-interference concept in separate-antenna full-duplex and shared-antenna full-duplex systems. . . . . . . . . 8Figure 2.1 Illustration of the downlink CRAN. The CP is connected totwo self-backhauling RUs with a hybrid RF/FSO backhaul.The RUs serve four users cooperatively via IBFD RF accesslinks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 2.2 Illustration of noise floor and digital/analog cancellations [3] . 14Figure 3.1 Average sum-rate versus SIC level for different weather con-ditions and RF bandwidths. . . . . . . . . . . . . . . . . . . . 25Figure 3.2 Average sum-rate versus SIC level for HD and FD RUs. TheRF bandwidth is 20 MHz, 40 MHz and 80 MHz, and the threeweather conditions L1, L2 and L3 are considered. . . . . . . . 26Figure 3.3 Average sum-rate for versus the distance of the backhaul linkfor different SIC levels. L2 weather condition. . . . . . . . . 27Figure 4.1 Learning to optimize. Supervised learning train and test phases. 30Figure 4.2 Hidden layers of deep neural network. . . . . . . . . . . . . . 32Figure 4.3 A DNN with three hidden layers. . . . . . . . . . . . . . . . . 33xiFigure 4.4 Flattening the data. . . . . . . . . . . . . . . . . . . . . . . . 35Figure 4.5 The Venn diagram of the constraints. Note that Failing tomeet the capacity constraint and ZF constraint does not makea DNN-generated [V,W,Q] solution infeasible, though affectsthe sum rate. However, failing to meet the Positive Semi-Definite (PSD) and power constraints leads to an infeasible so-lution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Figure 4.6 The proposed DNN structure to ensure Hermitian output. . . . 40Figure 4.7 DNN-based optimization approach. . . . . . . . . . . . . . . 44Figure 4.8 Post-processing of DNN-based solutions and achieved hybridsum-rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Figure 5.1 Learning to Optimize. Unsupervised learning train and testphases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Figure 5.2 Penalizing solutions outside of the feasible set using piece-wise regularization method. . . . . . . . . . . . . . . . . . . 50Figure 5.3 DNN structure of the proposed semi-supervised training methodfor V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Figure 5.4 DNN structure of the proposed unsupervised training methodfor Q and W . . . . . . . . . . . . . . . . . . . . . . . . . . 55Figure 5.5 Proposed unsupervised and semi-supervised DNNs for CRANDL optimization. . . . . . . . . . . . . . . . . . . . . . . . . 56Figure 5.6 Learning rate tuning guide [4] . . . . . . . . . . . . . . . . . 59Figure 5.7 Exploding Gradient Illustration. The graph on the right-hand-side shows exploding gradient effect and instabilities in trainand test errors, we were able to limit the gradient value byclipping, as it is shown in the left graph. . . . . . . . . . . . . 60xiiGlossary4G 4th Generation5G 5th GenerationANN Artificial Neural NetworkBS Base StationsCNN Convolutional Neural NetworkCOMP Coordinated Multi-PointCP Central ProcessorCRAN Cloud Radio Access NetworkCSI Channel State InformationDNN Deep Neural NetworkEMBB Enhanced Mobile BroadbandFD Full DuplexFSO Free-Space OpticalHD Half DuplexIBFD In-Band Full-DuplexLED Light-Emitting DiodeMAE Mean Average ErrorML Machine LearningMMTC Massive Machine-Type CommunicationsxiiiMSE Mean Square ErrorOOK On-Off KeyingPSD Positive Semi-DefiniteRAN Radio Access NetworkRF Radio FrequencyRU Radio UnitSE Spectral EfficiencySI Self-InterferenceSIC Self-Interference CancellationSVD Singular Value DecompositionURLLC Ultra-Reliable Low-Latency CommunicationsZF Zero-ForcingxivAcknowledgmentsI would first like to thank my supervisor, Professor Lutz Lampe, whose expertisewas invaluable in developing the study. His insightful feedback encouraged me topush my boundaries and think sharp.I want to thank my parents, Seyedjafar Bayati and Mahnaz Hajian, and mybeloved siblings, Seyedmohsen and Seyedmaryam for always being there for me.They always encouraged me to be the best me, never defined boundaries for mydreams, and supported me, no matter what.Finally, I could not have completed this dissertation without the loving supportof my best friend and companion, Faramarz Jabbarvaziri, who provided emotionalcomfort as well as pleasant distractions to rest my mind outside of my research.xvChapter 1Introduction1.1 Background and MotivationsNowadays, the importance of a reliable, high-speed mobile network is not hiddenfrom anyone. Especially during the pandemic, the world witnessed the importanceof a reliable mobile network and used 4th Generation (4G) capabilities to cope withthe new situation. Every aspect of human life, such as education, work, and health-care, changed dramatically, and social distancing became possible via online con-nection and mobile connectivity. Based on [5], in 2025, the 5th Generation (5G) ofmobile networks will support more than 2.8 billion subscriptions. Simultaneously,with an increase in data rate requests per user, due to the advances in augmentedreality, online gaming, and high-quality videos, we expect a considerable increasein data rate demands in 5G.5G cellular networks are supposed to support the three main objectives ofmassive machine-type communications Massive Machine-Type Communications(MMTC), Ultra-Reliable Low-Latency Communications (URLLC), and EnhancedMobile Broadband (EMBB). Given these requirements and with the enormous in-crease in the number of devices, data rate demand, and diversity of connected users(from low-throughput, low-power sensors and wearable devices to high-throughputservices such as online gaming), 5G cellular networks should be designed to bemore efficient. It means 5G networks need a more structured and optimized wayof managing resources such as spectrum, capacity, and energy. The Cloud Radio1Access Network (CRAN) concept helps to manage resources more efficiently byproviding an intelligent structure to apply the state-of-the-art techniques [6, 7].To be more specific, considering the limited frequency bands that are alreadyover-utilized, the spectrum’s more efficient use is crucial to accommodate thisrapidly growing demand. One way to improve mobile networks’ spectral effi-ciency is through the coordination of base stations to minimize inter-cell inter-ference. Coordinated beamforming is an effective way to preclude base stationsor Radio Unit (RU)s from causing interference in their neighbors’ coverage areas,and CRAN is a new technology that is proposed to realize this goal. To do so,however, high-throughput backhaul connections are required to link the RUs to aCentral Processor (CP) unit. The data of all users is needed to be shared with eachindividual RU or base station, and it requires that the backhaul capacity be multi-ple times larger than the capacity of uncoordinated RUs. To provide a high-enoughbackhaul capacity, the application of optical connections is considered for the back-haul. Notwithstanding the outstanding performance of the optical fiber solutionsin providing a high-throughput backhaul link, the massive cost of deployment andmaintenance has hindered the industry from utilizing the benefits of coordinatedtransceiver nodes.In recent years, Free-Space Optical (FSO) communication is suggested as acost-efficient solution for accomplishing the requirements of the backhaul of CRAN.In FSO, the optical signal generated by a laser is transmitted through the air, andan optical detector at the receiver converts it to electrical signals. However, FSOis highly sensitive to weather conditions and air quality as an optical signal canbe easily obscured by fog, rain, and dust. Therefore, an RF link is needed as anauxiliary means of communication in poor weather conditions. Such a system iscalled hybrid RF/FSO backhaul, and it is one of the most promising methods forcommercialization of CRAN in 5G [8].1.1.1 Cloud Radio Access NetworkCentralizing and virtualizing the conventional Radio Access Network (RAN) offersdynamic node cooperation ability, which results in coordinated connectivity acrossbase stations Base Stations (BS) and better resource allocation. The CRAN shown2CPRUUEbackhaul linkAccess linkFigure 1.1: Illustration of the downlink CRAN system model. The CP is con-nected to two RUs with backhaul links and RUs serve four users coop-eratively via access links.in Figure 1.1, is a promising network architecture that realizes centralization andvirtualization in the mobile network and can play an essential role in accomplish-ing 5G demands. Also, coordinated connectivity facilitates inter-cell interferencemanagement in the network. This can make the deployment of dense networksfeasible and consequently increases spectral efficiency [6, 9, 10].One of the main bottlenecks of CRAN is the limited backhaul capacity thatconnects the CP to the RU. This limitation can mask the CRAN’s main advan-tages—coordinated beamforming, efficient resource allocation, and interferencemitigation capability [11].Multiple data transmission strategies can be applied in CRANs to deal with thelimited capacity, including the data-sharing method, compression-based method,and a hybrid of them [6, 12–14]. In the data sharing method, each user is assignedto a cluster of RUs, and it’s data is shared with all RUs in that cluster. Then, each3user is served by all RUs of the cluster via a joint beamforming scheme, whichis known as the Coordinated Multi-Point (COMP) technique [12]. The bigger thecluster is, the more cooperation among RUs, and higher spectral efficiency canbe achieved. In a compression-based scheme, the capacity restriction is handledthrough data quantization, which leads to quantization noise [13]. The backhaulcapacity specifies the level to which the data needs to be quantized; therefore, alarger backhaul capacity results in lower quantization noise. According to [15],for medium to the high capacity backhaul links, which is the case for 5G, thecompression-based approach outperforms the data-sharing method. Hence, pro-viding a reliable and high-speed backhaul connection with acceptable quantizationnoise level is a plausible solution to deploy CRAN in a 5G network and meet itshigh spectral efficiency requirement.1.1.2 5G BackhaulingBackhaul refers to the connection between the CP and RUs in the mobile network(see 1.2). The prevalence of devices operating in the RF band and the fact that theRF bandwidth is very limited in capacity and license makes the RF band a bot-tleneck for the CRAN; not to mention the interference as another limiting factorfor reusing radio frequency bands. With the rapid growth of high-speed commu-nication demands, Radio Frequency (RF) no more suffice, and new alternatives arerequired for 5G.Recently the application of FSO communication in CRAN is suggested [11, 16,17]. In particular, it is shown that a hybrid RF/FSO communication link can con-siderably boost the capacity thanks to the large bandwidth of FSO, while offeringacceptable reliability owing to the robustness of connection in RF [8, 18]. In addi-tion, self-backhauling defined as the use of the same RF band for both the accesslink and the backhaul is another way to comply with the limitation of bandwidth inRF [19, 20].Finally, the combination of the FSO and an RF link with self-backhauling canprovide a hybrid backhaul link with high enough capacity and reliability whilecomplying with the limited RF bandwidth. In the next subsections, we elaborateon FSO and self-backhauling RF bands.4Figure 1.2: 5G CRAN.1.1.3 Free Space Optical communicationFree-space optical communication is an alternative means that can overcome RFdrawbacks, although it introduces new challenges. Unlike optical fiber, which is ahigh-speed communication link made from glass or plastic, FSO has much lowerdeployment cost. Easy implementation and low maintenance cost of FSO, comeswith its less reliability and higher outage probability compared to optical fiber.Regardless of the current use cases of FSO, visible light has been one of the firstcommunication tools of humanity used. Starting from Heliograph 1 and telegraphcommunication, visible light communication has been a non-commercial way ofsignaling from the early days. Nowadays, FSO is a transmission method in whichlaser and Light-Emitting Diode (LED)s are used to transmit data. Unlike fiber-optic communication, which is restricted to costly fibers made of glasses, FSO is1A signaling device with sunlight5Figure 1.3: Time division and frequency division duplex technologies.communicated through the air, thus has far less implementation and maintenancecosts. Compared to RF communication, as FSO operates in higher bandwidths, itprovides higher data rate—10 Gbps per wavelength [21]. More importantly, FSOis immune to interference. In clear weather conditions, FSO has less than 1dB/kmattenuation, making it a perfect candidate for communication over high distances[21]. Finally, unlike the RF, FSO benefits from the availability of a license-freespectrum, which is an excellent means for cutting the deployment costs.Despite these advantages, FSO has some concerning drawbacks. The perfor-mance of the FSO is susceptible to weather conditions, and it can experience from5 up to 350 dB/km attenuation in rainy and foggy weather conditions, respectively[22]. Also, as the maximum possible power is limited by safety concerns, moreeffective signal processing is required to overcome the signal attenuation of FSO[23, 24].1.1.4 RF Self-BackhaulingOne way to deal with the limitation of RF bandwidth is frequency reuse. Tradi-tionally, wireless nodes work in a half-duplex manner, meaning that the send andreceive signals use the same band but different times or the same time but differ-6ent frequency bands (see Figure 1.3). Using the same frequency band for both thetransmit and receive simultaneously is referred to as In-Band Full-Duplex (IBFD).Also, as mentioned, using the same frequency for backhaul and access link at thesame time, is called in-band full-duplex self-backhauling. The main limiting fac-tor in IBFD technology is the interference of the transmit signal that masks thereceived signal. Lack of practical interference cancellation techniques held backthe industry to deploy IBFD in practical scenarios. As shown in Figure 1.4, thereis always a Self-Interference (SI) signal from the transmitter to the receiver in bothseparate-antenna and shared-antenna systems [25]. To benefit from the IBFD, theremust be very strong isolation between the transmit and receive signals. The chal-lenge lies in the fact that the transmission power is usually much larger than thereceived power; thus even a small residual interference can mask the received sig-nal.Recent advancements in full-duplex RF communication have nearly doubledRF communication’s Spectral Efficiency (SE), creating the opportunity to enhancethe fronthal capacity. A combination of Self-Interference Cancellation (SIC) meth-ods, e.g., propagation, analog, and digital SIC, provide sufficient isolation that neardouble the spectral efficiency can be achieved in [20, 26–31]. Also, it is shown thatthe Full Duplex (FD) communication scheme operates on a lower power budgetcompared to Half Duplex (HD) scheme [31].The advancements above motivated us to incorporate hybrid RF/FSO schemesand IBFD RF self-backhauling techniques to realize a high-throughput and reliablebackhaul for CRAN.1.1.5 Complexity Crunch in CRANThe concept of CRAN has a large degree of abstraction, meaning that it refers tomultiple modules such as the data-compression module, CP-to-RU beamformer,and RU-to-user beamformer. Each of these modules’ design problem, even indi-vidually, is indeed a time-consuming optimization task. Hence, optimal CRANdesign with several nodes in a dense 5G network poses a significant challenge interms of computational complexity.However, the recent advancements of data-driven Machine Learning (ML) ap-7TxRxTxRxReflected Signaldesired SignalTxRxTxRxFigure 1.4: Illustration of self-interference concept in separate-antenna full-duplex and shared-antenna full-duplex systems.proaches in addressing complicated design problems motivate us to view CRANoptimization in light of these powerful tools. Machine learning approaches, specif-ically Deep Neural Network (DNN), are shown to be able to generalize the patternsand relations hidden in the training data to the new test case enabling researchersto apply them in many different design problems successfully.In this study, we are interested in the great potential of DNN to learn to solveoptimization problems. For instance, [32], [33] propose DNNs that learn how tosolve resource allocation problems. It has been shown that these methods cost lesstime as once trained, a DNN can produce the optimal results with low complexity.This motivates us to study the application of DNNs in 5G CRAN design.1.2 Objectives and ContributionsIt is shown in [15] that CRAN performance can be profoundly affected by the lackof high backhaul capacity. This obstacle can be explained by the added quantiza-tion noise, which affects the received signal’s quality at the user’s side.8Mostafa et al. in [8] employed a HD time-sharing scheme to realize RF self-backhauling in a hybrid RF/FSO backhaul. We also explored a hybrid RF/FSOwith FD self-backhauling in a CRAN architecture to address RF spectrum limi-tations, and unlike [8], we will transmit RF signals at the same time without em-ploying time-sharing. In HD, the backhaul RF capacity is shown to be much morerestricted than FD self-backhauling [34]. Therefore, the compression-based ap-proach performs much better when coupled with FD communication.In this project, we aim at maximizing the downlink user sum-rate under trans-mit power constraints at both the CP and RUs, a Zero-Forcing (ZF) requirement foreliminating the multi-RU interference, and the backhaul capacity limitation.We will adopt a hybrid RF/FSO with IBFD RF link for the backhaul of CRANand consider the joint optimization of backhaul RF beamforming, data-compression,and access link beamforming to maximize the user sum-rate. We will analyticallyderive a closed-form formula for the sum rate and form a non-convex optimizationproblem. We propose a transformation to simplify this problem to a weighted summean-square error (MSE) minimization, which is a semi-definite programming,and adopt an alternating optimization approach to solve it and show its superiorityover the existing hybrid RF/FSO with HD self-backhauling [8].Moreover, we will illustrate that CRAN optimization’s computational com-plexity is quite large and adopt data-driven approaches to overcome this challenge.In particular, we apply deep neural networks in both supervised and unsupervisedmanners to transfer much of the computational complexity to the offline trainingphase. This way, we achieve a CRAN design methodology that outperforms boththe state-of-the-art method proposed by [8] and our own proposed conventionaloptimization in terms of user-sum rate and computational complexity.1.3 OrganizationIn Chapter 2 the definition of the scenario, system model, and formulation of theproblem are presented. In Chapter 3, we propose a semi-definite programmingsolution for CRAN downlink optimization and achieve superior results comparedto the state-of-the-art methods. In Chapters 4 and 5 we present a supervised and anunsupervised machine learning solution, respectively, to reduce the computational9complexity of the optimization task. Finally, the Chapter 6 is dedicated to summaryand conclusions.10Chapter 2Problem Formulation2.1 System ModelWe consider a CRAN network composed of one CP, M RUs, and K users. The CPis assumed to apply the compression-based method [13] to transmit K independentdata streams to users via M RUs, as illustrated in Figure 2.1. The backhaul linksconnect the CP to the RUs with a multi-input multi-output (MIMO) RF link withbandwidth BRF and one FSO link with bandwidth BFSO. The number of antennasfor downlink transmission at the CP is denoted NCP, and the number of transmitand receive antennas at the RU is NRU. It is assumed that the CP has access to thecomplete channel state information of both the backhaul and access links. EachRU acts as an FD relay between the CP and users, and thus, both the access andbackhaul links transmit on the same frequency band simultaneously. The RUslinearly precode and jointly transmit the data symbols intended for each user.2.1.1 Hybrid RF/FSO fornthaul channel modelIn Chapter 1, we discussed the importance of fornthaul link reliability and capac-ity in CRAN architecture and 5G. fornthaul link capacity is the most importantlimiting factor on cooperative beamforming performance and resource allocationperformance in CRAN. Providing a reliable high data rate backhaul link can signif-icantly affect 5G characteristics and is vital to make use of CRAN advantages overRAN structure. In our model, we consider hybrid RF/FSO links in backhaul to11AccessRFBackhaul RFSISICentralProcessorFSOFigure 2.1: Illustration of the downlink CRAN. The CP is connected to twoself-backhauling RUs with a hybrid RF/FSO backhaul. The RUs servefour users cooperatively via IBFD RF access links.address this vital need. As mentioned before, RF and FSO can act complementarybecause of different carrier frequency utilization. High frequency in FSO provideshigh data rate though very dependent on weather condition, and lower frequencybands in RF link provides lower but reliable data rate for backhaul link.2.1.2 Full Duplex RadiosFor many years, in-band full-duplex communication seemed impossible. As An-drea Goldsmith once contended, “It is generally not possible for radios to receiveand transmit on the same frequency band because of the interference that results.”[35].However, recent advancements in designing and manufacturing transceiversthat provide superb analog isolation together with the state-of-the-art digital can-cellation methods that almost wholly cancel the remaining interference, made itpossible to deploy IBFD communication links in the RF band. Bharadia, D et al.proposed the first design and implementation of an IBFD system for WiFi 802.11ac[3]. They propose two main modules to render the self-interference almost negli-gible —1) an analog transceiver circuit that provides 60dB isolation between the12transmit and receive signal and 2) a digital cancellation method that attenuates theremaining interference by an additional 50dB. Achieving 110dB isolation makesit possible to render the self-interference under the noise floor or very close to thenoise floor. Reaching the noise floor practically means that the self-interferencesignal is fully canceled as it is not distinguishable from the receiver’s thermal noise.In practice, the transmit signal has multiple components—the main signal, har-monics caused by hardware nonlinearities, and the transmitter noise. Each of thesesignals requires different cancellation techniques and different attenuation levels toreach the noise floor. For example, in Figure 2.2, we can see the isolation require-ments for a practical WiFi system. The main signal is 20dB, which requires 110dBcancellation to reach the noise floor. The signal harmonics are usually around -10dB, so 80dB cancellation is enough for them, and the transmitter noise is around-40dB, which requires only 50dB cancellation. [3] proposes cancellation methodsfor each of these components, and in their experimental results, they achieve neardouble the spectral efficiency of a conventional WiFi system.In this work, we adopt the IBFD system proposed in [3] for the RF backhaullink; however, we consider and model some residual self-interference to obtain amore robust design. In Section 2.2.1, we provide more details on the IBFD self-interference model.2.1.3 FSO Channel ModelThe most practical modulation considered for modeling FSO channel is On-OffKeying (OOK) modulation, which works by intensity modulation of transmit x f somand received y f som diodes intensity, where xf som ∈ {0,2Pf so} with average transmitpower of Pf so. We can model FSO received intensity in the m-th RU withy f som = hf som xf som +nm, (2.1)where FSO channel of the m-th RU h f som modeled with considering three atten-uation types, atmospheric, scintillation and geometric losses [36]. According to[8], with φ as the divergence angle of FSO, and r as the radius of optical open-ing, the FSO channel gain is h f so = hlhshgR, where hl = eσddfh is the atmosphericloss model, hs = GG(a,b) is the Gamma-Gamma distribution of scintillation and13Figure 2.2: Illustration of noise floor and digital/analog cancellations [3]hg = [erf(√pir/√2d f hφ)]2, in which erf(.) is the error function.2.1.4 RF Backhaul Channel ModelFor modeling backhaul link, we used path loss and antenna gain pattern (hl) andRician fading channel (hs) to model line-of sight (LoS) propagation using linkbudget model. The channel gain is modeled as h = hlhs with 5dB Rician fadingfactor indicating the ratio between the direct and scattered path powers, and line-of-sight path loss exponent of nLoS = 2.5. Also, for all antennas, the antenna gainis assumed to be 3 dB. Link budget calculation is adopted form [37] and channelinformation and the simulation setup are provided in Table 3.2.2.1.5 RF Access Channel ModelSimilarly for downlink link we model channel gain as h = hlh f where hl is pathloss and h f is Rayleigh distribution fading gain. The distance between RUs andUEs assumed to be 10 meters fix and each RU’s power budget is set to 23 dB. Wecalculate path loss with exponent of nLoS= 3.5 in downlink. The system parameters14are summarized in Table 3.2, with the channel specifications and link models as in[8, 37].2.2 Problem FormulationLet s = [s1, . . . ,sK ]T be the vector of normalized Gaussian data symbols intendedfor users {1, . . . ,K}, Wm ∈ CNRU×K denote the precoding matrix of the m-th RUtoward users and qm ∼ CN (0,Qm) denote the quantization noise with covariancematrix Qm at the m-th RU due to data compression. The transmitted signal fromthe m-th RU to the users is denoted by xm ∈ CNRU and can be formulated asxm = Wms+qm. (2.2)In the following sections we will introduce the constraints of our problem in detail.2.2.1 Backhaul Capacity ConstraintIn this subsection, we will compute the minimum rate required for the backhaullinks. Assume that the RU is equipped with an IBFD transceiver, which receivesthe signals from the CP and transmits simultaneously to users in the same RF band.We can formulate the RF received signal at the m-th RU asyrfm = GHmxcp+ Isim+nm, (2.3)where nm ∼ CN (0,σ2n I) is the additive white Gaussian noise, Isi is the resid-ual self-interference, and xcp ∈ CNCP is the transmitted signal by the CP. Also,Gm ∈ CNCP×NRU is flat-fading MIMO backhaul RF channel. The self-interferencesignal is proportional to the transmitted signal, and therefore, the residual self-interference power at the m-th RU is proportional RU’s transmit power. That is,E{Isim(Isim)H}= αPsim, (2.4)where Psim = WmWHm+Qm. We define α as a unit-less constant and it dependson RU’s capacity in suppressing the self-interference [34, 38]. Hypothetical ideal15SIC makes each transmission node mutually orthogonal and results in αPm = 0,while practical IBFD results in residual self-interference [39, 40].The signal components intended for different RUs are separated by precodingat the CP. For this we assume that there is enough distance between the RUs tohave a full rank MIMO channel and that the number of antennas at the CP is morethan the total number of RU antennas, i.e., NCP ≥ MNRU. Then, we can neglectinterference between the signals for different RUs by imposing the zero-forcingconstraint [41]GHj VmG j = 0, j 6= m, (2.5)where Vm is the covariance matrix of the transmitted signal intended for the m-thRU. The corresponding maximum communication rate in the RF link between theCP and the m-th RU can be approximated in the high signal-to-noise ratio (SNR)regime asCRFm (Vm,Qm,Wm)= lndet(I+(αPsim+σ2n I)−1GHmVmGm)≈ ln(det(αPsim+σ2n I)−1 det(GHmVmGm))= lndet(GHmVmGm)− lndet(αPsim+σ2n I),(2.6)and the data rate is BRFCRFm .For the FSO links, it is assumed that commonly used on-off keying (OOK)modulation is applied [36]. The CP is assumed to be equipped with M OOK trans-mitters, and each RU has one optical receiver. For a given BFSO as FSO signalingrate, channel gain hFSO, optical power budget PFSO, and thermal noise and back-ground illumination σ2FSO, the maximum communication rate under OOK modula-tion can be calculated as [8]CFSOm =−∫ ∞−∞p(y) ln p(y)dy− ln(2pieσ2FSO), (2.7)wherep(y) =12√2piσ2(e−y22σ2FSO + e−(y−2hFSOm PFSO)22σ2FSO ). (2.8)16The corresponding data rate in the m-th FSO link is equal to BFSOCFSOm . It is worthnoting that due to the directional nature of FSO, they do not interfere with eachother.The backhaul link between the CP and the m-th RU needs to be able to supportthe transmission of the signal xm in (2.2), which is the quantized version of thesignal xˆm that would ideally be transmitted from the m-th RU in the absence ofbackhaul capacity limitations. The required communication rate to encode xˆm intoxm with covariance matrix Qm of the quantization noise is given by [13]Idatam = lndet(E{xˆmxˆHm}+Qm)det(Qm), (2.9)where xˆm is the uncompressed transmitted signal from the m-th RU in the absenceof backhaul capacity condition. Assuming the same transmission bandwidth forthe RF backhaul and user links, this leads to the capacity constraintsBRFRdatam ≤ BFSOCFSOm +BRFCRFm , ∀m ∈M. (2.10)Thus, we haveBRF lndet(WmWHm+Qm)−BRF lndet(Qm)≤BFSOCFSOm +BRF lndet(GHmVmGm)−BRF lndet(αPsim+σ2I),(2.11)which is non-convex. In chapter 3, we transform this problem to a convex semi-definite problem and employ alternating convex optimization to solve it.2.2.2 Power ConstraintsThe RF beamforming transmitters at both CP and RUs operate under power con-straints that can be written asTr(E{xcpxHcp})= Tr(M∑m=1Vm)≤ Pcp (2.12)Tr(E{xmxHm}) = Tr(WmWHm+Qm)≤ Pm, (2.13)17in which Pcp and Pm are the CP and RUs’ power budgets, respectively.2.2.3 Joint Optimization ProblemGiven the setup described in the previous section, our objective is to maximizethe user’s sum-rate. While the type of objective and the capacity and power con-straints by means of linear precoders and quantizers are alike those considered inthe previous work [8], because of the FD self-backhauling, we are facing a dif-ferent objective function and capacity constraints. In particular, our optimizationproblem is formulated asmaximizeVm,Wm,QmRsum (2.14a)s.t. (2.5),(2.11),(2.12),(2.13) Vm < 0,Qm < 0, (2.14b)where, (.) < 0 stands for positive semi-definiteness. The objective function is theweighted sum of maximum communication rate of usersRsum = BRFK∑k=1γk log2(1+|hHk wk|2∑l 6=k |hHk wl|2+hHk Qhk+σ2n), (2.15)wherehk = [hT1,k, . . .hTM,k]T, wk = [wT1,k, . . .wTM,k]T (2.16)and hm,k and wm,k are channel gain and beamforming vectors from the m-th RUto the k-th user, respectively, and Q = diag(Q1, . . .QM). Also, γk is the weight ofk-th user. This problem is non-convex and difficult to solve. In general, there isno guarantee for finding an optimal solution for this problem. However, to dealwith the non-convexity, in the next chapter we first transform this problem intoa weighted sum-mean-square error (MSE) minimization problem as suggested by[8, 42], and then approximate the capacity constraint by a convex subset. We willshow in the next section that an alternating convex optimization can be applied tofind a sub-optimal solution for this problem.18Chapter 3CRAN Optimization3.1 IntroductionIn the previous chapter, we formulated the joint optimization problem of down-link CRAN with hybrid RF/FSO with IBFD RF. We focused on the joint designof access and backhaul precoders (W,V), and access link quantizer (Q) in orderto maximize the users sum-rate, considering power and ZF constraints and back-haul capacity limitation. This optimization problem is, however, non-convex. Thischapter will show that this non-convex problem, which is hard to optimize, can betransformed into semi-definite programming and an alternating convex optimiza-tion algorithm can be used to solve it.3.2 Transformation of the CRAN Optimization Problem3.2.1 Transformation of the Objective FunctionIt has been shown in [42] that maximizing the non-convex sum-rate objective func-tion is equivalent to a weighted sum-MSE minimization problem. By doing thistransformation, (2.15) can be written as the following semi-definite objectiveR′=K∑k=1γkβk(|gk|2(hHk (WWH+Q)hk)−2Re(g∗khHk wk)), (3.1)19whereW = [w1, . . . ,wK ]. (3.2)For a fixed set of optimization variables, we havegk =hHk wkhHk (WWH+Q)hk+σ2, (3.3)in which gk is the scalar linear receive filter applied by k-th user and the corre-sponding optimal MSE weight is βk = 1/Ek, where Ek = E{|g∗kyk − sk|2}. Thisobjective function is convex with respect to the individual optimization variables.3.2.2 Convex Approximation of Capacity ConstraintAnother source of non-convexity in the problem formulation is (2.11). Following[8], we deal with the non-concave function on the left-hand side of the inequalityby employing conjugate function definition and Fenchel’s inequality:lndet(WWHm+Qm)−1 ≥−Tr(Zm(WWHm+Qm))+ lndet(Zm)+NRU,(3.4)for some positive definite NRU×NRU matrix Zm. It has been shown in [8] that withZm = (WmWHm+Qm)−1, we can replace the non-convex inequality (2.11) with−Tr(Zm(WWHm+Qm))+ lndet(Zm)+NRU ≥− BFSOBRFCFSOm − lndet(GHmVmGm)+ lndet(αPsim+σ2n I)− lndet(Qm).(3.5)The term lndet(αPsim+σ2n I) on the right hand side of (3.5) is non-convex. To dealwith this term consider that the power of loop-back interference is less that or equalto the RU’s transmit power, (i.e. Tr(WmWHm+Qm)≤ Pm). For simplicity, we resortto the worst case scenario with the residual self-interference power being equal toαPm. Thereby, we achieve a lower bound on the maximum achievable user’s sum-20rate. According to [34] and [43], we assume that Isi ∼ CN (0,σ2siI), and thusTr(σ2siI) = αTr(WmWHm+Qm)≤ αPm (3.6)therefore, σ2si ≤ αPmNRU . So in the worst case scenario we will haveαPsim =αPmNRUI. (3.7)Then we can write the inequality (3.5) as−Tr(Zm(WWHm+Qm))+ lndet(Zm)+NRU ≥− BFSOBRFCFSOm − lndet(GHmVmGm)+ lndet(αPmNRUI+σ2n I)− lndet(Qm).(3.8)This inequality provides a convex set with respect to the design parameters,Wm,Vm,Qm, when Zm is fixed and vice-versa. We can easily show that the in-equality (3.8) is a subset of the non-convex inequality (3.5), therefore, it will notcause infeasible solutions.3.3 Optimization AlgorithmFor a given Zm the following problem is a convex semi-definite programming andcan be solved using alternating convex optimization [8]:maximizeVm,Wm,QmR′ (3.9a)s.t. (2.5),(2.12),(2.13),(3.8),Vm < 0,Qm < 0 (3.9b)In this method, for fixed Zm, gk, and βk, we optimize Wm,Vm, and Qm via alter-nating approach until convergence. An efficient algorithm is introduced in [8] for adifferent scenario to solve the inner loop in [8, Algorithm 1]. We employ a similarapproach, which for clarity is summarized in Algorithm 1.21Algorithm 1Input: SIC level, Output: Rsum1: Initialize Wm,Qm:to satisfy the RUs power constraints and to calculate gk for the first time [8, (31),(32), (33)]2: Set i= 0 do3: Calculate gk from (3.3) and βk = 1/Ek,4: Calculate Zm = (WmWHm+Qm)−15: Update Wm,Qm,Vm via (3.9),6: Calculate Risum via (2.15),7: Update i= i+1while (Risum−R(i−1)sum > ε);3.3.1 Computational ComplexityIt is worth noting that the optimization problem in (3.9) is a convex semi-definiteprogramming and using the interior-point method it has the worst-case computa-tional complexity of O(max{n,m}4√n log( 1ε )), where n is the number of variables,m is the number of constraints, and ε is the solution accuracy [44]. Hence, withthe assumption of ε = 0.01 and complex-valued multiplication, the computationalcomplexity of Algorithm 1 is at least O(45n4.5).Notwithstanding the advantages of the proposed optimization for CRAN, com-putational complexity is still a significant challenge in realizing practical CRANnetworks. Hence, we incorporated machine learning techniques to overcome thesechallenges. Chapter 4 is dedicated to the application of deep neural networks inCRAN optimization.3.4 Numerical ResultsIn this section, we provide numerical results from system simulations to exam-ine our algorithm and compare its performance against the state-of-the-art time-division hybrid RF/FSO method proposed in [8]. In the simulations, the samesetup as in [8] is adopted. The CRAN scenario uses M = 2 RUs each equippedwith NRU RF transmit and receive antennas and one optical receiver. The CP nodeis equipped with NCP = 10 RF transmit antennas and two FSO transmitters, eachdedicated to one RU. We consider K = 4 users, and all have the same contribution22Table 3.1: FSO channel weather parameters [1, Table I].Atmospheric loss GG parametersID Weather σd (dB/km) Turbulence (a,b)L1 clear 0.43 Strong (8.05, 1.03)L2 Haze 4.2 Moderate (2.23, 1.54)L3 Fog 20 Weak (17.13, 16.04)in our weighted sum-rate objective function (2.15). According to [26], we considerSIC levels of between 45 dB to 113 dB for IBFD at the RUs. Three weather con-ditions are investigated for the FSO channel, referred to as L1, L2, and L3, andshown in Table 3.1. In the following, we show results as a function of the SIC levelto account for different IBFD solutions.First, Figure 3.1 shows the sum-rate of the proposed approach as a function ofSIC under different weather conditions and various RF bandwidths. The most sta-ble performance occurs in the L1 weather condition where FSO is fully functional.We further observe that for this weather condition and RF bandwidths of 20 and 40MHz, the sum-rate does not change much with the SIC level. This is because theFSO link provides sufficient backhaul capacity and highly accurately SIC is not re-quired. In case of the 80 MHz bandwidth, we note that the extra backhaul capacityprovided by the RF link improves the user’s sum-rate even in the L1 weather con-dition. This emphasizes the importance of hybrid RF/FSO for very high data-ratecommunication. Next, in Figure 3.2 we highlight the benefits of the FD transmis-sion compared to HD considered in [8] in terms of sum-rate. For the HD case,we show the results achieved under the best time allocation between RF backhauland user links. We observe that the proposed FD-based method outperforms theHD transmission given a sufficient SIC level. The intersection point between therespective curves is at fairly benign SIC levels of 60-70 dB, and independent ofthe RF signal bandwidth. Better weather conditions (i.e., L1 vs L2 and L2 vs L3)render the FD solution beneficial at lower SIC levels, as the FSO link can providemore of the backhaul capacity in the hybrid RF/FSO system.Finally, Figure 3.3 demonstrates the interplay between signal attenuation in thebackhaul link due to CP-to-RU distance and SIC levels. Clearly, highly effective23Table 3.2: Simulation Parameters.Parameter Symbol ValueNumber of antennas at the CP NCP 10Number of RUs M 2Number of antennas at RU NRU 4Number of users K 4Distance from the CP to the RUs dfh 1 kmDistance from the RUs to the users ddl 100 mParameters of the FSO linksParameter Symbol ValueTransmit power of FSO transmitter PFSO 10 dBmFSO signaling rate BFSO 1 GbaudDivergence angle of the laser beam φ 2 mradRadius of the receiver aperture r 10 cmResponsivity of the photodetector R 0.5 A/WNoise variance at the receiver σ2FSO 10−13A2Parameters specific to the RF backhaul linkParameter Symbol ValueTransmit power of the CP Pcp 33 dBmBreakpoint distance for the FH link dfhbreak 100 mLine-of-sight pathloss exponent nLoS 2.5Rice factor (Rician fading factor) Kr 5 dBAntenna gains for the FH link (GCP,GRU) (3dBi, 3dBi)Parameters specific to the RF downlinkParameter Symbol ValueTransmit power of the mth RU Pm 23 dBmBreakpoint distance for the downlink ddlbreak 10 mNon-line-of-sight pathloss exponent nnLoS 3.5Antenna gains for the downlink (GRU,GMU) (3dBi, 3dBi)RF parametersParameter Symbol ValueCarrier frequency fc 3.6 GHzBandwidth of the RF signal BRF 20,40,80 MHzNoise power spectral density N0 -170 dBm/HzNoise figure of the receivers NF 7 dB2440 50 60 70 80 90 100SIC (dB)200400600800100012001400160018002000Rsum(Mbps)L1L2L320MHz40MHz80MHzFigure 3.1: Average sum-rate versus SIC level for different weather condi-tions and RF bandwidths.interference cancellation in IBFD at levels of 90 dB or more provides fairly robustsum-rate performance as the backhaul link distance increases. On the other hand,systems with low SIC levels experience significant degradations due to the effectof the RU transmit signal on its received signal in the more attenuated RF backhaullink.3.5 ConclusionIn chapters 2 and 3, we studied the problem of IBFD hybrid RF/FSO in a CRANarchitecture containing multiple RUs and users. We formulated the problem of de-signing the CP and RU beamforming and quantization vector at RU as a sum-ratemaximization problem. We approximated the derived non-convex optimizationproblem using a semi-definite convex optimization problem by manipulating ob-jective function and capacity constraint. We solve this problem by alternating con-vex optimization and provide a lower-bound for the user’s sum-rate. The proposedmethod was simulated under different weather conditions and bandwidths, and itis shown to outperform the state-of-the-art time-division hybrid RF/FSO approach2540 50 60 70 80 90 100SIC(dB)0500100015002000Rsum (Mbps)BRF = 80 MHz40 50 60 70 80 90 100SIC(dB)020040060080010001200Rsum (Mbps)BRF = 40 MHz40 50 60 70 80 90 100SIC(dB)0200400600800Rsum (Mbps)BRF = 20 MHzL3-FD L2-FD L1-FD L3-HD L2-HD L1-HDFigure 3.2: Average sum-rate versus SIC level for HD and FD RUs. The RFbandwidth is 20 MHz, 40 MHz and 80 MHz, and the three weatherconditions L1, L2 and L3 are considered.260.5 0.75 1 1.25 1.5 1.75 2dfh(km)0200400600800100012001400160018002000R sum(Mbps)50dB60dB70dB80dB90dB100dB20 MHz40 MHz80 MHzFigure 3.3: Average sum-rate for versus the distance of the backhaul link fordifferent SIC levels. L2 weather condition.provided sufficient SIC.27Chapter 4Machine Learning-Based CRANOptimization, a SupervisedApproach4.1 Learning to OptimizeFor years, a combination of analytical modeling and heuristic numerical approachesplayed a vital role in designing and optimizing wireless communication systems.However, this method is not as effective in solving more complicated problems thatalso are time-critical. With the rise of 5G and the unprecedented increase in thenumber of nodes and users in the network and strict low-latency requirements ofmany 5G applications, heuristic optimization benefits are curbed. In recent years,many scientist across different fields have started integrating the machine learn-ing techniques in complicated optimization problems [45–47]. This trend is alsoseen in telecommunications, e.g., spectrum sensing [48], relay selection in high-mobility networks [49], anomaly detection for security in RAN [50], resource al-location in 5G [51], MIMO beamforming [52, 53] and channel estimation [54, 55].In this study, we employed DNNs to solve the optimization problem introducedin the previous section.284.1.1 Why and When Should We Use The Neural Network?In case the nature of an optimization problem requires constantly re-solving theoptimization for different sets of input parameters, knowing a mapping from theset of parameters to the optimal solution would be extremely helpful. This way,instead of running an optimization algorithm online, to find the optimal solutionfor each input parameter, we merely need to feed the parameters to the mappingfunction and obtain the solution with low computational complexity. Consider thefollowing unconstrained optimization problemminxf (x; p), (4.1)where p is the set of parameters of the problem. If p varies quite often, insteadof solving this problem numerically each time, we would rather find the mappingbetween the optimal point (x∗) and p. In such a case, a neural network can betrained in an unsupervised or supervised manner to find the mentioned mapping.In the CRAN optimization problem (3.9) in chapter 2, we jointly optimizedthree variables—access-link linear precoders (W = [W1, ...,WNRU ]), backhaul lin-ear precoder (V = [V1, ...,VNRU ]), and CP quantizer (Q = [Q1, ...,QNRU ]), in orderto maximize the weighted sum-rate of the users. The optimal solution of our prob-lem is a function of backhaul and access link’s Channel State Information (CSI).Since CSI is a fast-changing parameter in wireless communication, the aforemen-tioned optimization task requires to be constantly re-solved upon any change inCSI. Hence, in this problem finding the mapping from the CSI to the optimal W,V, and Q via a DNN is very helpful. Figure 4.1 shows the train and test phases ofthis method.In case we can successfully train this neural network, we can overcome thecomputational complexity of the semi-definite programming problem (3.9) anddrastically reduce the required time to find the optimal solution. One major issue,however, in our case, is the presence of multiple constraints in the optimizationproblem. We must enforce the output of the DNN to produce solutions that lie inthe intersection of the RU power constraints, CP zero-forcing constraint, backhaulcapacity limitation, and positive semi-definiteness of V and Q matrices and at thesame time accomplish an acceptable performance in terms of users sum rate.29Figure 4.1: Learning to optimize. Supervised learning train and test phases.4.1.2 How An Artificial Neural Network Works?An Artificial Neural Network (ANN) is a computing system that consists of con-nected nodes forming multiple linear layers, each of which equipped with a non-linear activation function. The structure of an ANN is illustrated in Figure 4.3.It can approximate complicated functions given a large enough data set, enoughnumber of layers and proper selection of parameters and hyper-parameters. Inthe training phase, the ANN learns the features of input data and finds the lineartransformation required to generate the next set of features to feed the next layer.The activation function provides the non-linearity required to increase the model’sflexibility. If X is the input, and Y is our label vector, using supervised learningnotation form [56], we have:X =xT1xT2...xTnn×d, Y =y1y2...ynn×1, ∆=δ1δ2...δkk×d. (4.2)30In order to handle the bias-variable or y-intercept, and take into account thenon-linearity of model, linear latent-factor features of input, zi = ∆xi, should gothrough a non-linear function called activation function, whereZ =zT1zT2...zTnn×k, L=l1l2...lkk×1. (4.3)Using linear model L, ANN model will make a predictionyi = LTh(zi) (4.4)that minimizes the loss function. This process happens at all hidden layers. Figure4.2 illustrates one hidden layer of a neural network. During the training phase,ANN will repeatedly update the L and ∆ jointly, until it gets to the output layer.Each neuron can detect a portion of feature and as the data flows in the layers,deeper layers can detect a combination of features. By definition, an ANN withmore than two hidden layers is a deep neural network.4.2 Data Collection and Pre-processing4.2.1 Data collectionIn supervised learning, we need labeled training data to train the network. Gen-erally, such labeled data can be obtained either from mathematically solving thecomplicated problem many times or from a real mobile network or a combinationof two.In our case, the labeled data is a large set of backhaul and access link CSIsas input features and the corresponding optimal solutions obtained from CRANoptimization as input labels, gathered from solving the heuristics optimization ap-proach introduced in the previous chapter. Considering the large computationalcomplexity of solving the semidefinite programming (3.9) introduced in chapter 2,the time required for gathering a large-enough data-set is very long.31δkdδ11lkl1 l2 l2xidxi3xi2xi1zikzi3zi2zi1h(zik)h(zi3)h(zi2)h(zi1)yiFigure 4.2: Hidden layers of deep neural network.Table 4.1: Compute Canada resources [2].Nodes CPU (GHz) RAM storage number of runs run-time56 2 x Intel @ 2.1 125GB variable 2 x 480G SSD 20000 3 weeksTo recap, the computational complexity of the numerical algorithm 1 we usedfor solving the CRAN optimization (3.9) is O(2n4.5). To provide enough computa-tional resources for solving this algorithm tens of thousands of times, we used theCompute Canada CPU resources. Besides, to make the algorithm faster, we usedthe parallel processing capability of MATLAB® using parfor that “executes for-loop iterations in parallel on workers in a parallel pool” [57], which reduced theamount of processing time considerably. For details on computational resourcesand run-time refer to Table 4.1.32Output layerHidden layersInput layerFigure 4.3: A DNN with three hidden layers.4.2.2 Dealing with Complex-Valued TensorsOne of the primary challenges of using deep learning in our problem is dealingwith complex numbers. In our problem five complex-valued tensors—backhaulCSI, access-link CSI, backhaul precoding matrix, access-link precoding matrix,and CP quantization noise covariance matrix are involved. In Table 4.2, the detailsof the size of these tensors are provided.In neural networks, one way to deal with complex-valued tensors is to separatethe real and imaginary parts and stack them or concatenate them to end up withreal-valued tensors that DNNs can use for training. Although such a pre-processingpotentially can obscure the relation between real and imaginary parts of the databy treating them as unrelated information in general, this relation can be learned inthe case of fully connected DNNs. This pre-processing has been broadly used indata-driven telecommunication articles [58–60].In the case of the Convolutional Neural Network (CNN), stacking or concate-nating the real and imaginary parts of the tensors can cause information loss asit breaks some existing geometrical relations and creates new ones. To preventthis, a new type of DNNs, called deep complex neural network, is proposed by33Table 4.2: Data-set dimensionality.Parameter Raw Complex Data Flatten Real DataW 20000×NRUtx ×NRUtx ×M 2×20000× (NRUtx NRUtx M)Q 20000×NRUtx ×NRUtx ×M 2×20000× (NRUtx + NRUtx (NRUtx −1)2 M)V 20000×NCPtx ×NCPtx ×M 2×20000(NCPtx + NCPtx (NCPtx −1)2 M)CSIDL 20000×NRUtx ×K×M 2×20000× (NRUtx NRUtx M)CSIFL 20000×NCPtx ×NRUtx ×M 2×20000× (NCPtx NRUtx M)[61]. However, in this study we only use fully-connected DNNs, thus we use theconventional real-imaginary concatenation method.4.2.3 Dealing with High-Dimensional TensorsTo feed the neural network with the data gathered from our problem, we need toflatten both the input features and input labels, as shown in Figure 4.4. Flatteningthe data for training fully connected DNNs is easy, as the reshaping order doesnot matter. However, to be able to use the output of the neural network, we needa function that recovers V, W, and Q from their flat real vector. Therefore, it isimportant to use a transformation that can undo itself in the reverse direction.There are three standard reshaping orders for flattening tensors—C, A, and F1.Among these orders, only the order F has this property. Meaning that, for example,if we reshape a 3D tensor W of size a×b×c to a vector of size 1×abc with orderF and then reshape it again to the size a×b×c, we get the same W. However, it isnot true for order C and A.4.3 Neural Network ModelDNNs and CNNs are among the most frequently used models in multi-antennasystem designs. Although DNN is a more general approach that targets a broader1According to [62]: “C means to read/write the elements using C-like index order, with the lastaxis index changing fastest, back to the first axis index changing slowest. ‘F’ means to read/writethe elements using Fortran-like index order, with the first index changing fastest, and the last indexchanging slowest. ‘A’ means to read / write the elements in Fortran-like index order if the array isFortran contiguous in memory, C-like order otherwise.”34Figure 4.4: Flattening the data.range of applications, CNN is specialized in learning geometrical patterns and re-lations.There are no geometrical relations or local inter-dependencies in backhaul andaccess link CSI matrices in our problem. That is because there is no actual semanticrelation between adjacent rows or columns, and just by re-ordering the indices ofthe antenna ports, we can obtain a new equivalent representation of the CSI matrix.In other words, changing the order of rows or columns of a CSI matrix does notadd or remove any information.Hence, a fully connected DNN is a more appropriate option for our problemcompared to a CNN. More elaborate details about the structure of the DNN arediscussed in Section 4.4 ,and the post-processing steps required for enforcing con-straints will be provided in Sections 4.5 and 4.5.2.35Figure 4.5: The Venn diagram of the constraints. Note that Failing to meetthe capacity constraint and ZF constraint does not make a DNN-generated [V,W,Q] solution infeasible, though affects the sum rate.However, failing to meet the Positive Semi-Definite (PSD) and powerconstraints leads to an infeasible solution.4.4 Performance MetricsAchieving an appropriate DNN architecture for our problem requires several pa-rameters and hyper-parameters tuning steps. Some of the parameters and hyper-parameters to be designed in a DNN are the number of hidden layers, the activa-tion functions, batch size, initialization types, number of epochs, number of nodesin each layer, optimization configuration such as the optimizer, learning rate, dropout rate, regularization type, and value, etc. Deciding on parameters and hyper-parameters types and values is a decision based on the the DNN’s performance. Inmost DNNs, the train and validation errors and their slopes are useful in determin-ing both the parameters and hyper-parameters. Though, in our case, the loss valueis not the only factor to be considered. The loss function measures how close weare to the input labels in the data-set. However, it does not guarantee the feasibilityof the solution and can not give us a good sense of the sum-rate performance in ourcase. As we get closer to the input labels, more predictions satisfy the constraintsand better performance we get, though we should observe all performance metricsto decide on a proper DNN architecture.36During the training, our implicit goal is to maximize the user sum-rate given in(2.15), so one performance metric is the amount of sum-rate achieved by the neuralnetwork. Also, since we cannot guarantee that the DNN-generated solutions cansatisfy the constraints, another performance metric is the ratio of feasible solutionsto all DNN-generated solutions and the amount of violation from the feasible set.Only all of these metrics together can measure our network’s performance. Withproper observation and evaluation of all performance metrics together with the lossvalues, we can land on a good DNN architecture that leads to a low validation error,good performance, and feasible predictions.As discussed before, we cannot make sure all the predictions are feasible. Inthe next section, we will discuss the post-processing steps needed to enforce con-straints in our predictions.4.5 Accounting for ConstraintsOur goal is to train a neural network that takes the backhaul and access link CSImatrices and produces V, W, and Q such that, first, satisfies the power, ZF, ca-pacity, and PSD constraints and second, maximize the user sum-rate formulated in(3.9).In this chapter, we approach this problem with a supervised learning method. Inother words, we will train a neural network that behaves, just like the optimizationmethod we introduced in chapter 3. This way, by learning to produce a solution set[V,W,Q] per CSI that is similar to the solution set achieved by the optimizationalgorithm for the same CSI, it indirectly learns to maximize the sum rate.However, the presence of multiple constraints in this problem causes manychallenges. By mimicking the characteristics of optimal solutions in the data-set,we cannot ensure that all constraints will be met. Figure 4.5 illustrates this by aVenn diagram. Failing to meet the ZF constraint causes interference on RUs, whichaffects the backhaul capacity and, consequently, the sum-rate, but it does not makea DNN-generated solution infeasible. The same is true for the backhaul capacityconstraint; if the sum-rate surpasses the backhaul capacity, it causes congestion,and the final sum-rate becomes capped by the backhaul capacity, but it does notmake a solution infeasible. However, failing to meet the power constraints on V37and W or the PSD constrain on V and Q makes a [V,W,Q] solution infeasible. Toovercome these challenges, we adopt various techniques for each constraint anddiscus them in the next sections.4.5.1 Backhaul ZF and Capacity ConstraintsIn supervised learning, the DNN implicitly learns how to avoid large interferenceas all of the input labels that are provided for training satisfy the ZF constraint;however, since we cannot guarantee that the DNN can meet the ZF constraint, weconsider the adverse effects of the residual interference in terms of user sum rate.Failing to satisfy the ZF constraint causes interference GHmVkGm that affectsthe RF backhaul capacity as follows:CRFm (Vm,Qm,Wm) = lndet(I+(GHmVkGm+αPsim+σ2n I)−1GHmVmGm), k 6= m(4.5)The larger the GHmVkGm becomes for k 6= m, the more interference it causes andfurther limits the RF backhaul capacity and consequently mitigates the user sumrate by causing congestion.However, as indicated in Figure 4.5, violating the PSD constraint of V and Qand power constraint of V and W makes a solution infeasible and useless. Hence,we enforce the PSD and power constraints through the structure of the DNN.4.5.2 Enforcing PSD ConstraintsA positive semi-definite matrix should be Hermitian. To enforce the Hermitianproperty, we employ two separate DNN’s to generate the diagonal and upper trian-gular elements of the covariance matrices Qm and Vm as seen in Figure 4.6. Forexample, for the matrixQm =q11 q12 q13 q14q21 q22 q23 q24q31 q32 q33 q34q41 q42 q43 q44 , (4.6)38we first decompose it into two arrays of diagonal Qdiagm = [q11, q22, q33, q44] andupper-triangular Qupperm = [q12, q13, q14, q23, q24, q34] and then train a separateneural network for each vector. This way, the Hermitian property of the generatedVm and Qm are enforced. The structure of this neural network is illustrated inFigure 4.6.To ensure the positive semi-definiteness of the DNN-generated Vm and Qmmatrices, after training, we pass the generated Vm or Qm matrices to a customizedfunction that approximates the generated tensors by their closest semi-definite ma-trices. To do so, after the training, we first combine Qdiagm and Qupperm to get Qm. Inthe mentioned activation function, we perform eigen-value decomposition on QmasQm = ∆Σ∆H, (4.7)where, Σ is a diagonal matrix containing the eigen-values of Qm. Then we approx-imate Σ with a Σ′ with all negative elements replaced by 0 and re-construct thePSD-approximated of Qm as follows:Q′m = ∆Σ′∆H. (4.8)This way we enforce the PSD constraint. The same structure can be applied to Vmmatrix to enforce the PSD constraint.4.5.3 Enforcing Power ConstraintTo enforce the Wm and Qm to meet the power constraints after training the neuralnetwork we employ a customized function in the output layer of the neural networkto first normalize the vector and then scale its power to the maximum allowedpower as suggested by [60]. So the customized activation function of the outputlayers areWscaledm = Wm√Pm−|Tr(Qm)||Tr(WmWHm)|, (4.9)andQscaledm = QmPm−|Tr(WmWHm)||Tr(Qm)| . (4.10)39Figure 4.6: The proposed DNN structure to ensure Hermitian output.For backhaul power constraint, we use a similar technique but since the powerconstraint is on the∑m Tr(Vm), we need to keep the ratio of transmit power betweendifferent m before and after the scaling. For m= 2, the pre-scaling power ratio isαv =Tr(V1)Tr(V2)=Tr(Vscaled)Tr(Vscaled). (4.11)Assuming ΣMm=1Tr(Vm) = Pc, a proper power scaling that enforces the backhaul RFpower constraint and keeps the power ratio isV′1 = V1αvPc(1+αv)Tr(V1), (4.12)V′2 = V2Pc(1+αv)Tr(V2). (4.13)With these post-processing actions, we can make sure all DNN predictions are inthe feasible set. In the next section, we will summarize the DNN performance andarchitecture.404.6 Implementation and Numerical Results4.6.1 PyTorchPyTorch is one of the most powerful machine learning libraries used to create arti-ficial neural network models. It is recognized for its descriptive documentation andactive developer community. We used four packages in PyTorch to implement ourneural network—PyTorch DNN model, automatic differentiation to compute gra-dient, PyTorch loss to compute loss function and PyTorch optimizer to optimize themodel parameters. All of these packages can be customized and incorporated withuser-defined modules if there is a specific demand in the problem that cannot bemet by these packages (as we do so in our unsupervised learning model in Chapter5).During the training phase, the neural network tries to minimize loss throughparameter optimization and gradient calculation is an essential part of the process.In PyTorch the data-type of the inputs and outputs of the neural network as well asthe wights and biases are tensor. To calculate the gradient of the loss function withrespect to the weights and biases, we can activate the auto-grad attribute of theweight and bias tensors. This way, when we do an operation in the forward path,PyTorch forms a computational graph which automatically calculates the partialgradients and using the chain rule, obtains the gradient vector to be used in thebackward path.4.6.2 ResultsIn this subsection, we present the performance results of the proposed supervisedlearning approach for solving the CRAN optimization problem. Each DNN canpredict one optimization variables in a specific weather condition and SIC level.For each weather condition and SIC level, we gathered a data-set containing 20000labeled samples, including 16000 training samples and 4000 validation samples.The main difference between each scenario is the availability of the FSO linkbased on weather conditions and backhaul RF link based on the SIC level. Thehyper-parameters of the DNN are listed in Table 4.3 and input and output sizes inTable 4.4.41We divided the training data set into 80 batches of size 200 and used batchgradient and RMSprop2 for training. We used Mean Average Error (MAE) andMean Square Error (MSE) loss functions as illustrated in Table 4.3 for differentnetworks. A combination of drop out and L2 regularization techniques is used tosolve the over-fitting problem in training phases. Furthermore, we observed a gra-dient explosion at the final stages of training and solved it via gradient clipping.To evaluate the final sum-rate performance, we used separate DNNs to train V, Qand W. Each of these DNNs can be used to predict one of the optimization vari-ables. The model evaluation can be done in two ways. One approach is to combineeach DNN-generated variable with the optimal results obtained in Chapter 3 andcalculate the sum rate (let’s name it Semi-DNN sum-rate) as illustrated in Figure4.8. We can compute this metric for each DNN separately. To evaluate the overallperformance of all DNNs together, we can compute the sum-rate performance bypassing the output of all three DNNs to evaluate the sum-rate (let’s name it DNNsum-rate) as shown in Figure 4.7. The results of both approaches are summarizedin Table 4.5 and associated DNN design parameters and input/output size are de-tailed in Tables 4.3 and 4.4. As it is seen from Table 4.5, the overall performance islimited by the performance of W prediction. We believe that the mandatory post-processing that we do on the DNN-generated W, Q and V after training to satisfythe PSD and power constraints affects the performance. In the case of V, any smalldivergence of the optimal covariance matrix leaves us with an inter-RU interfer-ence in the backhaul, which decreases the backhaul’s capacity and caps the finalsum-rate. In the case of Q and W, training the DNN was quite challenging and ourbest guess is that, compared to V, the variables Q and W have a more complicatedbehavior and it is more difficult to predict them.Computational Complexity ComparisonConsidering the variables’ size in our system model, the computational complexityof model-based approach introduced in chapter 3 is O((45(2× (10+4+4))4.5)≈2According to [63], In RMSprop “the effective learning rate is divided by square root of gradientaverage (lrrmsprop = lr√v+ε ), where ε is an added term to improve the numerical stability of algo-rithm and v = 0.9×MeanSquare(w, t−1)+0.1(σE/σw(t))2, which keeps a moving average of thesquared gradient for each weight.”42Table 4.3: DNN architecture in supervised learning.Variable Loss Layers & nodesW MSE 64, 100, 100, 200, 200, 200, 100, 100, 64Vdiag MAE 160, 200, 200, 100, 50, 20Vtri MSE 160, 200, 300, 400, 400, 400, 400, 300, 200, 180Qdiag MAE 64, 100, 50, 25, 15, 8Qtri MSE 64, 100, 80, 40, 30, 24Table 4.4: Inputs and outputs of DNNs in supervised learning approach.Parameter Input & Dimension Output & DimensionW CSIDL : 20000×64 W : 20000×64Vdiag CSIFL : 20000×160 Vdiag : 20000×20Vtri CSIFL : 20000×160 Vtri : 20000×180Qdiag CSIDL : 20000×64 Qdiag : 20000×8Qtri CSIDL : 20000×64 Qtri : 20000×24O(1010). However, the computational complexity of forward path in a DNN isO(nm1 +∑l−1i=1 mimi+1 +mlk), where n is the number of inputs, l is the numberof hidden layers each one with m1,m2, ...,ml neurons and k is the number of out-puts. Considering the five DNNs introduced in Tables 4.3 and 4.4, and the av-erage 50% dropout, the overall computational complexity of our supervised data-driven methods is O(152800/2)+O(98000/2)+O(908000/2)+O(13145/2)+O(19520/2)≈O(5×105).This huge complexity difference between model-based and data-driven opti-mization, becomes more severe in the next generation of Mobile network wherethere will be denser network and more networked devices.Table 4.5: Performance of semi-DNN and DNN architectures.Variable Semi-DNN sum-rate DNN sum-rateW 75%73%V 82%Q 99%43Figure 4.7: DNN-based optimization approach.44Figure 4.8: Post-processing of DNN-based solutions and achieved hybridsum-rate.45Chapter 5Machine Learning-Based CRANOptimization, an UnsupervisedApproachNotwithstanding the low computational complexity of the supervised DNN-basedCRAN optimization introduced in Chapter 4, the performance is limited to theperformance of the numerical optimization obtained in Chapter 2. In fact, evenreaching to the performance achieved via numerical optimization in Chapter 2 isvery difficult and requires arbitrarily large number of epochs and tedious hyper-parameter tuning in the supervised manner.In addition, the presence of constraints and structural regularization proposedto enforce the constraints (see Figure 3.8), contributed to the inferior sum rateperformance achieved in Chapter 4. That is because, the ways we enforced theseconstraints are not necessarily the most optimal solutions. For example, enforcingthe PSD constraint by changing the eigen-values of the DNN-generated V and Qcan potentially through the solution arbitrarily far away from the optimal solution.All in all, the sum rate performance achieved in Chapter 4 through the super-vised learning is acceptable especially considering the large reduction in compu-tational complexity. However, it seems that employing an unsupervised learningmethod that is not bounded by the existing solutions in Chapter 2 can potentiallyachieve a better performance while maintaining a low computational complexity.46Hence in this chapter we study the application of unsupervised learning for CRANoptimization.5.1 Learning to Optimize with Unsupervised LearningTo enhance the sum-rate performance of the DNN-based solution we can use un-supervised learning to let the DNN explore beyond the existing solutions (inputlabels) that is considered the ground truth and discover superior solutions in termsof sum-rate while maintaining a low computational complexity. In summary, thebenefits of the unsupervised learning in CRAN optimization are• No need for solving the optimization task (3.9) thousands of times to gener-ate the input labels• Can potentially surpass the performance achieved in Chapter 4In this approach, instead of minimizing the conventional MSE/MAE loss function,we set the negative of the objective function equal to the loss function and train aDNN to minimize it [64–66]. Assume that x∗ = argminx f0(x, p) is the optimizationproblem we want to solve. Similar to the supervised method, we design a DNN asfollows:x∗ = DNN(p), (5.1)and the loss isL = f0(x, p). (5.2)As an alternative, some employed reinforced learning (RL) to address similar prob-lems. A 2016 study by Kie Li, et.al, uses RL to solve unconstrained continuousoptimization problems [45]. In addition, [46, 47] use RL to solve unconstraineddiscrete combinatorial optimization problems.However, what has been missing in all of the mentioned works is the con-straints. To the best of our knowledge, a method for learning to optimize an ob-jective function under some generic constraints does not exist that ensures meetingthe constraints except for the power limitation constraint in beamforming that isusually addressed by a customized activation function that caps the norm of the47Figure 5.1: Learning to Optimize. Unsupervised learning train and testphasesDNN’s output [60]. In this chapter, we introduce a generic method for enforcingany inequality or equality constraints to a DNN-based optimization method.Assume that P is the set of all input features, i.e., p ∈P and X ∗ is theset of all corresponding outputs of the neural network, i.e., x∗ ∈X ∗. We wantthis DNN to take samples of p as the input and generate the corresponding x suchthat it minimizes the the objective function f0(x, p) under the set of the constraintsfi(x, p)≤ ci and hi(x, p) = bi.5.1.1 Piece-wise Regularization for Enforcing the ConstraintsIn this novel method, we propose to set the loss function of the DNN equal to theobjective function and add penalty terms to the objective function as follows:L (x∗, p) = f0(x∗, p)+∑iI( fi(x∗, pk)− ci)+∑jI(h j(x∗, pk)−b j) (5.3)48where, I is defined as belowI(x) =0 , x≤ 0∞ , x> 0 . (5.4)In each epoch, our target would be to minimize the mean of the loss, i.e.,L (X ∗,P)=1|P| ∑k∈PL (x∗k , pk). This way, we ensure that the constraints are met as the indica-tor function disposes of infeasible solutions by sending the value of the loss to in-finity. Nonetheless, since the gradient of ∑i I( fi(x∗, pk)−ci)+∑ j I(h j(x∗, pk)−b j)is always zero, the neural network cannot learn anything with this loss.A tweak for this issue would be to use penalty terms with non-zero gradient1to penalize the infeasible solutions instead of using the I function. Our target is tomake sure that the loss value outside of the feasible set becomes large enough suchthat no infeasible solution can minimize theL .Let’s define the “deviation from feasibility” for inequality constraints as fol-lowsFi(x∗, p) =0 , fi(x∗, p) ≤ ciηi( fi(x∗, p)− ci)γ , else (5.5)and for equality constraintsH j(x∗, p) =0 ,h j(x∗, p) = b jη j|h j(x∗, p)−b j|γ , else . (5.6)where, ηi ≥ 0 are hyper-parameter that tunes the effect of the regulating terms, andγ ≥ 1. Note that η j must be large enough to dominate L outside of the feasibleset. Now we define the penalty term as followsΩ(x∗, p) =∑iFi(x∗, p)+∑jH j(x∗, p), (5.7)In this method, for a solution x∗, if none of the constraints get violated, Ωbecomes 0 and there is no penalty, otherwise there is a penalty for each constraint1Gradient of the multiplicative penalty term with respect to the weights and biases of the neuralnetwork should be non-zero so that the neural network can learn how to avoid infeasible solutions49Figure 5.2: Penalizing solutions outside of the feasible set using piece-wiseregularization method.that is violated proportional to the amount of violation (see Figure 5.2). We definethe per-sample loss function as followsL (x∗, p) = f0(x∗k , pk)+Ω(x∗k , pk) (5.8)and the mean loss as followsL (X ∗,P) =1|P| ∑k∈P( f0(x∗k , pk)+Ω(x∗k , pk)) (5.9)5.2 Segmented CRAN Optimization Via UnsupervisedLearningUsing the method introduced in Section 5.1 we can formulate our constrained opti-mization problem and solve through unsupervised learning DNNs. However, train-ing a large DNN that outputs all optimization variables—V, Q, and W can be quitechallenging in practice. Larger networks are more prone to over-fit and hyper-parameter tuning can be very difficult and time-consuming.505.2.1 Segmented OptimizationTo overcome the tuning challenge mentioned in Section 5.2, we segment the prob-lem of joint optimization of [V,Q,W], to a segmented optimization tasks. As wehave a two-hop communication network consisting of backhaul and access linksthat connect CP, RU’s, and UE’s, it also does make sense to separate backhauland access optimization problems and maximize different objectives, i.e., back-haul capacity and user-sum rate in other optimization problems. We use a DNNfor optimizing V, and another DNN for optimizing Q and W. For training V, theobjective is to maximize the backhaul capacity while satisfying the ZF, PSD, andpower constraints. For training W and Q, the objective is to maximize the user sumrate while satisfying the power and PSD constraints.Besides, in an unsupervised manner, the customized loss function adds a hugecomplexity cost to training of our neural network as the network will calculate thederivation of the customized loss function in each epoch instead of the MSE orMAE functions which have a less complicated gradient calculation compared toa customized function like equation (5.9). Thus separating networks and havingseparate loss functions for various optimization variables makes the optimizationprocess more manageable and tuning steps easier.5.2.2 Accounting for ConstraintsFor power constraints we will use the piece-wise regularization proposed in sec-tion 5.1.1 so that unlike the power scaling method used in Chapter 4, the constraintcan get involved in the training process and influence the gradient. The main bene-fit of the proposed piece-wise regularization is that it allows the neural network tolearn how to stay within the allowed boundaries of the problem and not violate theconstraints.In addition to the piece-wise regularization method for imposing the powerconstraint, we will hard-wire the DNN’s structure to satisfy the ZF and PSD con-straints without applying any regularizing terms. We will show in Section 5.2.3and 5.2.4 for the first time, that applying a proper customized activation functionwe can impose these constraints.We will not enforce the backhaul limitation constraint during training Q and51W as we do not want to limit our results to supervised labels and the derived ca-pacity from supervised manner. We will discuss more about capacity limitationenforcement in post processing step.5.2.3 Optimization of V in a Semi-Supervised MannerFor training V, because of self-interference, the backhaul capacity also dependson Q, and W. So we use input labels of [Q,W], while training V (see equation(2.6)). That is why we call this training method semi-supervised. We have threeconstraints in optimizing V —PSD, ZF, and power constraints. Imposing ZF con-straint eliminates multi-RU interference as follows:GHmVkGm = 0 for k 6= m (5.10)The ZF constraint could not be guaranteed in the supervised CRAN optimizationin Chapter 4. Also, to impose the PSD constraint, we resorted to manipulatingthe Vm and Qm and approximated them by a PSD matrix which caused consider-able performance loss. However, in unsupervised learning we can guarantee thesatisfaction of both ZF and PSD constraints by designing a customized activationfunction at the output layer of the DNN.Assume that the frounthaul beamforming vector to m-th RU is Bm, then wehave Vm = BmBHm. To ensure that Vm is always PSD, the DNN can output Bmin the output layer and a customized activation function can take Bm and returnVm = BmBHm in the output layer of the DNN. However, this way the generatedmatrix Vm does not satisfy the ZF constraint. To satisfy the ZF constraint thematrix Bm must satisfyG˜HmBm = 0 ∀m ∈M, (5.11)whereG˜m = [G1, G2, . . . Gm−1, Gm+1, . . . , GM]. (5.12)It means that Bm must be chosen from the null-space of G˜Hm. According to [41],if G˜m = UmΣmYHm is the Singular Value Decomposition (SVD) of G˜m, where Ym =[Y(1)m , Y(2)m ], the columns of Y(2)m form an orthogonal basis for the null space ofG˜m. Hence, we can use Y(2)m as a basis for the backhaul beamforming matrix Bm to52Figure 5.3: DNN structure of the proposed semi-supervised training methodfor Vsatisfy the ZF constraint between the backhaul RF links.Therefore, using the matrix Y(2)m as the basis of the beamforming we defineBm = Y(2)m Am where Am is a variable matrix that can be optimized for maximizingthe backhaul capacity. Now consider a DNN that receives the channel matricesG, [G1,G2, ...,GM] and returns A, [A1,A2, ...,AM] matrices.A = DNN(G) (5.13)For simpler representation, we ignored the tensor flattening and tensor recov-ering modules here. For this DNN we define the activation function of the outputlayer of the DNN as follows:Vm =(Y(2)m Am)(Y(2)m Am)H(5.14)This activation function receives Am and returns Vm as according to (5.14). Fi-nally, the customized loss function of the proposed DNN for optimizing V can bedesigned based on the piece-wise regularization method proposed in Section 5.1.1.Since in the backhaul our target is to maximize the capacity and we have a power53constraint according to (5.8) we have the per-sample loss asLV =−NRU∑m(BFSOCFSOm +BRFCRFm)+ΩV, (5.15)where CFSOm is achieved in (2.7) and CRFm is achieved in (2.6) andΩV =0 ,∑m Tr(Vm) ≤ Pcpη(∑m Tr(Vm)−Pcp)γ , else (5.16)The structure of the proposed DNN is illustrated in Figure 5.3. As it is seen, duringthe training the optimal values of Q and W are fed to the customized loss functionbut V is provided by the DNN. Again, in this type of training since some of theinput labels are used and some are not, we name it semi-supervised learning. Also,in this problem we use γ = 2 and tune η during the training.5.2.4 Optimization of Q and W in an Unsupervised MannerFor maximizing users sum-rate we propose an unsupervised DNN that received thedownlink CSI as input and outputs Q and W. To meet all the constraints we proposea special DNN architecture as discussed below. The matrix Qm must satisfy thePSD constraint. Hence, similar to Vm, our DNN returns an auxiliary variable Smand then use a customized activation function to calculate Qm as followsQm = SmSHm. (5.17)This way, the output of the DNN is always PSD. Both Qm and Wm must satisfy thebackhaul capacity and power constraint as indicated in (2.10) and (2.13). We willenforce the capacity constraint in post-processing step by considering congestionfor values that violate the capacity constraint. Using the piece-wise regularizationmethod introduced in Section 5.1.1 the customized per-sample loss function of thisDNN can be defined as followsL =−NRU∑mRdatam +Ω, (5.18)54Figure 5.4: DNN structure of the proposed unsupervised training method forQ and Wwhere, Ω is the piece-wise penalty term corresponding to the power constraint andis achieved as followsΩ=0 ,∑m Tr(Qm) ≤ ∑m(Pm−Tr(WmWHm))η(∑m Tr(Qm)−Pm+Tr(WmWHm))γ , else .(5.19)The structure of the proposed DNN is illustrated in Figure 5.4. In this problem weuse γ = 2 and tune η during the training.5.2.5 Post-Processing StepPenalizing loss function proportional to the amount of violation from constraints,as discussed in Section 5.2.1, can push the DNN to produce values that satisfyconstraints, but it cannot guarantee that. So we still need the post-processing stepto enforce all predictions to lie in the feasible set. Besides, for constraints thatinvolve multiple variables, it is better to enforce them in the post-processing step.For example, to enforce the capacity constraint, if the sum-rate value derived fromunsupervised optimization of Q and W, violates the backhaul capacity derivedfrom semi-supervised optimization of V, we will consider congestion in the post-processing step. This way the minimum of backhaul capacity and sum-rate will beconsidered as the final sum-rate. Note that we do not need to modify the achievedW and Q to meet the capacity constraint, as in downlink if the capacity of backhaulis less than the capacity of the access link, users will experience a throughput less55Figure 5.5: Proposed unsupervised and semi-supervised DNNs for CRANDL optimization.than their capacity but no data loss will happen.Enforcing the power constraint in unsupervised manner is slightly differentfrom the supervised method applied in Chapter 4, because now both W and Qshould be adjusted together to meet the constraint. There are multiple ways toenforce the power constraint—1) by splitting the power budget between outputs ofthe DNN (W and Q) using a certain ratio that can be pre-calculated based on theexisting labels achieved from the conventional optimization 2) using either W or Qas provided by the DNN and adjust the other one to satisfy the constraint. We triedboth and concluded that the latter provides superior performance. In particular weadapt Q as provided by DNN and use formula (4.9) to scale W. The post processingstep and the overall semi-supervised approach is illustrated in Figure 5.5.5.2.6 Complexity Analysis of DNNThe complexity of training a neural network that has n inputs, l hidden layers eachone with m1,m2, ...,ml neurons and k outputs with back-propagation algorithm af-ter Ne epochs and Ns samples is O(NeNs(nm1+∑l−1i=1 mimi+1+mlk)). However, inthe forward path it is only O(nm1+∑l−1i=1 mimi+1+mlk).The beauty of the proposed scheme is that it takes an enormous chunk of thecomputational complexity offline. This enables the forward path to deliver solution56with a low computational complexity compared to online optimization algorithmssuch as Interior point method which is often used for non-convex optimizationsproblems. Interior point method has the worst-case computational complexity ofO(max{n,m}4√n log( 1ε )), where n is the number of variables, m is the number ofconstraints, and ε is the solution accuracy [44].5.3 Numerical Results and DiscussionIn this section we present the performance results of the proposed semi-supervisedlearning approach for solving the CRAN optimization problem. We consider thesame scenario in Section 4.6, i.e., specific weather condition and SIC level, is con-sidered to gather data and train the DNN. Also, we used the same data-set gatheredin Chapter 4, i.e., for each weather condition 20000 labeled samples including16000 training samples and 4000 validation samples.The DNN structure of both methods are listed in Table 5.2 and input and outputsizes are shown in Table 5.1. We divided the training data set into 80 batchesof size 200 and used batch gradient and RMSprop for training. To get the finalsum rate performance results, we used a semi-supervised DNN to obtain V and anunsupervised DNN to obtain Q and W as explained in Sections 5.2.3 and 5.2.4. Fortesting, we take the two trained DNNs and in forward path generated the optimal V,Q, and W and pass them to (2.15) to evaluate the sum rate performance. Table 5.3illustrates the semi-DNN performance and DNN sum-rate of the proposed scheme.Results show that the semi-supervised learning approach for optimizing backhaulcapacity, outperforms the conventional analytical approach used in Chapter 3 by30%.Overall, the sum-rate achieved by unsupervised method introduced in this chap-ter outperforms the supervised method (Chapter 4) by 6% and achieves 79% of theanalytical method.Computational Complexity ComparisonConsidering the two DNNs introduced in Tables 5.1 and 5.2 , the overall com-putational complexity of our unsupervised data-driven methods is O(80040) +O(101800) ≈ O(181× 103), which is 2.7 times faster than the supervised in for-57Table 5.1: Dimensionality of inputs and outputs of unsupervised DNNs.Parameter Input & Dimension Output & DimensionW,Q CSIDL : 20000×64 W,S : 20000×128V CSIFL : 20000×160 A : 20000×200Table 5.2: Architecture of unsupervised DNNsVariable Layers & nodesW,Q 64, 100, 150, 150, 130, 128V 160, 200, 200, 100, 50, 96ward path and 50000 times faster than model-based method. Notice that, gradientcomputational graph,that Pytorch uses for calculating the gradient, in unsupervisednetwork is more complicated compared to the supervised method. Hence, eventhough the unsupervised method is faster in forward path, it requires more trainingtime.Table 5.3: Performance of semi-supervised and unsupervised DNN-basedDL CRAN optimization.Variable semi-DNN Performance DNN Sum-rateW,Q 80%79%V 130%5.3.1 Training ChallengesDuring training and turning, we faced several challenges that will be explained inthis section.Tuning learning rateThe learning rate is considered the most important tuning parameter in the net-work. A high learning rate causes fast convergence to local minima and oscillatinggradient, and a low learning rate lowers the convergence speed considerably. Wetried many different learning rates and used Figure 5.6 to find a proper learning rate58Figure 5.6: Learning rate tuning guide [4][4]. Other than the initial learning rate, which is a crucial factor, we can schedulethe learning rate to decay in specific intervals (e.g., every 10 epochs) or base on aspecific threshold (e.g., train error threshold).Vanishing GradientThe activation function is an essential parameter of the network that should beappropriately chosen. For some activation functions such as Sigmoid, after addingseveral layers to the DNN, the gradients of the loss function approaches zero whichstops the model from learning. Easiest and maybe the most practical way to solvethe vanishing gradient problem is using Rectified Linear Units (ReLU) or LeakyReLu activation functions. Also, batch normalization can help with this issue.Another approach is changing the weight initialization from the usual Glorot ini-tialization to He initialization. It adds to network learning ability and has beenshown that works well with the Relu activation function [67]. We used all threeworkarounds in our model and effectively solved the vanishing gradient problem.59Figure 5.7: Exploding Gradient Illustration. The graph on the right-hand-sideshows exploding gradient effect and instabilities in train and test errors,we were able to limit the gradient value by clipping, as it is shown inthe left graph.Exploding GradientThe exploding gradient is another problem caused when large gradient results inlarge steps during the weight and biases update and instabilities the optimization,as illustrated in Figure 5.7. One common way to prevent gradient explosion is touse gradient clipping. It sets a limit on the norm of the gradient during the trainingfor more stability. We used gradient clipping, as illustrated in Figure 5.7.Over-fittingAnother common issue in the training phase is over-fitting, which means that theDNN memorizes the training samples. The most apparent clue of an over-fittednetwork is the increase in validation error while the training error continues to go60down. We avoided over-fitting in our training by employing the dropout schemeand L2-norm regularization.Non-homogeneous Loss and ConstraintsAs mentioned before, in semi-supervised learning, our goal is to minimize thecustomized loss function formulated in section 5.2.2. The idea is to penalize per-formance rate for predictions outside of the feasible set. The challenge here is thatthe value of ΩV and the fornthaul capacity which is the objective for optimizingV and the value of Ω and the user sum-rate which is the objective for optimizingQ and W are not homogeneous. So tuning η for adjusting the values is very im-portant during training. For best results, we monitored the sum-rate and penaltyvalues separately during the training phase (same as what we did for the train andvalidation loss values), and re-tuned η , every couple of epochs, according to theloss, performance and penalty values.5.3.2 PyTorch LimitationsDespite all the benefits, available libraries and descriptive documentation of Py-Torch, there were some challenges we faced working with complex numbers in Py-Torch environment. As we discussed in Section 4.6.1, PyTorch automatically cre-ates a computational graph and calculates the gradient for every operation. How-ever, this is true if the function we are using has a defined gradient in PyTorchlibrary. Unfortunately, currently there is a limited algebra support in PyTorch forcomplex numbers [68, 69] and even some of them are limited to CPU usage only.It made us write many customized functions for basic algebraic operations and re-stricted our actions to a limited set of operators. It also affected the computationspeed as having customized function expands the computational graph. Fortu-nately, PyTorch is working on expanding the complex library and hopefully theywill provide a more straight-forward program development environment for work-ing with complex numbers.61Chapter 6Conclusion and Future WorkTo conclude this thesis, we provide a summary of our main motivations, achieve-ments, and contributions in solving the downlink CRAN optimization problem. Wealso, propose possible future research directions that reflect the potential openingsin this area.6.1 ConclusionIn this research, we consider the downlink of a cloud radio access network consist-ing of a central processor and a network of connected radio units. CRAN offers apromising solution for 5G cellular networks as it enables coordinated beamform-ing and better interference management in ultra-dense networks, which will beone of the most common scenarios in 5G. To address CRAN’s backhaul capac-ity limitation, we proposed a novel resource allocation solution for the scenariowith full-duplex self-backhauling RUs connected through hybrid RF/FSO links tothe CP for improved network throughput. We studied the feasibility of the IBFDcommunication in terms of the required self-interference cancellation to outper-form the benchmark half-duplex hybrid RF/FSO transmission. Since the derivedoptimization problem for the design of the linear precoders and quantizers sub-ject to the fronthaul capacity, zero-forcing, and power constraints, is non-convexand intractable, we developed an algorithm to solve it via an alternating optimiza-tion approach. In the simulation results, the proposed hybrid RF/FSO system is62assessed in terms of achievable rate, and we highlighted the parameter range forwhich FD transmission is more rewarding than the time-division approach underdifferent weather conditions and selected RF bandwidth.Notwithstanding the benefits of the proposed solution, which is based on aconventional optimization approach, this method’s computational complexity isstill a challenge in practical scenarios. That led us to use a data-driven strategyto reduce complexity. We employed two machine learning-based optimization ap-proaches—supervised and unsupervised to optimize the design variables withoutsolving conventional analytical optimization tasks.The main achievements of the work in this thesis can be summarized as below:• We integrated in-band full-duplex self-backhauling and hybrid RF/FSO back-haul in a CRAN architecture to improve backhaul reliability and facilitatecoordinated beamforming for better interference management in 5G. Weshowed full-duplex self-backhauling is a more efficient approach comparedto the state-of-the-art half-duplex approach provided enough SIC.• To address the complexity crunch in the model-based CRAN design, weproposed two data-driven approaches for reducing the computational com-plexity. We proposed a supervised and an unsupervised learning-based opti-mization approaches and showed that they have much lower computationalcomplexity compared to the model-based method at the cost of a benign per-formance loss.• The key challenge in data-driven optimization approaches was generatingsolutions that meet all constraints. It required particular DNN architecture,customized activation, and loss functions and post-processing actions; inthe case of supervised learning, we imposed constraints via several post-processing steps and proposed a novel DNN architecture to meet positivesemi-definiteness constraint. In unsupervised learning, we proposed two ac-tivation functions for zero-forcing and positive semi-definiteness constraints,and presented a piece-wise regularization approach that enforces the DNNto generate solutions in the feasible set.636.2 Future WorkIn this section, we discuss possible future research directions.• Inaccurate channel state information can affect the system’s performance andgenerate a gap between theoretical analysis and real implementation [70],[71]. In this research, we assumed perfect CSI is available at the CP. Forfuture work, we can extend the work to a more realistic imperfect CSI. Inaddition, we can also investigate the CSI measurement methods and employthe advanced ML-based solutions for channel estimation as well [72]. Also,the channel model of the FSO link can be further enhanced using deep learn-ing frameworks as proposed by [73].• Another active area of research in 5G is millimeter wave communication.CRAN in such high frequencies needs to employ massive MIMO to com-bat propagation loss which further complicates the optimization problem interms of computational cost. The DNN approach proposed in this study canpotentially also be applied to millimeter wave CRAN.64Bibliography[1] B. He and R. Schober. Bit-interleaved coded modulation for hybrid RF/FSOsystems. IEEE Trans. Commun., 57(12):3753–3763, 2009. → pages x, 23[2] 2008-2020 Compute Canada. Cedar - cc doc. → pages x, 32[3] Dinesh Bharadia, Emily McMilin, and Sachin Katti. Full duplex radios. InProceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages375–386, 2013. → pages xi, 12, 13, 14[4] https://cs231n.github.io/neural-networks-3/. → pages xii, 59[5] et al. Jonsson, P. Ericsson mobilityreport. Ericsson: Stockholm, Sweden,2020. → pages 1[6] Tony QS Quek, Mugen Peng, Osvaldo Simeone, and Wei Yu. Cloud radioaccess networks: Principles, technologies, and applications. CambridgeUniversity Press, 2017. → pages 2, 3[7] D. Gesbert, S. Hanly, H. Huang, S. Shamai Shitz, O. Simeone, and W. Yu.Multi-cell mimo cooperative networks: A new look at interference. IEEEJournal on Selected Areas in Communications, 28(9):1380–1408, 2010. →pages 2[8] A. Mostafa and L. Lampe. Downlink optimization in cloud radio accessnetworks with hybrid RF/FSO fronthaul. In IEEE Globecom Workshops (GCWkshps), pages 1–7, 2018. → pages 2, 4, 9, 13, 15, 16, 18, 20, 21, 22, 23[9] J. Lee, Y. Kim, H. Lee, B. L. Ng, D. Mazzarese, J. Liu, W. Xiao, andY. Zhou. Coordinated multipoint transmission and reception in lte-advancedsystems. IEEE Communications Magazine, 50(11):44–50, 2012. → pages 3[10] W. Lee, I. Lee, J. Sam Kwak, B. Ihm, and S. Han. Multi-bs mimocooperation: challenges and practical solutions in 4g systems. IEEEWireless Communications, 19(1):89–96, 2012. → pages 365[11] S. Chia, M. Gasparroni, and P. Brick. The next challenge for cellularnetworks: backhaul. IEEE Microwave Magazine, 10(5):54–66, 2009. →pages 3, 4[12] B. Dai and W. Yu. Sparse beamforming and user-centric clustering fordownlink cloud radio access network. IEEE Access, 2:1326–1339, 2014. →pages 3, 4[13] S. Park, O. Simeone, O. Sahin, and S. Shamai. Joint precoding andmultivariate backhaul compression for the downlink of cloud radio accessnetworks. IEEE Transactions on Signal Processing, 61(22):5646–5658,2013. → pages 4, 11, 17[14] P. Patil and W. Yu. Hybrid compression and message-sharing strategy for thedownlink cloud radio-access network. In 2014 Information Theory andApplications Workshop (ITA), pages 1–6, 2014. → pages 3[15] P. Patil, B. Dai, and W. Yu. Performance comparison of data-sharing andcompression strategies for cloud radio access networks. European SignalProcessing Conference, pages 2456–2460, 2015. → pages 4, 8[16] F. Demers, H. Yanikomeroglu, and M. St-Hilaire. A survey of opportunitiesfor free space optics in next generation cellular networks. In 2011 NinthAnnual Communication Networks and Services Research Conference, pages210–216, 2011. → pages 4[17] S. Kazemlou, S. Hranilovic, and S. Kumar. All-optical multihop free-spaceoptical communication systems. Journal of Lightwave Technology,29(18):2663–2669, 2011. → pages 4[18] M. Z. Chowdhury, M. K. Hasan, M. Shahjalal, M. T. Hossan, and Y. M.Jang. Optical wireless hybrid networks: Trends, opportunities, challenges,and research directions. IEEE Communications Surveys Tutorials,22(2):930–966, 2020. → pages 4[19] D. Kim, H. Lee, and D. Hong. A survey of in-band full-duplex transmission:From the perspective of phy and mac layers. IEEE Communications SurveysTutorials, 17(4):2017–2046, 2015. → pages 4[20] S. Hong, J. Brand, J. I. Choi, M. Jain, J. Mehlman, S. Katti, and P. Levis.Applications of self-interference cancellation in 5g and beyond. IEEECommunications Magazine, 52(2):114–121, 2014. → pages 4, 766[21] M. A. Khalighi and M. Uysal. Survey on free space optical communication:A communication theory perspective. IEEE Communications SurveysTutorials, 16(4):2231–2258, 2014. → pages 6[22] Isaac I Kim and Eric J Korevaar. Availability of free-space optics (fso) andhybrid fso/rf systems. In Optical Wireless Communications IV, volume4530, pages 84–95. International Society for Optics and Photonics, 2001. →pages 6[23] A. S. Acampora and S. V. Krishnamurthy. A broadband wireless accessnetwork based on mesh-connected free-space optical links. IEEE PersonalCommunications, 6(5):62–65, 1999. → pages 6[24] H. Lei, Z. Dai, K. Park, W. Lei, G. Pan, and M. Alouini. Secrecy outageanalysis of mixed rf-fso downlink swipt systems. IEEE Transactions onCommunications, 66(12):6384–6395, 2018. → pages 6[25] A. Sabharwal, P. Schniter, D. Guo, D. W. Bliss, S. Rangarajan, andR. Wichman. In-band full-duplex wireless: Challenges and opportunities.IEEE Journal on Selected Areas in Communications, 32(9):1637–1652,2014. → pages 7[26] Z. Zhang, K. Long, A. V. Vasilakos, and L. Hanzo. Full-duplex wirelesscommunications: Challenges, solutions, and future research directions.Proceedings of the IEEE, 104(7):1369–1409, 2016. → pages 7, 23[27] L. Zhang, M. Xiao, G. Wu, M. Alam, Y. Liang, and S. Li. A survey ofadvanced techniques for spectrum sharing in 5G networks. IEEE WirelessCommun., 24(5):44–51, 2017. → pages[28] Dinesh Bharadia and Sachin Katti. Full duplex MIMO radios. In Symposiumon Networked Systems Design and Implementation, pages 359–372, 2014.→ pages[29] R. Pitaval, O. Tirkkonen, R. Wichman, K. Pajukoski, E. Lahetkangas, andE. Tiirola. Full-duplex self-backhauling for small-cell 5G networks. IEEEWireless Commun., 22(5):83–89, 2015. → pages[30] T. Chen, M. Baharaani, J. Zhou, H. Krishnaswamy, and G. Zussman.Wideband full-duplex wireless via frequency-domain equalization: Designand experimentation. In Annual International Conference on MobileComputing and Networking, pages 1–16, 2019. → pages67[31] D. Korpi, T. Riihonen, A. Sabharwal, and M. Valkama. Transmit poweroptimization and feasibility analysis of self-backhauling full-duplex radioaccess systems. IEEE Transactions on Wireless Communications,17(6):4219–4236, 2018. → pages 7[32] A. Zappone, M. Di Renzo, M. Debbah, T. T. Lam, and X. Qian.Model-aided wireless artificial intelligence: Embedding expert knowledge indeep neural networks for wireless system optimization. IEEE VehicularTechnology Magazine, 14(3):60–69, 2019. → pages 8[33] H. Ye, G. Y. Li, and B. Juang. Power of deep learning for channel estimationand signal detection in ofdm systems. IEEE Wireless CommunicationsLetters, 7(1):114–117, 2018. → pages 8[34] S. P. Herath and T. Le-Ngoc. Sum-rate performance and impact ofself-interference cancellation on full-duplex wireless systems. In IEEEAnnual International Symposium on Personal, Indoor, and Mobile RadioCommunications (PIMRC), pages 881–885, 2013. → pages 9, 15, 21[35] A Goldsmith. Wireless communications. Cambridge university press, 2005.→ pages 12[36] W. Zhang, S. Hranilovic, and C. Shi. Soft-switching hybrid fso/rf links usingshort-length raptor codes: Design and implementation. IEEE Journal onSelected Areas in Communications, 27(9):1698–1708, 2009. → pages 13, 16[37] A. Molisch. Wireless communications, volume 34. John Wiley & Sons,2012. → pages 14, 15[38] H. Tabassum, A. H. Sakr, and E. Hossain. Analysis of massivemimo-enabled downlink wireless backhauling for full-duplex small cells.IEEE Transactions on Communications, 64(6):2354–2369, 2016. → pages15[39] M. Duarte, C. Dick, and A. Sabharwal. Experiment-driven characterizationof full-duplex wireless systems. IEEE Transactions on WirelessCommunications, 11(12):4296–4307, 2012. → pages 16[40] M. Jain, J. Choi, T. Kim, D. Bharadia, S. Seth, K. Srinivasan, PH. Levis,S. Katti, and P. Sinha. Practical, real-time, full duplex wireless. In AnnualInternational Conference on Mobile Computing and Networking, pages301–312, 2011. → pages 1668[41] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt. Zero-forcing methodsfor downlink spatial multiplexing in multiuser mimo channels. IEEETransactions on Signal Processing, 52(2):461–471, 2004. → pages 16, 52[42] Q. Shi, M. Razaviyayn, Z. Luo, and C. He. An iteratively weighted mmseapproach to distributed sum-utility maximization for a mimo interferingbroadcast channel. IEEE Transactions on Signal Processing,59(9):4331–4340, 2011. → pages 18, 19[43] K. Akcapinar and O. Gurbuz. Full-duplex bidirectional communicationunder self-interference. In International Conference on Telecommunications(ConTEL), pages 1–7, 2015. → pages 21[44] Z. Luo, W. Ma, A. M. So, Y. Ye, and S. Zhang. Semidefinite relaxation ofquadratic optimization problems. IEEE Signal Processing Magazine,27(3):20–34, 2010. → pages 22, 57[45] Ke Li and Jitendra Malik. Learning to optimize. arXiv preprintarXiv:1606.01885, 2016. → pages 28, 47[46] Q. Cappart, T. Moisan, L. Rousseau, I . Pre´mont-Schwarz, and A. Cire.Combining reinforcement learning and constraint programming forcombinatorial optimization. arXiv preprint arXiv:2006.01610, 2020. →pages 47[47] V. Miagkikh and W. Punch. An approach to solving combinatorialoptimization problems using a population of reinforcement learning agents.In Proceedings of the 1st Annual Conference on Genetic and EvolutionaryComputation-Volume 2, pages 1358–1365, 1999. → pages 28, 47[48] G. C. Sobabe, Y. Song, X. Bai, and B. Guo. A cooperative spectrum sensingalgorithm based on unsupervised learning. In 2017 10th InternationalCongress on Image and Signal Processing, BioMedical Engineering andInformatics (CISP-BMEI), pages 1–6, 2017. → pages 28[49] W. Song, F. Zeng, J. Hu, Z. Wang, and X. Mao. Anunsupervised-learning-based method for multi-hop wireless broadcast relayselection in urban vehicular networks. In 2017 IEEE 85th VehicularTechnology Conference (VTC Spring), pages 1–5, 2017. → pages 28[50] M. S. Parwez, D. B. Rawat, and M. Garuba. Big data analytics foruser-activity analysis and user-anomaly detection in mobile wirelessnetwork. IEEE Transactions on Industrial Informatics, 13(4):2058–2065,2017. → pages 2869[51] L. Wang and S. Cheng. Data-driven resource management for ultra-densesmall cells: An affinity propagation clustering approach. IEEE Transactionson Network Science and Engineering, 6(3):267–279, 2019. → pages 28[52] W. Cui, K. Shen, and W. Yu. Spatial deep learning for wireless scheduling.IEEE Journal on Selected Areas in Communications, 37(6):1248–1261, June2019. → pages 28[53] H. Huang, W. Xia, J. Xiong, J. Yang, G. Zheng, and X. Zhu. Unsupervisedlearning-based fast beamforming design for downlink mimo. IEEE Access,7:7599–7605, 2019. → pages 28[54] Y. Yang, F. Gao, X. Ma, and S. Zhang. Deep learning-based channelestimation for doubly selective fading channels. IEEE Access,7:36579–36589, 2019. → pages 28[55] C. Chun, J. Kang, and I. Kim. Deep learning-based channel estimation formassive mimo systems. IEEE Wireless Communications Letters,8(4):1228–1231, 2019. → pages 28[56] M. Schmidt. Cpsc 340 and 532m - machine learning and data mining (fall2019). → pages 30[57] MATLAB. Execute for-loop iterations in parallel on workers - matlabparfor. → pages 32[58] Junbeom Kim, Hoon Lee, Seung-Eun Hong, and Seok-Hwan Park. Deeplearning methods for universal miso beamforming. IEEE WirelessCommunications Letters, 2020. → pages 33[59] Mehran Soltani, Vahid Pourahmadi, Ali Mirzaei, and Hamid Sheikhzadeh.Deep learning-based channel estimation. IEEE Communications Letters,23(4):652–655, 2019. → pages[60] Foad Sohrabi, Kareem M Attiah, and Wei Yu. Deep learning for distributedchannel feedback and multiuser precoding in fdd massive mimo. arXivpreprint arXiv:2007.06512, 2020. → pages 33, 39, 48[61] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. FelipeSantos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J Pal. Deep complexnetworks. arXiv preprint arXiv:1705.09792. → pages 34[62] NumPy®.https://numpy.org/doc/stable/reference/generated/numpy.reshape.html. →pages 3470[63] T .Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by arunning average of its recent magnitude. COURSERA: Neural networks formachine learning, 4(2):26–31, 2012. → pages 42[64] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time styletransfer and super-resolution. In European Conference on Computer Vision,pages 694–711. Springer, 2016. → pages 47[65] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos. Learningto optimize: Training deep neural networks for interference management.IEEE Transactions on Signal Processing, 66(20):5438–5453, 2018. → pages[66] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. Sidiropoulos. Learning tooptimize: Training deep neural networks for wireless resource management.In 2017 IEEE 18th International Workshop on Signal Processing Advancesin Wireless Communications (SPAWC), pages 1–6. IEEE, 2017. → pages 47[67] S. Kumar. On weight initialization in deep neural networks. arXiv preprintarXiv:1704.08863, 2017. → pages 59[68] Torch Contributors. Copyright 2019. torch.utils.data — pytorch 1.7.0documentation. → pages 61[69] PyTorch. Complex numbers — pytorch 1.7.0 documentation. → pages 61[70] A. Tajer, N. Prasad, and X. Wang. Robust linear precoder design formulti-cell downlink transmission. IEEE Transactions on Signal Processing,59(1):235–251, 2011. → pages 64[71] R. Apelfro¨jd and M. Sternad. Robust linear precoder for coordinatedmultipoint joint transmission under limited backhaul with imperfect csi. In2014 11th International Symposium on Wireless Communications Systems(ISWCS), pages 138–143, 2014. → pages 64[72] P. Dong, H. Zhang, G. Ye Li, I. Sim˜oes Gaspar, and N. NaderiAlizadeh.Deep cnn-based channel estimation for mmwave massive mimo systems.IEEE Journal of Selected Topics in Signal Processing, 13(5):989–1000,2019. → pages 64[73] H. Lee, S. Hyun Lee, T. QS Quek, and I Lee. Deep learning framework forwireless systems: Applications to optical wireless communications. IEEECommunications Magazine, 57(3):35–41, 2019. → pages 6471
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Machine learning-assisted CRAN design with hybrid RF/FSO...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Machine learning-assisted CRAN design with hybrid RF/FSO and full-duplex self-backhauling Bayati, Seyedrazieh 2020
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Machine learning-assisted CRAN design with hybrid RF/FSO and full-duplex self-backhauling |
Creator |
Bayati, Seyedrazieh |
Publisher | University of British Columbia |
Date Issued | 2020 |
Description | The ever increasing demand for higher data rates, lower latency communication, and a more reliable mobile network has led us toward the 5th generation (5G) of mobile networks. In 5G, resource allocation is one of the most challenging problems. Conventionally, model-driven methods, and analytical approaches have been used to allocate resources optimally. Despite accuracy, these methods often result in a non-convex optimization problem that is inherently challenging to handle and require proper convex approximation. To overcome such drawbacks, we need more efficient resource allocation techniques in the 5G mobile network. This research will study the downlink of a cloud radio access network. The cloud radio access network enables coordinated beamforming and better interference management in ultra-dense networks. This architecture's bottleneck is backhaul capacity restriction limiting the benefits that the cloud radio access network offers. We will use hybrid radio frequency and free-space optical links to address the backhaul capacity limitation. Also, to improve the throughput and increase the spectral efficiency of the radio-frequency links, we propose in-band full-duplex self-backhauling radio units. After formulating the mathematical model and solving it with analytical approaches, we will introduce a novel solution for the proposed scenario and show that it outperforms the state-of-the-art half-duplex backhaul technology provided enough self-interference cancellation under various weather conditions. We will derive a joint optimization problem to design the backhaul and access link precoders and quantizers subject to the fronthaul capacity, zero-forcing, and power constraints. We will show that this problem is non-convex and computationally intractable and approximate it with a semi-definite programming that can be effectively solved by alternating convex optimization. We also employed Compute Canada computational resources for solving mentioned semi-definite programming. The computational complexity of the proposed optimization approach motivates us to employ machine-learning-based optimization methods that recently received much recognition in academia and industry. We use supervised and unsupervised deep neural networks for learning the optimal resource allocation strategy and achieved 80% of the performance compared to the proposed analytical approach with only a fraction of computational cost. To meet all feasibility constraints of the problem, we also propose customized activation functions and post-processing steps. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2020-12-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0395114 |
URI | http://hdl.handle.net/2429/76630 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2021-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2021_may_bayati_seyedrazieh.pdf [ 1.17MB ]
- Metadata
- JSON: 24-1.0395114.json
- JSON-LD: 24-1.0395114-ld.json
- RDF/XML (Pretty): 24-1.0395114-rdf.xml
- RDF/JSON: 24-1.0395114-rdf.json
- Turtle: 24-1.0395114-turtle.txt
- N-Triples: 24-1.0395114-rdf-ntriples.txt
- Original Record: 24-1.0395114-source.json
- Full Text
- 24-1.0395114-fulltext.txt
- Citation
- 24-1.0395114.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0395114/manifest