UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Deep reinforcement learning for resource allocation in beyond 5G systems Huang, Rui


With the rapid development of wireless network-enabled applications, the beyond fifth generation (B5G) wireless systems are required to support a large number of mobile and Internet of things (IoT) devices. Moreover, the growing demand for applications with high data rate requirements, including virtual reality (VR), brings new challenges to the B5G wireless systems. While several emerging physical layer and medium access control techniques, including grant-free multiple access (GFMA), intelligent reflecting surface (IRS), and rate-splitting (RS), introduce additional degrees of freedom (DoF) to the B5G wireless systems, novel resource allocation algorithms are required to fully exploit their potentials. In this thesis, we propose deep reinforcement learning (DRL)-based algorithms to efficiently optimize the DoF and improve the performance of B5G wireless systems. First, we propose a distributed pilot sequence selection scheme for GFMA systems. The proposed scheme maximizes the aggregate throughput by mitigating pilot sequence selection collisions. In the proposed scheme, a distributed pilot sequence selection policy is obtained by using a multiagent DRL technique. Second, we propose a joint user scheduling, phase shift control, and beamforming optimization algorithm for IRS-aided systems. We formulate a joint optimization problem for maximizing the aggregate throughput and achieving the proportional fairness in IRS-aided systems. The proposed algorithm exploits neural combinatorial optimization (NCO) to determine user scheduling, and uses curriculum learning (CL) and deep deterministic policy gradient (DDPG) to optimize the beamforming vectors and IRS phase shifts. Third, we propose a novel IRS-aided RS VR streaming system. We formulate an optimization problem for maximizing the achievable bitrate of the 360-degree video subject to the quality of experience (QoE) constraints of the users. We propose a deep deterministic policy gradient with imitation learning (Deep-GRAIL) algorithm, in which we leverage DRL and the human expert knowledge to optimize the IRS phase shifts, RS parameters, beamforming vectors, and bitrate selection of 360-degree videos. Simulation results show that the proposed DRL-based algorithms improve the performance of B5G wireless systems by efficiently optimizing the DoF. Our results also demonstrate the effectiveness of empowering the DRL techniques with the human expert knowledge of wireless systems.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International