Two-stage reinforcement learning for MIMO-NOMA with hard-latency constraints

Luyuan Zhang, An Liu, Xiaoxia Xu, Xidong Mu, Yuanwei Liu

Research output: Contribution to journalArticlepeer-review

22 Downloads (Pure)

Abstract

A novel hard-latency guaranteed cluster-free multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) framework is proposed to deal with burst traffics that commonly occur in real-world scenarios. The hard-latency constrained effective throughput (HLC-ET) maximization problem is formulated, which jointly optimizes the beamforming and cluster-free success interference cancellation (SIC) operations. To address the resultant problem, a two-stage reinforcement learning (RL)-based algorithm is developed to capture system uncertainty, where the large-dimension optimization is decoupled into two stages to reduce the action space and fasten convergence of RL. In the long-term stage, we aim to maximize the HLC-ET, and a hybrid RL algorithm with policy reuse is adoped to control the priority weights to construct the weighted sum rate (WSR) function of users. In the short-term stage, a branch-and-bound (BB) based algorithm is further developed to obtain the optimal solution of the WSR maximization problem. The BB-based algorithm is proved to guarantee the convergence to an ϵ-optimal solution of the WSR maximization problem within a finite number of steps. To accelerate computation in the short-term stage, a channel correlation based two-loop greedy (CC-TLG) algorithm is proposed to significantly reduce the complexity with almost no performance loss compared to the BB-based algorithm. Finally, simulations demonstrate the advantages of the proposed two-stage RL based joint beamforming and SIC optimization (TSRL-JBSO) algorithm over conventional RL-based and non-RL based algorithms.
Original languageEnglish
JournalIEEE Transactions on Communications
Early online date05 Jun 2025
DOIs
Publication statusEarly online date - 05 Jun 2025

Bibliographical note

Publisher Copyright:
© 1972-2012 IEEE.

Publications and Copyright Policy

This work is licensed under Queen’s Research Publications and Copyright Policy.

Keywords

  • Beamforming
  • hard latency
  • non-orthogonal multiple access (NOMA)
  • reinforcement learning

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Two-stage reinforcement learning for MIMO-NOMA with hard-latency constraints'. Together they form a unique fingerprint.

Cite this