Abstract: Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems, meanwhile, the overhead cost of channel state information and beam training is considerable especially in the dynamic environments. To reduce overhead cost, we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method. By online learning of users’ moving trajectories, the proposed algorithm learns to scan a beam subspace to maximize the average effective sum-rate. Considering practical implementation, we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning, to reduce the training complexity. Furthermore, we propose a scalable state-action-reward design for scenarios with different user and antenna numbers. Simulation results verify the effectiveness of the designed method.
Keywords: multi-agent deep Q-learning; centralized training and distributed execution; mmWave communication; beam tracking; scalability