Over the past few years, deep reinforcement learning (RL) has made remarkable progress in a range of applications, including Go games, vision-based control, and generative dialogue systems. Via error-and-trial mechanisms, deep RL enables data-driven optimization and sequential decision-making in uncertain environments. Compared to traditional programming or heuristic optimization methods, deep RL can elegantly balance exploration and exploitation and handle environmental uncertainties. As a result, this learning paradigm has attracted increasing attention from both academia and industry and is paving a new path for large-scale complex decision-making applications.
However, when scaling deep RL to more practical scenarios, several challenges inevitably arise that require further consideration and urgent attention in the field. Firstly, the dominant model-free deep RL is sample-demanding, as the learning process requires massive interactions with the real environment. This makes it unrealistic to implement deep RL algorithms in some sampling-expensive applications, such as robotics. Secondly, the learned policy is sensitive to changes in the environment and can easily encounter catastrophic failures when deployed in a new or unseen environment. In some time-sensitive applications, such as autonomous driving, the ability to quickly adapt to new scenes is crucial. Thirdly, in complicated scenarios, the real state of the Markov decision process may be unavailable to access, and multiple objectives may exist in scheduling. In this case, previous deep RL algorithms for a single agent with fully observable states cannot achieve the desired goal. These concerns weaken the scalability of deep RL, and scientific investigations are necessary to meet realistic requirements.
This special issue aims at overcoming previously mentioned challenges and contributes innovative works to apply deep RL to more realistic scenarios.
The call-for-papers of this special issue has inspired wide interest and attracted numerous submissions with high quality. After two-round peer reviews, we selected four papers that try to tackle our interested deep RL problems for publication in this special issue. The topics of these published articles include offline reinforcement learning, meta reinforcement learning, and multi-agent reinforcement learning. In terms of applications, these range from electroencephalogram (EEG) brain-machine interface, the grid power management to automatic radar detection with the help of deep RL.
The first paper, titled “Deep Double-Q Network Decoder Based on EEG Brain-Machine Interface,” focuses on EEG signal processing. In detail, this work adopts a deep double-Q network for the decoding of EEG signals. In comparison to previous pattern recognition or signal process methods, deep RL has more adaptability of signal decoding modules in brain-machine interface to changing environments. The experimental results show the effectiveness of more precise EEG signals’ decoding with deep RL.
The second paper, titled “Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management,” considers the grid power management problem. Technically, uncertainty of environments is involved in decision-making, and multi-objectives guide the optimization process. As a result, multi-agent reinforcement learning is introduced to improve the search efficiency in the large state space, exploit the topology structure of tasks and enhance cooperation between agents at multi-levels. The proposed multi-agent hierarchical graph attention reinforcement learning can well manage the grid energies and significantly reduce voltage violation numbers.
The third paper, titled “A Practical Reinforcement Learning Framework for Automatic Radar Detection,” studies radar detection in a data driven way. Noticing that manual adjustment in radar detection is time and money expensive, this work proposes to automatically achieve radar detection with the combination of offline RL and meta RL. The developed method can reduce real-world interaction complexity and enable fast adaptation to new environments. Empirical results indicate the high efficiency of RL-based radar detection.
The fourth paper, titled “Boundary Data Augmentation for Offline Reinforcement Learning,” investigates the fundamental issue in offline RL. Theoretically, offline RL can boost data efficiency. However, the existence of distribution shift results in unreliable value estimation, and this makes it difficult to construct offline RL algorithms in risk sensitive scenarios. To address these concerns, this work proposes the use of generative adversarial nets to augment the dataset and calibrate the confidence in value estimation. The experimental results show the great potential of generative modeling for improving offline RL performance.
In summary, we hope this special issue will accelerate the scientific investigation of applicable RL in more general decision-making or optimization scenarios. The articles in this special issue are not only innovative but also provide precious experimental evidence and practical experience in the field. These contributed works bring more insights into algorithm design, bottleneck circumventing, and real-world deployment of deep RL and will facilitate the development of deep RL. Last but not least, we sincerely express our gratitude to all authors, reviewers, and the editorial board, who have made efforts to the success of this special issue.