| 引用本文: | 丘沛桓,倪炜霖,吴志刚,梁海朝.强信息约束下的飞行器智能协同机动决策方法[J].哈尔滨工业大学学报,2026,58(4):11.DOI:10.11918/202503020 |
| QIU Peihuan,NI Weilin,WU Zhigang,LIANG Haizhao.Intelligent cooperative maneuver decision-making approach for vehicles under strong information constraints[J].Journal of Harbin Institute of Technology,2026,58(4):11.DOI:10.11918/202503020 |
|
| 摘要: |
| 为实现高超声速飞行器在“目标拦截者防御者”多角色博弈场景下对拦截飞行器的逃逸,其需要与防御飞行器执行协同机动策略。然而,由于探测装置限制,高超声速飞行器面临非完美、非完备和非完整等强信息约束下的协同机动决策问题。为此,结合多智能体深度强化学习算法,提出了一种端到端协同机动决策方法,使高超声速飞行器能够在强信息约束下进行协同机动,进而成功逃逸。首先,将研究场景建模为分布式部分可观测马尔可夫决策过程,并提出一种观测信息共享堆叠机制,用于设计受强信息约束的局部观测状态空间。其次,针对多智能体强化学习稀疏奖励问题,构造一种结合博弈关系与零控脱靶量的多智能体合作决策奖励函数,提高多智能体系统在复杂博弈场景中的训练效率。最后,设计由基础智能体网络和顶层值分解网络构成的多智能体协同决策网络架构,能够从非完美、非完备和非完整信息中提取飞行器的时空轨迹特征,实现智能体系统的策略协调与飞行器的协同机动决策。结果表明,搭载所提出的智能协同机动决策方法的高超声速飞行器能够在强信息约束下的多角色博弈场景中成功逃逸,并在典型博弈场景与蒙特卡洛测试等数值仿真中展现了出色的效能和鲁棒性。 |
| 关键词: 协同机动决策 高超声速飞行器 强化学习 部分可观测问题 多智能体 |
| DOI:10.11918/202503020 |
| 分类号:V11 |
| 文献标识码:A |
| 基金项目:国家自然科学基金(62388101) |
|
| Intelligent cooperative maneuver decision-making approach for vehicles under strong information constraints |
|
QIU Peihuan,NI Weilin,WU Zhigang,LIANG Haizhao
|
|
(School of Aeronautics and Astronautics, Sun Yat-sen University, Shenzhen 518107, Guangdong, China)
|
| Abstract: |
| To achieve the escape of a hypersonic vehicle from an interceptor in a multi-role game scenario of “target-interceptor-defender”, it is necessary to execute a cooperative maneuver strategy with the defender. However, due to the limitations of the detection device, hypersonic vehicles face the problem of cooperative maneuver decision-making with imperfect, incomplete, and intermittent strong information constraints. To address this, this paper proposed an end-to-end cooperative maneuver decision-making approach by integrating a multi-agent deep reinforcement learning algorithm, enabling hypersonic vehicles to make cooperative maneuver decisions under strong information constraints and achieve successful evasion. First, the research scenario was modeled as a decentralized partially observable Markov decision process, and an observation information sharing stacking mechanism was proposed for the design of local observation state spaces under the strong information constraints. Second, to address the sparse reward problem in multi-agent deep reinforcement learning, a cooperative decision-making reward function was constructed by integrating game relationships and zero-effort miss distance, enhancing training efficiency in complex game scenarios. Finally, a multi-agent cooperative decision-making network architecture was designed, comprising the agents basic networks and the top value decomposition network. This architecture extracted spatio-temporal trajectory features from imperfect, incomplete, and intermittent information, enabling policy coordination among agents and cooperative maneuver decision-making for vehicles. Research results demonstrate that hypersonic vehicles equipped with the proposed intelligent cooperative maneuver decision-making approach can successfully evade in multi-role game scenarios under strong information constraints. The proposed approach exhibits outstanding performance and robustness in numerical simulations, including typical game scenarios and Monte Carlo tests. |
| Key words: cooperative maneuver decision-making hypersonic vehicle reinforcement learning partially observable problem multi-agent |