| 引用本文: | 李世钰,袁杰,谢霖伟,郭旭,张宁宁.局部风信息启发的AVW-PPO室内气源定位算法[J].哈尔滨工业大学学报,2025,57(8):57.DOI:10.11918/202410030 |
| LI Shiyu,YUAN Jie,XIE Linwei,GUO Xu,ZHANG Ningning.Local wind information-inspired AVW-PPO indoor odor source localization algorithm[J].Journal of Harbin Institute of Technology,2025,57(8):57.DOI:10.11918/202410030 |
|
| 摘要: |
| 为解决当前复杂、动态室内羽流环境中气源定位(OSL)效率低下和成功率不足的问题,尤其在湍流条件下机器人难以准确感知环境并实现有效导航的挑战,提出了一种基于深度强化学习的辅助价值与风导向的近端策略优化(AVW-PPO)算法。首先,在原始PPO算法的基础上引入辅助价值网络,以减少单一值网络的估计偏差,从而提升策略更新的稳定性与预测精度。其次,设计了一种风导向策略,将局部环境风场信息融入强化学习框架中的状态空间与奖励函数,使机器人能够更敏锐地感知羽流环境的动态变化,优化其决策路径,从而有效提高气源定位的效率。最后,通过构建二维环境中的气体扩散模型,在3种不同的湍流条件下对所提算法进行了测试。结果表明:相同环境条件下,AVW-PPO算法在平均搜索步数和成功率两个指标上均优于其他同类算法,且定位成功率超过99%。其中,风导向策略在提升搜索效率方面表现尤为突出,有助于减少机器人完成任务所需的时间。本研究为解决室内复杂湍流环境下的气源定位问题提供了新思路和新方法。 |
| 关键词: 气源定位 深度强化学习 近端策略优化(PPO) 辅助价值网络 风导向策略 |
| DOI:10.11918/202410030 |
| 分类号:TP242.6 |
| 文献标识码:A |
| 基金项目:国家自然科学基金(62263031); 新疆维吾尔自治区自然科学基金(2022D01C53) |
|
| Local wind information-inspired AVW-PPO indoor odor source localization algorithm |
|
LI Shiyu1,YUAN Jie2,XIE Linwei1,GUO Xu1,ZHANG Ningning1
|
|
(1.School of Electrical Engineering, Xinjiang University, Urumqi 830017, China; 2.School of Intelligence Science and Technology, Xinjiang University, Urumqi 830017, China)
|
| Abstract: |
| To address the challenges of low efficiency and insufficient success rates in odor source localization (OSL) within complex and dynamic indoor plume environments, particularly where robots struggle to accurately perceive the environment and navigate effectively under turbulent conditions, this paper proposes an auxiliary value and wind-guided proximal policy optimization (AVW-PPO) algorithm based on deep reinforcement learning. First, an auxiliary value network is introduced into the original PPO framework to reduce the estimation bias of a single value network, thereby improving prediction accuracy and stabilizing policy updates. Next, a wind-guided strategy is designed to integrate local wind field information into the state space and reward function of the reinforcement learning framework, enabling the robot to better perceive dynamic changes in the plume environment and optimize its decision-making path, thus significantly improving the efficiency of odor source localization. Finally, a gas diffusion model in a two-dimensional environment is constructed to test the proposed algorithm under three different turbulence conditions. Experimental results demonstrate that, under identical environmental conditions, the AVW-PPO algorithm outperforms other comparable algorithms in terms of average search steps and success rates, achieving a localization success rate of over 99%. Notably, the wind-guided strategy significantly boosts search efficiency, helping to reduce the time required for the robot to complete tasks. This study provides new insights and methodologies for addressing odor source localization problems in complex turbulent indoor environments. |
| Key words: odor source localization deep reinforcement learning proximal policy optimization (PPO) auxiliary value network wind-guided strategy |