局部风信息启发的AVW-PPO室内气源定位算法

李世钰; 袁杰; 谢霖伟; 郭旭; 张宁宁

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	李世钰,袁杰,谢霖伟,郭旭,张宁宁.局部风信息启发的AVW-PPO室内气源定位算法[J].哈尔滨工业大学学报,2025,57(8):57.DOI:10.11918/202410030
	LI Shiyu,YUAN Jie,XIE Linwei,GUO Xu,ZHANG Ningning.Local wind information-inspired AVW-PPO indoor odor source localization algorithm[J].Journal of Harbin Institute of Technology,2025,57(8):57.DOI:10.11918/202410030

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 2940次下载 325次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
局部风信息启发的AVW-PPO室内气源定位算法
李世钰¹,袁杰²,谢霖伟¹,郭旭¹,张宁宁¹
(1.新疆大学电气工程学院,乌鲁木齐 830017; 2.新疆大学智能科学与技术学院,乌鲁木齐 830017)

摘要:

为解决当前复杂、动态室内羽流环境中气源定位（OSL）效率低下和成功率不足的问题,尤其在湍流条件下机器人难以准确感知环境并实现有效导航的挑战,提出了一种基于深度强化学习的辅助价值与风导向的近端策略优化（AVW-PPO）算法。首先,在原始PPO算法的基础上引入辅助价值网络,以减少单一值网络的估计偏差,从而提升策略更新的稳定性与预测精度。其次,设计了一种风导向策略,将局部环境风场信息融入强化学习框架中的状态空间与奖励函数,使机器人能够更敏锐地感知羽流环境的动态变化,优化其决策路径,从而有效提高气源定位的效率。最后,通过构建二维环境中的气体扩散模型,在3种不同的湍流条件下对所提算法进行了测试。结果表明:相同环境条件下,AVW-PPO算法在平均搜索步数和成功率两个指标上均优于其他同类算法,且定位成功率超过99%。其中,风导向策略在提升搜索效率方面表现尤为突出,有助于减少机器人完成任务所需的时间。本研究为解决室内复杂湍流环境下的气源定位问题提供了新思路和新方法。

关键词: 气源定位深度强化学习近端策略优化(PPO) 辅助价值网络风导向策略

DOI：10.11918/202410030

分类号:TP242.6

文献标识码:A

基金项目:国家自然科学基金(62263031); 新疆维吾尔自治区自然科学基金(2022D01C53)

Local wind information-inspired AVW-PPO indoor odor source localization algorithm

LI Shiyu¹,YUAN Jie²,XIE Linwei¹,GUO Xu¹,ZHANG Ningning¹

(1.School of Electrical Engineering, Xinjiang University, Urumqi 830017, China; 2.School of Intelligence Science and Technology, Xinjiang University, Urumqi 830017, China)

Abstract:

To address the challenges of low efficiency and insufficient success rates in odor source localization (OSL) within complex and dynamic indoor plume environments, particularly where robots struggle to accurately perceive the environment and navigate effectively under turbulent conditions, this paper proposes an auxiliary value and wind-guided proximal policy optimization (AVW-PPO) algorithm based on deep reinforcement learning. First, an auxiliary value network is introduced into the original PPO framework to reduce the estimation bias of a single value network, thereby improving prediction accuracy and stabilizing policy updates. Next, a wind-guided strategy is designed to integrate local wind field information into the state space and reward function of the reinforcement learning framework, enabling the robot to better perceive dynamic changes in the plume environment and optimize its decision-making path, thus significantly improving the efficiency of odor source localization. Finally, a gas diffusion model in a two-dimensional environment is constructed to test the proposed algorithm under three different turbulence conditions. Experimental results demonstrate that, under identical environmental conditions, the AVW-PPO algorithm outperforms other comparable algorithms in terms of average search steps and success rates, achieving a localization success rate of over 99%. Notably, the wind-guided strategy significantly boosts search efficiency, helping to reduce the time required for the robot to complete tasks. This study provides new insights and methodologies for addressing odor source localization problems in complex turbulent indoor environments.

Key words: odor source localization deep reinforcement learning proximal policy optimization (PPO) auxiliary value network wind-guided strategy

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS