Abstract:To address the challenges of low efficiency and insufficient success rates in odor source localization (OSL) within complex and dynamic indoor plume environments, particularly where robots struggle to accurately perceive the environment and navigate effectively under turbulent conditions, this paper proposes an auxiliary value and wind-guided proximal policy optimization (AVW-PPO) algorithm based on deep reinforcement learning. First, an auxiliary value network is introduced into the original PPO framework to reduce the estimation bias of a single value network, thereby improving prediction accuracy and stabilizing policy updates. Next, a wind-guided strategy is designed to integrate local wind field information into the state space and reward function of the reinforcement learning framework, enabling the robot to better perceive dynamic changes in the plume environment and optimize its decision-making path, thus significantly improving the efficiency of odor source localization. Finally, a gas diffusion model in a two-dimensional environment is constructed to test the proposed algorithm under three different turbulence conditions. Experimental results demonstrate that, under identical environmental conditions, the AVW-PPO algorithm outperforms other comparable algorithms in terms of average search steps and success rates, achieving a localization success rate of over 99%. Notably, the wind-guided strategy significantly boosts search efficiency, helping to reduce the time required for the robot to complete tasks. This study provides new insights and methodologies for addressing odor source localization problems in complex turbulent indoor environments.