| 引用本文: | 刘月笙,徐中显,贺宁,贺利乐.强化学习驱动的移动机器人预测控制参数整定[J].哈尔滨工业大学学报,2026,58(4):212.DOI:10.11918/202503079 |
| LIU Yuesheng,XU Zhongxian,HE Ning,HE Lile.Reinforcement learning-driven parameter tuning for mobile robots predictive control[J].Journal of Harbin Institute of Technology,2026,58(4):212.DOI:10.11918/202503079 |
|
| 摘要: |
| 为提升动态环境下全向移动机器人轨迹跟踪预测控制的性能与适应性,并克服现有基于机器学习的参数调优方法存在的数据依赖性强、短期控制精度与长期系统性能难以兼顾等局限,提出了一种融合强化学习理论与事件触发机制的模型预测控制参数在线自整定方法。首先,建立全向移动机器人运动学模型,并构建对应的轨迹跟踪模型预测控制框架。其次,提出了一种融合Actor-Critic强化学习的参数动态优化框架,通过构建联合状态误差与动态性能指标的奖励函数,驱动控制器实时优化控制参数。进一步地,将事件触发机制深度协同于参数优化框架,构建自适应控制器,通过减少参数更新频率以降低计算负载,实现高效控制。最后,搭建全向移动机器人实物实验平台,在阶跃轨迹、Lemniscate曲线追踪,以及动态避障等多场景下开展对比实验。结果表明,相比于采用静态参数的传统模型预测控制方法,所提方法在阶跃轨迹跟踪场景中降低了约70%的超调量和调节时间,在Lemniscate轨迹跟踪场景中降低了约65%的状态偏差,在动态避障场景中降低了约30%的状态偏差,从而验证了该方法在复杂动态环境下提升轨迹跟踪性能的有效性及环境适应能力。本研究为解决动态不确定环境下移动机器人的高性能轨迹跟踪控制难题提供了新的思路和途径。 |
| 关键词: 全向移动机器人 模型预测控制 参数整定 强化学习 Actor-Critic框架 事件触发 |
| DOI:10.11918/202503079 |
| 分类号:TP242 |
| 文献标识码:A |
| 基金项目:国家自然科学基金(62473301);陕西省自然科学基础研究计划项目(2024JC-YBQN-0703) |
|
| Reinforcement learning-driven parameter tuning for mobile robots predictive control |
|
LIU Yuesheng1,XU Zhongxian2,HE Ning1,HE Lile1
|
|
(1.School of Mechanical and Electrical Engineering, Xian University of Architecture and Technology, Xian 710055, China; 2.School of Automation, Xian University of Posts and Telecommunications, Xian 710121, China)
|
| Abstract: |
| To enhance the performance and adaptability of trajectory tracking predictive control for omnidirectional mobile robots in dynamic environments and address the limitations of existing machine learning-based parameter tuning methods, such as strong data dependency and the difficulty in balancing short-term control accuracy with long-term system performance, this paper proposed an online self-tuning method for model predictive control (MPC) parameters. This method integrated reinforcement learning theory with an event-triggered mechanism. First, the kinematic model of the omnidirectional mobile robot was established, and a corresponding trajectory tracking MPC framework was constructed. Second, a dynamic parameter optimization framework incorporating the Actor-Critic reinforcement learning was introduced. By designing a reward function that combines state errors and dynamic performance metrics, the controller was driven to optimize control parameters in real time. Furthermore, the event-triggered mechanism was seamlessly integrated into the parameter optimization framework to develop an adaptive controller. This integration reduced the frequency of parameter updates, thereby lowering computational load and enabling efficient control. Finally, a physical experimental platform for omnidirectional mobile robots was developed, and comparative experiments were conducted across multiple scenarios, including step trajectory tracking, Lemniscate curve tracking, and dynamic obstacle avoidance. Experimental results demonstrate that compared to traditional MPC methods using static parameters, the proposed approach reduces overshoot and adjustment time by approximately 70% in step trajectory tracking, decreases state deviation by approximately 65% in Lemniscate trajectory tracking, and reduces state deviation by approximately 30% in dynamic obstacle avoidance scenarios. These results validate the effectiveness and environmental adaptability of the proposed method in enhancing trajectory tracking performance in complex dynamic environments. This research provides novel insights and approaches for addressing the challenges of high-performance trajectory tracking control of mobile robots in dynamic and uncertain conditions. |
| Key words: omnidirectional mobile robot model predictive control parameter tuning reinforcement learning Actor-Critic network event triggering |