基于FPGA的DDPG算法硬件映射解析与机器人运动技能学习

朱晓庆; 毕兰越; 宫婉儒; 吴通; 李钟军; 吴杜兴; 张川; 杨晓蓬

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	朱晓庆,毕兰越,宫婉儒,吴通,李钟军,吴杜兴,张川,杨晓蓬.基于FPGA的DDPG算法硬件映射解析与机器人运动技能学习[J].哈尔滨工业大学学报,2026,58(1):24.DOI:10.11918/202508035
	ZHU Xiaoqing,BI Lanyue,GONG Wanru,WU Tong,LI Zhongjun,WU Duxing,ZHANG Chuan,YANG Xiaopeng.Hardware mapping analysis of DDPG algorithm based on FPGA and robot motion skill learning[J].Journal of Harbin Institute of Technology,2026,58(1):24.DOI:10.11918/202508035

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 3185次下载 963次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
基于FPGA的DDPG算法硬件映射解析与机器人运动技能学习
朱晓庆^1,3,毕兰越^1,3,宫婉儒^1,3,吴通^2,3,李钟军^1,3,吴杜兴^1,3,张川^2,3,杨晓蓬^1,3
(1.北京工业大学信息科学与技术学院,北京 100039;2.中核核信信息技术(北京)有限公司,北京 100091; 3.核工业智能交叉实验室(北京工业大学),北京 100124)

摘要:

为研究神经网络和强化学习算法与高等动物进化原理之间的联系,本文结合深度确定性策略梯度（deep deterministic policy gradient,DDPG）算法构建了一套可观测、可解释的轮足机器人自主运动控制系统。首先在FPGA（field-programmable gate arrays）上部署Actor-Critic神经网络,并设计了一套FPGA-ARM机器人控制系统,通过实时导出网络权值激活信号并生成权值热力图,以可视化展示策略演化过程。实验表明,该方案单步计算时延缩减至28 μs,5 000步内完成收敛。同时,权值热力图揭示了策略在初期、中期及后期3个阶段的动态演化,定性分析表明,非关注区域对整体策略影响微弱、资源利用更趋优化。本文提出的硬件算法协同框架为强化学习“黑箱”可观测性研究提供了新范式,展示了FPGA在嵌入式机器人控制中兼具低延迟、高并行和低功耗的独特优势,为多智能体协作与异构平台下的实时技能学习与硬件加速提供了潜在应用前景。

关键词: 机器人学习机理解析技能学习 FPGA 强化学习

DOI：10.11918/202508035

分类号:TP242

文献标识码:A

基金项目:国家自然科学基金(62103009)；北京市自然科学基金(4202005)

Hardware mapping analysis of DDPG algorithm based on FPGA and robot motion skill learning

ZHU Xiaoqing^1,3,BI Lanyue^1,3,GONG Wanru^1,3,WU Tong^2,3,LI Zhongjun^1,3,WU Duxing^1,3,ZHANG Chuan^2,3,YANG Xiaopeng^1,3

(1.School of Information Science and Technology, Beijing University of Technology, Beijing 100039, China; 2.CNNC Hexin Information Technology (Beijing) Co., LTD., Beijing 100091, China; 3.Nuclear Industry X Intelligence Laboratory (Beijing University of Technology), Beijing 100124, China)

Abstract:

This paper investigates the intrinsic connection between neural networks, reinforcement learning (RL) algorithms, and the evolutionary principles of higher animals by developing an observable and interpretable autonomous control system for a wheel-legged robot. Leveraging the Deep Deterministic Policy Gradient (DDPG) algorithm, an Actor-Critic neural network has been implemented directly on Field-programmable gate arrays (FPGA). An FPGA-ARM robot control system is further designed to export weight activation signals in real time and generate weight heatmaps, thereby visualizing the strategy evolution process. Experimental results demonstrate that the proposed system has the ability of reducing the single-step computation latency to 28 μs and achieves convergence within 5 000 steps. Moreover, the weight heatmaps reveal the dynamic evolution of strategies across three phases——early, middle, and late stages. Qualitative analysis indicates that non-salient regions have minimal impact on the overall strategy, resulting in more efficient resource utilization. The proposed hardware-algorithm co-design framework establishes a novel paradigm for improving the interpretability and reducing the “black-box” nature of RL. It also showcases the unique advantages of FPGA in embedded robot control, namely low latency, high parallelism, and low power consumption. This work lays a robust foundation and presents promising prospects for real-time skill learning and hardware acceleration in scenarios involving multi-agent cooperation and heterogeneous computing platforms.

Key words: robotics analysis of learning mechanism skill learning FPGA reinforcement learning

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS