一种深度神经网络多步延迟参数更新并行优化方法
CSTR:
作者:
作者单位:

(1.兰州交通大学 电子与信息工程学院,兰州 730070; 2.西安交通大学 计算机科学与技术学院,西安 710049)

作者简介:

巨涛(1980—),男,教授,硕士生导师

通讯作者:

巨涛,jutao@mail.lzjtu.cn

中图分类号:

TP391

基金项目:

国家自然科学基金(61862037);甘肃省科技计划项目(23CXGA0028)


A multi-step delay parameter update parallel optimization method for deep neural network
Author:
Affiliation:

(1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; 2.School of Computer Science and Technology,Xi′an Jiaotong University, Xi′an 710049, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决深度神经网络(deep neural network,DNN)分布式数据并行训练中因聚合节点梯度进行全局梯度参数更新而导致的高通信开销问题,提出一种DNN多步延迟参数更新并行优化方法。首先,设计了一种自适应多步更新间隔选择策略,通过多次本地迭代,再聚合节点梯度,降低频繁通信造成的额外开销;同时,提出了一种参数修正策略,防止本地模型在多步本地更新后偏离全局模型,从而保证训练精度;其次,在聚合梯度时,将梯度张量切分为子张量,在梯度聚合过程中实现通信与计算的最大化重叠,进一步加速模型训练;最后,在CIFAR-100和ImageNet-mini数据集上,将本文方法与SSGD、Local SGD训练方法进行对比。实验结果表明,本文方法可以在保证模型训练精度的基础上,显著减少因参数更新引入的通信开销,可以实现通信与计算的最大化重叠,充分利用计算资源提升并行训练速度。研究结果可为降低DNN分布式训练过程中的通信开销提供新的方案。

    Abstract:

    To address the high communication overhead caused by global gradient parameter updates at aggregation nodes in distributed data parallel training of deep neural network (DNN), a parallel optimization method of multi-step delay parameter updates for deep neural network is proposed. Firstly, an adaptive multi-step update interval selection strategy was designed. After completing multiple local iterative parameter updates, node gradients are aggregated to update the global model parameters, reducing the excessive communication overhead caused by frequent gradient aggregation. At the same time, to prevent the local model from deviating from the global model after several local updates, a parameter correction strategy is proposed to ensure the accuracy of model training. Secondly, during gradient aggregation, the gradient tensor is split into several sub-tensors. By combining sub-tensor priority scheduling, communication and computation during gradient aggregation are maximally overlapped, further accelerating the model training process. Finally, on the CIFAR-100 and ImageNet-mini datasets, the proposed method is compared with SSGD, Local SGD training methods. Results show that the proposed method can significantly reduce communication overhead due to parameter updating on the basis of ensuring model training accuracy. It can maximize the overlap of communication and computing, and make full use of computing resources to improve the speed of parallel training. The results of this study can provide a new resolution to reduce communication costs in the distributed training process of deep neural network.

    参考文献
    相似文献
    引证文献
引用本文

巨涛,康贺廷,刘帅,丁肖健,王龙翔.一种深度神经网络多步延迟参数更新并行优化方法[J].哈尔滨工业大学学报,2025,57(9):95. DOI:10.11918/202407052

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-07-17
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-09-15
  • 出版日期: 2025-09-10
文章二维码