| 引用本文: | 巨涛,康贺廷,刘帅,丁肖健,王龙翔.一种深度神经网络多步延迟参数更新并行优化方法[J].哈尔滨工业大学学报,2025,57(9):95.DOI:10.11918/202407052 |
| JU Tao,KANG Heting,LIU Shuai,DING Xiaojian,WANG Longxiang.A multi-step delay parameter update parallel optimization method for deep neural network[J].Journal of Harbin Institute of Technology,2025,57(9):95.DOI:10.11918/202407052 |
|
| |
|
|
| 本文已被:浏览 2422次 下载 335次 |
 码上扫一扫! |
|
|
| 一种深度神经网络多步延迟参数更新并行优化方法 |
|
巨涛1,康贺廷1,刘帅2,丁肖健1,王龙翔2
|
|
(1.兰州交通大学 电子与信息工程学院,兰州 730070; 2.西安交通大学 计算机科学与技术学院,西安 710049)
|
|
| 摘要: |
| 为解决深度神经网络(deep neural network,DNN)分布式数据并行训练中因聚合节点梯度进行全局梯度参数更新而导致的高通信开销问题,提出一种DNN多步延迟参数更新并行优化方法。首先,设计了一种自适应多步更新间隔选择策略,通过多次本地迭代,再聚合节点梯度,降低频繁通信造成的额外开销;同时,提出了一种参数修正策略,防止本地模型在多步本地更新后偏离全局模型,从而保证训练精度;其次,在聚合梯度时,将梯度张量切分为子张量,在梯度聚合过程中实现通信与计算的最大化重叠,进一步加速模型训练;最后,在CIFAR-100和ImageNet-mini数据集上,将本文方法与SSGD、Local SGD训练方法进行对比。实验结果表明,本文方法可以在保证模型训练精度的基础上,显著减少因参数更新引入的通信开销,可以实现通信与计算的最大化重叠,充分利用计算资源提升并行训练速度。研究结果可为降低DNN分布式训练过程中的通信开销提供新的方案。 |
| 关键词: 深度神经网络 数据并行 通信调度 参数更新 计算与通信重叠 |
| DOI:10.11918/202407052 |
| 分类号:TP391 |
| 文献标识码:A |
| 基金项目:国家自然科学基金(61862037);甘肃省科技计划项目(23CXGA0028) |
|
| A multi-step delay parameter update parallel optimization method for deep neural network |
|
JU Tao1,KANG Heting1,LIU Shuai2,DING Xiaojian1,WANG Longxiang2
|
|
(1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; 2.School of Computer Science and Technology,Xi′an Jiaotong University, Xi′an 710049, China)
|
| Abstract: |
| To address the high communication overhead caused by global gradient parameter updates at aggregation nodes in distributed data parallel training of deep neural network (DNN), a parallel optimization method of multi-step delay parameter updates for deep neural network is proposed. Firstly, an adaptive multi-step update interval selection strategy was designed. After completing multiple local iterative parameter updates, node gradients are aggregated to update the global model parameters, reducing the excessive communication overhead caused by frequent gradient aggregation. At the same time, to prevent the local model from deviating from the global model after several local updates, a parameter correction strategy is proposed to ensure the accuracy of model training. Secondly, during gradient aggregation, the gradient tensor is split into several sub-tensors. By combining sub-tensor priority scheduling, communication and computation during gradient aggregation are maximally overlapped, further accelerating the model training process. Finally, on the CIFAR-100 and ImageNet-mini datasets, the proposed method is compared with SSGD, Local SGD training methods. Results show that the proposed method can significantly reduce communication overhead due to parameter updating on the basis of ensuring model training accuracy. It can maximize the overlap of communication and computing, and make full use of computing resources to improve the speed of parallel training. The results of this study can provide a new resolution to reduce communication costs in the distributed training process of deep neural network. |
| Key words: deep neural network data parallelism communication scheduling parameter updating computation and communication overlapping |
|
|
|
|