一种深度神经网络多步延迟参数更新并行优化方法

巨涛; 康贺廷; 刘帅; 丁肖健; 王龙翔

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	巨涛,康贺廷,刘帅,丁肖健,王龙翔.一种深度神经网络多步延迟参数更新并行优化方法[J].哈尔滨工业大学学报,2025,57(9):95.DOI:10.11918/202407052
	JU Tao,KANG Heting,LIU Shuai,DING Xiaojian,WANG Longxiang.A multi-step delay parameter update parallel optimization method for deep neural network[J].Journal of Harbin Institute of Technology,2025,57(9):95.DOI:10.11918/202407052

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 2422次下载 335次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
一种深度神经网络多步延迟参数更新并行优化方法
巨涛¹,康贺廷¹,刘帅²,丁肖健¹,王龙翔²
(1.兰州交通大学电子与信息工程学院,兰州 730070; 2.西安交通大学计算机科学与技术学院,西安 710049)

摘要:

为解决深度神经网络（deep neural network,DNN）分布式数据并行训练中因聚合节点梯度进行全局梯度参数更新而导致的高通信开销问题,提出一种DNN多步延迟参数更新并行优化方法。首先,设计了一种自适应多步更新间隔选择策略,通过多次本地迭代,再聚合节点梯度,降低频繁通信造成的额外开销；同时,提出了一种参数修正策略,防止本地模型在多步本地更新后偏离全局模型,从而保证训练精度；其次,在聚合梯度时,将梯度张量切分为子张量,在梯度聚合过程中实现通信与计算的最大化重叠,进一步加速模型训练；最后,在CIFAR-100和ImageNet-mini数据集上,将本文方法与SSGD、Local SGD训练方法进行对比。实验结果表明,本文方法可以在保证模型训练精度的基础上,显著减少因参数更新引入的通信开销,可以实现通信与计算的最大化重叠,充分利用计算资源提升并行训练速度。研究结果可为降低DNN分布式训练过程中的通信开销提供新的方案。

关键词: 深度神经网络数据并行通信调度参数更新计算与通信重叠

DOI：10.11918/202407052

分类号:TP391

文献标识码:A

基金项目:国家自然科学基金(61862037)；甘肃省科技计划项目(23CXGA0028)

A multi-step delay parameter update parallel optimization method for deep neural network

JU Tao¹,KANG Heting¹,LIU Shuai²,DING Xiaojian¹,WANG Longxiang²

(1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; 2.School of Computer Science and Technology,Xi′an Jiaotong University, Xi′an 710049, China)

Abstract:

To address the high communication overhead caused by global gradient parameter updates at aggregation nodes in distributed data parallel training of deep neural network (DNN), a parallel optimization method of multi-step delay parameter updates for deep neural network is proposed. Firstly, an adaptive multi-step update interval selection strategy was designed. After completing multiple local iterative parameter updates, node gradients are aggregated to update the global model parameters, reducing the excessive communication overhead caused by frequent gradient aggregation. At the same time, to prevent the local model from deviating from the global model after several local updates, a parameter correction strategy is proposed to ensure the accuracy of model training. Secondly, during gradient aggregation, the gradient tensor is split into several sub-tensors. By combining sub-tensor priority scheduling, communication and computation during gradient aggregation are maximally overlapped, further accelerating the model training process. Finally, on the CIFAR-100 and ImageNet-mini datasets, the proposed method is compared with SSGD, Local SGD training methods. Results show that the proposed method can significantly reduce communication overhead due to parameter updating on the basis of ensuring model training accuracy. It can maximize the overlap of communication and computing, and make full use of computing resources to improve the speed of parallel training. The results of this study can provide a new resolution to reduce communication costs in the distributed training process of deep neural network.

Key words: deep neural network data parallelism communication scheduling parameter updating computation and communication overlapping

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS