动态门控扩散去噪与跨层注意力的多模态图像融合网络

邸敬; 霍婧婧; 王鹤然; 刘冀钊; 廉敬

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	邸敬,霍婧婧,王鹤然,刘冀钊,廉敬.动态门控扩散去噪与跨层注意力的多模态图像融合网络[J].哈尔滨工业大学学报,2026,58(5):33.DOI:10.11918/202507016
	DI Jing,HUO Jingjing,WANG Heran,LIU Jizhao,LIAN Jing.Dynamic gating diffusion denoising and cross-layer attention-based multimodal image fusion network[J].Journal of Harbin Institute of Technology,2026,58(5):33.DOI:10.11918/202507016

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 846次下载 31次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
动态门控扩散去噪与跨层注意力的多模态图像融合网络
邸敬¹,霍婧婧¹,王鹤然¹,刘冀钊²,廉敬¹
(1.兰州交通大学电子与信息工程学院,兰州 730070;2.兰州大学信息科学与工程学院,兰州 730000)

摘要:

针对去噪扩散模型在图像融合任务中难以适应不同噪声水平、普通残差块对特征的筛选能力有限的问题,本文构建了一种动态门控扩散去噪与跨层注意力的多模态图像融合网络。首先,设计并引入4组专家卷积核至动态特征提取器模块,根据输入内容动态组合出最优卷积核,对输入特征实现自适应处理；其次,提出了一种改进的门控特征选择模块来生成门控信号抑制无关信息,提升模型在不同噪声水平下的扩散去噪能力,实现对特征的精准控制；最后,使用R-Transformer块进行特征调整,通过构建的全局局部空间注意力模块实现跨层特征融合,以生成纹理信息丰富、色彩保真度高的融合图像。在MSRS、RoadScene和Harvard三个数据集上的实验结果表明,与近年来图像融合领域中9种具有代表性的重要方法相比,本文方法的7种客观评价指标平均提升了5.11%~15.93%。本文方法在纹理细节保持及解剖结构完整性保留等方面均优于其他方法,符合人眼视觉特性,能够很好地处理各种光照环境场景和医学影像诊断场景下的多模态图像融合任务。

关键词: 多模态图像融合扩散模型门控特征选择模块跨层注意力融合模块专家卷积核

DOI：10.11918/202507016

分类号:TP391

文献标识码:A

基金项目:甘肃省自然科学基金(24JRRA231);国家自然科学基金(62061023);甘肃省科技计划重点研发计划(24YFFA024)

Dynamic gating diffusion denoising and cross-layer attention-based multimodal image fusion network

DI Jing¹,HUO Jingjing¹,WANG Heran¹,LIU Jizhao²,LIAN Jing¹

(1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; 2.School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China)

Abstract:

To address the challenges that denoising diffusion models struggle to adapt to varying noise levels and conventional residual blocks have limited feature selection capability in image fusion tasks, this paper constructs a multimodal image fusion network integrating dynamic gating diffusion denoising and cross-layer attention. Firstly, four groups of expert convolution kernels are designed and incorporated into the dynamic feature extractor module. The optimal convolution kernels are dynamically assembled based on input content, enabling adaptive processing of input features. Secondly, an improved gated feature selection module is proposed to generate gating signals that suppress irrelevant information, enhance the model’s diffusion denoising capability under different noise levels, and achieve precise feature control. Finally, R-Transformer blocks are adopted for feature adjustment. A global-local spatial attention module is constructed to realize cross-layer feature fusion, thereby generating fused images with rich texture information and high color fidelity. Experimental results on the MSRS, RoadScene, and Harvard datasets demonstrate that compared with 9 representative state-of-the-art methods in the field of image fusion in recent years, the proposed method achieves an average improvement of 5.11% to 15.93% across 7 objective evaluation metrics. The proposed method outperforms other counterparts in texture detail preservation and anatomical structure integrity maintenance, conforms to human visual perception characteristics, and can effectively handle multimodal image fusion tasks in scenarios such as various lighting environments and medical image diagnosis.

Key words: multimodal image fusion diffusion models gated feature selection module cross-layer attention fusion module expert convolutional kernels

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS