动态门控扩散去噪与跨层注意力的多模态图像融合网络
CSTR:
作者:
作者单位:

(1.兰州交通大学 电子与信息工程学院,兰州 730070;2.兰州大学 信息科学与工程学院,兰州 730000)

作者简介:

邸敬(1979—),女,副教授,硕士生导师

通讯作者:

霍婧婧,Hbingcheng@126.com

中图分类号:

TP391

基金项目:

甘肃省自然科学基金(24JRRA231);国家自然科学基金(62061023);甘肃省科技计划重点研发计划(24YFFA024)


Dynamic gating diffusion denoising and cross-layer attention-based multimodal image fusion network
Author:
Affiliation:

(1.School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; 2.School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China)

Fund Project:

undefined

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对去噪扩散模型在图像融合任务中难以适应不同噪声水平、普通残差块对特征的筛选能力有限的问题,本文构建了一种动态门控扩散去噪与跨层注意力的多模态图像融合网络。首先,设计并引入4组专家卷积核至动态特征提取器模块,根据输入内容动态组合出最优卷积核,对输入特征实现自适应处理;其次,提出了一种改进的门控特征选择模块来生成门控信号抑制无关信息,提升模型在不同噪声水平下的扩散去噪能力,实现对特征的精准控制;最后,使用R-Transformer块进行特征调整,通过构建的全局局部空间注意力模块实现跨层特征融合,以生成纹理信息丰富、色彩保真度高的融合图像。在MSRS、RoadScene和Harvard三个数据集上的实验结果表明,与近年来图像融合领域中9种具有代表性的重要方法相比,本文方法的7种客观评价指标平均提升了5.11%~15.93%。本文方法在纹理细节保持及解剖结构完整性保留等方面均优于其他方法,符合人眼视觉特性,能够很好地处理各种光照环境场景和医学影像诊断场景下的多模态图像融合任务。

    Abstract:

    To address the challenges that denoising diffusion models struggle to adapt to varying noise levels and conventional residual blocks have limited feature selection capability in image fusion tasks, this paper constructs a multimodal image fusion network integrating dynamic gating diffusion denoising and cross-layer attention. Firstly, four groups of expert convolution kernels are designed and incorporated into the dynamic feature extractor module. The optimal convolution kernels are dynamically assembled based on input content, enabling adaptive processing of input features. Secondly, an improved gated feature selection module is proposed to generate gating signals that suppress irrelevant information, enhance the model’s diffusion denoising capability under different noise levels, and achieve precise feature control. Finally, R-Transformer blocks are adopted for feature adjustment. A global-local spatial attention module is constructed to realize cross-layer feature fusion, thereby generating fused images with rich texture information and high color fidelity. Experimental results on the MSRS, RoadScene, and Harvard datasets demonstrate that compared with 9 representative state-of-the-art methods in the field of image fusion in recent years, the proposed method achieves an average improvement of 5.11% to 15.93% across 7 objective evaluation metrics. The proposed method outperforms other counterparts in texture detail preservation and anatomical structure integrity maintenance, conforms to human visual perception characteristics, and can effectively handle multimodal image fusion tasks in scenarios such as various lighting environments and medical image diagnosis.

    参考文献
    相似文献
    引证文献
引用本文

邸敬,霍婧婧,王鹤然,刘冀钊,廉敬.动态门控扩散去噪与跨层注意力的多模态图像融合网络[J].哈尔滨工业大学学报,2026,58(5):33. DOI:10.11918/202507016

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-07-08
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-05-28
  • 出版日期:
文章二维码