基于语义驱动的红外与可见光图像交互融合
CSTR:
作者:
作者单位:

(1.新疆大学 电气工程学院,乌鲁木齐 830017;2.新疆大学 智能科学与技术学院,乌鲁木齐 830017)

作者简介:

王瑾春(2000—),女,硕士研究生;马萍(1994—),女,副教授,博士生导师

通讯作者:

马萍,maping@xju.edu.cn

中图分类号:

TN911.73

基金项目:

新疆维吾尔自治区自然科学基金(2022D01C7,3D01C187);“天山英才”培养计划(2023TSYCQNTJ0,3TSYCCX0037)


Semantic-driven interactive fusion of infrared and visible images
Author:
Affiliation:

(1.School of Electrical Engineering, Xinjiang University, Urumqi 830017, China; 2.School of Intelligence Science and Technology, Xinjiang University, Urumqi 830017, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决现有的红外与可见光图像融合算法存在像素信息保留和语义特征提取不足的问题,提出一种基于语义驱动的红外与可见光图像交互融合算法。首先,通过联合操作图像融合网络和图像分割网络,形成语义驱动效果,更好地保留图像在像素域和语义域的信息特征;然后,构建跨域交互整合模块,捕捉红外与可见光图像特征,允许特征在不同空间和独立通道之间交互传递,实现特征从局部到全局的映射,增强两类图像的互补特性;最后,引入语义损失函数约束网络训练以保留源图像的内在语义特征。在多波段图像数据集和多光谱道路场景数据集上进行图像融合和分割实验,并与其他6种先进的融合算法进行比较。融合实验结果表明,本文算法在基于梯度的相似性度量、信息熵、峰值信噪比、空间频率、标准差、视觉保真度6个客观评价指标上分别平均提高了47.92%、6.15%、0.87%、44.31%、35.99%、36.88%;分割实验结果表明,本文算法在所有评价指标中,结果均为最优。所提算法在主观视觉效果的定性分析与客观质量评价的定量指标方面整体效果优于现有融合算法,融合图像可以兼顾视觉质量和高级语义任务,能更好地服务于人类视觉观察和机器视觉感知。

    Abstract:

    In order to solve the limitations of existing infrared and visible images fusion algorithms in preserving pixel-level information and extracting semantic features, an infrared and visible image interactive fusion method based on semantic driven was proposed. First, the image fusion network and the image segmentation network were jointly operated to form a semantic-driven effect, enhancing the retention of information features of the image in both pixel domain and semantic domain. Then, a cross-domain interactive integration module was constructed to capture features of infrared and visible images, allowing for the interactive transfer of features across different spatial locations and independent channels, thereby mapping features from local to global, and enhancing the complementary characteristics of the two types of images. Finally, a semantic loss function was introduced to constrain the network training, preserving the intrinsic semantic features of the source images. Pixel-level fusion experiments and semantic-level segmentation experiments were conducted on multi-band data sets and multi-spectral road scene data sets. These experiment results were then compared with six other advanced fusion algorithms. The results of fusion experiments show that the proposed algorithm achieves improvements of 47.92%, 6.15%, 0.87%, 44.31%, 35.99% and 36.88% across six objective evaluation metrics, including gradient-based similarity measures, information entropy, peak signal-to-noise ratio, spatial frequency, standard deviation and visual fidelity. The results of segmentation experiments indicate that the proposed algorithm outperforms all other evaluation metrics. Therefore, the proposed method exhibits superior performance in both qualitative analysis of subjective visual effects and quantitative indicators of quality evaluation compared to existing algorithms. The fusion images effectively balance both visual quality and high-level semantic tasks, thereby enhancing utility for human visual observation and machine vision perception.

    参考文献
    相似文献
    引证文献
引用本文

王瑾春,马萍,张宏立,王聪,苑茹.基于语义驱动的红外与可见光图像交互融合[J].哈尔滨工业大学学报,2025,57(9):56. DOI:10.11918/202406056

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-06-24
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-09-15
  • 出版日期: 2025-09-10
文章二维码