Abstract:In order to solve the limitations of existing infrared and visible images fusion algorithms in preserving pixel-level information and extracting semantic features, an infrared and visible image interactive fusion method based on semantic driven was proposed. First, the image fusion network and the image segmentation network were jointly operated to form a semantic-driven effect, enhancing the retention of information features of the image in both pixel domain and semantic domain. Then, a cross-domain interactive integration module was constructed to capture features of infrared and visible images, allowing for the interactive transfer of features across different spatial locations and independent channels, thereby mapping features from local to global, and enhancing the complementary characteristics of the two types of images. Finally, a semantic loss function was introduced to constrain the network training, preserving the intrinsic semantic features of the source images. Pixel-level fusion experiments and semantic-level segmentation experiments were conducted on multi-band data sets and multi-spectral road scene data sets. These experiment results were then compared with six other advanced fusion algorithms. The results of fusion experiments show that the proposed algorithm achieves improvements of 47.92%, 6.15%, 0.87%, 44.31%, 35.99% and 36.88% across six objective evaluation metrics, including gradient-based similarity measures, information entropy, peak signal-to-noise ratio, spatial frequency, standard deviation and visual fidelity. The results of segmentation experiments indicate that the proposed algorithm outperforms all other evaluation metrics. Therefore, the proposed method exhibits superior performance in both qualitative analysis of subjective visual effects and quantitative indicators of quality evaluation compared to existing algorithms. The fusion images effectively balance both visual quality and high-level semantic tasks, thereby enhancing utility for human visual observation and machine vision perception.