Abstract:To address the issues of sparse keypoint features in weakly textured indoor environments, insufficient utilization of structured features in structured scenes, and keyframe tracking failures during rapid camera movements, a stereo visual-inertial SLAM method based on the fusion of point and line features is proposed. First, the EDlines line segment extraction method, combined with a Gaussian image pyramid, is employed to achieve multi-scale line segment extraction, enhancing the scale invariance of line segment matching. Simultaneously, the uncertainty of line segment endpoints at different scales is modeled, and binary descriptors of line segments are partitioned using tiling technology to accelerate line segment matching, thereby improving the robustness and efficiency of line feature matching. Second, the pre-integration model of the inertial sensor is optimized, and a sliding window nonlinear optimization is performed by fusing the point feature reprojection error from stereo vision, the line feature reprojection error, and the pre-integration constraints of the inertial sensor, thereby improving the system’s pose estimation accuracy. Finally, extensive experiments are conducted on the EuRoC dataset which includes complex environments such as low-texture, structured scenes, and rapid camera movements. The experimental results demonstrate that the proposed method achieves a root mean square error of 0.031 m and an average error of 0.027 m on the EuRoC dataset, exhibiting stronger robustness and higher localization accuracy, especially in low-texture and rapid camera movement scenarios where the accuracy advantage is particularly significant.