Abstract:Visual attention modeling is a key technique for predicting the distribution of human attention when people are observing scenes, which is widely used in the fields of computer vision. Traditional visual attention models focus on the human eyes fixation points to reflect the eye movement information by calculating saliency maps, while they cannot reflect the perceived semantic information of the brain. To solve this problem, a visual attention model was proposed based on extracting semantic features. First of all, the eye tracking database VOC2012-E was established to study and record the eye movement data of human while observing natural scenes. Then, inspired by image semantic segmentation, the Fully Convolutional Networks(FCN) was used to extract the semantic object features. In order to extract the semantic object features more effectively, the FCN8s network was improved by activation function PReLu and optimization function Adam to mimic the brain’s perception of semantic object features. Next, 28 low-level features such as direction, color, and intensity characteristics were extracted, which attract attention in the human subconscious layer. Finally, Support Vector Machine(SVM) was used to map the previously extracted semantic object features and the low-level features into the human visual space. The real eye movement data was introduced for supervised training, and a visual attention model was obtained which can predict the human visual saliency map. Experimental results showed that the visual attention model proposed in this paper had better performance and biological advantages over the other eight classical models and four advanced models on the VOC2012-E and MIT300 databases.