研究目的
To compare the influence of adding 'top-down' and 'bottom-up' saliency maps in a CNN framework for classifying 67 specific architectural structures of Mexican culture, and to propose a bootstrap strategy for building visual saliency maps based on gaze fixations.
研究成果
The COSAL model, based on gaze fixations and co-saliency propagation, achieved the highest testing accuracy of 88.80±0.40%, outperforming GBVS and SMUIC models. This demonstrates that top-down saliency maps built from human visual attention are more effective for CNN-based classification of architectural images than purely automatic methods. The bootstrap strategy allows efficient scaling to large datasets with minimal manual annotation.
研究不足
The homography estimation for saliency propagation may not always be accurate due to perspective changes and matching issues, potentially leading to incorrect saliency maps. The psycho-visual experiment is limited to a small subset of images and a specific group of subjects, which may not generalize. The saliency pooling layer introduces non-deterministic behavior, requiring multiple tests for reliable results.
1:Experimental Design and Method Selection:
The study compares three types of saliency maps (GBVS, SMUIC, COSAL) integrated into a Deep CNN (AlexNet architecture) for image classification. The methodology includes saliency map generation, propagation, and insertion into pooling layers to enhance training efficiency and accuracy.
2:Sample Selection and Data Sources:
The dataset consists of 5,327 images from the Mex-Culture Database, categorized into 67 classes of Mexican architectural structures. A subset of 284 images was used for psycho-visual experiments to record gaze fixations.
3:List of Experimental Equipment and Materials:
Eye-tracker system for gaze fixation recording, computer systems for image processing and CNN training, software for saliency map generation (e.g., GBVS, LSD algorithm for SMUIC), and the AlexNet model.
4:Experimental Procedures and Operational Workflow:
Conduct psycho-visual experiments to record gaze fixations on reference images; generate subjective saliency maps; propagate saliency to other images using SIFT keypoints and homography estimation with RANSAC; train AlexNet with saliency maps in pooling layers; perform data augmentation (rotations and flips) on training set; evaluate models on validation and test sets.
5:Data Analysis Methods:
Accuracy and standard deviation are computed from 1,000 tests of the best models for each saliency map type to account for randomness in saliency pooling.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容