研究目的
To address the issue of average pooling ignoring the significances of different video clips in temporal action proposal methods by introducing a Temporal Attention Network (TAN) model.
研究成果
The proposed TAN model outperforms existing state-of-the-art methods in temporal action proposal, demonstrating the effectiveness of temporal attention in feature aggregation. The learned attention weights provide meaningful insights into the significance of different video clips.
研究不足
The method's efficiency and effectiveness are demonstrated on a specific dataset (THUMOS-14), and its generalizability to other datasets or real-world scenarios is not explored.
1:Experimental Design and Method Selection:
The TAN model is designed with two cascaded attention blocks for video clip feature aggregation.
2:Sample Selection and Data Sources:
The THUMOS-14 dataset is used, containing 200 videos in the validation set and 213 videos in the test set from 20 action classes.
3:List of Experimental Equipment and Materials:
Nvidia Geforce 1080 GPU is used for training and evaluation.
4:Experimental Procedures and Operational Workflow:
Videos are divided into clips, features are extracted, and segments are generated for the pyramid. Temporal attention is applied to weigh and combine features.
5:Data Analysis Methods:
The performance is evaluated using AR-AN curves and mean Average Precision (mAP) for action localization.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容