TY - JOUR
T1 - Weakly Supervised Salient Object Detection with Spatiotemporal Cascade Neural Networks
AU - Tang, Yi
AU - Zou, Wenbin
AU - Jin, Zhi
AU - Chen, Yuhuan
AU - Hua, Yang
AU - Li, Xia
PY - 2018/7/25
Y1 - 2018/7/25
N2 - Recently, deep learning techniques have substantially
boosted the performance of salient object detection in
still images. However, the salient object detection in videos by
using traditional handcrafted features or deep learning features
is not fully investigated, probably due to the lack of sufficient
manually labeled video data for saliency modeling, especially
for the data-driven deep learning. This paper proposes a novel
weakly supervised approach to salient object detection in a video,
which can learn a robust saliency prediction model by using very
limited manually labeled data and a large amount of weakly
labeled data that could be easily generated in a supervised
approach. Furthermore, we propose a spatiotemporal cascade
neural network (SCNN) architecture for saliency modeling, in
which two fully convolutional networks are cascaded to evaluate
visual saliency from both spatial and temporal cues to lead
the optimal video saliency prediction. The proposed approach
is extensively evaluated on the widely used challenging datasets,
and the experiments demonstrate that our proposed approach
substantially outperforms the state-of-the-art salient object detection
models.
AB - Recently, deep learning techniques have substantially
boosted the performance of salient object detection in
still images. However, the salient object detection in videos by
using traditional handcrafted features or deep learning features
is not fully investigated, probably due to the lack of sufficient
manually labeled video data for saliency modeling, especially
for the data-driven deep learning. This paper proposes a novel
weakly supervised approach to salient object detection in a video,
which can learn a robust saliency prediction model by using very
limited manually labeled data and a large amount of weakly
labeled data that could be easily generated in a supervised
approach. Furthermore, we propose a spatiotemporal cascade
neural network (SCNN) architecture for saliency modeling, in
which two fully convolutional networks are cascaded to evaluate
visual saliency from both spatial and temporal cues to lead
the optimal video saliency prediction. The proposed approach
is extensively evaluated on the widely used challenging datasets,
and the experiments demonstrate that our proposed approach
substantially outperforms the state-of-the-art salient object detection
models.
U2 - 10.1109/TCSVT.2018.2859773
DO - 10.1109/TCSVT.2018.2859773
M3 - Article
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -