TY - GEN
T1 - Self-supervised multi-object tracking with cycle-consistency
AU - Yin, Yuanhang
AU - Hua, Yang
AU - Song, Tao
AU - Ma, Ruhui
AU - Guan, Haibing
PY - 2023/3/31
Y1 - 2023/3/31
N2 - Multi-object tracking is a challenging video task that requires both locating the objects in the frames and associating the objects among the frames, which usually utilizes the tracking-by-detection paradigm. Supervised multi-object tracking methods have made stunning progress recently, however, the expensive annotation costs for bounding boxes and track ID labels limit the robustness and generalization ability of these models. In this paper, we learn a novel multi-object tracker using only unlabeled videos by designing a self-supervisory learning signal for an association model. Specifically, inspired by the cycle-consistency used in video correspondence learning, we propose to track the objects forwards and backwards, i.e., each detection in the first frame is supposed to be matched with itself after the forward-backward tracking. We utilize this cycle-consistency as the self-supervisory learning signal for our proposed multi-object tracker. Experiments conducted on the MOT17 dataset show that our model is effective in extracting discriminative association features, and our tracker achieves competitive performance compared to other trackers using the same pre-generated detections, including UNS20 [1], Tracktor++ [2], FAMNet [8], and CenterTrack [31].
AB - Multi-object tracking is a challenging video task that requires both locating the objects in the frames and associating the objects among the frames, which usually utilizes the tracking-by-detection paradigm. Supervised multi-object tracking methods have made stunning progress recently, however, the expensive annotation costs for bounding boxes and track ID labels limit the robustness and generalization ability of these models. In this paper, we learn a novel multi-object tracker using only unlabeled videos by designing a self-supervisory learning signal for an association model. Specifically, inspired by the cycle-consistency used in video correspondence learning, we propose to track the objects forwards and backwards, i.e., each detection in the first frame is supposed to be matched with itself after the forward-backward tracking. We utilize this cycle-consistency as the self-supervisory learning signal for our proposed multi-object tracker. Experiments conducted on the MOT17 dataset show that our model is effective in extracting discriminative association features, and our tracker achieves competitive performance compared to other trackers using the same pre-generated detections, including UNS20 [1], Tracktor++ [2], FAMNet [8], and CenterTrack [31].
KW - Cycle-consistency
KW - Multi-object Tracking
KW - Self-supervised learning
U2 - 10.1007/978-3-031-27818-1_40
DO - 10.1007/978-3-031-27818-1_40
M3 - Conference contribution
AN - SCOPUS:85152558842
SN - 9783031278174
VL - 13834
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 483
EP - 495
BT - MultiMedia Modeling - 29th International Conference, MMM 2023, Proceedings
A2 - Dang-Nguyen, Duc-Tien
A2 - Gurrin, Cathal
A2 - Smeaton, Alan F.
A2 - Larson, Martha
A2 - Rudinac, Stevan
A2 - Dao, Minh-Son
A2 - Trattner, Christoph
A2 - Chen, Phoebe
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on MultiMedia Modeling, MMM 2023
Y2 - 9 January 2023 through 12 January 2023
ER -