To intelligently analyze and understand video content, a key step is to accurately perceive the motion of the interested objects in videos. To this end, the task of object tracking, which aims to determine the position and status of the interested object in consecutive video frames, is very important, and has received great research interest in the last decade. Although numerous algorithms have been proposed for object tracking in RGB videos, most of them may fail to track the object when the information from the RGB video is not reliable (e.g. in dim environment or large illumination change). To address this issue, with the popularity of dual-camera systems for capturing RGB and infrared videos, this paper presents a feature representation and fusion model to combine the feature representation of the object in RGB and infrared modalities for object tracking. Specifically, this proposed model is able to (1) perform feature representation of objects in different modalities by employing the robustness of sparse representation, and (2) combine the representation by exploiting the modality correlation. Extensive experiments demonstrate the effectiveness of the proposed method.
Bibliographical noteFunding Information:
This work is partially supported by Hong Kong RGC General Research Fund HKBU 12254316. The work of H. Zhou was supported in part by UK EPSRC under Grant EP/N508664/1, Grant EP/R007187/1, and Grant EP/N011074/1 and in part by the Royal Society-Newton Advanced Fellowship under Grant NA160342.
© 2018 Elsevier B.V.
Copyright 2020 Elsevier B.V., All rights reserved.
ASJC Scopus subject areas
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence