Point-to-set distance metric learning on deep representations for visual tracking

Shengping Zhang, Yuankai Qi, Feng Jiang, Xiangyuan Lan, Pong C. Yuen, Huiyu Zhou

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target's dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.
Original languageEnglish
Pages (from-to)187 - 198
JournalIEEE Transactions on Intelligent Transportation Systems
Volume19
Issue number1
Early online date20 Nov 2017
DOIs
Publication statusPublished - Jan 2018

Fingerprint

Railroad cars
Support vector machines
Railroad tracks
Collision avoidance
Classifiers
Neural networks

Cite this

Zhang, Shengping ; Qi, Yuankai ; Jiang, Feng ; Lan, Xiangyuan ; Yuen, Pong C. ; Zhou, Huiyu. / Point-to-set distance metric learning on deep representations for visual tracking. In: IEEE Transactions on Intelligent Transportation Systems . 2018 ; Vol. 19, No. 1. pp. 187 - 198.
@article{f2dfdeab829f46b79011848a88ac1b4c,
title = "Point-to-set distance metric learning on deep representations for visual tracking",
abstract = "For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target's dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.",
author = "Shengping Zhang and Yuankai Qi and Feng Jiang and Xiangyuan Lan and Yuen, {Pong C.} and Huiyu Zhou",
year = "2018",
month = "1",
doi = "10.1109/TITS.2017.2766093",
language = "English",
volume = "19",
pages = "187 -- 198",
journal = "IEEE Transactions on Intelligent Transportation Systems",
issn = "1524-9050",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

Point-to-set distance metric learning on deep representations for visual tracking. / Zhang, Shengping; Qi, Yuankai ; Jiang, Feng; Lan, Xiangyuan; Yuen, Pong C. ; Zhou, Huiyu.

In: IEEE Transactions on Intelligent Transportation Systems , Vol. 19, No. 1, 01.2018, p. 187 - 198.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Point-to-set distance metric learning on deep representations for visual tracking

AU - Zhang, Shengping

AU - Qi, Yuankai

AU - Jiang, Feng

AU - Lan, Xiangyuan

AU - Yuen, Pong C.

AU - Zhou, Huiyu

PY - 2018/1

Y1 - 2018/1

N2 - For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target's dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.

AB - For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target's dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.

U2 - 10.1109/TITS.2017.2766093

DO - 10.1109/TITS.2017.2766093

M3 - Article

VL - 19

SP - 187

EP - 198

JO - IEEE Transactions on Intelligent Transportation Systems

JF - IEEE Transactions on Intelligent Transportation Systems

SN - 1524-9050

IS - 1

ER -