Person re-identification involves recognizing a person across non-overlapping camera views, with different pose, illumination, and camera characteristics. We propose to tackle this problem by training a deep convolutional network to represent a person’s appearance as a low-dimensional feature vector that is invariant to common appearance variations encountered in the re-identification problem. Specifically, a Siamese-network architecture is used to train a feature extraction network using pairs of similar and dissimilar images. We show that use of a novel multi-task learning objective is crucial for regularizing the network parameters in order to prevent over-fitting due to the small size the training dataset. We complement the verification task, which is at the heart of re-identification, by training the network to jointly perform verification, identification, and to recognise attributes related to the clothing and pose of the person in each image. Additionally, we show that our proposed approach performs well even in the challenging cross-dataset scenario, which may better reflect real-world expected performance.
|Number of pages
|IEEE Transactions on Circuits and Systems for Video Technology
|Early online date
|20 Oct 2016
|Published - Mar 2017