Robust Multimodal Person Identification With Limited Training Data

Niall McLaughlin, Ming Ji, Danny Crookes

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)
307 Downloads (Pure)


This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Original languageEnglish
Article number6461532
Pages (from-to)214 - 224
Number of pages11
JournalIEEE Transactions on Human Machine Systems
Issue number2
Early online date13 Feb 2013
Publication statusPublished - Mar 2013

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Human Factors and Ergonomics
  • Signal Processing
  • Artificial Intelligence
  • Control and Systems Engineering
  • Human-Computer Interaction
  • Computer Science Applications


Dive into the research topics of 'Robust Multimodal Person Identification With Limited Training Data'. Together they form a unique fingerprint.

Cite this