Robust bimodal person identification using face and speech with limited training data and corruption of both modalities

N. McLaughlin, Ming Ji, D. Crookes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)
99 Downloads (Pure)

Abstract

This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages585-588
Number of pages4
Publication statusPublished - 01 Jan 2011

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Robust bimodal person identification using face and speech with limited training data and corruption of both modalities'. Together they form a unique fingerprint.

Cite this