CLOSE—A Data-Driven Approach to Speech Separation

Ming Ji, Ramji Srinivasan, Danny Crookes, Ayeh Jafari

Research output: Contribution to journalArticlepeer-review

20 Citations (Scopus)

Abstract

This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
Original languageEnglish
Article number6473839
Pages (from-to)1355-1368
Number of pages14
JournalIEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Volume21
Issue number7
DOIs
Publication statusPublished - Jul 2013

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Fingerprint Dive into the research topics of 'CLOSE—A Data-Driven Approach to Speech Separation'. Together they form a unique fingerprint.

Cite this