This paper considers the separation and recognition of overlapped speech sentences assuming single-channel observation. A system based on a combination of several different techniques is proposed. The system uses a missing-feature approach for improving crosstalk/noise robustness, a Wiener filter for speech enhancement, hidden Markov models for speech reconstruction, and speaker-dependent/-independent modeling for speaker and speech recognition. We develop the system on the Speech Separation Challenge database, involving a task of separating and recognizing two mixing sentences without assuming advanced knowledge about the identity of the speakers nor about the signal-to-noise ratio. The paper is an extended version of a previous conference paper submitted for the challenge.
Ji, M., Hazen, T., & Glass, J. R. (2010). Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation. Computer Speech & Language, 24(1), 67-76. https://doi.org/10.1016/j.csl.2007.12.004