Conventional speech enhancement methods, based on frame, multi-frame or segment estimation, require knowledge about the noise. This paper presents a new method which aims to reduce or effectively remove this requirement. It is shown that, by using the Zero-mean Normalized Correlation Coefficient (ZNCC) as the comparison measure, and by extending the effective length of speech segment matching to sentencelong speech utterances, it is possible to obtain an accurate speech estimate from noise without requiring specific knowledge about the noise. The new method, thus, could be used to deal with unpredictable noise or noise without proper training data. This paper is focused on realizing and evaluating this potential. We propose a novel realization that integrates full-sentence speech correlation with clean speech recognition, formulated as a constrained maximization problem, to overcome the data sparsity problem. Then we propose an efficient implementation algorithm to solve this constrained maximization problem, to produce speech sentence estimates. For evaluation, we build the new system on one training data set and test it on two different test data sets across two databases, for a range of different noises including highly nonstationary ones. It is shown that the new approach, without any estimation of the noise, is able to significantly outperform conventional methods which use optimized noise tracking, in terms of various objective measures including automatic speech recognition.
|Number of pages||13|
|Journal||IEEE/ACM Transactions on Audio, Speech, and Language Processing|
|Early online date||11 Jan 2017|
|Publication status||Published - Mar 2017|