Abstract
We describe the theory and implementation of full-sentence speech correlation for speech recognition, and demonstrate its superior robustness to unseen/untrained noise. For the Aurora 2 data, trained with only clean speech, the new method performs competitively against the state-of-the-art with multicondition training and adaptation, and achieves the lowest word error rate in very low SNR (-5 dB). Further experiments with highly nonstationary noise (pop song, broadcast news, etc.) show the surprising ability of the new method to handle unpredictable noise.The new method adds several novel developments to our previous research, including the modeling of the speaker characteristics along with other acoustic and semantic features of speech for separating speech from noise, and a novel Viterbi algorithm to implement full-sentence correlation for speech recognition.Index Terms: speech recognition, noise robustness, full sentence correlation, unseen/unpredictable noise
Original language | English |
---|---|
Title of host publication | Interspeech 2019: Proceedings |
Pages | 436-440 |
Number of pages | 5 |
Publication status | Published - Sep 2019 |