Abstract
Parsing complex acoustic scenes involves an intricate interplay between bottom-up, stimulus-driven salient elements in the scene with top-down, goal-directed, mechanisms that shift our attention to particular parts of the scene. Here, we present a framework for exploring the interaction between these two processes in a simulated cocktail party setting. The model shows improved digit recognition in a multi-talker environment with a goal of tracking the source uttering the highest value. This work highlights the relevance of both data-driven and goal-driven processes in tackling real multi-talker, multi-source sound analysis.
Original language | English |
---|---|
Title of host publication | 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings |
Pages | 145-148 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 23 Oct 2012 |
Externally published | Yes |
Event | IEEE International Conference on Acoustics, Speech, and Signal Processing 2012 - Kyoto, Japan Duration: 25 Mar 2012 → 30 Mar 2012 https://doi.org/10.1109/ICASSP15465.2012 |
Conference
Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing 2012 |
---|---|
Abbreviated title | ICASSP'2012 |
Country/Territory | Japan |
City | Kyoto |
Period | 25/03/2012 → 30/03/2012 |
Internet address |
Keywords
- Attention
- Auditory Scene Analysis
- Cognition
- Digit Recognition
- Saliency
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering