Abstract
In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware
Original language | English |
---|---|
Pages (from-to) | 2094-2107 |
Number of pages | 14 |
Journal | IEEE Transactions on Multimedia |
Volume | 17 |
Issue number | 11 |
Early online date | 28 Sep 2015 |
DOIs | |
Publication status | Published - Nov 2015 |
Bibliographical note
"This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant number EP/K014277/1, the MOD University Defence Research Collaboration in Signal Processing."Keywords
- Convolutional neural networks (CNNs), deep learning, gaze direction, head-pose, RGB-D