3D Human Pose Estimation using Iterative Conditional Squeeze and Excitation Networks

Research output: Contribution to journalArticlepeer-review

456 Downloads (Pure)


We propose a new method for single-camera realworld 3D human pose estimation. Our method uses multi-task training together with iterative pose refinement using a novel conditional attention mechanism. For iterative pose refinement, the output of each convolutional layer is conditioned on the latest pose estimate, using a Conditioned Squeeze-and-Excitation network architecture that incorporates novel feedback connections. Multi-task training on both an in-the-wild 2D pose dataset and a controlled 3D pose dataset allows for real-world 3D pose estimation without the need for a large-scale in-the-wild 3D pose dataset, which is unavailable. Experiments are performed on several real-world datasets, as well as the Human 3.6 Million and HumanEva-I datasets, to show that the combined attention mechanism, iterative refinement scheme and multi-task training allow us to achieve robust and competitive performance with only a simple network architecture. In addition, we show that our method is efficient enough to run on commodity hardware, producing pose estimates in real-time.
Original languageEnglish
Pages (from-to)1-13
JournalIEEE Transactions on Cybernetics
Early online date05 Feb 2020
Publication statusEarly online date - 05 Feb 2020


Dive into the research topics of '3D Human Pose Estimation using Iterative Conditional Squeeze and Excitation Networks'. Together they form a unique fingerprint.

Cite this