TY - JOUR
T1 - 3D Human Pose Estimation using Iterative Conditional Squeeze and Excitation Networks
AU - McLaughlin, Niall
AU - Martinez-del-Rincon, Jesus
AU - Miller, Paul
PY - 2020/2/5
Y1 - 2020/2/5
N2 - We propose a new method for single-camera realworld 3D human pose estimation. Our method uses multi-task
training together with iterative pose refinement using a novel
conditional attention mechanism. For iterative pose refinement,
the output of each convolutional layer is conditioned on the
latest pose estimate, using a Conditioned Squeeze-and-Excitation
network architecture that incorporates novel feedback connections. Multi-task training on both an in-the-wild 2D pose dataset and a controlled 3D pose dataset allows for real-world 3D
pose estimation without the need for a large-scale in-the-wild 3D
pose dataset, which is unavailable. Experiments are performed
on several real-world datasets, as well as the Human 3.6 Million
and HumanEva-I datasets, to show that the combined attention
mechanism, iterative refinement scheme and multi-task training
allow us to achieve robust and competitive performance with
only a simple network architecture. In addition, we show that
our method is efficient enough to run on commodity hardware,
producing pose estimates in real-time.
AB - We propose a new method for single-camera realworld 3D human pose estimation. Our method uses multi-task
training together with iterative pose refinement using a novel
conditional attention mechanism. For iterative pose refinement,
the output of each convolutional layer is conditioned on the
latest pose estimate, using a Conditioned Squeeze-and-Excitation
network architecture that incorporates novel feedback connections. Multi-task training on both an in-the-wild 2D pose dataset and a controlled 3D pose dataset allows for real-world 3D
pose estimation without the need for a large-scale in-the-wild 3D
pose dataset, which is unavailable. Experiments are performed
on several real-world datasets, as well as the Human 3.6 Million
and HumanEva-I datasets, to show that the combined attention
mechanism, iterative refinement scheme and multi-task training
allow us to achieve robust and competitive performance with
only a simple network architecture. In addition, we show that
our method is efficient enough to run on commodity hardware,
producing pose estimates in real-time.
U2 - 10.1109/TCYB.2020.2964992
DO - 10.1109/TCYB.2020.2964992
M3 - Article
SP - 1
EP - 13
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
SN - 2168-2267
ER -