Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation

Human pose estimation requires a versatile yet well-constrained spatial model for grouping locally ambiguous parts together to pro- duce a globally consistent hypothesis. Previous works either use local deformable models deviating from a certain template, or use a global mixture representation in the pose space. In this paper, we propose a new hierarchical spatial model that can capture an exponential number of poses with a compact mixture representation on each part. Using la- tent nodes, it can represent high-order spatial relationship among parts with exact inference. Different from recent hierarchical models that asso- ciate each latent node to a mixture of appearance templates (like HoG), we use the hierarchical structure as a pure spatial prior avoiding the large and often confounding appearance space. We verify the effectiveness of this model in three ways. First, samples representing human-like poses can be drawn from our model, showing its ability to capture high-order dependencies of parts. Second, our model achieves accurate reconstruc- tion of unseen poses compared to a nearest neighbor pose representation. Finally, our model achieves state-of-art performance on three challenging datasets, and substantially outperforms recent hierarchical models.

Publications

"Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation"
Yuandong Tian, C. Lawrence Zitnick, and Srinivasa G. Narasimhan
Proc. of European Conference on Computer Vision Vision (ECCV),
Oct, 2012.
[PDF]

Code and Data

Trained models
Spotlight

Illustration

	The hierarchical model We have built a hierarichical model for human pose estimation which encodes high-order relationship among parts.
	The graphical model The Undirected graphical model associated with the hierarchical model. Each node consists of a position variable p_j that identifies where the part is, and a type variable z_j. The type variable characterizes how the part looks like and how its child parts are arranged spatially.
	Type Compatibility Using hierarchical model, it is possible to capture the compatibility between the parent and the child type, and thus only reasonable configurations are allowed to have high score, while appearance of parts are shared to reduce the number of parameters in the model.

	Samples from the hierarchical model From this model, it is possible to sample reasonable human poses. Compared to previous approaches, our samples are more natural-looking.

Results

	PCP performance on three benchmark datasets Our method achieves state-of-the-art performance on three benchmark datasets, the PARSE dataset, Leeds Sport dataset and UIUC people dataset in terms of PCP (the percentage of parts being correctly detected).
	Sample Pose Estimation Results Some sample pose estimation results on PARSE and Leeds Sports Datasets.