State-of-the-art approaches for articulated human pose estimation are rooted in parts-based graphical models. These models are often restricted to tree-structured representations and simple parametric potentials in order to enable tractable inference. However, these simple dependencies fail to capture all the interactions between body parts. While models with more complex interactions can be defined, learning the parameters of these models remains challenging with intractable or approximate inference. In this paper, instead of performing inference on a learned graphical model, we build upon the inference machine framework and present a method for articulated human pose estimation. Our approach incorporates rich spatial interactions among multiple parts and information across parts of different scales. Additionally, the modular framework of our approach enables both ease of implementation with- out specialized optimization solvers, and efficient inference. We analyze our approach on two challenging datasets with large pose variation and demonstrate state-of-the-art performance on these benchmarks.
* Please note that the evaluation in Table 1 of the main document was performed with the looser variant of the Percentage Correct Parts (PCP) metric. Please refer to the addendum for updated evaluations with the alternate stricter variant of PCP.
This material is based upon work supported by the National Science Foundation under Grants No. 1353120 and 1029679 and the NSF NRI Purposeful Prediction project.