The recent ubiquity of high-framerate (120 fps and higher) handheld cameras creates the opportunity to study human grasping at a greater level of detail than normal speed cameras allow. We first collected 91 slow-motion interactions with objects in a convenience store setting. We then annotated the actions through the lenses of various existing manipulation taxonomies.
We found manipulation, particularly the process of forming a grasp, is complicated and proceeds quickly. Our dataset shows that there are many ways that people deal with clutter in order to form a strong grasp of an object. It also reveals several errors and how people recover from them. Though annotating motions in detail is time-consuming, the annotation systems we used nevertheless leave out important aspects of understanding manipulation actions, such as how the environment is functioning as a "finger" of sorts, how different parts of the hand can be involved in different grasping tasks, and high-level intent.