Social Scene Understanding
from Social Cameras

Mechanical Engineering
Carnegie Mellon University

(a) In a social scene, such as a wedding reception, people interact with others by sending visible social signals, such as gaze direction, facial expressions, or body gestures. These social signals are a strong cue understanding social scenes. In this thesis, we present a computational framework for social cognition—the ability to understand social scenes by perceiving, modeling, and predicting social signals. (b) Cameras are socially immersed in the form of smartphones, camcorders, or wearable cameras. These social cameras are ideal sensors to capture social scenes as they encode the gaze behaviors of the camera holders or wearers.

Summary

Humans interact with one another by sending visible social signals, such as facial expressions, body gestures, and gaze orientation. Social cognition—the ability to perceive, model, and predict such social signals—enables people to understand social interactions and to plan their behavior in accordance with this understanding. In this thesis, we establish a computational foundation towards social cognition from social cameras (i.e., cameras, such as smartphones, camcorders, or wearable cameras, held or worn by members of a social group that encode their gaze behavior).

We present a computational framework to perceive social signals, to model the relationship between them, and to predict social behaviors. In first part, we present a 3D reconstruction framework of three types of social signals: gaze movement, body motion, and general scene motion. This representation provides a computational basis to analyze social scenes. In the second part, we develop a relational model of attentive behaviors from the reconstructed social signals. The occurrence of 3D points of joint attention is used to define the relationship. This relational model enables us to predict gaze behavior at any location and time. We apply our framework in real-world social scenes including sporting events, meetings, social games, and parties.

Committee Members

Yaser Sheikh (chair)
Jessica Hodgins
Levent Burak Kara
Kenji Shimada
Christoph Bregler (external member)

Document

Hyun Soo Park, "Social Scene Understanding from Social Cameras", Carnegie Mellon University, 2014

Presentation (April 28, 2014)

PDF (9.7Mb), Video embedded PPTX (1.8Gb)

Acknowledgments

I owe a great intellectual debt to Prof. Yaser Sheikh who has given me tremendous guidance. He has shown the ideal role model as a mentor; he has encouraged me with endless patience that allowed me to focus on my research and to dream of pursuing my academic career. I am honored to have an opportunity to work with him and proud of our work. Also I appreciate my committee members including Prof. Jessica Hodgins, Prof. Kenji Shimada, Prof. Levent Burak Kara, and Prof. Christoph Bregler for their valuable comments and advice.

I am fortunate to collaborate with many researchers from Disney Research Pittsburgh, Intel, and Microsoft Research. In particular, I would like to acknowledge Dr. Shiratori who has played a key role to build my insight on structure from motion that constitutes the basis of my thesis. Also I thank all our group members for their advice and help, including Natasha Kholgade, Tomas Simon, Varun Ramakrishna, Yair Movshovitz-Attias, Minh Vo, Zijun Wei, and Hanbyul Joo. I am indebted to the KDisTech members including Sungwook Yang, Junsung Kim, Jongho Lee, and Hyunggi Cho, who have broadened my research horizon.

Finally, I would like to express my great appreciation to my family, in particular, my beloved wife, Soo Jin Kang for her tireless support and sacrifices.

References

- Theory
[1] H. S. Park, T. Shiratori, I. Matthews, and Y. Sheikh, "3D Reconstruction of a Moving Point from a Series of 2D Projections", ECCV 2010.
[2] H. S. Park and Y. Sheikh, "3D Reconstruction of a Smooth Articulated Trajectory from a Monocular Image Sequence", ICCV 2011.
[3] H. S. Park, E. Jain, and Y. Sheikh, "3D Social Saliency from Head-mounted Cameras", NIPS 2012.
[4] H. S. Park, E. Jain, and Y. Sheikh, "Predicting Primary Gaze Behavior using Social Saliency Fields", ICCV 2013.
- Applications
[5] T. Shiratori, H. S. Park, L. Sigal, Y. Sheikh, and J. Hodgins, "Motion Capture from Body-Mounted Cameras", SIGGRAPH 2011.
[6] H. Joo, H. S. Park, and Y. Sheikh , "MAP Visibility Estimation for Large-Scale Dynamic 3D Reconstruction", CVPR 2014.
[7] *I. Arev, *H. S. Park, Y. Sheikh, J. Hodgins, and A. Shamir, "Automatic Editing of Footage from Multiple Social Cameras", SIGGRAPH 2014.
[8] *H. S. Park, *Y. Wang, E. Nurvitadhi, J. C. Hoe, Y. Sheikh, and M. Chen , "3D Point Cloud Reduction using Mixed-integer Quadratic Programming", CVPRw 2013.