Headshot: An App That Helps People Take Pictures of Themselves

Julia Schwarz (grad)
andrew id: julenka

For a quick overview and demo, watch the video below:

Update: I have released my Windows Phone Face Detection code as open source at http://facedetectwp7.codeplex.com/

Link to final paper is here

Project Description

Cameras are excellent at recording subjects when the subject is not the photographer him/herself. Self portraits are still difficult to take, especially in impromptu situations and mobile devices when people do not have access to shutter triggers and tripods. Front-facing cameras on mobile phones begin to solve this problem, but these have much lower (VGA-quality) resolution than rear-facing cameras, and are unlikely to improve any time soon. This project aims to ameliorate the challenges of self-photography by building a mobile phone application that uses face detection to tell users when their heads are centered (or in another pre-specified position) as they're taking a photo.

Interface

The interface for this app is designed to be minimal as the user's goal is to take a photo. When opened, instructions are displayed. The user then positions the target where they want their head to be, and as they take a photo of themselves the app detects where their face is in the photo and gives feedback to guide the user so their head is in the correct position. I am exploring three different techniques for giving feedback:

Tell the user how to move the camera using words (i.e. "tilt left, tilt up")
Tell the user how far their face is from desired location using pitch
Tell user how far their face is from desired location using vibration

Each method has advantages and disadvantages. An additional problem is to communicate when no face is detected.

Screenshot of interface (no camera data displayed as this is not on actual device

A mockup of the headshot interface explaining the interaction (click for larger photo):

Approach

I have built this app as a Windows Phone application. To do so, I have done the following:

Allow user to specify where their head should be in photo
Capture the camera stream from phone
For each frame, perform face detection and get list of face locations
Give feedback to user to help them correctly position their camera

Data Used

Building an accurate face detector using the viola-jones algorithm (see below) requires large amounts of training data. Face data is available at the CMU face database. Fortunately, pre-trained models from the OpenCV library exist so training is not required as long as you can read the XML files (which is not easy!).

Algorithms Used

Performing Face Detection

The primary challenge with headshot was implementing the face detection algorithm. I tried 2 different algorithms:

Color-Matching FaceLight is an existing managed (C#) library that does face detection by finding the center points of skin colored regions. While easy to implement, this approach works quite poorly (it looks for skin color, not faces, and is very sensitive to lighting). The approach is well described at http://facelight.codeplex.com/.
Viola-Jones algorithm This algorithm is the most popular alogirithm because of its speed and accuracy. There is a great Wikipedia article describing the algorithm. The algorithm is designed to work for any objects but tends to be used for faces.

There are existing libraries that use this algorithms to do face detection unfortunately, no publicly available algorithms work for the Windows Phone. This is because the primary library, OpenCV is written in C++ and Windows Phone code has to be entirely managed (written in C#). There is an internal library available from Microsoft, but I would never be able to release the code for this and so I don't want to use it. So, I have written a C# port of the OpenCV face detection algorithm that takes the XML model files from OpenCV and builds a face detector from them. I have been able to perform face detection using my library, and my face detection code is publicly available at http://facedetectwp7.codeplex.com/.

Pre-Proccesing Images

I implemented histogram equalization to maximize contrast to maximize detection rate. In practice I found that histogram equalization didn't matter.

Evaluation

I evaluated the efficacy of both my face detection algorithms and the Headshot application itself, though for reserach purposes the more interesting contribution is the latter

Performance of Algorithms

I compared the three different approaches I used for face detection using 180 images from the CMU face database. Results are below. As expected, color-based detection failed because images were in black and white. The proprietary algorithm was faster than my algorithm but a little bit less accuracy (90% vs. 98%). This could be because the model I used for my algorithm could have been trained on the images used for the CMU face database, as this database was publicly available. In practice I found my detection algorithm to perform roughly as well as the proprietary algorithm, though it was slower.

	Light-based	Proprietary (internal Microsoft SDK)	My Face Detection Library
Time to finish detection for an image (ms)	10 ms	250 ms	1744 ms
Percentage of faces correclty detected	0%	90%	98%

User Study (aka "Does Headshot Actually Work?")

To evaluate whether headshot actually helps people take better photos, I ran an informal 10-person user study, with 4 conditions: using headshot/not using headshot, and whether the target was in the center of the picture vs not.

Study results

Target in center (distance in pixels, audio feedback condition left hand side)	Target on left side of screen (distance in pixels, audio feedback condition left hand side)

This preliminary study indicates that headshot helps users position their head more accurately when the target is not in the center of the screen. This is likely because people are already good at taking pictures of themselves when their head is in the middle of the shot.

Final Report

I have accomplished all the tasks I set out to do for this project. The tasks completed are below

Tasks Completed

Implemented light-based face detection.
Implemented Viola-Jones face detection using two different methods: 1) existing (but proprietary) library, 2) my own C# port of OpenCV face detection.
Tuned both Viola-Jones face detection algorithms to minimize latency.
Implemented functional version of the application with word feedback to guide user
Implement basic image processing (i.e. histogram equalization) to improve face detection
Implement 1 other feedback techniques (use pitch to communicate distance). I found this to be completely useless without directional feelback
Tune my C# OpenCV detector so that it performs better (right now it misses faces sometimes)
Improve performance of OpenCV detector (it is at 100ms to detect, would like to drop it to 50 ms)
Evaluate performance of OpenCV detector compared to proprietary and light-based detector
Run small user study validating effectiveness of headstho
Polish the application
Make C# OpenCV detection publicly available

Biggest Challenge

The biggest challenge for me was implementing a robust, fast face detection algorithm as I didn't know how Viola Jones worked, or how to understand the model that was in the XML OpenCV file. Only after long hours of thinking about the problem did I find the Java-based solution which helped tremendously. Even then, tuning that detector so that it worked quickly enough to be usable was quite a challenge!

Future Work

I am interested in the idea of helping photographers take pictures of themselves. I hope to do another project in this same vein of work called RemoteShot, an application that makes taking group photos easier by using a proxy device (a mobile phone) to show users what the main camera sees, and allows users to trigger the camera shutter using the proxy device.