Evaluating Surface Normal Predictions

Over the years, I've gotten a number of questions about evaluating surface normal results. This document is aimed at helping answer questions like ``are you doing X or Y?'' and especially ``why isn't my number in the same ballpark?''. The overall criterion introduced in [1] is as follows:

for each valid pixel, compute the angular error with respect
to the ground-truth and then aggregate this error over the test set.

Different definitions of each bolded point change results. Here, I clarify each and give a list of common issues I have seen.

Valid: This is crucial: while interpolated depth is perhaps useful for RGBD understanding, surface normal prediction can only be evaluated at locations measured by the Kinect. In NYU this is given in the rawDepths variable.

Angular error: this is the angle between two unit vectors, or acos(x'*y). In all of our papers and ones that I am aware of, this is what is used. Given two HxWx3 maps N and P, this is quickly computed in degrees as


Here, sum(N.*P,3) computes a map of the dot-product; and acosd converts that to degrees. The min/max is necessary for numeric reasons.

Ground-truth: Surface normals are estimated, not computed, so there is no one single ground-truth. My recent work has used the ground-truth from [2] since it is very high-quality.

Aggregate: We introduced 6 metrics in [1]. If E is the collection of all the errors over the dataset, you compute them via mean(E), median(E), mean(E.^2).^0.5, mean(E<t).

Common Issues: There are a number of subtleties that often cause issues the first time around:

Sample Code: As a demonstration, here is a sample evaluation code. You'll have to fill in some of the data loading, but it should give you an idea: evalSimple.m

-David Fouhey, January 2016

[1] Data-Driven 3D Primitives for Single Image Understanding. D. Fouhey, A. Gupta, M. Hebert. ICCV 2013.
[2] Discriminatively Trained Dense Surface Normal Estimation. Ľubor Ladický, Bernhard Zeisl, Marc Pollefeys. ECCV 2014