I perform Normalized Cross Correlation (NCC) in the spatial domain over a limited range of shifts. My NCC code is implemented in a .mex function for
maximum performance, and implements standard spatial correlation using integral images (summed area tables) for computing the normalization coefficient as suggested in
Fast Normalized Cross-Correlation.
Creating mex files is relatively straightforward -- the only potential
pitfall is to remember that MATLAB (like FORTRAN and many numerical
libraries such as FFTW and BLAS) employs column-major ordering in matrices
instead of the standard C/C++ row-major ordering.
For maximum performance, I've multithreaded the mex function using the shared memory multiprocessing API OpenMP. OpenMP is widely supported in gcc, Visual Studios, and Intel compilers.
To parallelize the 4 nested for loops required for spatial correlation, all that was required was a single compiler directive:
Where private(...) contains a list of all private variables for the code
block (ie. nested loop indices, intermediate sums, etc.) .
Usage for my NCC mex file is: [correlationPlane] = spatial_subsearch(data,template,ROI).