15-663 Project 2

Pratch Piyawongwisal (ppiyawon)

Eulerian Video Magnification

Overview

In this project, I implemented Eulerian Video Magnification (Wu et al., 2012) technique for revealing subtle temporal variations in videos that are hard to see with bare eyes.

Approach

As shown in the below figure, in the first step, the input video is decomposed into different spatial frequency bands, using Laplacian Pyramid. Then, we take the sequence of pixel's values over time, and apply a temporal band-pass filter to extract the frequency band of interest. The resulting signal is then amplified and added back to the original frames. Lastly, the pyramid is collapsed to generate the output video.

Laplacian Pyramid

I read the frames of the input video in RGB format and construct a Laplacian pyramid for each frame. First, I create a Gaussian pyramid, from finest to coarsest level. At each level, I use impyramid() with 'reduce' option to find the next level (blurred and half-sized). Then I apply impyramid() with 'expand' option to upscale the half-sized image back to full size and subtract this from the current Gaussian level to get the Laplacian.

Temporal Filtering

Once the Laplacian pyramid is created, I apply temporal band-pass filter to n smallest levels of the pyramid (which have smallest spatial frequencies). I look at each pixel's R, G, B values over time. I find the FFT of these vectors to get the temporal frequencies of the pixel. Then, I generate an N-order Butterworth band-pass filter with low cutoff frequency Fc1 and high cutoff frequency Fc2. The filter is converted into frequency domain with freqz() function and multiplied to the pixel's temporal frequency vector, with amplification factor alpha. This results in the frequency band [Fc1, Fc2] that we are interested in being magnified. In addition to the Butterworth filter, I also tried an ideal filter where I zero out all temporal frequency vector entries outside the frequency band and amplify the entries inside the band.

Reconstruction

The reconstruction process is simple. I find the inverse Fourier transform of the amplified frequency signal to get the pixel values in spatial domain, compute the average between the original signal and the amplified signal, and add it back to the Laplacian frames. Lastly, I reconstruct the video from the modified Laplacian pyramid by recursively adding each Laplacian level to the smallest Gaussian level and upscaling the result.

Results

Parameters:

filter filter type (Ideal/Butterworth)
size Laplacian pyramid's size
cutoff # of smallest spatial frequency bands (pyramid levels) to apply temporal filter
Fc1, Fc2 low and high cutoff frequencies
alpha amplification factor
N order of Butterworth filter

Face.mp4

528×592, 301 frames, 30 fps
[filter, size, cutoff, Fc1, Fc2, alpha] = [Ideal, 8, 4, 0.8, 1, 50]

Baby2.mp4

640x352, 899 frames, 30 fps
[filter, size, cutoff, Fc1, Fc2, alpha, N] = [Butterworth, 10, 4, 2.33, 2.67, 75, 256]

Bells and Whistles

Subway.mp4

640x352, 243 frames, 30 fps
[filter, size, cutoff, Fc1, Fc2, alpha] = [Ideal, 8, 4, 3.2, 6, 25]

Leg.mp4

640×480, 201 frames, 30 fps
[filter, size, cutoff, Fc1, Fc2, alpha] = [Ideal, 8, 4, 1.5, 2.5, 20]