Project 2: Image Resizing by Seam Carving

William Wedler
15-463 Fall 2007

Overview

Digital images are often viewed in many different display devices with a variety of resolutions. Variation of resolution makes viewing images difficult because they usually are resized to accommodate limited space. Simple attempts at resizing include scaling and cropping. Scaling reduces perceivable detail and cropping can not be done automatically. Also, cropping alters the image composition and is not always desirable. This project is an implementation of a different image resizing approach called seam carving. This resizing method is presented by Avidan, S and Shamir, A (Seam Carving for Content-Aware Image Resizing).

Seam carving allows a change in size of the image by modifying the least noticeable pixels in an image. A typical application for seam carving is to reduce the size of an image along one dimension. This can be done by finding one pixel wide paths from the top to the bottom of the image and removing those paths. If the pixels in those paths are similar to surrounding pixels, then their removal may be unnoticed. Other seam carving applications include increasing the size of an image, changing the size of an image in two dimensions and even object removal.

Process

Reducing the size of an image is accomplished by removing pixels that will go unnoticed. The pixels to be removed are determined by finding the path across the image with the lowest total energy, which is the sum of the energy of each pixel along the path. Consider making the image shown below one pixel narrower. One pixel in each row of the image is removed, reducing the width of the image by one pixel. The red line near the right edge shows the minimum energy path from the top to the bottom of the image. Each pixel in the path is removed from the image for an overall reduction in width. To further reduce the width of the image, the pixels from the next lowest energy path will be removed. This process can continue until the desired width is reached.

Minimum energy path through an image

Approach

The resizing process involves assigning energy to each pixel and then finding the path across the image that minimizes total energy. Finding the minimum path can be costly, but is not difficult. Assigning energy to each pixel is difficult because the definition of energy used can lead to poor results.

Assigning energy to each pixel

Each pixel has energy based on the sum of the x and y derivatives at that point. The derivatives are computed by taking the finite divided difference between the pixel and its neighbors. The energies from each color channel are computed and then summed for the overall energy of the pixel. Finding an appropriate energy function can drastically affect the results. One problem with this energy function is that it does not take into account vertical lines in the image. A minimum path may not cross through a vertical line directly, but if part of the path is on one side of a line and part of the path is on the other, then the end result may be a skew in the vertical line. Examples of artifacts that appear are given in the results section.

The x and y derivatives where taken using the MATLAB conv2 function to convolve the approximated derivative of a 9x9 Gaussian with the image. The derivative is approximated as a central finite divided difference. This method works well for interior pixels, but pixels along the edge do not get appropriate values, since a central finite divided difference is not appropriate. To improve the results, the image size is expanded by reflecting pixels near the edge, across the edge and then cropping out the expanded region after the convolution. This help remove unrealistic energies near the edges without adding much computation time.

Consider the photo of the St. Louis Arch from Flickr. The energy of each pixel is shown in the figure below. Since the energy function is based on the first derivative, edges have a very high weight and the outline of the arch can be seen.

Image input to the energy function.

Plot of the energy mapping of each pixel in the Arch photo.

Finding the minimum path using Dynamic Programming

Finding the path through the image with the minimum total energy is a direct application of Dynamic Programming. If the width of the image is being reduced, the path will travel top to bottom and the total number of pixels in the path will equal the height of the image. The minimum path is found by storing the optimal paths to reach each pixel in a cost table. The table is constructed by top-down memoizing, and the path to a pixel is inferred by back tracing through the table.

The following figure shows the same Arch photo as input to the path energy minimization function. Below is a figure of the cost table constructed from the energy values at each pixel.

This cost table was constructed for horizontal carving, where seams from the top to the bottom of the image are removed.

The memoizing table represents the optimal cost to reach each pixel in the image. The table has dimensions that match the size of the image, but the width and height may be switched depending on the direction of the path through the image. For a path that is top to bottom, the table is sized MxN where M is the width of the image and N is the height of the image. The cost values in each table cell are the energy of the corresponding pixel plus the total energy so far of the optimum path to reach that cell. Evaluating the value at each table cell is a recursive function defined as:
T(i,j) = e(i,j)+MIN( T(i, j-1), <-- moving straight down
                             T(i+1, j-1), <-- moving down and to the left
                             T(i-1, j-1) <-- moving down and to the right
                             )
Where e(i,j) is the energy of the pixel located in the image at (i,j). This function considers the three possible paths to reach cell (i,j) and uses the path with the lowest total energy so far. The three paths are moving straight down, moving down and to the left and moving down and to the right. When (i,j) is on the edge, some of these directions may not be possible and they are omitted from the formula. Summing the energy of the path so far with the energy of the current pixel updates the energy of the path and that is the cost that is entered into T.

The optimal path, which minimizes total energy of each pixel in the path is found by back tracing through the table. First, the minimum cost in the last row is determined and that is the end point of the path. Then, the adjacent cells of the above row are compared and the path follows the direction of the minimum of those cell values. The path is determined by tracing back to the first row of the table.

The red lines are seams used to vertically carve the Arch photo to reduce the height (left) and width (right).

Bells and Whistles

Alternative Energy Functions.

Avidan, et al present several possible energy functions. I started with the simplest and decided to try a few of the others for comparison. In total, the energy functions that I considered were:

e1
e1 is the sum of the magnitude of x and y partial derivatives.
This is the simplest and gave good results.
HoG
HoG is e1 divided by a histogram of gradients measure.
This method was used for several examples in the paper. The intention is to attract a seam to an edge in the picture but to not cross it. This is done by dividing by the max of a histogram of gradients measure to reduce the cost near an edge. But then the cost is multiplied by the e1 energy, so it is high at an edge.
e2
e2 used the second derivative combined with e1.
Covered in lecture, a convolution with the Laplacian of a Gaussian gives an approximate second derivative. I based this energy function on computing the second derivative and I attempted to get the same effect as HoG: attract a seam to an edge but no cross an edge.
entropy
Entropy is calculated along a seam.
This is a very different approach because it is not based on gradients, but on the distribution of color and brightness. The optimal seam is one with the greatest uniformity, so entropy is minimized.

Results

The initial implementation with the e1 energy function was used to reduce the width of a sample image, shown below. The image was resized with few artifacts. The most noticeable is the deformation of the suitcase.

Twins Painting

Painting by Annie Wedler in the fall of 2007

original size

Original Image

  1. reduced size

    Reduced Image

  2. reduced size

    Further Reduction of width

Bunk Bed Painting

Painting by Annie Wedler in the fall of 2007

Original Image

  1. Resized Image

  2. Further Resized

  3. Vertically Resized

Closet Painting

Painting by Annie Wedler in the fall of 2007

Original Image

  1. Resized Width

  2. Resized Further

  3. Resized Height

Arch Sunset

original size

Original Image

  1. original size

    Horizontally Resized

  2. original size

    Vertically Resized

Solar Powered Boat Race

Carnegie Mellon Solar Splash endurance even race in the summer of 2007.

Original Image


Electric boat sprint

Carnegie Mellon Solar Splash sprint even race in the summer of 2007.

original image

Original Image


West End Overlook

Pictures like these of Pittsburgh are not taken from a helicopter, but from a small park called the West End Overlook. This photo is from Flickr.

original image

Original Image


Undesirable Results

The cost function is a good method for weighing importance of sections of an image, but it is not perfect. There are many examples of images that will not be resized well using seam carving. Bad results occur when geometry of a figure is modified noticeably and when a section of an important figure is removed. These results are undesirable and an adjusted energy function or some other resizing method should be used.

Noticeable geometry modifications

Some images have very structured geometry and seam removal can disrupt this geometry.

e1 energy function artifact

This image has seemingly insignificant data, the dark floor in the painting, near an important area, the edge of the painting itself. The seam carving algorithm crosses the edge of the painting and distorts it. This distortion is very noticeable even though only 10 pixels were removed, and is highlighted in red.

People are very good at perceiving curvature and continuity, so seam carving an image of the St. Louis Arch is likely to have poor results.

Original photo of the St. Louis Arch, from Flickr

Resized photo with irregular geometry that is very noticeable due to the elegant form the arch is supposed to have.

original size

This Arch photo has nicer geometry and the energy function was modified to avoid edges more. A single resizing was successful, but resizing too much leads to disappearing in the lower right leg of the Arch.

Disappearing figures from the image

The strength of seam carving is removing areas of the image that will go unnoticed. But there are many cases where the energy function will lead to the carving of an area of the image that is important. People, for example, are almost always important subjects of photographs, and when the energy function maps low energy to a person, the result is unexpected disappearing.

Photo taken when Carnegie Mellon Solar Splash was testing at Lake Arthur in the spring of 2006.

This image was taken at sunset and the shadows make the subjects in the foreground as dark as the background.

The paths of the carved seams are shown in red. A majority of them go through the person on the left (Mike) even though he is an important subject.

As expected from the seams, Mike has been almost completely removed!


Bells and Whistles:

Alternate Energy Functions

The two energy functions I completed are e1 and e2. I attempted the HoG energy function, but when run on an example image, it was too slow. The power of the e1 and e2 functions is that most of the calculation is done by convolution. The HoG function is non-linear and uses the MAX function to find the maximum histogram value, in addition to separating the window out into discreet histogram bins. Finally, I did not implement the entropy function because dynamic programming would not find the maximum entropy path based on my definition for entropy. Dynamic programming is a greedy algorithm and assumes a locally optimal solution is globally optimal. However, when considering an optimal direction for a path to take, the entropy not only depends on the pixels that are already included in the path, but also the pixels that will be included if a given path is to be taken. However, dynamic programming may have given a good enough answer, where even though entropy is not minimized, a path that is close to minimal may be found. Even though an entropy based energy function may have had good results, I did not implement it, since it would have been slow like the HoG energy function.

Function Energy Mapping Horizontal Carving Results
e1
e2

Increasing Image Size

Image size can be increased just as well as decreased by adding pixels along a seam instead of removing them. I first calculated N paths that would be removed when reducing the size of an image. With those paths, however, I replaced each pixel with two pixels, increasing the overall size by 1 pixel. The two pixels were inserted by iterating through the path and taking the average of the left pixel and the current pixel and the average of the current pixel and the right pixel. For vertical carving instead of horizontal carving, the above and below pixels were used instead of right and left pixels.

This method certainly increased the size of the images, but I did not find good examples where it worked better than simply scaling the image. Seam carving causes features to become slightly stretched even though its use should reduce the amount of stretching that occurs.

Original Image

Width increased by inserting average pixels along seams

Width increased by scaling performed in photoshop

For comparison, the seam carved and scaled methods are shown for reducing the width of the image. Seam carving results in features appearing less squished in the image.

Width reduced by seam carving

Width decreased by scaling performed in Photoshop