15-463 Fall 2007
Digital images are often viewed in many different display devices with a variety of resolutions. Variation of resolution makes viewing images difficult because they usually are resized to accommodate limited space. Simple attempts at resizing include scaling and cropping. Scaling reduces perceivable detail and cropping can not be done automatically. Also, cropping alters the image composition and is not always desirable. This project is an implementation of a different image resizing approach called seam carving. This resizing method is presented by Avidan, S and Shamir, A (Seam Carving for Content-Aware Image Resizing).
Seam carving allows a change in size of the image by modifying the least noticeable pixels in an image. A typical application for seam carving is to reduce the size of an image along one dimension. This can be done by finding one pixel wide paths from the top to the bottom of the image and removing those paths. If the pixels in those paths are similar to surrounding pixels, then their removal may be unnoticed. Other seam carving applications include increasing the size of an image, changing the size of an image in two dimensions and even object removal.
Reducing the size of an image is accomplished by removing pixels that will go unnoticed. The pixels to be removed are determined by finding the path across the image with the lowest total energy, which is the sum of the energy of each pixel along the path. Consider making the image shown below one pixel narrower. One pixel in each row of the image is removed, reducing the width of the image by one pixel. The red line near the right edge shows the minimum energy path from the top to the bottom of the image. Each pixel in the path is removed from the image for an overall reduction in width. To further reduce the width of the image, the pixels from the next lowest energy path will be removed. This process can continue until the desired width is reached.
The resizing process involves assigning energy to each pixel and then finding the path across the image that minimizes total energy. Finding the minimum path can be costly, but is not difficult. Assigning energy to each pixel is difficult because the definition of energy used can lead to poor results.
Each pixel has energy based on the sum of the x and y derivatives at that point. The derivatives are computed by taking the finite divided difference between the pixel and its neighbors. The energies from each color channel are computed and then summed for the overall energy of the pixel. Finding an appropriate energy function can drastically affect the results. One problem with this energy function is that it does not take into account vertical lines in the image. A minimum path may not cross through a vertical line directly, but if part of the path is on one side of a line and part of the path is on the other, then the end result may be a skew in the vertical line. Examples of artifacts that appear are given in the results section.
The x and y derivatives where taken using the MATLAB conv2 function to convolve the approximated derivative of a 9x9 Gaussian with the image. The derivative is approximated as a central finite divided difference. This method works well for interior pixels, but pixels along the edge do not get appropriate values, since a central finite divided difference is not appropriate. To improve the results, the image size is expanded by reflecting pixels near the edge, across the edge and then cropping out the expanded region after the convolution. This help remove unrealistic energies near the edges without adding much computation time.
Consider the photo of the St. Louis Arch from Flickr. The energy of each pixel is shown in the figure below. Since the energy function is based on the first derivative, edges have a very high weight and the outline of the arch can be seen.Image input to the energy function. Plot of the energy mapping of each pixel in the Arch photo.
Finding the path through the image with the minimum total energy is a direct application of Dynamic Programming. If the width of the image is being reduced, the path will travel top to bottom and the total number of pixels in the path will equal the height of the image. The minimum path is found by storing the optimal paths to reach each pixel in a cost table. The table is constructed by top-down memoizing, and the path to a pixel is inferred by back tracing through the table.
The following figure shows the same Arch photo as input to the path energy minimization function. Below is a figure of the cost table constructed from the energy values at each pixel.
The memoizing table represents the optimal cost to reach each pixel in the image. The table has dimensions that match the size of the image, but the width and height may be switched depending on the direction of the path through the image. For a path that is top to bottom, the table is sized MxN where M is the width of the image and N is the height of the image. The cost values in each table cell are the energy of the corresponding pixel plus the total energy so far of the optimum path to reach that cell. Evaluating the value at each table cell is a recursive function defined as:
T(i,j) = e(i,j)+MIN( T(i, j-1), <-- moving straight down
T(i+1, j-1), <-- moving down and to the left
T(i-1, j-1) <-- moving down and to the right
Where e(i,j) is the energy of the pixel located in the image at (i,j). This function considers the three possible paths to reach cell (i,j) and uses the path with the lowest total energy so far. The three paths are moving straight down, moving down and to the left and moving down and to the right. When (i,j) is on the edge, some of these directions may not be possible and they are omitted from the formula. Summing the energy of the path so far with the energy of the current pixel updates the energy of the path and that is the cost that is entered into T.
The optimal path, which minimizes total energy of each pixel in the path is found by back tracing through the table. First, the minimum cost in the last row is determined and that is the end point of the path. Then, the adjacent cells of the above row are compared and the path follows the direction of the minimum of those cell values. The path is determined by tracing back to the first row of the table.
Avidan, et al present several possible energy functions. I started with the simplest and decided to try a few of the others for comparison. In total, the energy functions that I considered were:
The initial implementation with the e1 energy function was used to reduce the width of a sample image, shown below. The image was resized with few artifacts. The most noticeable is the deformation of the suitcase.
Painting by Annie Wedler in the fall of 2007
Painting by Annie Wedler in the fall of 2007
Painting by Annie Wedler in the fall of 2007
Carnegie Mellon Solar Splash endurance even race in the summer of 2007.
Carnegie Mellon Solar Splash sprint even race in the summer of 2007.
Pictures like these of Pittsburgh are not taken from a helicopter, but from a small park called the West End Overlook. This photo is from Flickr.
The cost function is a good method for weighing importance of sections of an image, but it is not perfect. There are many examples of images that will not be resized well using seam carving. Bad results occur when geometry of a figure is modified noticeably and when a section of an important figure is removed. These results are undesirable and an adjusted energy function or some other resizing method should be used.
Some images have very structured geometry and seam removal can disrupt this geometry.
People are very good at perceiving curvature and continuity, so seam carving an image of the St. Louis Arch is likely to have poor results.
The strength of seam carving is removing areas of the image that will go unnoticed. But there are many cases where the energy function will lead to the carving of an area of the image that is important. People, for example, are almost always important subjects of photographs, and when the energy function maps low energy to a person, the result is unexpected disappearing.
Photo taken when Carnegie Mellon Solar Splash was testing at Lake Arthur in the spring of 2006.
As expected from the seams, Mike has been almost completely removed!
The two energy functions I completed are e1 and e2. I attempted the HoG energy function, but when run on an example image, it was too slow. The power of the e1 and e2 functions is that most of the calculation is done by convolution. The HoG function is non-linear and uses the MAX function to find the maximum histogram value, in addition to separating the window out into discreet histogram bins. Finally, I did not implement the entropy function because dynamic programming would not find the maximum entropy path based on my definition for entropy. Dynamic programming is a greedy algorithm and assumes a locally optimal solution is globally optimal. However, when considering an optimal direction for a path to take, the entropy not only depends on the pixels that are already included in the path, but also the pixels that will be included if a given path is to be taken. However, dynamic programming may have given a good enough answer, where even though entropy is not minimized, a path that is close to minimal may be found. Even though an entropy based energy function may have had good results, I did not implement it, since it would have been slow like the HoG energy function.
|Function||Energy Mapping||Horizontal Carving Results|
Image size can be increased just as well as decreased by adding pixels along a seam instead of removing them. I first calculated N paths that would be removed when reducing the size of an image. With those paths, however, I replaced each pixel with two pixels, increasing the overall size by 1 pixel. The two pixels were inserted by iterating through the path and taking the average of the left pixel and the current pixel and the average of the current pixel and the right pixel. For vertical carving instead of horizontal carving, the above and below pixels were used instead of right and left pixels.
This method certainly increased the size of the images, but I did not find good examples where it worked better than simply scaling the image. Seam carving causes features to become slightly stretched even though its use should reduce the amount of stretching that occurs.
For comparison, the seam carved and scaled methods are shown for reducing the width of the image. Seam carving results in features appearing less squished in the image.