Computational Photography

Zack Fleischman

Interactive PNG Extractor


The end result of this lab is to produce an application that when run will allow the user to easily pick out a part of a picture that is of interest and save only that part of the picture with an alpha channel. The intended goal of it is to extract PNG images for use as sprites in games.

This is done via a few different methods. There are 2 main operating modes: selection and deselection. Selection allows the user to explicitly determine what part of the picture they want to keep, where as deselection allows the user to express the portions of the picture that they do NOT want in their final image.
Using these 2 different modes, there are 2 main tools for selection. The first has the user simply click the object they want or don't want. The program then does its best based off edge detection to select or deselect the clicked on object. This can be adjusted via 2 parameters (see below). The second is a free selection method, where the user traces out an automatically closed shape, and everything in the shape is either selected or deselected.
The user is presented with an easy to use GUI interface complete with UNDO and GUI file selection.

This project was an effort to explore 3 different problems for me.
  1. Figuring out how to construct Matlab Graphical User Interfaces and how they interact with the main program
  2. Figuring out how to select "an object" based off the user providing a point (click) inside the object
  3. Figuring out how to determine what is "inside" a freely constructed shape that can be overlapping, convex, concave or anything you can think of
Tbese 3 problems each led to trials and tribulations, with some successes and some failures. All of which are detailed below!

1.) Matlab GUIs

I have done some GUI work before, but never in Matlab, so I was interested in figuring it out. As for constructing the actual GUI, Matlab has a really easy utility called "guide" which lets you drag and drop UI pieces onto a background panel and then sets up a code skeleton for you. The real magic comes in that there is a "handles" structure passed around the code through which you can access every UI control and property you can imagine. So through use of manipulating these properties and registering "callback" functions for events such as button clicks, it was reasonably straightforward. One interesting thing that took me awhile to get used to was the use of handles to figures and axis. These can be passed to such functions as imshow and pretty much anything else used for displaying. This also necessitated me passing around a lot of global variables. Overall, I like the control you get with Java GUIs better, but its not bad considering the sheer number of functions already available.

2.) Object Selection

This was the first thing that attracted me to the project, and the others came naturally as I worked on it. The main idea was to use Matlab's built in "edge" function to generate a mask for the edges, and then based off an input point into that mask, construct a new mask that represents the object clicked on. The most difficult part of this is that most of the time, the edge is not perfect and has holes. Thus my first implementation for it, which I left in as an option in the GUI since it was a pain in the butt and it still creates some interesting if not intended effects, was the flood fill algorithm. This essentially started at a point, took a look at all 4 of the its cardinal neighbors and the ones that weren't "edges" were marked and added to the queue to be recursed on. This was pretty slow and the recursion was too much for the stack, so I reimplemented it as an iterative process with several optimizations including spreading out across a whole row until you reached an edge on either side and then adding to the queue. However, as said before, this did not lead to satisfactory results. So I looked for different options.

The next algorithm that I used involved spreading outwards in an ever expanding square, adding one to the dimension number each step, and following rules on whether the previous step had hit an edge, determined if a pixel in question is inside the shape or not. The problem that this algorithm ran into is that it could not traverse "corners" or "bends" in objects. Otherwise it worked great though. It did not spread too far through holes, and it worked pretty fast. Still, I was not satisfied.

I fixed the problem with the bends by keeping track of all points marked inside the circle, and then randomly choosing K of them to repeat my expanding square algorithm. Doing this, I gather a whole new set of points that were not hit on the previous iteration (unless of course the whole shape has been mapped out). I can thus iterate this process however many times to get a "fuller" shape without expanding too much through the holes.

Unfortunately, I wasn't quite able to implement the possible optimizations that I know exist that make it possible to iterate quickly, so I had to limit my iterations to a small number, leading to artifacts.

3.) Freely Constructed Shape Recognition

So to fix the issue with jagged edges caused by the object recognition, I also gave the user the opportunity to just choose draw out any area they want. It closes automatically and anything that is inside it will either be selected or deselected depending on the option chosen. The difficult part is given a list of points that define a closed shape, how do I tell what points are in the shape and what points are out of it? I first tried to do this at the pixel level, inserting new points until I had a point on every pixel. However, there were way too many edge cases, and the quality of the results were poor and slow.

This prompted me to abstract a level and just deal with it on the geometrical level. I ended up defining the shape as n connected edges. If you cast a line through the shape, depending on how many times you have intersected one of the edges of the shape, you can determine if a given distance along the line is inside the shape. Therefore, I cast rays for every horizontal row, and calculated which ones were in and out. This I had working nicely, but unfortunately somewhere along the line I inserted a bug, and it will occasionally miss a horizontal scan or 2. I'm not sure why and it bugs me since I had it working at one point. But alas, the theory is sound :)


So using these tools, it is reasonably easy for one to extract a part of an image and then save it into PNG form with transparencies and autocropping.

Picture Gallery

Starting Photos:

Flood Fill:
User Interface: