MIME-Version: 1.0 Server: CERN/3.0 Date: Sunday, 01-Dec-96 19:52:08 GMT Content-Type: text/html Content-Length: 23087 Last-Modified: Friday, 17-May-96 02:19:39 GMT
This project was an attempt to parallelize a computer vision object recognition algorithm. Given a database of edge-detected object models and a previously unseen edge-detected image, this algorithm attempts to find any and all models which appear in the new image. Matching is performed using an approximation of the Hausdorff Fraction; see Figure 1 for a simple example. The image search is performed in a hierarchical scheme which attempts to minimize comparisons by quickly eliminating regions of the image from consideration. [1]
In this paper we will discuss the pros and cons of different methods
of parallelization and describe the factors that motivated our
design decisions. We will then describe our parallel implementation
using Split-C
in detail; this will be followed by a performance analysis. We also
discuss the suitability of this algorithm for use in facial recognition and
present our results. We conclude with a discussion of some possible
extensions to both the parallel implementation and the facial recognition
aspects.
This method uses an approximation to the Hausdorff Fraction to
measure the similarity between two edge images of the same size.
This approximation is computed using a subspace representation
of the two edge images. The distance for which the fraction is
being computed is factored in by dilating one of the images by
the distance. For this application, one of the images in the match
will be a model from a stored database while the other will be
region of a new image. To create the subspace, we first convert
each of the model images into a column vector by reading the pixels
in column major order. We then construct a matrix from these vectors.
We compute the eigenvectors and eigenvalues of this matrix and
select the k largest eigenvalues and create a matrix from
their corresponding eigenvectors. This matrix is used to project
each of the model images and any region of the new image into
the subspace. The subspace representation of the two images will
each be a k-length vector. The primary step in computing
the approximation to the Hausdorff Fraction is a dot product of
these two k-length vectors. This method is described in
detail in [1].
The subspace representations of the database models are precomputed
and stored along with all of the matrices and data needed to perform
the projection operation. Regions of the image we are searching
need to be projected into the subspace during run time. This projection
operation is computationally very expensive. Therefore, both the
search algorithm and the parallelization method attempt to minimize
the number of projections performed. Note that the approximation
is only accurate to within 0.05 of the true Hausdorff Fraction,
so final decisions about the relative quality of a match are made
using the true bi-directional Hausdorff Fraction.
The search algorithm presented in [1] attempts to minimize the number of projections and comparisons by performing a coarse-to-fine examination of the new image. The new image is subdivided into cells where each cell represents a set of translations of the model with respect to the image. Each model is translated to the center of the cell and compared with that region of the image dilated by the radius of the cell (see Figure 2). This amounts to a rough comparison of this model with every possible translation in the cell. If the resulting fraction is less than some threshold, then the set of translations represented by that cell can be ruled out as a match with that model. If the resulting fraction is not below the threshold, then the algorithm subdivides the cell into four quadrants and considers each of those. This is repeated until the model is ruled out or the algorithm has recursed down to the point where each cell is one translation (see Figure 3). If a model cannot be ruled out by the approximation at the lowest level, then the true bi-directional Hausdorff Fraction is calculated. If the model still cannot be ruled out, it is marked as a possible match and the fraction and position are stored for later consideration.
The first major task in parallelizing this object recognition algorithm was deciding how to distribute the processing amongst the processors. One option was to distribute the search cells across the processors. Thus each processor would be assigned a group of cells and would search each of these cells for all models in the database. This method had some attractive features. Cells are independent entities as are all image projections associated with them. The projections could be computed and used locally for each of the models and no communication between processors would be necessary. However, this method requires each processor to have access to the entire database of models. Our test database contained 30 view each of 20 models for a total of 600 models. For a database of that size having a complete set of models at each processor is not unreasonable. However, any real application would require a much larger database which would preclude the use of this method.
Our second option was to distribute the models across the processors. Thus each processor would have a subset of the models in the database and would search every cell in the image for its set of models. This method is much more tolerant of a large database than the first, but it introduces some significant problems. Each processor is searching through all of the cells in the image. As such, every processor will need most of the image projections calculated. Image projections are very expensive, so having each processor compute all of the projections locally is unacceptable. Instead it is necessary to have processors store projections in a global data structure as they are calculated so other processors can access them. This introduces significant network traffic into the parallel version. However, since this method can handle a large database much more effectively, it became the only choice for implementation.
Once this implementation was chosen, we were faced with two primary issues in parallelization. One issue was how to efficiently compute and share the image projections. To prevent processors from needing to access the same projections simultaneously, we stagger the starting points of each processor across the image we are searching. In an ideal case, the processors will move from cell to cell, in lock step, without any conflicts. If two processors happen to need the same projection simultaneously, and the projection has not yet been calculated, then a simple locking mechanism ensures that only one does the calculation while the other waits. So when a given processor needs a projection, it first checks to see if the projection has been calculated. If so it gets the projection and uses it. If it has not been calculated, it locks the projection, calculates it, and stores it in the global structure. If it fails to lock the projection, then another processor must be computing it, so it waits until that computation completes.
The second issue was that of load balance. This issue appeared
to be deceptively simple since a large quantity of models could
be easily distributed across many processors. The problem with
this simple solution arose because of the organization of our
database. The 30 views of each of the 20 models were stored sequentially
in the database. A given processor would be assigned most of the
views of a few objects. If one of those objects appeared in the
image, that processor would have many more close matches than
other processors and would recurse more deeply into the image.
This created a distinct load imbalance. We solved this problem
by spreading the views of each object across the processors. In
a real application it would be sensible to store the views of
an object out of order to solve this problem. With this change in
place we achieved a much more acceptable load balance. Figure
4 shows the load balance of the algorithm before and after this
correction. It is important to note, however, that it is impossible
to predict which models will cause the deepest searches so there
will always be a certain amount of load imbalance.
Once our Split-C implementation was complete, we measured its performance on the SP-2 massively parallel processor. We also compiled the sequential version for the SP-2 and measured its performance. The graph in Figure 5 shows the performance results in terms of speedup over the sequential version.
The graph clearly shows that our parallel implementation achieves less than ideal performance. Ideal performance would be a speedup of N for a test on N processors. These results were not as good as we had hoped, so we attempted to identify the source of overhead which was limiting our performance. We separately measured the time spent in each phase of the search. The breakdown of time for different numbers of processors is shown in Figure 6.
The single processor graph shows that the parallel version, when run on one processor performs nearly as well as the sequential version. However, when more processors are added, we see that the time spent fetching projections from the global structure is a significant source of overhead. As the number of processors being used increases, the fewer projections are calculated locally, so more global accesses are required. The graphs show that the overhead associated with fetching projections increases for more processors.
To solve this problem, we would like to fetch the projections
using a split-phase access and do some other calculations while
waiting for the data to arrive. However, the nature of the algorithm
prevents us from knowing in advance which projections we need.
Thus by the time we know we need a projection we have no other
calculations to perform while waiting for the data.
With our parallel implementation complete we began to investigate the usefulness of this method for facial recognition. Our goal was to see if the algorithm could successfully find faces in a scene using the approximation to the Hausdorff fraction. One test method we considered was inserting face models from our database directly into scenes and then seeing if we could match them. This works well but is very artificial. Instead, we use images that are not in the database of the people who do appear in the database to see if we can effectively find those faces. This is much more representative of a real-world situation.
We started the construction of our database with a set of 400 images. This set consisted of 10 poses each of 40 different people. We wanted to exclude some of these poses from the database so that we could search for them in scenes. We elected to save the odd-numbered poses for matching and construct the database from the 200 even-numbered poses. The resulting database is shown in Figure 7. We used this database to construct a subspace and the needed supporting data as is described in [1].
Our next step was to create a set of images to search. We didn't have access to the people who appear in our database, so we decided to create synthetic images by inserting the edge images (the ones we kept out of the database) into images which contained other faces of roughly the same size. We attempted to select images which contained a significant number of faces and additional clutter to introduce the possibility of false matches.
We have included several of our synthetic test images and the results of our searches on them. Each set of images includes the original grayscale and the corresponding edge image. Each pair of images following consists of the synthetic image created by inserting the face we wish to search for and the same image with the best match overlayed in red. The notation "X/Y" associated with the inserted face indicates that we inserted pose Y of person X. In instances where the search fails to identify the correct face, the correct face is shown in blue while the result of the search is shown in red. Coordinates of the match are column followed by row with the origin at the top left.
The results of our experiments with facial recognition were mixed. We were pleased that often the algorithm successfully selected the correct person to match the face in the scene. However, the algorithm does sometimes fail and incorrectly selects another face, or in rare cases something that is not even face. However, the key result of our tests was more subtle. When we examined all of the possible matches which survived the Hausdorff approximation based search, we found that the correct result was nearly always present.
Our test set was not large enough to suggest any statistical conclusions, but we believe that the search algorithm is effective in quickly eliminating non-matches without eliminating good matches. The final decision as to which element of the database is the best match is made with the true Hausdorff approximation. We believe this is where the algorithm fails. Thus it may be possible to use the approximate Hausdorff method to quickly search the image, and then use a more robust face matching algorithm to pick the best match from the survivors. The method developed by Turk, et. al [3] might be a good choice for this application.
It has been established that searching a scene for a given set of models can be conducted efficiently by using the approximation to the Hausdorff fraction. In the first part of this project we have shown that this algorithm can be effectively parallelized. Thus, with enough processors at our disposal we can quickly search a new scene for models in the database.
Further we have determined that this algorithm can be successfully applied to the problem of facial recognition. Given a database of faces and a scene, it is possible to use this algorithm to search the image for people who appear in the database. This algorithm has potential applications as a means of identifying unknown individuals. This could be used by law enforcement agencies for searching criminal records. Companies could use it to verify the identity of employees as part of their security. More general problems such as face detection could be addressed with a variant of this algorithm which searches for face-like structures in a scene. This work could also be extended to search for a given face in a sequence of images.
The primary limiting factor in this parallel algorithm is the time spent fetching projections from the global structure. As we have already stated, predicting the need for a given projection would allow us to perform a split-phase access and avoid wasting time waiting for the network to supply the data. Future revision to this implementation should include some means of projection prediction. A simplistic approach would be to fetch all projections that might be needed. The danger in this approach is that such a scheme would increase network traffic and congestion significantly. A proper approach would be to employ something akin to a branch prediction scheme with high accuracy.
Another possible approach to improving this algorithm might be to use a breadth first search rather than a depth first one. This would allow you to accurately predict which projections you would need. However, this would also require storage of a list of surviving models for each cell at each level as the search proceeds. The memory requirements of these lists would be significant and perhaps prohibitive.
Yet another solution would be to use the first method of parallelization described above. We would distribute the cells across the processors. Each processor could compute the projection for its cells as needed and store them. The problem of storing all of the models could be solved by subdividing the database into model sets. A processor would consider only one set at a time. When nearing completion of a set, the processor could begin reading the next set from a file. This would keep memory requirements to a minimum. However, the demands on the file system might become the limiting factor in the algorithm with this implementation.
We see three main areas for future work with respect to facial recognition.
A larger set of test images could be used in an effort to produce real
statistics on recognition performance. However, these statistics will still
be approximations because facial recognition is inherently non-quantifiable.
Real test images (as opposed to our synthetically-generated test images) could
be used. This would require the people in the database to naturally appear in
the test images; most likely a new face database would need to be created. It
would also be interesting to see how the search algorithm performs under
changing lighting conditions. The true Hausdorff fraction that is computed to
eliminate matches is not necessarily the best measure to use for facial
recognition. Edge detected images of faces are not necessarily a good basis
for final matching. Perhaps a better method would be to use a greyscale-based
matching scheme; the method used in [3] might be a
good candidate. Certainly, edge-detection based matching appears to be useful
for quickly and efficiently narrowing the search.
References
Source Code