Newsgroups: sci.image.processing
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!gatech!swrinde!sdd.hp.com!hp-pcd!hpcvsnz!briand
From: briand@cv.hp.com (Brian Dixon)
Subject: Re: object-searching in an image
Sender: news@hpcvsnz.cv.hp.com (News )
Message-ID: <D8zywG.IIp@hpcvsnz.cv.hp.com>
Date: Mon, 22 May 1995 20:43:28 GMT
References: <woellik.52.000F9272@s15mhi.tu-graz.ac.at>
Nntp-Posting-Host: hpcvsgen.cv.hp.com
Organization: Hewlett-Packard
X-Newsreader: TIN [version 1.1 PL9.4]
Lines: 121

There are different ways of finding known objects within a picture.
The common ones are edge-based pattern matching, usually based on an
edge-enhanced (roberts, sobel, laplacian, gradient magnitude images in
general etc.) image and an edge-enhanced template of the known object.
Typical algorithms use some type of matching methodology or scoring method
that assigns a 'score' to each pixel location within the image, e.g.
Hough transform modified to work with your template rather than just lines
etc.  These methods are usually scale and rotation intolerant and must be
utilized carefully to make up for these shortcomings.  For example, if you
want to find a square or rectangle, use 'corner' templates to find the
corners rather than using a similar square or rectangle which will surely
miss if your picture is rotated or at a different magnification than that
used for training the template.

The normalized correlation template matching that you mention is very
tolerant of illumination variation such as overall brightness changes,
slightly less tolerant of varying shadows (angle of incidence changes),
and reasonably tolerant of scale and rotation (within a few degrees
depending on image and template features.)  In general it works pretty well
and is worth using.  So much depends on *your* particular situation.

The normalized correlation template matching algorithm convolutes a
template over an image to produce an output image that represents the result
of the transform.  If the convolution is done for every pixel, then the output
image can be normalized to a proper range of displayable values and you can
view the output of the transform.  Normally you don't need to normalize the
image in order to do your object search (only if you are going to view it).

The template is just a piece of another image that represents the object
that you want to find.  For example, if you want to find basketballs in
images, then you remove just the basketball image itself out of a 
representative image and use that for your template..unchanged in most
cases.

Theoretically, the way this algorithm works is as follows.  (Assuming that
the convolution is only done for the set of pixels in your image that are
'far enough' from the edge of the picture so you don't have to worry about
pixels outside the image being convoluted with your template.)  Produce a
new image where each pixel value is determined by centering the template
over the corresponding pixel in the original image and then multiplying each
pixel 'under' the template by the value found in the template, then summing
all of these values.  Best shown by example, image A is convoluted by
template B, with the result in C:

 A:  1  2  3  4  5     B:  1  2  1     C:
     6  7  8  9 10         2  4  2             112 128 144
    11 12 13 14 15         1  2  1             192 208 224
    16 17 18 19 20                             272 288 304
    21 22 23 24 25

Try placing the template (B) over an pixel in the image (A) and multiply then
sum everything and see if your result matches the corresponding pixel in
the output image (C) (Note that 'C' does not include values for convolution
locations that would have caused the template to extend outside the original
image...if you want to get close to the border, then that's a different topic.)

Now that you have your transform image, the best match-up of the template
within the original image is the point with the highest value in the
transform image, e.g. row 4, column 4 in the original image has the value
'304' in the transform and is the point of best match.

In reality, some tricks are used to speed things up.  For example, only do
the transform at points of a regular grid to do a 'rough find'.  Common
approaches may be something like calculating transform values only on every
16th pixel, then picking the highest value as the best 'rough' match.
Starting at the 'rough' match point, do the convolution on the 8 neighbors
that surround that point, and the highest transform value of those (and
the original) is your new find.  Repeat the convolution around unconvoluted
pixels which surround each new find until the highest transform value within
the 32 pixel region is found, e.g. +- 16 pixel rows and columns define the
'between grid point' area that hasn't been searched as of completion of the
'rough find'.  This technique is called 'hill climbing' and the concept of
breaking up the original image into a rough grid followed by a 'fine search'
like I described is fairly common and executes much quicker than the
theoretically correct method I described above.  I've also heard of using
different grid sizes and then only using strongly connected neighbors (4)
and other variations to try to optimize the search execution time.  This is
the kind of stuff the vision computer and software people won't give you
all the details on.  You can get pretty close with my algorithm though and
optimize it if you need to.

So anyway, those are the two most common search algorithms, but there are
many many more floating around for the 'general' case, and even MORE for
specialized situations that have a priori knowledge in the image.

Good luck,
Brian Dixon

-----------------------------------------------------------------------------

Woellik Helmut (woellik@s15mhi.tu-graz.ac.at) wrote:
: Hi there,

: I am looking for a method for doing object finding in a 
: gray scaled picture. The search-object is well known
: (size, shape and brightness), but in the given image,
: it can differ from it. This is because of little shaddow
: effects, noise or dirty parts on the object, or even 
: hidden and joined parts.
: My solution now (and it has a very good functionality) 
: is to operate with a lot of histograms, continued with 
: similar edge detecting. The disadvantage is the 
: computed time: One object with, say 20*20 pixel in a 
: 300*200 image needs more than 2 minutes to find!

: My question now: Is there any good and FAST algorithm 
: for doing this job? (I heard from the Normalized-
: Correlation-Search-Technique, but I do not know more
: about it...)
: I also wonder about any books/papers with this topic.

: Thanks.

: --
: Woellik Helmut          woellik@mhi.tu-graz.ac.at
: Institut fuer Foerdertechnik TU-GRAZ
: --

--
Brian Dixon, Machine Vision Engineer, Hewlett Packard (Corvallis, Oregon)
503-715-3143 (wk), briand@cv.hp.com (email). "Opinions & attitudes are mine!"
