Interest Point Detectors
We use point detectors extended to find maxima in scale space.
An image L is a function from (x,y) to an intensity value. The scale space
form of L is a function from (x,y,scale) to an intensity value. It can be
thought of as a stack of the same image with increasing amounts
of blur.
Each detector searches for maxima of some function on the texture in the
image L at a given scale. It then determines if a detected point is also
near a maximum along the scale dimension. Our implementation searches
for (x,y) maxima at scales that are powers of sqrt(2). Then, regardless of
the type of function used to find (x,y) maxima, it uses scale normalized
Laplacian filters to search for a nearby maximum along the scale direction.
There are 20 scale levels of Laplacian filters between any two scale levels of
the primary (x,y) function.
To make the responses of functions comparable across scale, we use
scale normalization as described by Lindeberg. We obtain derivatives of
the intensity function L using derivative of Gaussian kernels. For each
order of derivation, we multiply the kernel once by the sigma value of
the Gaussian. For example, the Laplacian of Gaussian is a second derivative
based kernel, so we multiply it by sigma*sigma.
Functions
Below are brief summaries of some of the functions. For details, see the
papers by Mikolajczyk and Schmid.
Harris Function
The Harris function approximates the product of the eigenvalues of the
squared gradient matrix. If [dx,dy]T is the gradient at some
point, then the squared gradient matrix D is:
[dx*dx dx*dy]
[dy*dx dy*dy]
The Harris function sums the squared gradient matrices Di in a window around
a point using a Gaussian:
A = sum(Gi*Di).
It then computes a value for the given
point using the trace and determinant of resuling matrix:
det(A) - 0.06 * trace(A).
The Harris value will be large if the product of the eignevalues of A is
large.
There are two scales involved in computing the Harris function. One is
the derivation scale used to compute the gradient at each given point. The
other is the integration scale used to sum up the squared gradient matrices
in a window around a given point. The integration scale is the
characteristic scale of the function. Our implementation
chooses a fixed ratio between the two scales:
(derivation scale) = 0.7125 * (integration scale).
Trace of Hessian
The second derivative matrix (Hessian) at a given point is:
[dxx dxy]
[dyx dyy]
The trace of Hessian is (dxx+dyy).
We use a combination of a 1D Gaussian blurring filter and a 1D second
derivative of Gaussian filter to find the values for dxx
and dyy. The response of (dxx+dyy) is the same as convolving
with a 2D Laplacian of Gaussian kernel. The advantage of computing
the trace of Hessian is that it can be done entirely with 1D filters, and
so is more efficient.
Thresholding
After collecting the maxima of a given function, the next step is to
discard weak responses. Picking thresholds is an art. A common method
is to use some percentage of the strongest response value. Instead,
our implementation uses some multiple of standard deviation from zero,
computed on the responses themselves. This method works better than
the common method, but for the Hessian it starts to break down as scales
get larger than about 30 pixels. Therefore, we switch over to another
method of choosing threshold at larger scales of Hessian. Part of this
hack is to use all the points (no threshold at all) when there are less
than 20 of them.
Below we show some response histograms to
help justify the scale selection method and also explain why it breaks down.
The histograms each have 100 bins. The horizontal scale is response value
for the given function, and the vertical scale is either raw number of points
in a given bin or percent of total points that fell in that bin. The
histograms for the Hessian function were computed on the
first image in the shoe dataset, and the Harris histograms
were computed on the
first image
in Krystian Mikolajczyk's graffiti6 dataset.
- Hessian, sigma=1.4
-- This graph shows the typical shape of a response histogram. It suggests
that the response falls off exponentially. While modeling this as a Gaussian
distribution may not be excatly correct, it is easy to compute.
- Hessian, sigma=1.4, zoomed in near 0
-- This is a closer look at the beginning of the histogram. We again use 100
bins, but on a smaller range of response values.
- Hessian, all scales, only maxima
-- This diagram is similar to the previous two, but shows a wide range of
scales. The curves are all plotted as probability distributions (sum to 1)
rather than raw counts, which makes them easier to compare. Notice that as
scale increases, the distributions become flatter. The "spiked" appearance
of the distributions for the last few scales is due to the fact that there
are fewer points than bins. As the distribution becomes flatter (more
uniform) the standard deviation moves further from zero, thus raising
the threshold. This method seems to improve repeatability, but eventually
(if the threshold multiple is large enough) it eliminates all points,
which is not helpful.
- Hessian, all scales, full image
-- This histogram contains all the points in the image, rather than just the
maxima. It shows that the distribution of the maxima is characteristic of
the repsonses as a whole.
- Harris, sigma=1.4
-- Here we see that the Harris response drops off sharply and more quickly than
the Hessian. However, it still appears to be an exponential curve.
- Harris, sigma=1.4, zoomed in near 0
-- The distribution holds the same shape, even after zooming in by many
orders of magnitude.
- Harris, sigma=1.4, zoomed in more
- Harris, all scales, only maxima
-- The distribution of maxima has such a strong spike that there is very little
change across a wide range of scales.
- Harris, all scales, full image
-- The distribution of all the responses shows some of the same flattening
at larger scales, but not as pronounced as for the Hessian.