Interest Point Detectors

We use point detectors extended to find maxima in scale space. An image L is a function from (x,y) to an intensity value. The scale space form of L is a function from (x,y,scale) to an intensity value. It can be thought of as a stack of the same image with increasing amounts of blur.

Each detector searches for maxima of some function on the texture in the image L at a given scale. It then determines if a detected point is also near a maximum along the scale dimension. Our implementation searches for (x,y) maxima at scales that are powers of sqrt(2). Then, regardless of the type of function used to find (x,y) maxima, it uses scale normalized Laplacian filters to search for a nearby maximum along the scale direction. There are 20 scale levels of Laplacian filters between any two scale levels of the primary (x,y) function.

To make the responses of functions comparable across scale, we use scale normalization as described by Lindeberg. We obtain derivatives of the intensity function L using derivative of Gaussian kernels. For each order of derivation, we multiply the kernel once by the sigma value of the Gaussian. For example, the Laplacian of Gaussian is a second derivative based kernel, so we multiply it by sigma*sigma.

Functions

Below are brief summaries of some of the functions. For details, see the papers by Mikolajczyk and Schmid.

Harris Function

The Harris function approximates the product of the eigenvalues of the squared gradient matrix. If [dx,dy]T is the gradient at some point, then the squared gradient matrix D is:
     [dx*dx  dx*dy]
     [dy*dx  dy*dy]
The Harris function sums the squared gradient matrices Di in a window around a point using a Gaussian:
A = sum(Gi*Di).
It then computes a value for the given point using the trace and determinant of resuling matrix:
det(A) - 0.06 * trace(A).
The Harris value will be large if the product of the eignevalues of A is large.

There are two scales involved in computing the Harris function. One is the derivation scale used to compute the gradient at each given point. The other is the integration scale used to sum up the squared gradient matrices in a window around a given point. The integration scale is the characteristic scale of the function. Our implementation chooses a fixed ratio between the two scales:
(derivation scale) = 0.7125 * (integration scale).

Trace of Hessian

The second derivative matrix (Hessian) at a given point is:
     [dxx  dxy]
     [dyx  dyy]
The trace of Hessian is (dxx+dyy). We use a combination of a 1D Gaussian blurring filter and a 1D second derivative of Gaussian filter to find the values for dxx and dyy. The response of (dxx+dyy) is the same as convolving with a 2D Laplacian of Gaussian kernel. The advantage of computing the trace of Hessian is that it can be done entirely with 1D filters, and so is more efficient.

Thresholding

After collecting the maxima of a given function, the next step is to discard weak responses. Picking thresholds is an art. A common method is to use some percentage of the strongest response value. Instead, our implementation uses some multiple of standard deviation from zero, computed on the responses themselves. This method works better than the common method, but for the Hessian it starts to break down as scales get larger than about 30 pixels. Therefore, we switch over to another method of choosing threshold at larger scales of Hessian. Part of this hack is to use all the points (no threshold at all) when there are less than 20 of them.

Below we show some response histograms to help justify the scale selection method and also explain why it breaks down. The histograms each have 100 bins. The horizontal scale is response value for the given function, and the vertical scale is either raw number of points in a given bin or percent of total points that fell in that bin. The histograms for the Hessian function were computed on the first image in the shoe dataset, and the Harris histograms were computed on the first image in Krystian Mikolajczyk's graffiti6 dataset.