Affine-Invariant Patches

Mikolajczyk and Schmid advanced a method of finding salient points in images and adapting them to the local texture so that they can be found repeatably, even under large changes of viewpoint. The adapted points should actually be thought of as patches, because they include a description of the shape and scale of the surrounding texture, as well as the position of the point itself. Affine-invariant patches form the basis for our modeling method.

We find affine-invariant patches in two steps:

Below are some results of our implementation.

Repeatability Tests

Mikolajczyk and Schmid prescribe a test for the quality of the affine-adaptation based on the repeatability of the points it finds. We assume that the correct homography H between two images L and L' is given. A point x in L is repeated if its projection x' in L' is within some threshold distance of the nearest point in L'. For the tests below, the threshold is 3 pixels. The repeatability rate is given as a percent: the ratio between the number of actual repeated points and the number of points that could possibly repeat between the two images.

In addition to testing if the position repeats, it is also important to test if the shape repeats. Let U be the 2x2 shape matrix associated with x in L. U induces an ellipse in L with equation
yTUTUy=1.
This shape can be projected into L' and compared with the ellipse induced by the shape matrix associated with the nearest point in L'. To compare ellipses, we find the ratio of their intersection to their union. The choice of shape threshold determines whether the ellipses are sufficiently similar.

Test Images

We tested repeatability on the graffiti6 dataset collected by Krystian Mikolajczyk. The set includes ground truth homographies. Here are the images: We tested 12 different combinations of point detectors and affine-adaptation settings. Six of these used binaries graciously provided by Krystian Mikolajczyk. The other six used our own implementations. All the tests were done with the program vibes_test.ln provided by Krystian. The methods tested (along with their abbreviations) were: The following tables give percent repeatability between img1 and each of the other images.

Shape threshold = 0.1:
HKHK/FHK/MHH/FH/MSKSK/FSK/MSS/FS/M
img22.180.385.61010242.10.5114111638
img300.782.102.5903.56.506.325
img400.71014.700.872.802.216
img5000.8900.451.700.11.500.677.8
img6000000.3800.10.3700.223.1
img7143.91.51587206.95.4111617
img8000.300.152.100.150.9200.899.8

Shape threshold = 0.55:
HKHK/FHK/MHH/FH/MSKSK/FSK/MSS/FS/M
img2464538495362585647455656
img3312727363744474034384241
img4242121262633332721282834
img5141013151420191514151722
img62.73.94.94.47.58.33.85.55.23.55.710
img7312821313129434231183736
img8118.911141416171310131722