Video Modeling and Recognition

Here are some initial results from our work on modeling and matching video.

Bear and Dog

Run Lola Run

Region tracking from "Run Lola Run". Tracks colorized according to rigid component (as detected by motion segmentation program): Patches from model, reprojected: Stills from Corner 2: Stills from Train 2:

Bear

Teddy bear with head wobbling:

Teddy bear by itself:

Matching experiment on Bear

Here is an image showing two rigid models of the bear brought into register. One model is from the bear-dog video, and the other is from the bear by itself. The image on the right has been projected thru the same camera matrix as the image on the left. The camera matrix is the first recovered pose for the left image. Red lines connect the patches that were matched between the two models and used for registration. We measure the quality of the match by the repeat rate, which is the ratio of the number of matches to the number of possible matches. The repeat rate for this experiment is: 210 / min (2385, 3701) = 0.0880503

Dinosaurs

This clip shows some of the problems with tracking affine invariant interest points on objects with less texture and more spcularity.

Technique

To do the modeling, we need to locate points on the surfaces of the objects and track them thru the video frames. You can see some of the results of tracking above. We seed points in each frame using a combination of Laplacian and Harris-Laplacian detectors. We affine-adapt these points and use the Birchfield tracker to find them in subsequent frames. While tracking the interest points, we keep the patch around each point aligned with its respective patch in the first frame using LM. Here are videos of a few selected patches. The videos show each patch evolving as the point is tracked. The idea is to have as little change in appearance as possible. These clips are highly magnified (roughly 10x to 20x, depending on scale of point), so motion you see here is actually quite small in the real image.