Video Modeling and Recognition
Here are some initial results from our work on modeling and matching video.
Bear and Dog
Run Lola Run
Region tracking from "Run Lola Run".
Tracks colorized according to rigid component (as detected by
motion segmentation program):
Patches from model, reprojected:
Stills from Corner 2:
Stills from Train 2:
Bear
Teddy bear with head wobbling:
Teddy bear by itself:
Matching experiment on Bear
Here is an image showing two rigid models of the bear brought into register.
One model is from the bear-dog video, and the other is from the bear by
itself.
The image on the right has been projected thru the same camera matrix as the
image on the left. The camera matrix is the first recovered pose for the
left image. Red lines connect the patches that were matched between the
two models and used for registration. We measure the quality of the match
by the repeat rate, which is the ratio of the number of matches to the
number of possible matches. The repeat rate for this experiment is:
210 / min (2385, 3701) = 0.0880503
Dinosaurs
This clip shows some of the problems with tracking affine invariant interest
points on objects with less texture and more spcularity.
Technique
To do the modeling, we need to locate points on the surfaces of the objects
and track them thru the video frames. You can see some of the results of
tracking above. We seed points in each frame using a combination of
Laplacian and Harris-Laplacian detectors. We affine-adapt these points
and use the Birchfield tracker to find them in subsequent frames.
While tracking the interest points, we keep the patch around each point aligned
with its respective patch in the first frame using LM. Here are videos of a
few selected patches. The videos show each patch evolving as the point is
tracked. The idea is to have as little change in appearance as possible.
These clips are highly magnified (roughly 10x to 20x, depending on scale of
point), so motion you see here is actually quite small in the real image.