Figure
1.
Sketch of hypothesis build:
a)
Initialization: The closest match in the database is identified for
each scene feature.
b) Search for a new pairing: starting from the latest assignment,
we look for an unassigned feature in the model image. This feature
is mapped to its best appearance match in the scene, under the condition
that this new pairing is coherent with the pose predicted by the hypothesis.
The pose is then updated based on the new match.
Main
Results.
We compare the performance of our probabilistic search method to that
of Lowe's algorithm on a training set consisting of 100 images of toys
and common kitchen items, with a single image per object. The test set
contained images of single objects and complicated scenes that include
several objects. We used a resolution of 480x320 for training images
and test images of single objects, and 800x533 for complex scenes.
Our algorithm achieved a performance similar to Lowe's system, with
a detection rate of 85\%. Since our method verifies the coherence of
each match by scoring the resulting partial hypothesis, our false alarm
rate was lower than that of Lowe's method.
In order to measure the accuracy of the pose transformations estimated
by each method, the training and test images were manually marked with
ground truth information. An ellipse was fitted around objects present
in a scene, and a canonical orientation was chosen for each of them.
We measured the accuracy of the transformation with the distance, in
pixels, between the predicted positions of the ellipses in a scene,
and the ground truth previously recorded. The error was averaged across
points on the ellipse and across test images. We obtained a mean error
of 45 pixels for our method, and 56 pixels for Lowe's algorithm.
Our approach requires examination and valuation of a number of partial
and complete hypotheses that are much higher than with Lowe's method.
As a result, the probabilistic algorithm is the slower of the two methods.
Our unoptimized code for Lowe's method takes in average 2 seconds on
a Pentium 4 running at 2.4GHz to identify objects in an 800x533 image,
while our probabilistic algorithm requires on average 10 seconds for
the same image.

Figure
2. Example
of result for a textured object included in a complex scene.
a) Initial object
b) Result of Lowe's algorithm. Since the stuffed bear is a
textured object, detection of similar features can occur in many locations,
leading to incorrect pairings. As a result, the frame transformation,
estimated only from the features positions, is inaccurate.
c) Result of the probabilistic search.
Conclusion.
We
introduced a probabilistic framework that explains scenes measurements
with the presence of objects from a database, and with background detections.
Our approach benefits from a high number of matches, which leads to
a high confidence on the presence of objects in the scene. The valuation
of each complete or partial hypothesis allows us to verify the coherence
of each tentative new pairing, thus leading to a lower false alarm rate
than that obtained by Lowe.
In the current implementation, the parameters of the location terms
in the foreground density are set to arbitrary values. Future work will
include actual learning of those parameters from several observations
of the same features across multiple images. Independent parameters
should ultimately be used for each feature in the database.
Another direction for future research is the relative importance of
pose and appearance in the incremental hypothesis build process. When
a hypothesis is initialized and consists of a single match, the uncertainty
in the predicted frame position is very high, and decreases as more
pairings are added. A next version of the hypothesis search method will
put a higher weight on appearance-based pairings at the beginning of
the search, and a higher weight on location verification when numerous
correspondences have been chosen.
References
[1]. D.G. Lowe, “Object recognition from local scale-invariant
features”, Proc. Int. Conf. Comp. Vision, Corfu, Greece, 1999
[2]. M. Weber, M. Welling and P. Perona, “Unsupervised learning
of models for recognition”, Proc. 6th Europ. Conf. Comp. Vis.,
2000.
[3]. J. Pearl, “Heuristics”, Addison-Wesley, 1984.
