|
[back]
Towards
an Integrated Model of Saliency-based Attention and Object Recognition
Dirk Walther, Maximilian Riesenhuber, Tomaso Poggio2, Laurent Itti, Christof
Koch1
Abstract.
We are working on an integrated model for the dorsal (where) and the
ventral (what) pathway in the primate's visual processing system and
the interaction between these two pathways. To model visual search behavior
in primates, we integrate and extend the saliency-based model for bottom-up
attention by Itti and Koch (Nature Review Neuroscience 2001;2(3):194-203)
and the HMAX hierarchical model for object recognition by Riesenhuber
and Poggio (Nature Neuroscience 1999;2:1019-1025).

Figure
1.
Schematic of the attentional modulation experiment. The stimulus composed
of two paper clip images is subjected to the saliency-based bottom-up
attention algorithm - the most salient location is determined (blue
circle). Around this focus of attention, the modulation function is
created that is subsequently used to modulate the V4 layer in the HMAX
object recognition model. In this example the recognition model is trained
for the paper clip that is not attended to in the first step. Hence
its response is very low (response a), much lower than the recognition
threshold (dashed line). Inhibition of return (IOR) inhibits the originally
attended location such that the next most salient location (yellow circle)
can win the competition for salience. Accordingly, a different modulation
function is used to modulate the HMAX recognition model. As this time
the correct paper clip is being attended to, the response of the recognition
model is high (response b), much higher than the recognition threshold.
In this example, the target paper clip was detect at the second attended
location. Up to five locations are taken into account in the computer
experiments.

Figure 2. Recognition rates for stimuli with varying distance
between the two paper clip images with (yellow) and without the use
of attentional modulation. An improvement by up to 100% is achieved
with the attentional modulation paradigm.
In the combined model we use saliency-based attention to modulate object
recognition at the V4 level. Interesting regions in the visual scene
are successively selected by a rapidly shiftable focus of attention
(FOA). Neural activity of a particular neuron in V4 is inhibited based
on its distance from the current FOA (figure 1). Recognition rates for
stimuli composed of two paper clip objects typically increase twofold
compared to previous experiments without attention (Neuron 1999;24(1):87-93).
To achieve this improvement a depression of the V4 activity outside
the focus of attention by as little as 20% proves to be sufficient.
With 10% activity modulation recognition still improves by 70%. We find
that the twofold increase in recognition rate is robust over a large
range of modulation strengths of the V4 activity.
We aim to develop a system that can explore a natural scene and look
for specific objects in a cluttered environment. There is a whole range
of possible applications for such a system - e.g. automated vehicle
navigation, surveillance or video conferencing systems, to name a few.
One of our next steps towards the goal of an integrated neuromorphic
vision system is the processing of real world images. In addition to
the integration of the saliency-based bottom-up attention system with
object recognition, we are going to integrate components of top-down
attentional interaction into our model in order to bias the attention
system towards the features relevant for detecting the target object.
To realistically simulate both covered and overt attention shifts, we
are already using a pan-tilt video camera with our model for mimicking
saccadic eye movements. In the current implementation, the camera is
looking around the room, focussing in on the most interesting (i.e.
most salient) location in its visual field.
top
|