Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

Towards an Integrated Model of Saliency-based Attention and Object Recognition
Dirk Walther, Maximilian Riesenhuber, Tomaso Poggio2, Laurent Itti, Christof Koch1

Abstract. We are working on an integrated model for the dorsal (where) and the ventral (what) pathway in the primate's visual processing system and the interaction between these two pathways. To model visual search behavior in primates, we integrate and extend the saliency-based model for bottom-up attention by Itti and Koch (Nature Review Neuroscience 2001;2(3):194-203) and the HMAX hierarchical model for object recognition by Riesenhuber and Poggio (Nature Neuroscience 1999;2:1019-1025).

Figure 1. Schematic of the attentional modulation experiment. The stimulus composed of two paper clip images is subjected to the saliency-based bottom-up attention algorithm - the most salient location is determined (blue circle). Around this focus of attention, the modulation function is created that is subsequently used to modulate the V4 layer in the HMAX object recognition model. In this example the recognition model is trained for the paper clip that is not attended to in the first step. Hence its response is very low (response a), much lower than the recognition threshold (dashed line). Inhibition of return (IOR) inhibits the originally attended location such that the next most salient location (yellow circle) can win the competition for salience. Accordingly, a different modulation function is used to modulate the HMAX recognition model. As this time the correct paper clip is being attended to, the response of the recognition model is high (response b), much higher than the recognition threshold. In this example, the target paper clip was detect at the second attended location. Up to five locations are taken into account in the computer experiments.


Figure 2. Recognition rates for stimuli with varying distance between the two paper clip images with (yellow) and without the use of attentional modulation. An improvement by up to 100% is achieved with the attentional modulation paradigm.

In the combined model we use saliency-based attention to modulate object recognition at the V4 level. Interesting regions in the visual scene are successively selected by a rapidly shiftable focus of attention (FOA). Neural activity of a particular neuron in V4 is inhibited based on its distance from the current FOA (figure 1). Recognition rates for stimuli composed of two paper clip objects typically increase twofold compared to previous experiments without attention (Neuron 1999;24(1):87-93). To achieve this improvement a depression of the V4 activity outside the focus of attention by as little as 20% proves to be sufficient. With 10% activity modulation recognition still improves by 70%. We find that the twofold increase in recognition rate is robust over a large range of modulation strengths of the V4 activity.

We aim to develop a system that can explore a natural scene and look for specific objects in a cluttered environment. There is a whole range of possible applications for such a system - e.g. automated vehicle navigation, surveillance or video conferencing systems, to name a few.

One of our next steps towards the goal of an integrated neuromorphic vision system is the processing of real world images. In addition to the integration of the saliency-based bottom-up attention system with object recognition, we are going to integrate components of top-down attentional interaction into our model in order to bias the attention system towards the features relevant for detecting the target object.

To realistically simulate both covered and overt attention shifts, we are already using a pan-tilt video camera with our model for mimicking saccadic eye movements. In the current implementation, the camera is looking around the room, focussing in on the most interesting (i.e. most salient) location in its visual field.



top