Most models
of object recognition assume the isolated occurrence of objects in the
field of view. However, in our everyday experience we are usually confronted
with scenes that are cluttered with a variety of objects – some
relevant for our actions, some not. Our brain’s response to this
overwhelming flood of visual information is serializing the processing
of the objects by mechanisms of visual attention. Attentional selection
of objects is often modeled using all-or-nothing switching of neuronal
connection pathways from the attended region of the retinal input to
the recognition units. However, there is little physiological evidence
for such all-or-none modulation in early areas. We have developed a
combined model for spatial attention and object recognition in which
the recognition system monitors the entire visual field, but attentional
modulation by as little as 20% at a high level is sufficient to recognize
multiple objects.
A first important step is the approximate extraction of the extent of
the attended object. We have extended our model of bottom-up saliency-based
attention to this end. Once we have determined the most salient location
in the input image, we ask back why this location is salient, tracing
back to the conspicuity map and finally the feature map contributing
most to the saliency of the attended location. Since the feature map
is much sparser than the saliency map, we can segment the shape of the
attended object in this feature map, thereby obtaining a mask that is
used for object-based inhibition of return as well as for modulating
the activity of cell populations in the object recognition system.
We are modulating the activity of neuron populations at the S2 level
of processing in our hierarchical model for object recognition. The
rational for choosing S2 is twofold – biologically and computationally
motivated. The S2 layer corresponds in its function approximately to
area V4 in the primate visual cortex. There have been a number of reports
from electrophysiology [1-5] and psychophysics [6, 7] that show attentional
modulation of V4 activity. Hence, the S2 level is a natural choice for
modulating recognition. From a computational point of view, it is efficient
to apply the modulation at a level as high up in the hierarchy as possible
that still has some spatial resolution, i.e. S2. This way, the computation
that is required to obtain the activations of the S2 units from the
input image needs to be done only once for each image. When the system
attends to the next location in the image, only the computation upwards
from S2 needs to be repeated.
Activation outside the FOA is entirely suppressed. As little as 20%
attentional modulation is sufficient to boost the recognition performance
significantly.
Using the model described above, we were subsequently able to process
multiple paperclip stimuli in images. It is remarkable that as little
as 20% modulation of the activity of neuron populations at the S2 level
of HMAX was sufficient to successfully recognize both paperclip stimuli
in almost all of the stimuli containing two paperclips.
References
1. Reynolds, J.H., T. Pasternak, and R. Desimone, Attention increases
sensitivity of V4 neurons. Neuron, 2000. 26(3): p. 703-714.
2. Treue, S., Neural correlates of attention in primate visual cortex.
Trends in Neurosciences, 2001. 24(5): p. 295-300.
3. Connor, C.E., D.C. Preddie, J.L. Gallant, and D.C. Van Essen, Spatial
attention effects in macaque area V4. Journal of Neuroscience, 1997.
17(9): p. 3201-3214.
4. Motter, B.C., Neural Correlates of Attentive Selection for Color
or Luminance in Extrastriate Area V4. Journal of Neuroscience, 1994.
14(4): p. 2178-2189.
5. Luck, S.J., L. Chelazzi, S.A. Hillyard, and R. Desimone, Neural mechanisms
of spatial selective attention in areas V1, V2, and V4 of macaque visual
cortex. Journal of Neurophysiology, 1997. 77(1): p. 24-42.
6. Intriligator, J. and P. Cavanagh, The spatial resolution of visual
attention. Cognitive Psychology, 2001. 43(3): p. 171-216.
7. Braun, J., Visual-Search among Items of Different Salience - Removal
of Visual-Attention Mimics a Lesion in Extrastriate Area V4. Journal
of Neuroscience, 1994. 14(2): p. 554-567.
8. Itti, L., C. Koch, and E. Niebur, A model of saliency-based visual
attention for rapid scene analysis. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 1998. 20(11): p. 1254-1259.
9. Itti, L. and C. Koch, Computational modelling of visual attention.
Nature Reviews Neuroscience, 2001. 2(3): p. 194-203.
10. Riesenhuber, M. and T. Poggio, Hierarchical models of object recognition
in cortex. Nature Neuroscience, 1999. 2(11): p. 1019-1025.