Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

Rapid Natural Scene Categorization without Attention
Fei Fei Li, Rufin VanRullen, Christof Koch, Pietro Perona

Visual attention plays an important role as we walk around the world and recognizes different objects. So what happens when attention is taken away? Are we still able to recognize scenes or objects? Our study finds that certain high level tasks, such as natural scene categorization, can still be performed with little or no attention.

Abstract.
What can we see when we do not pay attention? While attention is not necessary for some detection tasks on simple synthetic stimuli, without it we are "blind" even to major aspects of a natural complex scene. It would thus appear that only visual tasks that have an explanation in the early stages of the visual system may be carried out without attention. We report on a complex visual task that requires no attention. Our subjects can rapidly detect animals in briefly presented natural scenes while simultaneously performing another visual task that demands full attention. By comparison, they are unable to discriminate large 'T's from 'L's in the same conditions. We conclude that attention may not be necessary for some visual tasks that are associated with 'high level' cortical areas.

Motivation.
Psychologists have long known that certain visual search tasks do not require attention. A hallmark of inattentive visual is that it is achieved in a parallel fashion: an inattentive task may be carried out simultaneously with other visual tasks; target detection does not become more difficult when the number of distractors is increased. For example, the following three panels are three examples of inattentive tasks taken from Braun and colleagues' study in 1998.


However, none of the known inattentive tasks approaches the sophistication of everyday vision where complex scenes must be scrutinized in order to assess high level properties such as presentation of danger or the structure of the social interaction. On the other hand, previous studies have suggest a superbly rapid processing of complex visual scenes by our visual system (Thorpe et al, 1996). In general, humans can decisively differentiate a natural scene containing target object(s) (such as animal(s)) from a scene that does not contain target object in as fast as 150msec. We are hence motivated by these two lines of studies to explore whether attention plays a critical role in such rapid natural scene categorizations.

Main Results. We conducted psychophysical experiment on human subjects. The following figure is a schematic illustration of one trial of the main experiment, along with samples of target and distractor images. An attentionally demanding letter discrimination task is presented at the center of the visual field. After the central SOA time, the central stimulus (combination of Ts and Ls) is replaced by a perceptual mask (five Fs). Subjects are instructed to respond whether all five letters are the same or one of them is different. In the peripheral natural scene categorization task, an image is presented peripherally for a very brief time (27msec). Subjects make a speeded response to the presence of animals. Under the dual task condition, subjects are required to perform both tasks concurrently.

As illustrated by the following figure, our results show that there is little or no attentional cost for subjects to perform the natural scene categorization task when attention is withdrawn by the central letter discrimination task. Here we show a normalized average performance by all subjects, each represented by one dot from each color. Red dot indicates performance collected from trained images. Blue dot indicates performance collected from completely novel images. The horizontal axis represents performance of the central task (attentionally demanding). The vertical axis represents performance of the peripheral task (natural scene categorization). For each subject, his/her single task performance on both the peripheral task and the central task are independently normalized to 100%. Our goal is to compare the dual task performances, represented by red and blue dots. Clustering of the dots at the (100,100) corner indicates that subjects can simultaneously perform both the central letter task as well as the peripheral categorization task.

Conclusion. We reported a study in which withdrawing attention entails little or no cost to the performance of a complex visual task, namely the natural scene categorization task. We also conducted a series of control experiments that compared this result with performing seemingly simpler, synthetic stimuli visual tasks. Contrary to our intuition, natural scene categorization appears to entail the least attentional demand. These results lend great motivation to future studies of attentional as well as recognition models of the human visual system.


top