|
[back]
Models
of visual object categorization in humans
Robert J. Peters,
Fabrizio Gabbiani, Christof
Koch
Abstract.
Previous studies of exemplar, prototype, and decision-bound models of
visual object categorization have not resolved the importance of memory
capacity and flexibility of decision surfaces in human categorization
behavior. We have compared these previous models with our new roaming
exemplar model (RXM), according to their abilities to match human observers'
categorizations of various 2-D image contours. Unlike past comparisons
among categorization models, we explicitly accounted for memory capacity
by penalizing models for their number of free parameters with the Akaike
information criterion. This revealed that a successful model of human
categorization--such as the RXM--did not require a large memory capacity
if the orientation of its decision boundary was unconstrained, suggesting
that an efficient computer implementation of object categorization could
also rely on limited memory storage.
Motivation.
Object categorization is one of the primary tasks of the human visual
system. Successful categorization of visual stimuli is a result of sensory
processing and prior visual experience that is used for conscious cognition.
Psychological models of categorization typically make categorization
decisions using a mechansim based on a multidimensional representation
of incoming stimuli, plus possible auxiliary representations, such as
memory traces. This process is controlled by a number of free parameters,
which are fitted with the goal of matching human categorization behavior.
However, a simple statistical comparison between models may ignore important
differences in the neurobiological implications of the models. For example,
one highly successful model, the generalized context model (GCM), assumes
that all training images are stored in memory; a literal interpretation
of the GCM might conclude that the neuronal substrate of categorization
also scales linearly with the number of exemplars in a category, or
that categorization in biological systems involves only brute-force
memorization, without any category-level abstraction. To provide a more
detailed look at such issues, we have developed a new roaming exemplar
model (RXM) that draws from neural networks and exemplar-based models
of categorization. In contrast to previous exemplar-based models, the
RXM's memory traces are free parameters, allowing us to control for
memory capacity when judging a models' goodness-of-fit. Thus, using
human categorization performance as goal, we compare several computational
models of categorization, providing new insights regarding the key qualities
of a successful categorization model.
Research

We used
three types of schematic, line-drawn visual stimuli (see figure above):
Brunswik faces and tropical fish outlines, which have been used previously,
plus a new set of "cartoon face" images. Each type of visual object
was parameterized along four dimensions comprising the stimulus parameter
space. Different groups of objects were assigned to configurations,
which contained equal numbers of training exemplars assigned to each
of two categories, as well as an additional number of testing exemplars.
The training exemplars from the two categories were always chosen so
as to be linearly separable in the objects' parameter space; that is,
the members of the two categories could be separated by some 3-D hyperplane
in the 4-D parameter space. The categorization experiments consisted
of a training phase and a testing phase. In both phases, subjects viewed
a series of objects presented one at a time. Each object was presented
for 2s, followed by 2s of blank screen. During each 4s trial, subjects
pressed one of two buttons indicating to which category the object belonged.
In the training phase, subjects were shown only the training exemplars
from the two categories of objects, and were given feedback on their
reponses in the form of a high- or low-pitch tone indicating whether
the response was correct or incorrect, respectively. Subjects performed
training blocks of 100 trials until they scored at least 85% correct
on a training block. Once subjects reached this criterion, they moved
into the testing phase, in which they were shown the previously unseen
testing exemplars, in addition to the training exemplars that they had
viewed during the training phase. Subjects received no feedback on their
responses during the testing phase.

We tested
several categorization models (see figure above) by fitting them to
match the human observers' response profiles from the testing phase
of the categorization tasks. Each model receives input in the 4-D stimulus
parameter space, and produces an output that represents a categorization
probability for the input object. The models we tested can be summarized
as follows:
Exemplar
models compute the distance in feature space between a test exemplar
and each of a set of stored exemplars. The test exemplar is classified
into the category for which the sum of these distances is smallest.
Different types of exemplar models have different ways of choosing the
stored exemplars:
All-exemplar
model, in which the set of stored exemplars is identical to the
set of training exemplars; this model has the highest possible memory
demand.
Prototype model, in which the one stored exemplar per category
is the arithmetic mean of the training exemplars from that category;
this model has a low and constant memory demand.
Roaming-exemplar model[n], in which each category has n stored
exemplars, which must lie within the polygon that circumscribes the
training exemplars (dotted lines). The number of stored parameters
can be chosen to control the memory demand of the model.
Boundary
models learn a linear or quadratic boundary that separates the categories
in feature space, and then classifies new objects according to their
distance from this boundary.

We fitted
subjects' categorization probabilities with versions of the roaming-exemplar
model using 1, 2, 3, 6, and 10 stored exemplars, as well as the all-exemplar,
prototype, and linear boundary models, and assessed these fits with
two measures (see figure above):
1. The
loglikelihood, which represents the overall fitting error but is not
corrected for the number of free parameters, and
2. The Akaike information criterion (AIC), which includes a penalty
for the number of free parameters, thereby allowing unbiased comparisons
among models with different numbers of free parameters.
When the
model fits were assessed with the loglikelihood (above left), we found
that the all-exemplar and boundary models both obtained better (lower)
scores than the prototype model. All of the roaming-exemplar models
obtained better scores than the all-exemplar, boundary, and prototype
models. In addition, there were large improvements in the fit of the
RXM[n] as the number of stored exemplars increased.
In contrast, when the model fits were assessed with the AIC to account
for their number of free parameters (above right), the RXM with one
stored exemplar (RXM[1]) obtained a better (lower) score than all other
models, including all-exemplar models, prototype models, boundary models,
and versions of the RXM with more than one stored exemplar. Moreover,
increasing the number of stored exemplars in the RXM[n] was detrimental
to the AIC goodness of fit, so that the RXM[6] and RXM[10] fit much
worse than any of the other models. A detailed analysis showed that
the RXM[1], despite its low memory capacity, was able to outperform
the other models because it had better flexibility in the shape and
orientation of its decision surfaces.
In contrast, when the model fits were assessed with the AIC to account
for their number of free parameters (above right), the RXM with one
stored exemplar (RXM[1]) obtained a better (lower) score than all other
models, including all-exemplar models, prototype models, boundary models,
and versions of the RXM with more than one stored exemplar. Moreover,
increasing the number of stored exemplars in the RXM[n] was detrimental
to the AIC goodness of fit, so that the RXM[6] and RXM[10] fit much
worse than any of the other models. A detailed analysis showed that
the RXM[1], despite its low memory capacity, was able to outperform
the other models because it had better flexibility in the shape and
orientation of its decision surfaces.

In the
RXM, the parameters which describe the stored exemplars become free
parameters of the model, and can be incorporated into comparisons among
models using statistical measures such as the Akaike Information Criterion.
This allowed us to address the importance of memory by comparing different
versions of the RXM with different numbers of stored exemplars. With
this framework, we can now provide a better answer as to why models
which are otherwise appealing in their conceptual simplicity, such as
prototype models, are consistently outperformed by all-exemplar models:
all-exemplar models allow better flexibility in matching the shape and
orientation of decision surfaces to those used by human observers (see
figure above). Our results show that the goodness-of-fit of all-exemplar
models can even be improved by allowing "roaming" stored exemplars,
and thus an unconstrained decision boundary, without committing to potentially
unreasonable memory demands or to a lack of category-level abstraction.
This is an important step toward the goal of developing models of object
recognition that can perform as well as human observers, yet also be
implemeted in a computationally efficient manner--we have shown that
such an implementation does not need an exorbitant memory capacity as
long as it has sufficient flexibility in its learning algorithm.
top
|