|
[back]
The
Bin Model for Generalization
Alexander Nicholson,
Xubo Song, Yaser Abu-Mostafa
Abstract.
The problem of overfitting the data is attacked by using the Bin Model
analysis. This provides a method of bounding generalization error without
sacrificing valuable training data.
Motivation
& Aims. In the standard learning scenario, we are given a finite
set of examples with targets that were generated according to some rule,
and the task is to extract the rule from the examples. It is often the
case that we can fit the data well, but have failed to extract the rule
so that outputs given on new data are incorrect. This is the problem
of overfitting the data.
We would
like to give a measure of how well we expect our hypothesis to perform
on new data, that is, what sort of generalization we can achieve. This
is most commonly done by reserving a portion of the data set for directly
estimating the generalization error. When only a small number of examples
are available, this can be undesirable. The Bin Model analysis attempts
to provide a method of bounding generalization error without sacrificing
valuable training data.
Research & Achievements. The Bin Model - We consider binary classification
problems - the target function f(x) maps input vectors x to {0,1}. We
represent each candidate function g(x) by a bin. Each bin is characterized
by a parameter , which
is the probability that g(x) disagrees with f(x) on a point randomly
chosen from the input space.
A bin can be visualized as containing red and green marbles. The probability
of picking a red marble is .
When we draw i.i.d. examples from the input space and test them on the
functions f and g, we are virtually drawing marbles from a bin and checking
their colors. We can consider each sample as a Bernoulli trial with
probability of failure
(red marble) and probability (1- )
of success (green marble).
When we look at a class of hypothesis functions - our learning model,
we have a set of 's
which we can consider as being drawn from a distribution, p( ).
The -distribution
indicates the suitability of the learning model for approximating the
target function.
It
is worth noticing that the -distribution
embodies all of the information needed to analyze the learning problem.
We do not need the information about the specific forms that the candidate
functions assume. Nor do we need to know anything about the inputs and
input distribution. All that we do is to draw marbles (equivalent to i.i.d.
inputs) from the bins characterized by the -distribution. This makes it
possible to model learning machines in a very simple way and without loss
of generality.
Because of its simplicity, the bin model is useful for giving us ideas
about generalization. The -distribution
summarizes the generalization behavior entirely, so, although it is dependent
on the learning model and target function, we can expect functions with
similar -distributions
to have similar generalization performance, regardless of other model
complexity measures. This result, although contrary to common conceptions
about complexity and overfitting, has been experimentally verified, and
is illustrated by the following figure.
Currently under investigation are ways of estimating the distribution
in practical learning situations, applications of the Bin Model analysis
to continuous and noisy target functions, and the effects of certain learning
algorithms on the generalization results.
Publications/References
Bin Model for Neural Networks. Abu-Mostafa, Y. and Song, X. In:
Proceedings of the International Conference on Neural Information Processing
(ICONIP'96), Hong Kong, 1996, pp. 169-173.
top
|