Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

The Bin Model for Generalization
Alexander Nicholson, Xubo Song, Yaser Abu-Mostafa

Abstract. The problem of overfitting the data is attacked by using the Bin Model analysis. This provides a method of bounding generalization error without sacrificing valuable training data.

Motivation & Aims. In the standard learning scenario, we are given a finite set of examples with targets that were generated according to some rule, and the task is to extract the rule from the examples. It is often the case that we can fit the data well, but have failed to extract the rule so that outputs given on new data are incorrect. This is the problem of overfitting the data.

We would like to give a measure of how well we expect our hypothesis to perform on new data, that is, what sort of generalization we can achieve. This is most commonly done by reserving a portion of the data set for directly estimating the generalization error. When only a small number of examples are available, this can be undesirable. The Bin Model analysis attempts to provide a method of bounding generalization error without sacrificing valuable training data.

Research & Achievements.
The Bin Model - We consider binary classification problems - the target function f(x) maps input vectors x to {0,1}. We represent each candidate function g(x) by a bin. Each bin is characterized by a parameter , which is the probability that g(x) disagrees with f(x) on a point randomly chosen from the input space.

A bin can be visualized as containing red and green marbles. The probability of picking a red marble is . When we draw i.i.d. examples from the input space and test them on the functions f and g, we are virtually drawing marbles from a bin and checking their colors. We can consider each sample as a Bernoulli trial with probability of failure (red marble) and probability (1-) of success (green marble).

When we look at a class of hypothesis functions - our learning model, we have a set of 's which we can consider as being drawn from a distribution, p().

The -distribution indicates the suitability of the learning model for approximating the target function.

It is worth noticing that the -distribution embodies all of the information needed to analyze the learning problem. We do not need the information about the specific forms that the candidate functions assume. Nor do we need to know anything about the inputs and input distribution. All that we do is to draw marbles (equivalent to i.i.d. inputs) from the bins characterized by the -distribution. This makes it possible to model learning machines in a very simple way and without loss of generality.

Because of its simplicity, the bin model is useful for giving us ideas about generalization. The -distribution summarizes the generalization behavior entirely, so, although it is dependent on the learning model and target function, we can expect functions with similar -distributions to have similar generalization performance, regardless of other model complexity measures. This result, although contrary to common conceptions about complexity and overfitting, has been experimentally verified, and is illustrated by the following figure.

Currently under investigation are ways of estimating the distribution in practical learning situations, applications of the Bin Model analysis to continuous and noisy target functions, and the effects of certain learning algorithms on the generalization results.

Publications/References

Bin Model for Neural Networks. Abu-Mostafa, Y. and Song, X. In: Proceedings of the International Conference on Neural Information Processing (ICONIP'96), Hong Kong, 1996, pp. 169-173.

top