|
[back]
Detection
of Human Motion in a Cluttered Scene
Yang Song, Luis Goncalves, and Pietro Perona
Abstract.
Humans are the most important component of a machine's environment.
We develop an algorithm which can generate models of human motion automatically
from unlabeled real image sequences. Experiments show that the resulting
models can successfully detect and label humans from image sequences
with clutter and occlusion.
Motivation.
Humans are the most important component of a machine's environment.
Detecting and interpreting human presence, actions and activities is
one of the most valuable functions of our own visual system. Endowing
machines with the same ability would enable a great number of useful
industrial applications ranging from convenient non-contact user interfaces
for consumer products, to on-board safety systems for automobiles, and
surveillance systems for stores and museums.
A system for interpreting human activity must, first of all, be able
to detect human presence. A second important task is to localize the
visible parts of the body and assign appropriate labels to the corresponding
regions of the image -- for brevity we call this the labeling task.
Given a labeling the different parts of the body may be tracked in time.
Their trajectories and/or spatiotemporal energy pattern will allow a
classification of the actions and activities.
We focus here on detection and labeling. This problem was studied in
the context of a `generalized Johansson problem' in our previous project.
The position and velocity of point-features is the input to a system
that decides whether human motion is present. The system also assigns
probabilistic labels (the main parts of the body plus a generic background
label) to the detected features. The method is shown to be fast and
robust both to extraneous clutter and to undetected body parts. The
algorithm is also demonstrated to work well on a number of grayscale
image sequences. We address here the problem of unsupervised learning
of model structure.
Research. We restrict our attention to triangulated models, since
they both account for much correlation between the random variables
that represent the position and motion of each body part, and they yield
efficient algorithms. Our goal is to learn the best triangulated model,
i.e., the one that reaches maximum likelihood with respect to the training
data. We approach the problem in two settings: when the training features
are labeled, i.e., the parts of the model and the correspondence between
the parts and observed features are known (e.g. by a motion-capture
system), and when the training features are unlabeled, i.e., the training
features include both useful foreground parts and background clutter
and the correspondence between the parts and detected features are unknown
(e.g. when they are acquired with a monocular camera and no human intervention
is practical). The method is based on maximum-likelihood. Taking the
labeling of the data as hidden variables, a variant of the EM algorithm
can be applied. A greedy algorithm is developed to search for the optimal
structure of the decomposable model based on the (conditional) differential
entropy of variables. Our algorithm leads to systems able to learn models
of human motion completely automatically from real image sequences -
unlabeled training features with clutter and occlusion. We conducted
experiments both on motion-captured data and on grayscale image sequences.
Figure 1 shows the samples frames from the body and chair moving
sequences. The dots (either in black or in white) are the features selected
by Lucas-Tomasi-Kanade algorithm on two frames. The white dots are the
most human-like configuration found by the automatically learned model.
Figure 2 shows the automatically learned model. Figure 2
(a) gives the mean positions and mean velocities (shown in arrows) of
the composed parts selected by the algorithm. Figure 2 (b) shows
the learned decomposable triangulated probabilistic structure. The results
show that the automatically learned model and successfully detect and
label human, even better than the hand-crafted model.



Publications/References.
Learning Probabilistic Structure for Human Motion Detection. Y Song,
L Goncalves, P Perona. In: Proc. of IEEE CVPR'01, to appear, December
2001
Unsupervised
Learning of Human Motion Models. Y Song, L Goncalves, P Perona.
In: NIPS 14, to appear, December 2001
Monocular Perception of Biological Motion - Detection and Labeling.
Y Song, L Goncalves, E Di Bernardo, P Perona. In: Proc. of ICCV'99,
pp 805-812, Sept, 1999,Corfu, Greece.
Monocular Perception of Biological Motion - Clutter and Partial Occlusion.
Y Song, L Goncalves, P Perona. In: Proc. ECCV'00, Vol 2, pp 719-733,
June/July 2000.
Towards Detection of Human Motion. Y Song, X Feng, P Perona.
In: Proc. IEEE CVPR'00, Vol 1, pp 810-817, June 2000.
top
|