Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

Detection of Human Motion in a Cluttered Scene
Yang Song, Luis Goncalves, and Pietro Perona

Abstract. Humans are the most important component of a machine's environment. We develop an algorithm which can generate models of human motion automatically from unlabeled real image sequences. Experiments show that the resulting models can successfully detect and label humans from image sequences with clutter and occlusion.

Motivation. Humans are the most important component of a machine's environment. Detecting and interpreting human presence, actions and activities is one of the most valuable functions of our own visual system. Endowing machines with the same ability would enable a great number of useful industrial applications ranging from convenient non-contact user interfaces for consumer products, to on-board safety systems for automobiles, and surveillance systems for stores and museums.

A system for interpreting human activity must, first of all, be able to detect human presence. A second important task is to localize the visible parts of the body and assign appropriate labels to the corresponding regions of the image -- for brevity we call this the labeling task. Given a labeling the different parts of the body may be tracked in time. Their trajectories and/or spatiotemporal energy pattern will allow a classification of the actions and activities.

We focus here on detection and labeling. This problem was studied in the context of a `generalized Johansson problem' in our previous project. The position and velocity of point-features is the input to a system that decides whether human motion is present. The system also assigns probabilistic labels (the main parts of the body plus a generic background label) to the detected features. The method is shown to be fast and robust both to extraneous clutter and to undetected body parts. The algorithm is also demonstrated to work well on a number of grayscale image sequences. We address here the problem of unsupervised learning of model structure.

Research. We restrict our attention to triangulated models, since they both account for much correlation between the random variables that represent the position and motion of each body part, and they yield efficient algorithms. Our goal is to learn the best triangulated model, i.e., the one that reaches maximum likelihood with respect to the training data. We approach the problem in two settings: when the training features are labeled, i.e., the parts of the model and the correspondence between the parts and observed features are known (e.g. by a motion-capture system), and when the training features are unlabeled, i.e., the training features include both useful foreground parts and background clutter and the correspondence between the parts and detected features are unknown (e.g. when they are acquired with a monocular camera and no human intervention is practical). The method is based on maximum-likelihood. Taking the labeling of the data as hidden variables, a variant of the EM algorithm can be applied. A greedy algorithm is developed to search for the optimal structure of the decomposable model based on the (conditional) differential entropy of variables. Our algorithm leads to systems able to learn models of human motion completely automatically from real image sequences - unlabeled training features with clutter and occlusion. We conducted experiments both on motion-captured data and on grayscale image sequences. Figure 1 shows the samples frames from the body and chair moving sequences. The dots (either in black or in white) are the features selected by Lucas-Tomasi-Kanade algorithm on two frames. The white dots are the most human-like configuration found by the automatically learned model. Figure 2 shows the automatically learned model. Figure 2 (a) gives the mean positions and mean velocities (shown in arrows) of the composed parts selected by the algorithm. Figure 2 (b) shows the learned decomposable triangulated probabilistic structure. The results show that the automatically learned model and successfully detect and label human, even better than the hand-crafted model.











Publications/References.
Learning Probabilistic Structure for Human Motion Detection.
Y Song, L Goncalves, P Perona. In: Proc. of IEEE CVPR'01, to appear, December 2001

Unsupervised Learning of Human Motion Models. Y Song, L Goncalves, P Perona. In: NIPS 14, to appear, December 2001

Monocular Perception of Biological Motion - Detection and Labeling.
Y Song, L Goncalves, E Di Bernardo, P Perona. In: Proc. of ICCV'99, pp 805-812, Sept, 1999,Corfu, Greece.

Monocular Perception of Biological Motion - Clutter and Partial Occlusion.
Y Song, L Goncalves, P Perona. In: Proc. ECCV'00, Vol 2, pp 719-733, June/July 2000.

Towards Detection of Human Motion. Y Song, X Feng, P Perona. In: Proc. IEEE CVPR'00, Vol 1, pp 810-817, June 2000.


 

top