Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

Human Action Classification
Xiaolin Feng, Pietro Perona

Motivation and Aims. Human body is a high-dimentional articulated structure. Its motion is very compliated but interesting to study. In this project, we aim to classify the human actions and also identify the postures that the action consists of.

Learning the Action. In our point of view, an action is a temporal link of different human body configurations. A body configuration is represented by the 10 rectangular shaped body parts as well as their motions. To be able to segment the body parts in the training data, we make the special colored costume like following (the rectangles in the right figure are estimated shapes of body parts):



Different Actions can share very similar configurations at certain time. We define the basic configurations that consist of all possible actions as movelets. We learn the movelets by stacking the configurations of all interested actions together and applying k-means to them at one time. The samples of constructed movelets from training actions of human walking and running are shown below:


While K-means algorithm returns the estimated mean configurations as movelets, it also labels the configurations of each action with one of the movelets. Therefore, each training action can be viewed as the temporal transition from movelet to movelet. Hidden Markov Model is applied directly to learn the transition matrix which represents the temporal path of the action in the dataset of movelets.

Action Classification. Normally an action we watched by our eyes is full of texture, motion, shape and shading information. An example can be:




What we are intrested is the following question: Can you recognize the action only by the shape and motion information while the other information are removed? That is: Can you still recognize the action from the following sequence:

The answer is simply 'yes' for this example. But how can we let the computer do it? For each configuration in the sequence, we estimate the likelihood that it matches each movelet in our dataset. We then compare this test sequence with all actions we learned. Since the transition matrix of an action constrains the possible paths of the action among movelets, hidden markov model can find us how likely as well as the best possible path that this action may interpret the test action. The test sequence will be classified to the action with the maximum likelihood.

Experiments. We learned 9 perodic human actions of one subject: stepping, walking, running actions captured by the camera at 45,90,135 degree of view angle. The test sequences of same type of actions were performed by 5 different subjects. We have above 95% right classification rate on the actions captured at 90 degree view angle. For the same type of action taken at 45 and 135 view angle, we found out they are not seperable. If we view them as same action, ie, walking at 45 view angle is the same as walking at 135 view, but different from walking at 90 degree view, we have 100% classification rate on them. We also learned 8 nonperodic human reaching actions: reaching towards 8 different directions. Training is done on 1 subject, and test sequences were capture from 5 different subjects. We have 100% classification rate over all these reaching actions.

Reference.
[1] X.Feng and P.Perona,"Human Action Classification", in preparation.


top