Motivation
and Aims.
Human body is a high-dimentional articulated structure. Its motion is
very compliated but interesting to study. In this project, we aim to
classify the human actions and also identify the postures that the action
consists of.
Learning the Action. In our point of view, an action is a temporal
link of different human body configurations. A body configuration is
represented by the 10 rectangular shaped body parts as well as their
motions. To be able to segment the body parts in the training data,
we make the special colored costume like following (the rectangles in
the right figure are estimated shapes of body parts):

Different Actions can share very similar configurations at certain time.
We define the basic configurations that consist of all possible actions
as movelets. We learn the movelets by stacking the configurations of
all interested actions together and applying k-means to them at one
time. The samples of constructed movelets from training actions of human
walking and running are shown below:
While
K-means algorithm returns the estimated mean configurations as movelets,
it also labels the configurations of each action with one of the movelets.
Therefore, each training action can be viewed as the temporal transition
from movelet to movelet. Hidden Markov Model is applied directly to
learn the transition matrix which represents the temporal path of the
action in the dataset of movelets.
Action Classification. Normally an action we watched by our
eyes is full of texture, motion, shape and shading information. An example
can be:
What we are intrested is the following question: Can you recognize the
action only by the shape and motion information while the other information
are removed? That is: Can you still recognize the action from the following
sequence:

The
answer is simply 'yes' for this example. But how can we let the computer
do it? For each configuration in the sequence, we estimate the likelihood
that it matches each movelet in our dataset. We then compare this test
sequence with all actions we learned. Since the transition matrix of
an action constrains the possible paths of the action among movelets,
hidden markov model can find us how likely as well as the best possible
path that this action may interpret the test action. The test sequence
will be classified to the action with the maximum likelihood.
Experiments. We learned 9 perodic human actions of one subject:
stepping, walking, running actions captured by the camera at 45,90,135
degree of view angle. The test sequences of same type of actions were
performed by 5 different subjects. We have above 95% right classification
rate on the actions captured at 90 degree view angle. For the same type
of action taken at 45 and 135 view angle, we found out they are not
seperable. If we view them as same action, ie, walking at 45 view angle
is the same as walking at 135 view, but different from walking at 90
degree view, we have 100% classification rate on them. We also learned
8 nonperodic human reaching actions: reaching towards 8 different directions.
Training is done on 1 subject, and test sequences were capture from
5 different subjects. We have 100% classification rate over all these
reaching actions.
Reference. [1] X.Feng and P.Perona,"Human Action Classification",
in preparation.