Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

[back]

Gesture Recognition
George Panotopoulos, Dinkar Gupta, Demetri Psaltis, Pietro Perona

Abstract. Though your personal computer has a processing capacity orders of magnitude larger than it did some ten years ago you still use the same means to interface with it, namely a keyboard and pointing device. In the context of this project we investigate the design of an interface based on human gestures. The system we are envisioning is not limited to a particular user and should be able to learn new gestures.

Motivation and Aims. The improvement of computerŐs memory and processing capacity offers us the possibility to implement new interfaces, allowing us to interact with them in a more user-friendly, intuitive way. We want to implement a system that will allow users to input information by performing gestures in front of a camera, an accessory that is becoming increasingly popular in personal computers. This system should have several important characteristics. First it should be user-independent, meaning it should be able to recognize the same gesture as such even when performed by different users. Secondly it should be expandable, meaning that it should have the capability to learn any new gesture that is presented to it.

Research.
A gesture collected by a camera is encoded as a sequence of frames, each frame containing a number of pixels. This representation allows us to consider the gesture as a sequence of arrays. By stacking these arrays we can create a 3D space, where the third dimension is essentially time. Thus we can incorporate the motion characteristics of a gesture in this third dimension, and then apply pattern classification techniques to this 3-Dimensional space. In order to achieve user-independence we should extract features which are common to all users and which remain fairly constant over repetitions. In order to achieve expandability we should implement a system that selects such features automatically. In our approach each gesture is encoded as a sequence of 30 frames. A simple segmentation algorithm is applied on each frame so that data is encoded binary with 1 signaling the presence of the hand and 0 otherwise. Then the frames are stacked producing a binary 3D-space.



Regarding the feature selection we used the Forstner corner and circle operators, appropriately modified to operate in 3D space. Application of these operators on the training part of our database produces the data that will be used by our classification algorithm. Since the operators are very general they return a very high number of hits. To make sure these do not correspond to the same feature or noise we employ a clustering algorithm, and assume to be features only those occurrences that cluster well over our training database.

Once the features have been extracted we proceed to the classification algorithm. Two different approaches are proposed for this step. The first one is similar to the Constellation Model (link to Learning Object Class Models). The other is a Divide and Conquer implementation of Neural Networks (NN).

For the Constellation Model approach we assume that the location of the features follows a 3D Gaussian distribution. During training we estimate the probability of detection of these features, as well as the means and standard deviations of their distributions. During classification we extract the possible constellations given a certain gesture and estimate which one has the higher probability. In our particular case the number of features makes the exhaustive search over combinations too time consuming, therefore features are assigned only to the distribution they most likely could originate from and then the overall probabilistic score is computed.

For the Divide and Conquer approach we use Neural Networks to perform the classification. Since the dimensionality of the input space is large and we do not want to limit the number of possible networks we would need a fairly sizeable NN to perform the task with acceptable performance. Our idea is instead of asking the general question "which is this gesture" to break it down to more, simpler questions. Each simple question can be answered by a simple NN with reasonable performance, and once this is done we can proceed to the next, more specific question, which of course depends on the previous answer. Note that the answer to the question is of probabilistic nature, meaning that the NN indicates how probable each possible answer is. This procedure can be visualized using a tree structure. At each node of the tree we ask a question, and depending on the answer we proceed to one of the children of that node. Each question is simple enough so that it can be answered by our elementary NN with an acceptably low probability of error. When we reach the leafs of the tree we ask the most simple questions that can be asked, namely "is this gesture X?". If the answer confirms our hypothesis it determines the output of the system. If not we go back to the parent node and follow the next most probable path. An added advantage of this approach is that it can be easily matched to reconfigurable processors, such as the OPGA.

Achievements.
We have fully implemented the Constellation Model approach and tested it using a gesture database composed of 4 subjects performing 2 gestures each. The resulting performance of correct classification is 60%, mainly due to the simplicity of the features we have extracted. The Divide and Conquer NN was tested on digit classification and was found to outperform comparable "vanilla flavor" Neural Networks. An analytical model of the probabilistic behavior of the classification system was derived and was found to be in good agreement with simulation results.

Future Research. Having a complete gesture classification system we intend to identify its weakest elements and improve them in order to improve the overall performance of the system. Our first improvement will be the selection of more complex features that will reduce the number of hits per sample. Once this is done we want to compare the merits of the two classification approaches.

Publications/References
Computer Gesture Recognition: Using the Constellation Method. Dinkar Gupta, Caltech Undergraduate Research Journal, Vol 1, April 2001.


top