Objects from different semantic categories often have different motion signatures. Humans, animals, birds, cars, airplanes etc, have very distinct patterns of motion. Therefore, along with spatially salient features, the temporal patterns of movement can be used as features in order to determine the category of an unlabeled object.
In this project, we explore the suitability of spatio-temporal features in order to perform unsupervised classification of a set of objects belonging to different semantic categories using videos. We also evaluate the predictive capability of the learned model on unseen examples.
In the past three weeks since the project proposal, we have constructed our dataset, obtained an off-the-shelf spatio-temporal feature extractor. We have obtained implementations for 2 unsupervised, generative, bag of feature classifiers, namely, Probabilistic latent semantic analysis(pLSA) and Latent DirichletAllocation (LDA).
Our data set consists of videos three categories of objects in motion; sprinting cheetahs, walking cows, and crawling human babies. Most of these videos have been obtained from youtube website and have been crafted from longer videos by retaining only relevant sections which show the examplars in their characteristic motion. If time permits, we will add more semantic categories to the dataset.
In this project, we are employing the spatio-temporal feature extraction technique outlined in [1]. In this technique, interest points are detected in videos by applying a gaussian smoothing filter in the spatial plane and a gabor 1D filter in the temporal plane. Regions which return high values for the above response function are considered as interest points. In particular, visually salient regions exhibiting complex motion patterns invoke strong response while non-salient regions undergoing simple translation invoke weak responses.
Cuboids in the X-Y-T space are determined around these interest points and each cuboid is converted into a feature description vector using different methods. The dimensionality of these features is controlled by using Principal component analysis. Please refer [1] for the details.
[2] provides a Matlab implementation of the feature extraction algorithm. With the permission of the original authors, we have obtained the implementation and intend to use it in our project. The feature extraction package uses a matlab toolkit, also made available by the authors.
We intend to use a c++ implementation of the pLSA algorithm, implemented previously by Ashok for his research purposes. The pLSA implementation follows from the technical specification of the algorithm in [4]. However, the EM formulation is naïve and may be prone to overfitting problems. [4] suggests tempered annealing as a means of overcoming overfitting problems, but we do not intend to implement the modification.
As per suggestion from the course instructor, we will use the implementation of latent dirichlet allocation[5] provided by the authors in [3] and compare the results with the pLSA implementation.