Project Proposal

Multi-class Object Categorization in Images

Lu He   Tuobin Wang

Motivation

Nowadays online images are retrieved according to its keywords. However, current solution to tag keywords to an image by analyzing nearby text is either troublesome or inaccurate. For online image retrieval without tags, the automatically understanding of an image’s content is of the most importance. Image understanding is the process of automatically interpreting those objects to figure out what’s actually happening in the image. This may include figuring out what the objects are, their spatial relationships to each other, and etc. Researchers on the domain of computer vision and related fields delve into the area of object categorization, which makes web images retrieval without using tags possible. Our project works on a variation of this problem.

Objective

Our goal in this project aims to automatically recognize object classes in images. One typical scenario is, when a user selects a region of interest, our project returns a corresponding object class label of this region, which we call interactive object categorization. Another scenario is, the user "touches" an object by single click, our project associates a category label. Besides that, our project can automatically list the object class contained in the image without user interaction.

Method

Due to large difference in pose, size, illumination and viewpoint of images, the problem of object categorization in images is challenging.  Trying to solve this problem, we are planing to combine the appearance-based information with spatial-layout information (also called shape-context information) to generate features.

Our project uses both image processing techniques and machine learning techniques, where image processing part uses filters to generate a set of filter responses for feature selection. These filter responses are then clustered by K-mean algorithm to generate an initial object classes (also named texton dictionary). For the machine learning part, one applicable method is to mapping the initial texton dictionary into a smaller dictionary by treating each histogram of clusters as Gaussian distribution and by applying Bayesian methodology to maximize the conditional probability of the ground truth labels for the given histograms, which is to ensure that the texton dictionary after mapping is both compact and also discriminative. This method could be mathematically explained. An alternative approach is to combining color, texture, shape, location potentials into CRF model, where the hidden nodes could be clusters and the local observations could be the characteristic of each cluster such as color. Boosting methodology may be applied in training the CRF to simultaneously select useful features and learn the weights.

Dataset

Our available dataset include the MSRC 21-class database and the 7-class Corel subset and the 7-class Sowerby database used in[1].

Timeline

Apr. 15 – Apr. 20 Analyze the project and its feasibility.
Apr. 21 – Apr. 29 Read some related work, design the model.
Apr. 30 – May 07 Design modules and coding.
May 08 – May 18 Coding.
May 19 – May 25 Debug and Test.
May 26 – May 30 Prepare presentation and demo.

Reference

[1]. X. He, R.S. Zemel, and M.´A. Carreira-Perpi˜n´an. Multiscale conditional random fields for image labeling. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, volume 2, pages 695–702, June 2004.

[2]. Jamie Shotton, John Winn, Carsten Rother, Antonio Criminisi, Textonboost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Appearance, Shape and Context., Intl. Journal on Computer Vision (IJCV), special issue., Springer Verlag, 2009

[3]. Silvio Savarese, Antonio Criminisi, John Winn, Discriminative Object Class Models of Appearance and Shape by Correlatons, in Proc. IEEE Computer Vision and Pattern Recognition (CVPR)., Jan. 2006

[4]. John Winn, Antonio Criminisi, Thomas Minka, Object Categorization by Learned Universal Visual Dictionary, in Proc. IEEE Intl. Conf. on Computer Vision (ICCV)., Jan. 2005

[5]. Jamie Shotton, John Winn, Carsten Rother, Antonio Criminisi, Textonboost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Appearance, Shape and Context., Intl. Journal on Computer Vision (IJCV), special issue., Springer Verlag, 2009