Project Proposal
Multi-class Object
Categorization in Images
Lu He Tuobin Wang
Motivation
Nowadays online images are retrieved according to its keywords. However, current solution to tag keywords to an image by analyzing nearby text is either troublesome or inaccurate. For online image retrieval without tags, the automatically understanding of an image’s content is of the most importance. Image understanding is the process of automatically interpreting those objects to figure out what’s actually happening in the image. This may include figuring out what the objects are, their spatial relationships to each other, and etc. Researchers on the domain of computer vision and related fields delve into the area of object categorization, which makes web images retrieval without using tags possible. Our project works on a variation of this problem.
Objective
Our goal in this project aims to automatically recognize object classes in images. One typical scenario is, when a user selects a region of interest, our project returns a corresponding object class label of this region, which we call interactive object categorization. Another scenario is, the user "touches" an object by single click, our project associates a category label. Besides that, our project can automatically list the object class contained in the image without user interaction.
Method
Due to large difference in pose, size, illumination and viewpoint of images, the problem of object categorization in images is challenging. Trying to solve this problem, we are planing to combine the appearance-based information with spatial-layout information (also called shape-context information) to generate features.
Our project uses both image processing techniques and machine learning techniques, where image processing part uses filters to generate a set of filter responses for feature selection. These filter responses are then clustered by K-mean algorithm to generate an initial object classes (also named texton dictionary). For the machine learning part, one applicable method is to mapping the initial texton dictionary into a smaller dictionary by treating each histogram of clusters as Gaussian distribution and by applying Bayesian methodology to maximize the conditional probability of the ground truth labels for the given histograms, which is to ensure that the texton dictionary after mapping is both compact and also discriminative. This method could be mathematically explained. An alternative approach is to combining color, texture, shape, location potentials into CRF model, where the hidden nodes could be clusters and the local observations could be the characteristic of each cluster such as color. Boosting methodology may be applied in training the CRF to simultaneously select useful features and learn the weights.
Dataset
Our available dataset include the MSRC 21-class database and the 7-class Corel subset and the 7-class Sowerby database used in[1].
Timeline
Apr. 15 – Apr. 20 Analyze
the project and its feasibility.
Apr. 21 – Apr. 29 Read some
related work, design the model.
Apr. 30 – May 07 Design
modules and coding.
May 08 – May 18 Coding.
May 19 – May 25 Debug and
Test.
May 26 – May 30 Prepare presentation and demo.
Reference
[1]. X. He, R.S. Zemel, and M.´A. Carreira-Perpi˜n´an. Multiscale conditional random fields for image labeling. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, volume 2, pages 695–702, June 2004.