Multi-class Object Categorization in Images

CS 34/134 Project Milestone:

Lu He, Tuobin Wang

May 11, 2009

 

Accomplishments that have been achieved so far: 

1.GUI Design:

We accomplished a simple GUI of our project (fig1), in which user can load an image from his/her local computer (fig2) and select a region (fig3), then our final project can show the recognition result in the right textfielg (fig4) automatically(To be done).

Fig1: Simple GUI Fig2: Load an image
Fig3: Select a region Fig4: Image recognition

 

2. Preprocessing of training images:

In our project, each training image is changed to CIE L, a, b color model, and then is convolved with 17 filter-banks to generate features. The filter-banks are made of 3 Gaussians, 4 Laplacian of Gaussians (LoG) and 4 first order derivatives of Gaussians. While the three Gaussian kernels are applied to each CIE L, a, b channel,each with three different value σ = 1,2,4, the four LoGs are applied to the L channel only,each with four different value σ = 1,2,4,8, and the four derivative of Gaussians kernels were divided into two x- and y- aligned set, each with two different value σ = 2,4. Then, we get each pixel in the image associated with 17-dimensional feature vector.

 

3. K-mean cluster:

After all training images are preprocessed, the whole set of feature vectors for all the training images are then clustered by the unsupervised K-mean algorithm with a large initial value K (in the order of thousands) and by calculating Mahalanobis distance between features during clustering. Hence, after doing k-mean on all the training images, we get k clusters finally.

 

4. Universal visual dictionary(UVD):

After clustering, we calculate the covariance for each cluster, then the set of these K clusters and the associate covariances constitute the initial UVD.                                                                                         

           

5. Building histograms

            Original Image               Ground Truth

In the above figures, the left one is an original image from the training dataset and the right one is the ground truth correspondingly. As you may notice, the ground truth figure has been separated manually into five regions and all marked with one of three colors (representing three objects).

After we create the initial UVD, we build histograms over this UVD for every training region which has been marked in the ground truth.

 

Things to be done:

Since the size of the initial UVD K is very large, our goal is to use a mapping to merge bins in the initial UVD to generate the final UVD with size T (T<<K), we decide to use the model and the algorithm proposed in [1], which makes our mapping results both compact and discriminative.

 


Our model is to maximize the posterior probability of ground truth labels given all the histograms P(c|{H}) by using Bayes rules(1). The term in the denominator acts to penalize the mappings which reduce discriminative and the numerator favors mappings which lead to intra-class compactness. From (1), the key is to compute the conditional probability of all the histograms given the ground truth labels,P({ H}|c). The assumption to our model is that we model the histogram for each class using a Gaussian distribution and we apply the variance stabilizing transformation of a multinomial to make the variance constant rather than linearly dependent on the mean. For each class, we define a prior. Then multiply them up to compute P({H}|c).

Our algorithm is to iteratively merge a pair of textons in UVD that most improves the value P(c|{H}), if we cannot find this pair, then stop and we get the final UVD.

Comment: 
For the timeline we submitted in our proposal, we think our project is on track. In our schedule, we need to program in another one week and then finish coding.

  

Reference:
[1]. John Winn, Antonio Criminisi, Thomas Minka, Object Categorization by Learned Universal Visual Dictionary, in Proc. IEEE Intl. Conf. on Computer Vision (ICCV)., Jan. 2005