ML Yuxi

Optical Word Recognition

Yuxi Zhang

COSC 174 Spring 2012

Overview

Nowadays images are more and more frequently used in electronic documents to embed textual information. They usually leave more lively impression on people's minds. Automatically extracting text from born-digit images is therefore an interesting prospect as it would provide the enabling technology for a number of applications such as retrieval of Web content, content filtering (e.g. advertisements or spam emails) etc. Another important application is license place recognition (LPR), which is used in various security and traffic applications, such as the access-control system.

Goal

The objective of this project is to use image processing and machine learning methods to extract and recognize the textual information embedded in optical images. The images are usually interfered by noise, distorted to some degree and may have different fonts and colors. In this case, a robust word recognition approach is needed to achieve high performance. Firstly I plan to study and apply several different existing algorithms on this problem, and explore the potential gains and losses of them. Then I will try to investigate other possible methods, and see if any improvement can be made based on the existing approaches.

Method

1.Preprocessing and normalization:

There are many methods that can be used in this step. To convert images to binary or grey-level and to remove noise, methods based on image filtering theory can be used, for example, setting grey-level threshold. To deal with geometric distortion, skew correction methods can be applied. To normalize, thinning or skeletonization methods are often applied.

2. Feature extraction:

A large number of feature extraction methods have been proposed in the literature. For example, principal component analysis from the set of pixels in an image and the raw pixel matrix itself can be used.

3. Isolated character classification:

k-nearest-neighbor classifier
Bayes classifier
polynomial classifier
neural network
support vector machine

4. Word classification:

Holistic classifier
Segmentation Based Approaches
HMM Based Recognition

Firstly I will segment the images into isolated characters, and apply the approaches listed in 3 to classify characters individually. Then I will try holistic classifier, where the image of the given word is considered as an entity as a whole and then classify it, given a dictionary of possible words. The two kinds of methods will be analyzed and compared.

Dataset

The dataset used in this project is from ICDAR 2011 Robust Reading Competition.
In the training set, there are 3583 images of words cut from the original images and a single text file with the ground truth transcription of all images. In the test set, there are 918 images but the ground truth text file is not provided. Since the training set is large, I can split it into two and use one as testing set.

Another dataset that can be used is from ICDAR 2003 Robust Reading Competiton.
In the training set, there are 849 images of words cut from the original images and a single text file with the ground truth transcription of all images. In the test set, there are 1190 images with the ground truth text file.

Here are some image examples:

Timeline

Milestone:

Complete image processing procedure, eliminate the noise, deal with distortion, enhance blurred images, convert images to gray or binary, segment images into separate characters or numbers, and implement an existing algorithm to do word recognition.

Final:

Try different methods, compare the performances, analyze the results and make improvement.

References

Shahab, Asif; Shafait, Faisal; Dengel, Andreas; , "ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images," International Conference on Document Analysis and Recognition (ICDAR), 2011 , pp.1491-1496, 18-21 Sept. 2011

J. M. White, G. D. Rohrer,"Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction", IBM Journal of Research and Development, Volume 27, Number 4, Page 400, 1983

Ivind Due Trier, Anil K. Jain, Torfinn Taxt, "feature extraction methods for character recognition--a survey", Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996

H. Bunke. "Recognition of cursive roman handwriting: past, present and future". In Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on, pages 448 – 459 vol.1, Aug. 2003.

Shyang-Lih Chang, Li-Shien Chen, Yun-Chung Chung, and Sei-Wan Chen, Senior Member, IEEE, "Automatic License Plate Recognition", IEEE Transactions on Intelligent Transportation Systems, Vol. 5, No. 1, March 2004

Raghuraj Singh, C. S. Yadav, Prabhat Verma, Vibhash Yadav, "Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network", International Journal of Computer Science & Communication, Vol. 1, No. 1, pp. 91-95, January-June 2010