Simple Geometric Shape Sketch Recognition

PING LIN

The Problem

Free-hand sketch recognition is a problem that has been studied for a long time [1-4]. But because of the diversity of the possible target objects to be recognized, there hasn't been any method that is "the" method to use; and the commercially available tools are not at a stage that is matured enough to attract enough peoples' attention. However, as pen-based digital devices, as PDA, Tablet-PC, become increasingly popular, the need for sketch recognition only keeps rising.

In this project, we would like to identify and decide proper machine-learning techniques to accomplish the task of recognizing in real-time simple geometric 2D shapes free-hand sketches on a pen-based device like Tablet PC. The intension is to confine the domain of recognition to be the simple geometric shapes, like circle, rectangle, triangle, arrow, etc. so that the complexity of the problem is not overwhelming with the time constraints, while making the system general-purpose augmentable as much as  possible keeping in mind that it is desirable that the system could be augmented and generalized to different symbolic sets and with big alphabets. In the end, simple schematic graphs consisted of these simple geometric shapes should be drawn and recognized/beautified by this system.

General Framework

The input to the system is a timestamped signal, each of the signal point is a pair of (x,y) coordinates on the screen. But in this project, we are not going to exploit the temporal structure. First, HMM or Dynamic Bayes Net is much more complicated to implement and probably a too-large overkill for simple shape recognition; more importantly, anything relying on the temporal information will put constraints on the order strokes are written,which is not so desirable for geometric shape recognition. So, in this project, the input is merely a 2D image. The recognition system is consisted of the following main stages:

Preprocessing:

Include filtering (for instance threshold filtering for connectivity) the input signal,  normalizing for size and slant, compensating the obvious deficiency of the signal (for instance, endpoints refinement) so the processed signal is "cleaner" and normalized, more suitable for the upcoming processing. 

Low-level processing:

Need to obtain some feature information using the normalized 2D image so that the abstract feature information can be fed to machine learning method at the high-level processing stage.

If this were a more general sketch recognition system, then probably there would be some additional needs at this stage, such as to segment the signal into strokes or find the corners and then do a stroke-wise classification to obtain the abstract feature representation of the strokes using a suitable language.

But it is not needed here since in this project, each input image is naturally separated by the pause the user imposes between two objects to be recognized. So, the input is naturally segmented by the user and during the pause, the system needs to recognize the shape and output a beautified one. This is the exact meaning of being called "real-time" of this recognizer.  

High-level recognition:

Having the abstract representation of the input signal, machine learning techniques can be applied. The result is the recognized shapes and we need to output the beautified shapes to the screen according the location and size information obtained in preprocessing stage.

Possible Methods 

For the high-level learning methods, there are a lot of them proposed in the literature. Some of the them are specific for geometric shape recognition; some are for general purpose that is suitable both for geometric shapes and other symbols, including characters, letters, math symbols and diagram symbols.  

The challenge is that while we'd like to choose a method that is not too specific so that the system can be augmented easily; too general a method means too high complexity to finish in time and may be too computational extensive to be used in real-time recognition, which is the current intended goal of this project.

The plan is to start with some more general methods. The current candidate is manifold learning, or more specific, kernel isomap [5].  If it turns out that this method is not realistic for the real-time implementation for the current specific task, simpler methods such as SVM will be resorted to [4,6].

Data Sets

Very limited sketch recognition datasets are available. And for the ones that are available [7], they are basically too complicated for this project. For this project, the training data sets will be generate by hand since no extensive training is anticipated.

By Milestone

I will finish the preprocessing and low-level feature extraction, have tried different high level learning algorithms and decided which learning method will be used in the final polishment.

Reference

[1] Randall Davis. Sketch Understanding in Design: Overview of Work at the MIT AI Lab. In Sketch Understanding, Papers from the 2002 AAAI Spring Symposium, pp.24-31. Stanford, California, March 25-27 2002.

[2] Hammond, T.Eoff, B.Paulson, B.Wolin, A.Dahmen, K.Johnston, J., and Rajan, P. Free-Sketch Recognition: Putting the CHI in Sketching. 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI 2008) Works In Progress, Florence, Italy, April, 2008, pp. 3027--3032.

[3] Tevfik Metin Sezgin. Sketch Interpretation Using Multiscale Stochastic Models of Temporal Patterns. Ph.D Thesis for Massachusetts Institute of Technology. May 2006.

[4] Michael Oltmans. Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches. Ph.D Thesis for Massachusetts Institute of Technology. Cambridge, MA, May 2007.

[5] Choi, H., and Hammond, T. Sketch Recognition based on Manifold Learning. 23rd Annual AAAI Conference on Artificial Intelligence: Student Abstracts, Chicago, Illinois, July, 2008, pp. 1786--1787.

[6] Heloise Hwawen Hse and A. Richard Newton. Sketched symbol recognition using zernike moments. In ICPR (1), pages 367-370, 2004. doi: 10.1109/ICPR.2004.1334128. URL http://csdl.computer.org/comp/proceedings/icpr/2004/2128/01/212810367abs.htm.

[7] ETCHA Sketches. http://rationale.csail.mit.edu/ETCHASketches/.