Automatic Initilization of the TLD Object Tracker

Louis H. Buck
Thayer School of Engineering
Dartmouth College
louis.h.buck@dartmouth.edu

Motivation

TLD is a long-term, real-time tracker designed to be robust to partial and complete occlusions as well as changes in perspective and scale [4]. The algorithm is of interest to my research in object tracking using machine vision on a quadrotor micro air vehicle (MAV). Currently the TLD algorithm needs to be initiated by the user selecting a region of interest (ROI) in a frame of the video sequence. This prevents the algorithm from being used in completely autonomous tracking applications, or in applications where the operator cannot provide the necessary ROI input to the algorithm. The purpose of this project is to recreate TLD in Matlab from the theory described in literature [1][2][3] and implement a separate pre-trained object-class detector that can automatically initialize the TLD in the presence of an object of the trained class. Ideally an operator would be able to choose an object class (such as a car or person) that it wants the MAV to recognize and follow. He or she would then place the MAV in position to survey an area, where it would hover and wait until an object of interest came into view. At this point, the object-class detector would recognize its presence and initialize the TLD to track the object.

Background

The TLD object tracker was developed by Z. Kalal, J. Matas and K. Mikolajczyk at the University of Surrey and Czech Technical In stitute. It is called TLD because of it utilizes three components in para llel to accomplish the task of long-term tracking, tracking, lea rning and detection [4].

Tracking
The tracker used is a median-shift tracker based on Lucas-Kanade optical flow algorithm. The tracker provides an estimation of the trajectory of the object based solely on the frame-to-frame movement of key points, and independent of the system's object model.

Detection
The detector is a random forest classifier based on a collection of 2bit binary patterns.These binary patterns are discretizations of the gradients across randomly sized and located pixel patches (called groups) withi n the region of interest. Each group yields its own decision tree within the random forest , the leafs of which represent different positive representations the detector has foundwithin that pixel patch.

Learning
The learning that takes place is a semi-supervised process that fuses results from the object detector and tracker to iteratively improve the object model. False negatives close to the tracked trajectory extend the decision trees (grow the forest) by positively labeling the tracked patch and retraining the model. False positives far from the tracked trajectory prune the forest by removing leaves that led to the false identification.

The tracker is initiated by the user bounding the object of interest with a box in a single frame. The initial random forest model is trained with 100 different affine transformations of this single labeled example [1][2][3][4].

Method

The machine learning methods used to create the TLD are describe d in detail above, and can be summarized as semi-supervised lear ning of structured datausing a random forest classifier.The initializing object-class detector needs to recognize catego ries of objects as opposed to specific instances of a single object. From preliminary literature searches it appears as if a boosting algorithm such as Adaboost would work well[5][6]. Other viable options include part-based models[8] or bag-of-features[9].

Data

The algorithm will be tested on one of the data sets used by the creators of TLD in order to remove potential causes of discrepa ncy between my implementation and theirs. Specifically the dirtbike video sequence[7] will be used since a bundant similar training data of dirt bikers is available on you tube[10]. The most useful data sets to use for my research, however, would be aerial views of people or vehicles, so if time permits and a source is identified, the algorithm will be trained and tested on such data sets as well.

Timeline

The proposed work plan is as follows:

        4/17    Lucas-Kanade Optical Tracker                            4/24    Random forest classifier                        
        5/1     P-N Learning                                            5/8*    Working TLD Tracker                             
        5/15    Development and training of object classifier   
        5/22    Working automatic initialization                        5/29    Poster presentation                             
*Project milestone

References

[1] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N Learning: Bootst rapping Binary Classifiers by Structural Constraints. IEEE Conference on Computer Vision and Pattern Recognition, 2010.
[2] Z. Kalal, J. Matas, and K. Mikolajczyk. Online Learning of Robust Object Detectors During Unstable Tracking. IEEE Conference on Computer Vision Workshops, 2009.
[3] Z. Kalal, J. Matas, and K. Mikolajczyk. Forward-Backward Error: Automatic Detection of Tracking Failures. International Conference on Pattern Recognition, 2010.
[4] Z. Kalal, J. Matas, and K. Mikolajczyk. Tracking Learning Detection Demo Poster.
[5] M. Stojmenovic. Real Time Machine Learning Based Car Detection in Images with Fast Training. Machine Vision and Applications, 2006.
[6] J. Renno, D. Makris, and G. Jones. Object Classification in Visual Surveillance Using Adaboost. University of Surrey.
[7] https://github.com/zk00006/OpenTLD/tree/master/_input
[8] X. Xia, and S. Zhang. Statistical Part-Based Models for Object Category Recognition. International Conference on Machine Learning, 2009.
[9] C. Schmid. Bag-of-Features for Category Classification. http://www.di.ens.fr/willow/events/cvml2011/materials/CVML2011_Cordelia_bof.pdf
[10] http://www.youtube.com/watch?v=iBSBd0UWfnk