Problem Statement

Our current micro unmanned vehicle system utilizes a custom built open loop servoing system to track regions of interest within an image. We use the tracking mechanism to capture high quality still images for computing structure from motion. We have also developed a custom closed looped inertial stabilized gimbal camera system. The open loop system was eventually chosen due to simplicity of the control structure. The purpose of this project is to utilize reinforcement learning to diminish the necessity for hand crafting the nonlinear closed loop controller and eliminate the need for expensive inertial sensors while meeting or exceeding the capabilities of the open loop system.

Figure 1 - Closed Loop Gimbaled Camera

Methods

The problem is highly analogous to an optimal control problem. But unlike optimal controls the exercise of long tedious derivations and solutions to the Hamilton-Jacobi-Bellman equation need not apply. An approximation of the optimal control can be obtain through the use of Q learning algorithms and their variants which can be considered a heuristic dynamic programming technique. The problem statement will include the controller but not the automatic recognition of regions of interest. It will be assumed that initial locations of the regions of interest in an image will be part of the input to the controller. One of the main concerns of this problem will be the adaptation of Q learning to continuous states and actions. This will be addressed by utilizing the techniques of Gaskett [1] where upon they implement a Wire-fitted Neural Network Q-learning. An alternative method will be a Neural Fitted Q [2]. Neural Fitted Q utilizes a multi-layer perception neural network which can be trained offline. Evaluation metrics of the controller will be as follows and compared to the open loop system.

1 – Dynamic response to single event with criteria chosen from second order systems. i.e. rise time, %overshoot, settling time.

2 – Time on target, or time within threshold.

Data

No data sets currently exist but we have and maintain a custom aircraft simulator for real-time playback of telemetry data. The gimbaled camera systems can be mounted to the simulator with fixed target, or a target can me fixed to the simulator for automatic generation of training data.

$Description: C:\Users\steuscher\Pictures\Snapshot 1 (4-12-2011 11-08 PM).png$

Figure 2 - Flight Simulator

Schedule

April 21 – Automated data collection system fully operational.

May 10 – Minimal Fitting of selected algorithms as well as ground truth for open loop system evaluation criteria.

May 24 – Full evaluation criteria of implemented controllers.

References

[1] C. Gaskett, D. Wettergreen and A. Zelinsky : "Q-Learning in Continuous State and Action Spaces", Proceedings of 12th Australian Joint Conference on Artificial Intelligence AI'99, Sydney, Australia, December 1999 ©Springer-Verlag.

[2] M Reidmiller: “Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method”, Machine Learning: ECML 2005, 2005 – Springer.