Machine Learning Project Proposal

Flight Delay Prediction

Huiting Yu


Problem Description

Much efforts have been put into the prediction of airline delays in recent years, since it is important for both the air industry and the commercial flight passengers. Flight delays can be related to tons of factors and this project aims at investigating how significant are these factors using machine learning training techniques. Then the most significant factors can be used to predict future flight delays and the prediction can be evaluated by the testing data sets. Machine learning classification methods will be used to predict the likelihood of the delay. Besides, the expected length of delay is also worth prediction. Linear regression method is a good choice for predicting the length of future delays, since the flight has some numerical attributes ( distance, time, etc).


Datasets

The Bureau of Transportation Statistics has over 20 years of flight data available online. The flight data include a bundle of attributes, like carrier, distance, departure time, arrival time, etc. Not all attributes will be used in training and testing, since some are intuitively not relevant to delays. Only part of the

attributes are used.


Methods

For the prediction of the likelihood of the delays, classification learning methods can be used, like bayes classifier method and NBTree.

For the prediction of the length of delays, linear regression can be used. The numeric attributes of the flights are used in the training and testing.


Milestone

By May 8, I hope I can classify delayed flights and non-delayed flights using bayes classifier method and one or two other classification methods, and compare their efficiency and accuracy.


Reference

[1]  Bayesian Network Analysis of Flight Delays  link

[2] Alarming Large Scale of Flight Delays: an Application of Machine Learning link

[3] Bureau of Transportation Statistics