NBA is regarded as one of core constituents of modern professional sports with around $4.1 billion dollars of revenue in the 2010-2011 season [1]. Meanwhile, professional betting, which is another billion dollar industry, greatly depends on the accurate game outcome prediction [2]. The goal of this project is implementing Machine Learning algorithms to predict outcome of a game/series by given two NBA teams' statistics.
The NBA dataset is downloaded from www.databasebasketball.com. The raw data contains the year-long NBA statistics of players, teams and coaches for both regular seasons and playoffs. I only use data of seasons 2008-2009 through 2009-2010 due to the consistency, since whole NBA league kept changing because of old team quitting and new team joining since 2000s.
Features | wins | d_ast | d_pts | d_fgm | o_reb |
---|---|---|---|---|---|
Accuracy | 0.6939 | 0.6293 | 0.6199 | 0.613 | 0.611 |
Y is the series outcome of two teams. For example, Boston Celtics plays two games against Los Angeles Lakers in their series. If Celtics wins two games, the series outcome of Celtics is 1, the opposite is 0. However, if they draw, the series outcome will be determined by the most dominant feature, i.e. wins for each team in previous season. The series outcome will be 1 of the team with higher wins rate. Due to lack of the accumulative season data from the database, it is only possible to use series outcome instead of game outcome as Y. In other words, it is impossible using same X data to predict game results of 1 win and 1 lose. That's a limitation of the dataset which I will talk in the later part.
In order to get a better understanding of the prediction accuracy, I examined several related work of game outcome prediction. Michael et al. [3] reported up to 73% accuracy to predict NBA games when using linear regression. Hamadani [4] used logistic regression to predict NFL games with accuracy of 64.8%. Radha-Krishna [5] predicted soccer matches with accuracy of 65.5% when using neural networks. In this project, I plan to implement 5 machine learning binary classification methods.
I provided the prediction results of the four implemented algorithms. From the table below, all methods achieve a very satisfied series prediction accuracy except linear regression. However, the game prediction accuracy drops a lot due to the collected data limitation. Overall, the AdaBoost seems to be the best classification algorithms in both game and series prediction.
Algorithms | Linear | Logistic | AdaBoost | KNN |
---|---|---|---|---|
Games Prediction Accuracy | 0.661 | 0.6797 | 0.7016 | 0.6894 |
Series Prediction Accuracy | 0.5655 | 0.8368 | 0.8966 | 0.8506 |
In this project I assume that if a team wins a series, it wins all the games in that series. Clearly, a bad case is the draw in series which can only give me a fixed 50% game prediction accuracy. Therefore, I tried to use KNN to classify the series win/lose/draw results instead of the original win/lose results. Still if a team wins or loses a series, it wins or loses all the games. But if the team draws in a series, it wins and loses equal number of games. The below figure shows the result of this proposed series model instead of the binary classification model. Although the games prediction accuracy increased a little, the series accuracy drops approximately 60% compared to the win/lose model. Because I haven't implemented Neural Networks yet, it is hard to determine whether use this win/lose/draw model or win/lose model to predict game outcomes at this time.
[1] Sports Industry Overview
[2] McMurray, S. (1995). Basketball's new high-tech guru. U.S. News and World Report, December 11, 1995, pp 79 - 80.
[3] Michael Papamichael, Matthew Beckler and Hongfei Wang, NBA Oracle, 2009.
[4] Babak, Hamadani. Predicting the outcome of NFL games using machine learning. Project Report for CS229, Stanford University.
[5] Balla, Radha-Krishna. Soccer Match Result Prediction using Neural Networks. Project report for CS534.
[6] Alan McCabe, An Artificially Intelligent Sports Tipper, in Proceedings : 15th Australian Joint Conference on Artificial Intelligence, 2002.
[7] Yoav Freund, Robert E. Schapire. "A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting", 1995.
[8] Paul A. Viola, Michael J. Jones, "Robust Real-Time Face Detection", ICCV 2001, Vol. 2, pp. 747.