Chemistry Reactions among NBA players

Hao Xianan, Weijia Mao, Ruixuan Hou

Overview

Basketball game is a team sport. Although there are dozens of stats to evaluate the performance of individual player, such as the field goal percentage , average rebounds  and average steals, few measures are given to evaluate performance of team cooperation. So the goal of our project is finding way to evaluate team work of players. Using this model, we could help a NBA manager to choose the best combination of players, as well as to buy a player that fits current team best.

Methods

We state the problem as the following model: (suppose a team of 10 players)

where X is the n-dimensional feature vector of each player that contains n significant features that evaluates this player properly. Z is a 10-dimensional vector with the value 1 meaning the corresponding player is on court and the value 0 meaning he is not on court. We define a function S(Z) that measures a player’s performance score as well as the performance of him and each of the other players on court. We sum these measures to form the score of this particular 5-player combination. Then the problem turns into how to learn the function and how to choose a combination that maximizes the score.

The problem can also be stated as a problem of learning an optimal subset of a ground set. We will exploit the method learning optimal subsets[1] to construct a function that minimizes the similarity between the predicted player combination and the actual optimal combination. (In our training set, the optimal combination is labeled.)

We also want to exploit the method proposed in [2] that optimizes rankings of data to help us better address the problem mentioned above.

Data Set

We could find detailed player statistics one NBA.com. Also, some crazy guys have published detailed lists of each matchup of one unit against another. Using this list, we could find in a particular time span, which set of players are on court and the scores, rebounds they got. This data set ensure us to know the true performance of each combination of players. On http://basketballvalue.com/downloads.php, we could download matchup of one unit against another from season 2005/2006 to 2011/2012.

To facilitate our analysis of data, we plan to import our data into a database. In this way, we could get target statistics by using a simple query command.

Timeline

Finish all data collection and processing by February 3.

Implement the optimal subset selection model and validate it by February 19(Milestone).

Finish the second model by February 28.

Finish the validation and testing by March 7.

References

[1] Y Guo, C Gomes, Learning Optimal Subsets with Implicit User Preferences, Proceedings of the 21st  International Joint Conference , 2009

[2] Brian McFee, Gert Lanckriet, Metric Learning to Rank, International Conference on Machine Learning,2010

[3] NBA official Website: www.NBA.com

[4] http://basketballvalue.com/downloads.php