The other day, when I was searching for this year’s NBA 2K ratings, I saw this article by Adam Yudelman at Nylon Calculus, a basketball analytics blog. He was using player level statistics from the previous year to forecast player ratings in the game the following year (for example, using 2012-2013 stats to forecast 2K14 ratings). After clicking over to the technical details of his forecasting model, I noticed a couple of things: he was using a huge amount of player level data, and he was running an unrestricted regression (simple OLS) to implement his forecasts. However, there is considerable uncertainty about what variables are useful in predicting future ratings, since we don’t know what stats the developers look at when deciding on ratings. For example, when deciding on ratings, the developers at 2K sports might look at points per game (PPG), points per 36 minutes (PP36), points per 100 possessions (PP100), or maybe even all three. Due to the inherent model uncertainty, I thought that I might be able to improve on Adam’s forecasts using a technique called Bayesian Model Averaging (BMA), which I have used in my economics research.
The idea underlying BMA is relatively simple. Since we don’t know which variables belong in the model, we should average the results from a number of models, each of which contains a subset of the variables. In practice, we use a weighted average of these models, where models that fit the data better receive higher weight. The weight of each model is determined by the model’s marginal likelihood (similar to the AIC or BIC in a traditional OLS framework).
To see why BMA can improve out of sample forecasting, consider the following example. Assume that the developers only look at PPG; that is, they don’t take PP36 or PP100 into consideration when deciding on ratings, but that the person trying to predict the ratings didn’t know that. This person ran OLS with all three variables included. Likely, the coefficients on PP36 and PP100 would be close to zero, but different than zero. Now imagine this person adding dozens of similar variables to the regression, none of which the developers actually pay attention to. Now, they would have dozens of coefficients near zero, but because there were so many of them, they could potentially add a substantial amount of “noise” to forecasts.
BMA deals with this problem by probabilistically weighting the models. The model including all of the variables may not fit any better than the model that only uses PPG. Since, in a Bayesian framework, there is a built in penalty for including more variables, the model with only PPG would receive much higher weight than the full model. When you average across the two, the coefficients on all the other variables would get further weighted towards zero (since they are not included at all in the PPG only model, more than half of the weighted average would be 0). Therefore the “noise” that results from including all of the extra variables in OLS is dampened when using BMA.
When I implemented BMA for 2K ratings, I did so using three sets of variables. First, I used the same variables that Adam did – I call this model BMA. Next, I used all of the variables that Adam had collected – I call this model BMA Full. Finally, I used all of the variables collected, along with the previous year’s 2K rating – I call this model BMA Full Lag. The number of variables is large in all three models: 27 in the first, 55 in the second, and 56 in the third.
|BMA Full with Lag||3.41||2.55|
|OLS Full with Lag||3.71||2.78|
I found that performing BMA with the same variables that Adam included increased forecasting accuracy by about 5%, reducing the RMSFE (you can think of this as the standard deviation of the actual values minus predicted values) from 4.68 to 4.47, and the MAFE from 3.75 to 3.54. While the forecasting gain is real, it is quantitatively small. In the BMA Full model, I found similarly sized gains, with the RMSFE falling to 4.37 from 4.62 when BMA was used instead of OLS. Finally, including the previous year’s 2K rating improved the performance of both models. The RMSFE fell to 3.71 when using OLS, and fell further to 3.41 when BMA was used. There are two graphical illustrations below.
The top image is a scatter plot of the forecasted ratings vs. the actual ratings. If we predicted with perfect accuracy, all of the points would lie on the 45 degree line. We can see that forecasting accuracy improves when using the model that includes a lag, since on average more of the points lie closer to the 45 degree line.
The bottom image contains the posterior rating distributions from all three BMA models for Nicolas Batum’s 2K15 rating. In his case, there were improvements as we moved from BMA to Full BMA to Full BMA with Lag. Under the BMA model, the posterior forecast is wide, and centered around 81 (his actual rating, as indicated by the black bar, was 79). In the Full BMA model, we can see that the mean of the distribution has shifted to the left, and the peak lies on top of his actual rating. However, in the Full BMA model the forecast interval is still fairly wide, with a 95% credible interval of about 71-87. In the Full BMA with Lag model, the distribution is still centered around his true rating, and credible interval shrinks to 74-84, indicating that we are now estimating his rating more precisely.
I showed off the power of BMA in the context of forecasting NBA 2K ratings. Specifically, I showed that when there is uncertainty about the variables to include in a regression model, taking a weighted average across all possible models can help improve out of sample forecasting accuracy. Although the application topic is fun, the general result remains true in more serious applications as well, such as when finding correlates of economic growth or when forecasting recessions.
Finally, I would like to thank Adam Yudelman again for sharing his data with me.