Friday, July 17, 2020

Predicting NBA Team Scoring Rates with Splits

Using splits when analyzing sports is, generally, something that should be done with caution. Entire seasons worth of player and team performance are subject to noise so taking that data, breaking it into specific splits, and drawing conclusions leads to even noisier results.

With that being said....I am going to see if I can use team scoring rate splits to better predict next year overall scoring efficiency. The impetus behind my idea was that I thought certain splits may better isolate how effective an NBA offense is at scoring efficiently. For example, one of the splits I looked at was scoring efficiency on possessions that started with a dead ball. The theory is that by looking at just these possessions, I could get more signal about future offensive efficiency because this split is not reliant on the opponent missing a shot and the location of the rebound after the missed shot. The issue is, in general, teams have about one fourth to one third the possessions following a dead ball versus those that start after an opponent miss.

I looked how all situations scoring efficiency in year N can be predicted by scoring efficiency in all situations, off of a miss, after a dead ball, and after a steal in year N-1. I included scoring efficiency after steals to show how offensive value generated off of steals is not something teams should count on year over year since they are highly dependent on the opponent and transition efficiency in general is a noisy. I made five simple linear models to predict year N all situation offensive efficiency (measured in points per 100 possessions): scoring efficiency in all situations in prior year adjusted for minutes continuity between years N and N-1, scoring efficiency after opponent misses in prior year adjusted for minutes continuity between years N and N-1, scoring efficiency after dead balls in prior year adjusted for minutes continuity between years N and N-1, a model using all three of the splits adjusted for minutes continuity between years N and N-1, and a model to look at the change in overall scoring efficiency with just minutes continuity between years N and N-1. The minutes continuity data can be found here.

The following represents the adjusted R squared values for each linear model:
Of all the models, just using the prior year overall scoring efficiency is most predictive followed by the mixed split model and the model using efficiency after opponent misses. First, my assumption that performance after dead balls might give more signal into a team's "true" offensive efficiency seems to not hold much weight. That model was barely more predictive than the model using scoring effectiveness after steals. I would attribute this to the sample issues I alluded to above, where there were only about one fourth to one third of the possessions after dead balls relative to those after a miss for a given team season. Furthermore, not breaking the data down into splits proves to be the best way to handicap a team's offensive efficiency in the following year. Now keep in mind, this is a very simple way to approach such handicapping; the R squared between all situations scoring efficiency in years N and N-1 is still only 0.38 after adjusting for minutes continuity. What is missing in this analysis is a more granular look at minutes continuity; the way I accounted for minutes continuity does not account for the quality of the players nor does it account for changes in player ability by means of aging. This manifests itself in the model using just simple roster continuity having an adjusted R squared of just 0.012. Incorporating player value and performance (through one or a blend of the many RAPM variants out in the public) and player aging would yield a better correlation.

What can be gleaned from this study? This is a reaffirmation that using splits to predict team performance is not a worthwhile endeavor. Splits can be useful for describing team and player past performance, but their contributions to past performance (either positive or negative) is noise for the purposes of prediction and should be regressed heavily towards the mean. 



No comments:

Post a Comment