For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

Nathan's Normal Distribution Formula

Here's the jist of my formula:

I take the statistic of Points Per Possession scored and scored against any given team (adjusted for how difficult play is at www.kenpom.com). Every given team, according to these stats, are predicted to beat other teams by specific Points Per Possession. I take the predicted Points Per Possession Margin and subtract that from the ACTUAL result for all games a team has played. I then adjust this data according to the recency of the game (more recent has more weight) and how terrible their opponents were (if they play a cupcake, there's no reason to beat them by 40 when you can beat them by 20, and sometimes you can beat a team by 40 out of their frustration when on average you would probably beat them by 20).

This gives us a list of values. For example, UNC's list of values (per 100 possessions) is:

Pennsylvania : -11.51
Kentucky : 7.82
UC Santa Barbara: -0.44
Oregon : 3.75
Notre Dame : 5.95
NC Asheville : 15.92
Michigan St. : 34.24
Oral Roberts : -10.4
Evansville : -4.63
Valparaiso : -5.77
Rutgers : -3.93
Nevada : 6.8
Boston College : -30.6



I can assume a standard distribution of this data (although less so than the unadjusted data, since I didn't mess around with this) thanks to the Central Limit Theorem, since teams doing much worse or much better than average is more up to random chance than any of the data used here (i.e. there are millions factors we could use to predict a coin toss, but the toss still on average only gets heads one out of every 2 times....and it would be really hard to find those factors!).

Since this data falls under the normal distribution, we can apply probabilities to it!
The standard deviation of this data is 15.35, and the average is .28 (the average would be an accurate representation of how I think kenpom.com under or overestimates a team, but right now it also takes into account the fact that I don't know how he adjusts his data for recency). By using the spreadsheet function NORMDIST, we can then predict the total % of samples in which any given team predicted to win would lose, and vice versa.

For example, if the Boston College-UNC game were to be played again (at UNC), the following would result:

North Carolina would be predicted to win by an efficiency margin of 22.27 points for every 100 possessions. We must split this between both teams (to find the number of samples that BOTH teams would possibly to good enough / bad enough to change the predicted outcome). So that means we need to find the probability that Boston College can do good enough, and Carolina to do bad enough.

By using the NORMDIST function, we find that in only 5% of all of Boston Colleges' samples (adjusted for their consistency, as we require the standard deviation for this calculation) could they do good enough to make up for the 11.14 points per possession that they would have to get. However, in 23% of all samples, North Carolina could do bad enough to cough up the extra 11.14 points per possession they need to lose. So out of the TOTAL number of samples (200%), we get the following result:

Boston College has a 13.74% chance of winning, and North Carolina has an 86.26% chance of winning.

Now, this data is incorrect to apply to the UNC game everyone witnessed a few days ago for one reason only: the reason UNC could cough up that many PPP is because RECENTLY they have been the most inconsistent all season, namely -- against Boston College!


But this formula can be applied to any team, and help us understand when an underdog isn't really that much of an underdog.

No comments:

Post a Comment

Followers

About Me

I wish my heart were as often large as my hands.