For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

Fixing the current models...

Ken Pomeroy and the LRMC (logistic regression-markov chain) models for predicting NCAA winners are both incredibly accurate, but they both have their own flaws -- for similar reasons.

Ken Pomeroy's measure eliminates pace, but this takes away a good portion of basketball's psychological strategy of gaining a large lead. For example, at halftime, a team that is up by 10 points could have either been very efficient and slow (which Kenpom would measure as good) or less efficient but very fast - (which Kenpom does not measure as a good thing). While it is not GOOD to be inefficient, the barrier of ten points is still in the way for the losing team, despite the inefficiency. The comeback is still required, and is a psychological barrier.
Furthermore, Ken Pomeroy (using Bill James' pythagorean expectation formula instead of Dean Oliver's formula including standard deviation) asserts that the head-to-head better team should have the higher winning percentage. This is a false assumption even when all anamolies are taken care of. The fact of the matter is -- whichever team is most efficient is most likely to win in a competition; but the slower the team is, the more likely they are to lose to a worse team(the smaller the point barrier to overcome, the more likely that the worse team can overcome it). This is indirectly mentioned by Dean Oliver in Basketball on Paper in his section on standard deviation, variance, and covariance, etc.
The LRMC, on the other hand, only uses point margin in its calculations. While this effectively works with the idea of a 'point barrier,' it might not work as well in head-to-head, as we know, greater efficiency will always lead to more points, even if the margin is small. Furthermore, as we know, teams that are slow might also be ranked lowly on the LRMC, which would ignore the higher percent of wins that slow teams get against better teams.

Thirdly - neither of these formulas take consistency into account.

So to make the best model, we must use one that includes consistency, but combines the idea of a 'point barrier' and efficiency. It must also be two rankings: head to head and overall win%.


I finally bought Dean Oliver's book, "Basketball on Paper" -- and I think I am onto a breakthrough!

Dean Oliver actually uses a statistical formula based on standard deviation to prove the percent chance that a team will win a game! Unfortunately, Dean has not done nearly as much as Ken Pomeroy in terms of figuring out how to adjust statistics for quality of opponents. So currently I am working on a system that turns Kenpom's predictions into chance of win.

Dean Oliver's formula is as follows:

Win%=NormsDist(Point Margin/Standard Deviation of Point Margin)

so if I perhaps used Kenpom's predictions--

Chance of Win%=Normsdist(Predicted Point Margin/Standard Deviation of Actual Minus Predicted Point Margins of both Teams)

This gives us a good number (for example, 2008-2009 North Carolina posts an on-season 96.5%, whereas the Log5 formula puts them at 97.7%) -- but I am still unable to adjust for opponents' inconsistency. Oh well -- it's definitely a step in the right direction!


About Me

I wish my heart were as often large as my hands.