For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

Heels vs. Badgers, Predicted Four-Factors

Hey, that title kinda rhymes.


I've been trying to create adjusted four-factors for quite a while now, and I've finally settled on a method that is somewhat sound. First, I adjusted each team's four-factors for strength of schedule. Then I adjusted for home-court advantage using a similar method. Here are my predicted results for UNC v Wisconsin.



eFG%TO%OR%FTREfficiency
UNC48.217.431.839.5101.9
WISC49.719.133.49.9100.4

All the margins here are very slight, except for Free-Throw-Rate, which seems about right considering the Heels' and Badgers' status quo.*

Caveat: If we adjust Free-Throw-Rate for FT%, UNC's efficiency would most certainly drop in this formula. The four factors explain 95-99% of a team's efficiency. The remainder mostly comes from OR% being overrated when eFG% is "artificially" boosted by 3-pointers (this is why FG% is a necessary evil...), and free throw percentage.




*-What is the plural of quo? "Quotient"? "Quos"?? I sure hope it's not "quos."

Introducing The Holy Grail II: Offense & Defense (featuring UNC v. UNLV)

Accurate +/- data thanks to Adrian Atkinson (@freeportkid on twitter), editor of the Tar Heel Tip-off.

In this year's College Basketball Prospectus, I introduced "The Holy Grail" - a very simple formula for estimating true player offensive impact per possession based on Offensive Rating and Possessions Used. Not only does this describe player production (R^2 value of .65 against "true" player production), but it only requires a couple of inputs.

Unfortunately, this does not give us the whole picture: no defense, and no admission that "intangibles" could be in effect that are described by traditional plus-minus. This is especially useful when trying to analyze one game: the more causal/correlative statistics we have to offset any issues of sample size, the better**. Enter my new NCAA stat, which we will simply call "Efficiency Impact."

By using statistics to predict "true" player production, we can estimate a player's overall impact (offense + defense) rather than just offense. For an idea of the depth of this formula, these are the main parts of its tabulation:

  • Offensive Rating & Possessions Used
  • Defensive Rebound Percentage
  • Steal Percentage (Steals/Opponent Poss)
  • Block Percentage (Blocks / Opponent 2FGA)
  • Assist Rate (Assists / Team FGM)
  • Offensive & Defensive Efficiencies, both On & Off-Court
    and more...

READER'S NOTE: I have adjusted the +/- to be taken into account based on the number of possessions played; for NBA players, I estimated the amount of "noise" (inaccuracy) introduced based on low sample size, and adjusted accordingly. The same methodology is used here.

Here are the main inputs from the UNC in the UNLV game:
(In the "adjusted" categories, positive is always good, negative is always bad).

MIN%AST%STL%BLK%DR%POS%ORTGadj offense (on)adj defense(on)adj offense(off)adj defense(off)
Hairston. P.J.35.00.00.07.30.018.0204.91.6-0.60.70.1
McAdoo, James M.45.017.28.40.015.816.8114.91.4-0.30.60.2
Bullock, Reggie47.510.72.60.015.015.9110.90.9-0.30.50.2
Hubert, Desmond2.50.00.00.00.050.6151.50.00.0-0.10.2
Watts, Justin5.00.00.00.047.60.00.0-0.2-0.1-0.10.2
Strickland, Dexter72.513.01.70.06.515.7122.0-0.4-0.7-0.40.0
Marshall, Kendall77.542.70.00.012.214.6106.8-0.5-0.6-0.50.1
Henson, John*80.00.00.03.217.822.175.3-0.2-0.5-0.30.1
Zeller, Tyler*60.00.00.00.027.714.763.9-0.7-0.8-0.50.0
Barnes, Harrison*75.06.61.60.06.328.687.2-1.5-0.4-1.20.2

And here are the results:


Statistical +/- per 100Adj. +/- per 100Efficiency Impact (per 100)Efficiency Impact (Game)
Hairston. P.J.18.812.265.132.09
McAdoo, James M.12.402.233.912.05
Bullock, Reggie-0.581.450.920.51
Hubert, Desmond31.050.185.670.17
Watts, Justin3.98-0.200.510.03
Strickland, Dexter2.00-1.73-1.45-1.23
Marshall, Kendall0.33-1.84-1.87-1.69
Henson, John*-7.06-0.89-2.38-2.22
Zeller, Tyler*-8.64-2.24-3.73-2.61
Barnes, Harrison*-3.84-3.36-3.90-3.41

Hairston's insanely high offensive rating (204.9) on nearly 20% of UNC's possessions during his minutes, in addition to UNC's overall improvement in efficiency leads to Hairston leading the Heels for the game. The big 3 boys*, on the other hand, didn't even break 90 in terms of ORTG, and played for much of the game.

Desmond Hubert only really played in one possession, but grabbed an offensive board and made a free throw, thus the high usage/high ORTG and impact per 100.

I will continue posting these, especially for Carolina games as Adrian's +/- data is more reliable than StatSheet's, but I am willing to analyze more games.




* - I don't think anyone calls them that, but I just did.
** - In this prediction formula, I have largely canceled out statistics that covariate heavily, leading to coefficients that have very low p-values. Each of these in tandem lead to a rating that is well-adjusted (for example, I have found time and time again that Offensive Rating overrates players' shooting efficiency, so this formula inserts a negative term against True Shooting Percentage). Using plus-minus data is similar: why should we trust a player's box-score rating if their team did considerably worse while they were on the floor?

Tidy Text: Top Teams' Toughness Tabulation, 11/28/2011


Here's a quick look at how the top-ten-Pomeroy teams are faring early in the season.

I took each team's wins, and adjusted them for strength of opponent, home-court-advantage, and most notably, diminishing returns (for example, winning by 40 then by 20 makes you look like a +23 team, rather than a +30 team*).


TmAdjusted Win%
Wisconsin (2)0.991
Ohio St. (3)0.985
Kentucky (1)0.977
Syracuse (5)0.948
Alabama (10)0.947
Florida (8)0.941
Duke (6)0.937
Louisville (7)0.926
Missouri (9)0.924
North Carolina (5)0.921

The top 3 are playing like the top 3 (ish), Bama is playing quite well, and UNC has been lagging behind.




*This is a pretty simple excel calculation. Each game returns an adjusted efficiency margin by home-court-advantage/opponent strength, which I assume has a game-standard deviation of 16. I plug this number into the NormDist function in excel like so

=NormDist(x=Adj.Margin, mean=0, st.dev=16, cumulative=TRUE)

So when we average a 40-point win and a 20-point win (assuming a pace of 72 possessions), we get the following:

40-pt-win = .99974 win%
20-pt-win = .95873 win%

Average these two win%, and you get .97923.
The NormDist function regresses in a pretty intuitive way (theoretically, we could say that it estimate's the team's "real" win%). And, intuitively, plugging this Win% back in reverse does NOT give us +30 points, but rather +23.5 points. And so on.


The Media Audit 11/16/11: "Commodores," "Bruins," and other words one would rarely use outside discussing NCAA sport.

Stuck inside with a terrible cold, so we get a blog post!
Today we are going to look at the following three 'claims' by media or popular convention, and support or refute based on the hard data:

Claim #1: Coach K has won 903 games.

Okay, just kidding. I wish him only luck as a human being, and I wish him only ill in basketball. Moving on to the real...

Claim #1: Vanderbilt is a top-caliber team.
There are a few obvious reasons I bring this up.
  1. The Commodores were #7 in the preseason polls
  2. The Commodores were #9 in Ken Pomeroy's preseason rankings (buy the book!)
  3. The Commodores have dropped to #18 in the polls and #19 in the Pomeroy rankings after losing to Cleveland State...at home...by 13.
Couple thoughts:
Obviously, we don't have a lot of information on Vanderbilt's true ability, as they have only played three games. However, with a tough schedule ahead (Kentucky 2x, Florida 2x, Marquette, Louisvile), Vandy needs to shape up if they want to keep their losses to a minimum. Shape up, you say? Yes:
  1. Vandy shot 39.2% in effective FG% against a Cleveland St. in the post-Jarvis Varnardo era.
  2. Vandy turned the ball over on over 30% of their possessions in the same game.
  3. Biggest consistent issue: field goal defense. Vandy allowed 54.7% eFG against Oregon, 54.6% against Cleveland St, and 51% against Bucknell. For comparison, the worst average eFG defense last year was Central Arkansas, who allowed 56.2% against even weaker competition.
Vandy is starting off much worse than average at forcing misses. The best team last year with below-average eFG defense was Marquette, who did so against the ridiculously stacked Big East. Suffice to say, they gotta step things up. 

Claim #2: Belmont is not a top-caliber team.
The Bruins received votes on Monday, but did not make it into the top 25. Suffering from Davidson syndrome, Belmont decided to schedule against super-tough Duke, and somewhat-less-tough Memphis, and lost both times. As I mused earlier, Belmont's one-point loss against Duke was honestly very very impressive. The sixteen point loss at Memphis was perhaps less impressive, but Belmont had pulled within seven at the 4:30 mark in the 2nd half, on the road no less.

ESPN wrote that Belmont "put up a good fight," so perhaps the media still has their eye on the Bruins, although they will be hard to follow until March, when their fate is determined by their RPI...

Claim #3: Ben Howland should be fired.
I am a very conservative statistician, when I am being honest. To that degree, I find it very difficult to say that any one coach should be fired after couple crappy seasons. Let's look at the facts, by UCLA's Pomeroy rankings:

2004: 125
2005: 66
2006: 3 (lost to nat'l champs Florida in...the National Championship)
2007: 6 (lost to nat'l champs Florida in the Final Four)
2008: 3 (lost to nat'l runner-ups Memphis in the Final Four)
2009: 12 (crushed by Nova in the round of 32)
2010: 109
2011: 54

UCLA has most definitely had a down couple of years after having three of their best seasons in a long, long time. So let's look at this year:

2012 (so far): 
  • #93 Pomeroy 
  • Lost to #171 Loyola Marymount at home by 11
  • Lost to #112 Middle Tennessee at home by 20
  • Allowed 66.7% eFG against these teams
Allowing 66.7% from the field is in the bottom 20 in the country, and honestly is probably somewhat due to luck on Loyola/MTSU's parts. 

But with a moderately tough conference schedule and a home game against Texas soon, UCLA is in danger of starting out 1 and 3 against D-I opponents. Pomeroy's projections have the Bruins at under .500 in conference and under .500 overall. Scary times for Howland's squad, but given the talent I'm not all-too surprised. 

Bonus Claim #4: J'Covan Brown played an amazing game last night.
One of my roommates (who I will refer to as "Mur-Dog") just asked, "How good is J'Covan Brown?" So now I will tell you, offensively, how good his game was last night:

With an offensive rating of 157 on 29.5% of his team's possessions while on the court, against a slightly-better-than-average defense, J'Covan Brown's estimated offensive impact per 100 possessions was +20.7. Which on its own is better than any offense in the country (the highest adjusted offense is Ohio State, whose offense is roughly +18.8 better than average). Over the course of the game, he played 90% of the game, which boosted their offense by a total of +18.7 points per 100 possessions. DANG.

----

As soon as I reformat my hard drive (got my first PC virus in two years...), I will start discussing players by their overall (defensive + offensive) impacts. Hint: John Henson is sort of the best player in the country in his first two games. Color me surprised, offensively.

"My Team Should be #4, not #10!"

Howdy, folks.

Just thought you should know that we can demonstrate, mathematically, that there is little observable difference between teams in the top 25...er...top 32...er....top 15.

If the top 32 teams played .500 teams for 34 games, this would be their expected record (according to Ken Pomeroy's preseason numbers) :



RankTeamWins per 34
Losses per 34
1Kentucky331
2Ohio St.322
2North Carolina322
2Duke322
5Syracuse313
5Connecticut313
5Pittsburgh313
5Louisville313
9Vanderbilt304
9Wisconsin304
9Kansas304
9Florida304
9Temple304
9Missouri304
15Baylor295
15Xavier295
15Gonzaga295
15Nevada Las Vegas295
15Purdue295
15Memphis295
15Marquette295
15Michigan295
15St. Mary's295
15Michigan St.295
15Miami FL295
15West Virginia295
15Belmont295
15Florida St.295
15New Mexico295
15Texas295
15Texas A&M295


True, Texas A&M would very likely lose to Kentucky (giving Pomeroy's preseason ratings the benefit of the doubt here), but over the course of a season, there is little observable difference between the two (at least in terms of wins and losses). But given college games usually deviate from 10 to 12 points from the predicted point margin, it is even true that over the course of time, there is little observable difference between the 25th and 1st teams -- even to algorithmic ranking systems.

I say all this not to oppose ranking -- I love ranking teams and players. It's what I do. But to get up in a tizzy about who is #1 and who is #5 in rankings, especially pollster rankings, is honestly pointless (and honestly unknowable). Furthermore, as you get further and further from #1, teams become more and more close in skill level -- i.e. given accuracy of ratings, the difference between UK and OSU is significantly greater than the difference between UT and TA&M (a consequence proven by the central limit theorem).

So next time you want to bicker about the pollsters, take a step back and say: "The difference between team #4 and team #10 is actually quite small," over and over.

College Basketball Prospectus 2011-2012



It's out! I am very proud to be listed as a contributor on this book alongside some of my favorite sportswriters/statisticians/sportswrititicians.


Buy it here:
http://www.basketballprospectus.com/products/cbp2011/

!

Why John Henson Scares Me

EDIT: Hello, people from The Devil's Den. Aptly-titled! Just an FYI, I ran my NBA-style regression on these numbers (which I used to get extremely accurate numbers in the 2011 NBA finals) and Henson's "Roland Rating" here on offense honestly looks more like -1.5 than -17.5 when we adjust for randomness.

This is going to be a very short blog post. As a Tar Heel fan, John Henson scares the heck out of me.


Some of you who read this will scream out "PLUS MINUS IS FALLIBLE." And true, correlation does NOT imply causation. But this discrepancy appears to be very significant. Around 90% of Henson's on-court/off-court performance was tracked by StatSheet last season, and there are some interesting results.


The Heels seem to be about 3.2 points per 100 possessions better on defense with Henson on the floor.
Offense, on the other hand, is a very scary story, that I saw with my own eyes all season long:


(These are slightly estimated & rounded, based on UNC's average pace)



The Heels are nearly EIGHTEEN POINTS worse (per 100) on offense with Henson on the floor. Poor foul-shooting and over-shooting from inside the lane without passing to more efficient teammates are what appear to be the cause, just from watching.


Now, with this in mind, it might seem like the Heels would be worse with Henson (offensively) in every game, but that figure is more like 68% of the time. So this data at least suggests that there is maybe more to the picture than meets the eye.




I saw that Henson played miserably in his 11 minutes against William & Mary (who UNC ended up clobbering). So I removed unranked and non-ACC teams, and got the following:










So. Yeah. Any ideas?

Followers

About Me

I wish my heart were as often large as my hands.