For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

Offensive Decision% version 2

I've updated my Offensive Decision% formula to include Offensive Rebounds. A field goal attempt is now only considered one 'decision' if the player does not rebound the attempt. So the denominator now subtracts Offensive Rebounds x Shot% in an estimation of how many offensive rebounds a player makes on their own missed field goals. Here's the entire formula:

Here are the current results for the NBA, with those playing 500 minutes or more.

NCAA team offensive efficiency impacts

I have previously done work on estimating how much statistics (specifically, the Four Factors + 2 more) impact efficiency. My prior method was lazy and inaccurate at adjusting for Strength of Schedule. The new method adjusts each factor rating differently. For the math, scroll to the bottom* EDIT: Yes, the total numbers do not EXACTLY equal (Adjusted Offensive Rating - League Average Offensive Rating), but they are close (R^2 of .99, to be concise).

But here's what you really want.
NCAA adjusted offensive four factors

*The original method took (Deductive Efficiency - Deduced efficiency with league average stat) and multiplied this by (Adjusted Efficiency / Raw Efficiency). The new method is a little more complex. I found out that each stat didn't impact efficiency as much as I thought, since each factor interacts with one another. I found the following:

While predicting change in efficiency (minus average), the following weights occur: eFG&FG+=1.065833, TO%+=1.088916, OR%+=0.935664, FTR&FT+=0.38507

Each individual output would have to be multiplied by these coefficients. However, I still needed to adjust for strength of schedule. To do this, I subtracted Adjusted - Raw Offense for each team to get their Schedule Adjustment Factor. I then weighed each of the four factors so that they would sum to one (fg=
0.306672, to=0.313314, or=0.269218,ft= 0.110796). Here's an example of how eFG%&FG% look:

eFG&FG+=1.065833 * [(Deduced Efficiency - Deduced efficiency with average eFG% and FG%) + .306672*Schedule Adjustment ]

NCAA player offensive ratings

Just put together another round of player offensive ratings - data via It adjusts for quality of defense and usage% - the final number is an estimate of how efficient an average team's offense would be with the player on the court.

Here's the results

2010 Tourney "All-or-nothing" Ratings

This is using my pre-tournament LRMC simulation from last year. See my last blog post for the method,. I still haven't come up with a good name for it.
How does "effective likelihood of doing better than statistically expected" sound?

here's the first, second and third (r1, r2, and r3)

The first-round component gives us the most meaningful information (as the later rounds heavily favor better teams). Let's take a look at the results:

1) Xavier: as a 6-seed, beat 11-seed Minnesota and 3-seed Pittsburgh
2) Washington: as an 11-seed, beat 6-seed Marquette and 3-seed New Mexico
3) Marquette: lost to the (more-volatile) Washington
4) Utah St: lost to Texas A&M (who was just 0.07 lower in volatility)
5) Minnesota: lost to (highest-volatility) Xavier

other notables: the official Cinderella of 2010, Butler, was #7 (volatility of 0.48). Also, Cornell (who beat the 5 & 4 east seeds as a 12-seed), was ranked 19, with 0.4.

On to the second round:

1) West Virginia: made it to the Final-Four as a two-seed.
2) BYU: fell to Kansas St, who was 5th in volatility
3) Duke: Won the tournament...
4) Kentucky: Didn't make it past West Virginia, but succeeded as a (statistically) overrated team
5) Kansas St: Fell to Butler in the Elite 8 - pulled through in a pretty tough bracket though (statistically)

other notables: Butler is the highest-ranked 5-seed in 2nd-round volatility.

The third round doesn't tell us much new information, although Duke is the highest-ranked team here (in a bracket that statistically favored Kansas).

Anyways, the information here is hard to quantify, but I think some important things can be learned, especially from the first-round component!

Team Volatility

This is going to be a shortish post considering the amount of new analysis I'm introducing, but I would like to start offering some tools to help predict even the strangest of occurrences. For example, it would have been statistical folly to predict Northern Iowa or Cornell to win as many games as they did in 2010; I want to predict the next Cornell!

So let's go in order of depth.
First, basic probabilities: has some phenomenal pre-selection simulation projections for the tournament, giving individual probabilities for each team making it to round X.

From these we can find AVERAGE PROJECTED WINS: simply sum together each of the 6 probabilities to find the mean-expected wins each team will have in the tournament.
From this, we can do some theory: given that team x wins at least y games, how many wins will they THEN be projected to have; I call this "Average Projected Wins with X games secure." This would be estimated like so:

=Y games won + sum(probabilities of the rest of the tournament)/(probability of winning Y games)

So for two games secure, the math would be:

=2 + sum(probabilities of winning the 3rd,4th,5th, and 6th games)/probability of winning in the second round)

From this, we can get a hybrid statistic, that I like to call Volatility: this is the marginal wins gained from winning any specific round of the tournament, TIMES the probability of winning that round. We do this by subtracting "X games secure" from our starting average (zero games secure).
For example, one team's volatility in the first round would be:
=[(2-win secure average wins) -( 0-wins secure average wins)] * odds of winning those first two games

The first three rounds are the ones that tell us the most information, l
ater rounds are skewed by higher-quality teams having much higher odds of winning the games beforehand. On the right are the top ten teams by "first round volatility," considering the projected fielding of teams.

This tells us, roughly, which team will benefit the most if they can overcome early obstacles. A better utilization of this method would be to subtract from the ESPN National Bracket "average wins" rather than my statistical "zero wins secure average." This gives us a better picture of which team will do better than expected by most, and therefore, which team will help you destroy everyone in your office pool!

Offensive Decision%

Finally, some good old fashioned statistics that don't have really good theory behind them!
Often-times, when I'm watching a basketball game, I mentally determine who is making the most good decisions and the most bad decisions on offense.

So here's a basic metric of what my eyes see, and I call it Offensive Decision %. It basically measures, poorly, Good Offensive Decisions / Total Offensive Decisions.

=(FGM + Assists + .44 * FTM) / (FGA + Assists + TO + .44 * FTA)

And here's the top NBA players (as of earlier this week) with median minutes played or more.

Estimated defensive rating formula, with Usage% !

EDIT/UPDATE: This formula, like Dean Oliver's is based on some good theory, but as I have examined it more, it is a very poor measure of defensive success. If you need a quick fix, the following explains player defense better than the formula described:

(Points Allowed On Court / Possessions Played) - (Points Allowed Off Court / Possession Off-Court)

Woo! This one took a lot of work, but I think I have all of the theoretical errors taken care of. It's very similar to Dean Oliver's box-scoreformula, but with a few important adjustments:

-'Points allowed' are assigned individually based on estimated output per possession in units of 0, 1, 2, and 3, based on Ryan Parker's bachelor essay (Rather than only assigning players Stop values that add a marginal 'DefensivePointsPerScoringPossession' per stop)

-For each possession-allowed (0,1,2, and 3), we both estimate (via blocks, defensive rebounds, turnovers, and player fouls) the effectiveness of the player's defense, but also more intuitively adjust for our unknowns (most importantly, non-block-forced-field-goal-misses). This allows us to not rely on shoving 100% of the Team Defensive Rating into the final step of the formula.

-Defensive possessions used are calculated by the marginal-used-possessions from our estimates; The base rating still lies close to 20%, but is modified only in part by blocks/stls/pf/dr.

This is for college ball, since that's where Ryan's estimates of possession-endings come from; however, the forced free throws come from my NBA-team-estimate (which is pretty lazy currently).

quick definitions for the uninformed:
DFG% = opponent's Field Goals Made / opponent's Field Goal Attempts
DOR% = opponents Offensive Rebounds / (opp. off. reb + team def. reb)
PF=player personal fouls
dFTA = Free Throw Attempts by opponents
tmBlk = (team)blocks
DR = player defensive rebounds
dFT%=opponent's Free Throws Made / opponent's Free Throw Attempts
dFGA=opp's field goal attempts
dFGM=opp's field goals made
d3PM=opp's made three pointers
Stl = player steals
Poss = team possessions, as estimated here

Let the math begin!
tMin% (team minute %)= .2 * minutes / game minutes = minutes / team minutes
(this is our basic estimate of player defensive involvement for the whole game in places where we can't assume otherwise)

PossPI (possessions played in)= Team Possessions * tMin% * 5

FMW (forced-miss-weight) = (dfg%*(1-dor%)) / (dfg%*(1-dor%)+(1-dfg%)*dor%)
(same as Dean Oliver's formula - distributes credit of missed field goal to the one guarding and the one getting the defensive rebound. Guarding man gets FMW, defensive rebounder gets 1-FMW).

eFFTA (estimated forced free-throw-attempts) = (.6033*PF^1.2132)
(This is the basic team-level estimate I got from the NBA)

FFTA (forced free-throw-attempts) = uafFTA * (dFTA/team's Sum of(uafFTA))
(This forces the prior number to make the total forced free throw attempts equal to the actual free throw attempts)

FMstops (stops from forced misses)=(Blk + .tMin%*(dFGA-dFGM-tmBLK))*FMW*(1-dOR%) + DR*(1-FMW)

(Defensive rebounds are worth 1-FMW, blocks are worth 1*FMW, and we estimate that all other DFG% can be distributed equally. My NBA-team data showed zero correlation between Blocks and Non-Blocked-Field-Goal-Misses).

0pdp (zero-points-defensive possession)
=FMstops + .27*(fFTA-fFTA*dFT%) + Stl + tMin%(dTO-tmStl)

(Gives each player full credit for their steal, and then distributes all other turnovers equally. NBA team data also seemed to show no correlation between Steals and Non-Steal-Turnovers. This, like the rest of the 'pdp' formulas is based off the possession-ending-estimates in Parker's bachelor essay.)

=.35*FFTA - .25*fFTA*dFT%

=.95*tMin%(dFGM-d3PM+(tmBlk-Blk)) - Blk +.36*FFTA*dFT%

This spreads out 2-pointers made between all-players, but trades out the appropriate credit for blocks. This might look a little counter-intuitive, so I might talk a bit more about this in comments or a later post. Also, we assume that each player only blocks 2-pointers.

=tMin%*(d3PM+.02*(dFGM-d3PM)) +.03*FFTA*dFT%

dPA=1pdp + 2*2pdp + 3*3pdp
(Defensive points allowed. 1 for 1-point possessions, etc)
dPOSS=0pdp + 1pdp + 2pdp + 3pdp
(Total defensive-possessions the player is credited for ending.)


Here the formula is in action (from Saturday's Carolina game):

Edit: If you're wondering how effective this really is, check out the ratings applied to NBA players with median minutes or more, and converted the ratings to defensive win shares. Compare this with basketball-reference's 2010-2011 season by Defensive Win Shares.

Losing Larry Drew II

EDIT: I accidentally named Strickland in the paragraph on defensive plus-minus rather than Drew. Now fixed.

The North Carolina Tar Heels just lost Larry Drew II, transferring after playing some pretty decent basketball (according to that article).

Let's take a moment and look at Larry Drew's estimated offensive impact.

Using 15% of the Tar Heels' possessions for 57.5% of each game, with their lowest Offensive Rating, I estimate that losing Drew will bring Carolina's 'Raw' Offensive Efficiency up to 107.88 (from 106.45). Depending on how you look at it, Drew's absence would add between 1.4 and 1.5 points per 100 possessions to Carolina's 'Adjusted Offensive Rating'.*

Also, I ran StatSheet's plus-minus data and found that (weighted by minutes played) while Drew was on the court, Carolina averaged a point margin of 2.3 per 40 minutes. With his replacement point guards on the court, they averaged 10.8 points per 40 minutes. To this effect, Drew's on-court presence hurt Carolina by 8.5 points per 40 minutes.

But Larry Drew's main claim to fame was his defensive prowess. There are no truly good defensive stats for players like Drew, but we have to assume that he contributed some to Carolina's defense. Let's try to take a closer look:

Some quick stats from his Pomeroy page: I'll rank him among the three players who run point the most (Marshall, Strickland, and Drew).

Defensive Rebound%: Drew takes the lead at 9.3%, in close second is Strickland's 8.7%. Marshall isn't far behind at 7.4%
Block%: Ha! Marshall is the only one recording noticeable blocks, with 0.3%.
Steal%: Drew posts an impressive 2.7, but Strickland and Marshall have him beat at 3.1 and
Fouls Committed per 40: While fouling helps in some situations, Carolina's best Four-Factor stat is how few times their opponent gets to the line. This will likely only improve, as Drew's
modest 3.4 is bested by Marshall's 2.3 and Strickland's 2.6.
Defensive Plus Minus: Not going to rank players (takes too long to get these numbers), but with Drew on the court, Carolina allowed 40.9 points per 40 minutes. Off the court, Carolina allowed only 28.0 points per 40 minutes. That means that with Drew on the floor, Carolina did 12.9 points per 40 worse on defense.

It's never a very good idea to only use plus minus when looking at players, but NET +/- can tell us some reasonably accurate things about the effect of substituting players. As long as Carolina can emotionally push through this, losing Drew could actually win them an extra game or two. I just pray that the boys stay out of foul trouble and don't get fatigued now that a lot of minutes have to be filled.

Furthermore, I think that I would personally stick with Strickland, not Marshall. While Marshall posts an insane assist rate of 42.8 (compared to Strickland's 12.6), I'll take Strickland's TO% of 18.4 over Marshall's sloppy 32.9 any day.

That is all!

*One way of adjusting is just adding the 1.4 to the raw numbers. But if I use the ratio of UNC's Adjusted Efficiency to Actual Efficiency (1.034), the impact goes from -1.43 to -1.47.

NCAA Love...or How I Learned To Keep Worrying About Maryland Not Making It...

The following my listing of projected NCAA seed (from versus LRMC seed (LRMC ranking + .75):


Maryland still gets the short end of the stick, and Oklahoma St. gets too much love.


About Me

I wish my heart were as often large as my hands.