For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

Saving My Beloved FG% from Death...

I am constantly frustrated by the current basketball community's lack of love for FG%. In Pomeroy's analysis, he looked at a very small sample of just one team. But I think it can be argued that FG% is still an important metric for two things: discussion of NBA players and of possessions.

First, a quick numerical analysis:

I took the top 310 or so NBA players and looked at how 'underrated' they are by their FG%. If we just look at the difference between eFG% and FG% we see that nearly 40% of players are underrated by 5% or more. But this is only a small part of the picture. Nobody who eyeballs the FG% of a guard thinks that a player is only shooting two-pointers. So I set up a regression for estimating eFG% via FG%. This allows us to look closer at the difference in 'roughly-expected' eFG% and eFG% itself.

Done this way, only about 10% of these players are off by 5% or more. If we get rid of the 'overrated players' (we assume that people aren't going to overrate a player's FG% in their minds), that drops to 7.7%. So at least on the player level, it's reasonable to say that FG% is pretty good for eyeballing efficiency via shooting.

Still unconvinced? Here's some more pudding:

1) FG% is useful for discussing Rebound%. To compare one team's eFG% to their OR% is not as intuitive as comparing with FG%. Field goal percent gives a better picture of 'possible rebounds' than eFG%.

2) In the same vein, FG% is more important in the discussion and analysis of what ends possessions and what doesn't.

3) I will maintain that the three-point-shot is harder to repeat. I have not done any analysis on this, but the theory is sound: the more difficult the shot, the harder it is to repeat. Therefore, to some degree, I would estimate that year-to-year FG% is a better predictor of out-of-sample eFG% than eFG% itself.

4) At least in the NBA, extremely high eFG% by a player is more likely to be from a big man; so in extreme cases of 'shooting well from the field', (which are often the important points of study), FG% is usually sufficient.


I know that none of this takes care of the two basic arguments against FG%: worse correlation with offensive efficiency, and 'just add .5*3pm from the box score!').


To that I say:

'Hey, eFG% isn't even really a percentage! It's just easier to type than Field Goal Points / (FGA * 2). I like my percentages to be out of 100, thank you!'

Below and Above The Bubble

Here are the 15 teams that reside just below the bubble for the NCAA tournament according to rpiforecast.com.

(at large rank) Team, Conf., LRMC rank

(#50) Florida St., ACC, #37
(#51) Maryland, ACC, #22
(#52) Virginia Tech, ACC, #40
(#53) Gonzaga, WCC, #39
(#54) UTEP, CUSA, #62
(#55) Richmond, A10, #51
(#56) UCLA, P10, #60
(#57) Southern Miss, CUSA, #75
(#58) UAB, CUSA, #73
(#59) Duquesne, A10, #30
(#60) Miami FL, ACC, #66
(#61) Clemson, ACC, #55
(#62) Old Dominion, CAA, #72
(#63) Marshall, CUSA, #67
(#64) South Carolina, SEC, #97
(#65) Penn St., B10, #69

This is looking pretty bad for those top ACC teams - the three right below Duke, Carolina, and Boston College in at-large probability. The biggest travesty here is obviously Maryland, who is #22 in the LRMC and #14 in Pomeroy. However, if they sustain their impressive defensive efficiency, win @ Boston College, and end up with 21 wins (as Pomeroy's numbers expect), I think they'll make it in.

Now let's look at the other end of the spectrum (the last 15 in):

(#49) Northwestern, B10, #38
(#48) Butler, Horz., #31
(#47) Colorado St., MWC, #68
(#46) Georgia, SEC, #54
(#45) Xavier, A10, #78
(#44) Washington St., P10, #34
(#43) Temple, A10, #33
(#42) Central Florida, CUSA, #56
(#41) Boston College, ACC, #52
(#40) Colorado, B12, #57
(#39) Iowa St., B12, #35
(#38) St. Mary's, WCC, #15
(#37) Arizona, P10, #21
(#36) Utah St., WAC, #43
(#35) Marquette, BE, #20

Xavier grabbing an at-large bid as the #78 team makes me shudder, but what can you do.

Trade Amare??

pdf - Player Trades that would Maximize Team Wins for the last half of the season.

corresponding messy excel sheet


EDIT: I forgot to to the 'trade' itself in Excel. It assumes that an average player would affect each team the same way. Amare is the biggest name on the 'both-sides-win' top 15 players. Not Kobe Bryant.

There are only a handful of players who, by trading, increase the value of both teams. This is an application of Simpson's Paradox.

The main math involved is roughly based on 'impact' value (which I have outlined earlier) in tandem with expected wins added, where I converted team efficiency margin into wins. (Every efficiency margin increase per 100 possessions is worth roughly 2.5 wins).

Overrated/Underrated NCAA teams

Overrated / Underrated Teams, ranked by average difference in Humans (avg AP & ESPN polls) and Computers (average LRMC and Ken Pomeroy rankings).

https://dl.dropbox.com/u/241759/overrated.html


Note: Teams not getting votes in either poll are given an arbitrary ranking of 45. Teams getting votes but not making the top 25 get a value of 36.

Theoretically Correct RPI!

Firstly, sorry for the lack of updates. I've had a lot of personal projects in other areas that I've been working on lately. But don't worry, MVP ratings, team ratings, etc, will all be coming back soon.

For now, here's my formula for the 'theoretically correct RPI.' The NCAA uses weights that are somewhat intuitive, but also provably arbitrary/random. So without further ado, let's examine how to get the most accurate "Real Win%" from three values: Win%, Opponents' Win%, and Opponents' Opponents' Win%. This will give us an over and underrated ranking, and tell us which teams, by the NCAA's logic alone, get the short end of the at-large-bid-stick.

Using the same math behind my simple adjusted rebound percent, we will work backwards to accurately represent the three variables involved in the RPI.

first:
B2(Opponents' Real Win%)=A1*B1/(1-A1-B1+2*A1*B1)
where A1=opponents' raw win% and B1=opponents' opponents raw win%

A3(Team's Real Win%) = A2*B2/(1-A2-B2+2*A2*B2)
where A2=raw team win%, and B2=opponents' real win%


I'll do some data mining to get this data officially for all NCAA teams soon, adjusted for home/away...skipping the first step in this equation makes for some strange results. One caveat of this formula is that all 0% and 100% teams remain that way (i.e. Kansas gets the same value as San Diego St.) More soon!

Followers

About Me

I wish my heart were as often large as my hands.