For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN
Wahoo!

I've just invented a formula for a Player's Efficiency Margin!



Now I need to get back to work.....

Best NBA players

I haven't had a chance to adjust for tempo, but here are my ten most valuable and ten best NBA players, according to how many points they boost their team's margin by per minute.

The first is simply how their team is overall affected, the second is their points minus their points allowed by whomever they are defending (all per-minute).


here tis:
http://dl.getdropbox.com/u/241759/nbastuff.html

Using the Four Factors (and Pace and FT%) to Estimate Point Margin/Efficiency Margin!

The 'Four Factors' are the keys to success in winning a basketball game. They are, generally put, how often you make your baskets, how often you get offensive rebounds, how often you get to the free throw line, and how often you turn the ball over (similarly, how often you let your opponents do the same things). The better these four areas are, the better your team will be. Kenpom.com lists them for each team on their game plan page (example here).

The total number of possessions, Free Throw Percent, and Field Goal Percent are also needed to calculate the actual score, and therefore are also important to winning a game, but we are only given Possessions from Kenpom's four factors data. (Free Throw Percent can be estimated by each teams' average).

My equations are CLOSE, but require a bit of alteration by viewing the regression lines between actual and predicted numbers....but check out the huge formula!

Here's a quick primer of definitions:

Poss=Possessions
FTR=Free throw Rate (FTA/FGA)
eFG%=Effective Field Goal Percentage
(FG%=Field Goal Percent)
OR%=Offensive Rebound percentage
TO%=Turnover percent
(TO=Turnovers)

The first thing we need to find is Free Throws Attempted! We can estimate it thusly:

FTA=(Poss-TO%*Poss)/((1/FTR)*(eFG%+(1-eFG%)*(1-OR%)+.475))

From this we can get:

FGA=FTA/FTR

I'll explain where this comes from later -- but now we have the estimated number of free throws attempted and field goals attempted.

Then you just calculate the final score in the following way:

Final Score=FTA*FT%+FGA*eFG%

After doing this for both teams, you can predict a point margin (or estimate one from a previous game).

To find the efficiency of a team, simply divide the final score by possessions played!


Now for the really nerdy part: WHY DOES THIS WORK??

Algebra!

No seriously, check it out:

Premises:
1) Possession change (the calculation for total possessions) can only happen in the following ways: when a team gets a defensive rebound, when the ball is turned over, when the ball goes in the hoop. Also, we estimate that .475 percent of the time a free throw is attempted, a possession ends.
2) We can represent Field Goal Attempt possession changes in the following way: FGA*(FG%+(1-FG%)*(1-OR%))

That is to say, when you make a shot (FG% means when the ball goes in) and when you miss it (1-FG%) and don't get an offensive rebound, (1-OR%) the other team gets the ball next.
3) We can therefore estimate possessions in the following way:

Possessions=FGA*(FG%+(1-FG%)*(1-OR%))+.475*FTA+TO

4) Unfortunately, the four factors do not offer us FG%, but a close number, eFG%, so we can turn the formula into this:

Possessions=FGA*(eFG%+(1-eFG%)*(1-OR%))+.475*FTA+(TO%*Poss)

5) Since we don't have FTA or FGA, we need to use FTR to get rid of one of these two variables to solve for the other. Let's try FGA.

SINCE FTR=FTA/FGA..........FGA=FTA/FTR

This gives us:

Possessions=FTA/FTR*(eFG%+(1-eFG%)*(1-OR%))+.475*FTA+(TO%*Poss)

6) Factor it and use algebra!

Possessions-(TO%*Poss)=FTA*((1/FTR)*(eFG%+(1-eFG%)*(1-OR%))+.475)
=>
(Possessions-(TO%*Poss))/((1/FTR)*(eFG%+(1-eFG%)*(1-OR%))+.475)=FTA

7) Hooray!



Now, why is this important??
For me, it helps us predict the final score MORE accurately. Kenpom only keeps adjusted stats for offensive and defensive efficiencies....however, this might not be a very accurate representation of how a team works. For example, if one team REALLY relies on not fouling teams (like Uconn did this season) to make up for a stinky factor (like Uconn did with not forcing turnovers), they are more likely to fare poorly against good teams that are good at drawing fouls (like Georgetown, Syracuse, and Michigan St.).

And so we move forward!

Fixing Kenpom.....

This year, Kenpom.com's stats weren't as accurate as I had hoped in predicting the tournament....

But fortunately, I have come up with a stat that fixes some of the errors....
The biggest two problems I found were:
1) that Memphis' best defenses (and Gonzaga's best offenses) came from absolutely crushing terrible teams. Beating Poop St. by 9000 or my Cat by 2390209 isn't exactly an important stat, kenpom.
2) Your Pyth is only adjusted by schedule for your opponents, and not your opponents' opponents. In this way, beating Gonzaga is worth more than beating North Carolina........

But I came up with a fairly good stat for #2! I used Kenpom's derivation for adjusting averages and added a step to it to add opponents' opponents. The new stat is:

Adjusted Offensive Efficiency=Raw Offensive Efficiency * Opponents' Opponents' Raw Offensive Efficiency / Opponents' Raw Defensive Efficiency

similarly,

Adjusted Defensive Efficiency=Raw Defensive Efficiency * Opponents' Opponents' Raw Defensive Efficiency / Opponents' Raw Offensive Efficiency



This is important because: if (for offense) you only adjust to your opponents' raw defensive efficiency, you assume that their raw defensive efficiency is the best measure of how they really play!

My prediction

mean prediction:

Carolina by 7

74 possessions,





EFG %



TO%



OR%



FTR



north carolina



50.94



17.07



32.37



42.68



michigan st.



48.26



20.69



37.03



32.48



North Carolina - 77% from FT
Michigan St - 71% from FT





In order for Michigan State to win (by two)

71 possessions,








EFG %



TO%



OR%



FTR



north carolina



48.9



17.75



31.08



40.97



michigan st.



50.19



19.86



38.51



33.78


North Carolina - 71% from FT
Michigan St. - 71% from FT

The Probability of Having a Good Run

probability of a good run

(((1-to%)*(1-nofga%)*FG%+(1-to%)*(1-nofga%)*(1-fg%)*or%fg%)*2*eFG%/100)^3

Not predicting the unpredictable.

I do not have any magical power that will tell you how the unpredictable things in this tournament will play out. I can't tell you what things outside of the norm are going to happen.

But I can tell you some things to watch out for.
One of those is consistency.

Consistency shows how often a team plays as expected. More accurately, you might call it how well Ken Pomeroy's stats (adjusted by me) can predict a team.



So here are the top 25 teams you shouldn't be surprised by when they do something crazy different than their ranking presupposes on Kenpom.com

(16 and 15 seeds not included)






team



standard deviation from expected point margin



1



cornell



12.96



2



western kentucky



12.59



3



louisville



12.55



4



michigan st.



12.41



5



north carolina



12.31



6



louisiana st.



12.14



7



arizona st.



12.11



8



west virginia



11.94



9



clemson



11.74



10



marquette



11.56



11



gonzaga



11.55



12



tennessee



11.35



13



kansas



11.28



14



akron



11.27



15



maryland



10.98



16



wake forest



10.74



17



michigan



10.73



18



missouri



10.63



19



northern iowa



10.34



20



boston college



10.28



21



syracuse



10.26



22



brigham young



10.22



23



mississippi st.



10.18



24



UCLA



10.17



25



illinois



10.08











Some things to note.

I ran about 5,000 simulations of the bracket today in Excel, to find who was most likely to win the championship (the math was a little too complicated to enter in, so I just made a simulation program).

Here is a short list of some important things to know: http://lazydrumhead.googlepages.com/simresults.html


And here are some other things to note:
1) North Carolina's bracket is ridiculously difficult (Gonzaga and Arizona State are underseeded, and Oklahoma also poses a big threat.
2) I guess you could say that Connecticut's bracket is ridiculously difficult, because Memphis is the overwhelming favorite to win the NC in general.

Here are my top-10 predicted closest games:


1. Marquette v. Missouri - 2nd round
2. ASU v. Oklahoma - Sweet 16
3. Syracuse v. ASU - 2nd round
4. Kansas v. WVU - 2nd round
5. FSU v. Wisconsin - 1st round
6. Louisville v. WVU - elite 8
7. Tennessee v. Oklahoma St. - 1st round
8. Memphis v. Pittsburgh - championship game
9. Washington v. Purdue - 2nd round
10. Gonzaga v. North Carolina - sweet 16

Info on The Bracket-Maker

In order to use the bracketmaker, you must have Microsoft Excel or Openoffice Calc.
To purchase it, click the "Buy" button up at the top of this blog, and you'll receive an email shortly with download instructions.

Here's a quick rundown of how the bracket-maker works:

The day has come!

My random-bracket generator is complete!

Five bucks!


Ok, now here's the picture you need to see:


Oh my goodness....

Joe Lunardi is picking the FIFTH-best conference to get the most NCAA tourney bids.

• Big Ten (7): Michigan State, Illinois, Purdue, Wisconsin, Minnesota, Michigan, Ohio State
• Big East (6): Pittsburgh, Connecticut, Villanova, Syracuse, Marquette, West Virginia
• ACC (6): North Carolina, Duke, Wake Forest, Clemson, Florida State, Boston College

And Penn State is only four teams off the bubble.


The Big Ten qualifiers just have a ton of losses and some random wins against BETTER teams (UCLA, Duke, Louisville).

Overrated/Underrated, continued.

I've been reading up on ESPN's "Giant Killer" stuff (I am using my 30-day trial for INsider) and they make some pretty good points. So I'll just auction off this information:

According to ESPN:
Florida State, Butler, and Xavier all have great potential to fall far.

and my, my, look who's predicted to knock off some giants:
San Diego St., VCU, Cleveland State

VCU, as we all know, owned Duke a couple years Back. And Cleveland State shocked the world (barely) this week as they clinched an automatic NCAA tournament bid by winning the Horizon League tournament, over highly-touted Butler.


While this is all well and good, none of these 3 giant killers are above the 50% stat that they often require to kill giants. So I'm just gonna delve into some more over and underrated territory.

I know I should be patient and wait for the brackets, but some important things are here:

Overrated:
Florida State:
Likely Seed: 5
Likely Opponents' Seeds: 12, 4
Kenpom Rating (Seed): #37 (10)

LSU:
Likely Seed: 6
Likely Opponents' Seeds: 11, 3
Kenpom Rating (Seed): #40 (10)

Utah St:
Likely Seed: 10
Likely Opponents' Seeds: 7
Kenpom Rating (Seed): #59 (15)

Dayton:
Likely Seed: 9
Likely Opponents' Seeds: 8
Kenpom Rating (Seed): #83 (21 aka 16)

Siena:
Likely Seed: 9
Likely Opponents' Seeds: 8
Kenpom Rating (Seed): #63 (16)

Underrated:

West Virginia (still):
Likely Seed: 6
Likely Opponents' Seeds: 6,3,2
Kenpom Rating (Seed): #8 (2)


Gonzaga:
Likely Seed: 4
Likely Opponents' Seeds: 5, 1, 2
Kenpom Rating (Seed): #4 (1)

Purdue:
Likely Seed: 5
Likely Opponents' Seeds: 12, 4, 1
Kenpom Rating (Seed): #16 (4)

San Diego St.:
Likely Seed: 11
Likely Opponents' Seeds: 6, 3
Kenpom Rating (Seed): #32 (8)

wahoo!

my bracket-maker is pretty much ready!


once selection sunday is over, I can put in the data and have it spit out random brackets!

Cool features:

1) I don't strictly use Kenpom's data. The data automatically adjusts here for teams I think he overrates or underrates, according to the regression line of their over/underperformance.
2) You can tell my program how random you want it to be. At 100% randomness, a team's chance of moving on is exactly equal to my predicted chance of them winning. For example, if Providence were playing Louisville, my program gives Providence a 28% chance of moving on. If Providence gets 'lucky,' they move on in the bracket.

But at 0% randomness, the better team always wins. Or, the team my system predicts to win, always moves on to the next round.

I'll be selling this thing for $5 via Paypal, and for free if you help me collect the data! :)

Weighted Goodness

Here's a pretty rough ranking of mine for the top teams in the country, based on the best 3 predictors of NCAA tournament performance: Adjusted Scoring Margin, the LRMC, and Kenpom.com

I can actually use Kenpom stats to generate a much more accurate Adjusted Scoring Margin but I decided to use ESPN insiders'


Anyways, here's Nathan's top 25 teams.

LAZY

Here are the top 10 teams from Kenpom.com, sorted by Provable Laziness

The definition of lazy here is:

"A lazy team stops playing defense when their offense is doing well, and vice versa."



Top 10 Kenpom Teams, Sorted by Laziness




(Correlation and slope of Actual-Expected Defensive efficiency and Actual-Expected Offensive efficiency)

(higher is lazier)





Provable Laziness Factor (slope x R value)
1 Pittsburgh 0.39
2 Connecticut 0.17
3 Memphis 0.12
4 Duke 0.11
5 UCLA 0.1
6 Kansas 0.02
7 West Virginia 0.02
8 North Carolina 0.01
9 Gonzaga 0
10 Louisville 0

Ups and Downs...

The under and overseeded.


Here's how I figger:
How good they really are: http://kenpom.com


I just subtract their 'kenpom seeding' from their actual predicted seeding.

Underseeded:

West Virginia:
Kenpom 'seed': 2
Predicted seed: 7
Seed difference: -5

Gonzaga:
Kenpom 'seed': 2
Predicted seed: 6
Seed difference: -4

Michigan:
Kenpom 'seed': 15
Predicted seed: 12
Seed difference: -5

Brigham Young:
Kenpom 'seed': 3
Predicted seed: 8
Seed difference: -5


Overseeded:

Oklahoma:
Kenpom 'seed': 4
Predicted seed: 1
Seed difference: 3

LSU:
Kenpom 'seed': 10
Predicted seed: 6
Seed difference: 4

Dayton:
Kenpom 'seed': 21 (!)
Predicted seed: 9

Fifteen.

Fifteen = The number of teams predicted to finish within 6 points of Memphis (the #1 Kenpom team).

Fourteen = The number of teams predicted to finish within 6 points or better than North Carolina (the #2 Kenpom team).



Carolina is my favorite to win it all.

how often does a team do better after doing worse, and vice versa

yes, I have made a stat for this (% of sinusoidality)

for the top 5 teams

Memphis - 76%
North Carolina - 79%
Pittsburgh - 63%
Connecticut - 64%
Duke - 67%

Carolina is the most apt to perform worse after performing well. Etc.


Upset Watch!

"Upset" watch for today.
(Games that Kenpom predicts to be the opposite of what the polls say---just click any team to see the predicted score of their future games).

-Unranked Davidson over #22 Butler, 68-66
-Unranked Georgetown over #11, 73-68
-Unranked USC over #19 Washington, 72-71
-Unranked Texas over #2 Oklahoma, 73-72 (hehe!!)
-Unranked St. Marys over #23 Utah St, 67-65


Let's go Texas!

Followers

About Me

I wish my heart were as often large as my hands.