For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

The One-Seeds

I told my friend Stephen that Kentucky will not be a 1-seed come tournament time.

That was a pretty dumb thing to say

I picked the top few teams that I thought might make #1 seeds, and did some analysis from their stats from Kenpom.com.

Anyways, here's my #1 seed bracketology: http://spreadsheets.google.com/pub?key=tdf4HIaf_vWtQhoNCxjHLdQ&single=true&gid=0&output=html

Time Left on Shot Clock

By doing some simple multiplication and division of stats from kenpom.com, we can estimate the mean/median/expected number of seconds left on the shot clock when a team's possession will end.


I expect a high standard deviation of this number for most teams, but it is interesting to look at.

Here's the results (internet explorer might be required, hopefully not)

Adjusted Player Offensive Ratings

I adjusted Ken Pomeroy's 100 most efficient college players (with a minimum of 40% minutes played) for opponents' quality of defense.

The results are here.

(EDIT: the Usage% represents how much a teams' possessions a player ends up 'using' via shots/turnovers/etc. players under 20% are below average in usage. I will soon adjust only those who are above the 20% mark)

Texas v. UNC

Ken Pomeroy's Stats predict Carolina to lose to Texas by 20 points. Here's my basic info you need to know on these 2 teams:

1) Texas' point margin vs. predicted has a standard deviation of about 8.07 points
2) North Carolina's point margin vs. predicted has a standard deviation of about 9.86

this gives us an average of 8.97 for both teams
which means that there is, according to the normal distribution:

-a 68.2% chance that Carolina's final margin is between {-11 and -29}
-a 95.4% chance that Carolina's final margin is between {-2 and -38}


Simply by using standard deviations, Carolina has a 1.29% chance of winning, less than Ken Pomeroy's estimation (using the Log5 method) of 5%

Nathan's Statistical Rankings

Here is a link to my statistical rating of college basketball teams, according to my best possible model given the stats I currently have (which is similar in nature to the LRMC model, and similar in appearance to Sagarin ratings).


http://tinyurl.com/nathansrankings

Hopefully in January I will have a model adjusted including diminishing returns, consistency, and 'game point margin' which accurately reflects the 'real score' of a game, rather than one that was altered in the last 30 seconds to a game-insignificant-degree. (To do this, we will use Bill James' "time statistically over" stat from Statsheet.com).

UNC's terrible 2nd halves

Carolina is beating their opponents by .36 points per possession in the first half of their games.
But in the second half, they average -.04 points per possession.

Not good!

Oliver-Adjusted PlusMinus

Rather than using the standard model for plus-minus, I am going to start using one of my own, which I will test on data I get on the homeschool girls team I do stats on.

Currently, the players have their own value, based on (Points Produced - Points 'Allowed') / Possessions Played In, (which I call 'net') based on their Dean Oliver ORTG and DRTG stats.

I don't strictly use the DRTG version points 'allowed,' which is =(1-stop%)*DptsPerScPoss*Dposs.
Dean Oliver admits the weaknesses of his formula, so I average this with the stats on "Defensive points allowed" that the girls' team keeps, based on how many points they were the primary reason for allowing (before converting this number to a per 100 possession number).

This "net" stat is the one I primarily use to evaluate players in practices, but in games I also use the good ol' plus-minus stat. For my purposes, I do (PlusMinus ONcourt/poss played) - (Plusminus Oncourt/poss not played) to give a sort of "teamwork" value. The plusminus stat is prone to have lots of error on its own. Especially in the case of how I measure it, it is expected that there would be a lot of error for players who play significantly low or high minutes. That is to say, those who play closer to 0% and 100% of the minutes than 50% have less samples of EITHER possessions played or possessions not played. Furthermore, plusminus does not account for improvement or decrease in team that have relatively little to do with the player in question.

Knowing the biases of both of these formulas, I thought I might use them in tandem with On-Court offensive efficiency and defensive efficiency (which I actually have for the girls team now). Furthermore, the coefficients used by people like Dan Rosenbaum in statistical plus-minus are good things to use, but using Dean Oliver's ORTG and DRTG as the modifying numbers in a formula are likely to be much more representative of a player's statistical throughput.

So, without further ado, I would like to present my method for finding the Oliver-Adjusted PlusMinus.

(Of note: I have data on players that simple box-scores absolutely do not give, and that would be relatively time-consuming to extract from play-by-play. This is: the players on the court at any given time, and the efficiency of that 5-(wo)man group has offensively and defensively)

All of this data will be extracted on a player-by-player basis from each separate substitution (i.e., each individual 5-man team. duplicates must also be taken care of).

First, we must estimate what Kevin Pelton calls the 'fudge factor' (ff) to get a predicted Offensive efficiency. We first sum all the players' on-court Usage% from the game in question; this is the 'fudge factor'. This sum is now our divisor: we now divide each player's game usage % by the fudge factor, which now gives us a sum Usg% of One. Then each player is assigned a predicted points produced, by multiplying their Usg% by their game ORTG.

For each player, we can get a predicted teammates' points produced simply by summing their teammates' (ORTG*(usg%/ff)*possessions played until next substitution. This gives us PREDICTEDteammatePProd, or, P

Then also for each individual player, we subtract their personal predicted points produced (ORTG*(usg%/ff)*possessions played until next substitution) from the actual amount of points produced in the current lineup, to give us an estimated ACTUAL number of teammates' points produced. This gives us eACTUALteammatePtsProduced or, A


By then doing P/A, we have a coefficient that estimates how a player benefits their team offensively, in effect, an adjustment factor for their teammates' ORTG, which we can call OTC. We can do the same with defensive stats (except usage is stuck at the estimated 20%), to get a coefficient for effect on teammates' DRTG, which we can call DTC.

I foresee some problems that will come about from this, and both of these will likely have to be adjusted from the obvious future stat Oc and Dc, which is how much a player's teammates affected their own ORTG and DRTG.

After bridging this unfortunately difficult gap, we can then combine a player's personal [(Points Produced / Oc) - (Points Allowed / Dc) + (Average Teammates' Points Produced * OTc) - (Average Teammates' Points Allowed * DTc) ]/(Possessions Played in*100) to get an adjusted PlusMinus.

Hopefully I can get there soon...

!

UNC vs FIU
based on simple stats predicted by Accuscore.com

efficiencies:

FIU: 77
UNC: 113

Theoretically Correct RPI, Part I

The Rating Percentage Index, or 'RPI' is College Basketball's tamer BCS computer system (i.e. - it's unfair). By weighting three simple numbers, it assigns each team a score.

RPI = (.25 * Team's Winning %) + (.5 * Opponents' Winning %) + (.25 * Opponents' Opponents' Win%)

However, a better methodology would be to reverse-engineer Bill James' Log5 method and adjust for a teams' schedule.

The simpler (non-adjusted) version looks like this:
WPct = .500 + A - B (http://www.diamond-mind.com/articles/playoff2002.htm) which means: Team's Win% = .5 + Real Win% - Opponents' Real Win%

A little explanation is required here: Bill James' values for A and B are based on how often they beat teams in general. That is to say, if a team has played EVERY team on equal footing (a perfectly adjusted strength of schedule).
the number we want is "Real Win%," so with some algebra, we get: Rwin%=Twin%-.5+O.Rwin% where O.Rwin% roughly equals: O.Rwin%=OTwin%-.5+O.OTwin% (which we get by estimating that the "O.OTwin%" or "Opponents' Opponents' Team Winning%" is roughly equal to their "Real" win %)
Therefore, a team's "real" win% roughly equals:
Rwin%=Twin%-.5+(OTwin%-.5+O.OTwin%)
=Twin%+O.Twin%+O.OTwin%-1  This shows us that a teams' win%, opponents' win%, & opponents' opponents' win% are all roughly EQUALLY weighted in figuring out their 'real' value.
So a better simple RPI would be  RPI= Team's Winning % + Opponents' Winning + Opponents' Opponents' Win% -1


In the next post, we will examine the 'normally-adjusted' version.

Fixing the current models...

Ken Pomeroy and the LRMC (logistic regression-markov chain) models for predicting NCAA winners are both incredibly accurate, but they both have their own flaws -- for similar reasons.


Ken Pomeroy's measure eliminates pace, but this takes away a good portion of basketball's psychological strategy of gaining a large lead. For example, at halftime, a team that is up by 10 points could have either been very efficient and slow (which Kenpom would measure as good) or less efficient but very fast - (which Kenpom does not measure as a good thing). While it is not GOOD to be inefficient, the barrier of ten points is still in the way for the losing team, despite the inefficiency. The comeback is still required, and is a psychological barrier.
Furthermore, Ken Pomeroy (using Bill James' pythagorean expectation formula instead of Dean Oliver's formula including standard deviation) asserts that the head-to-head better team should have the higher winning percentage. This is a false assumption even when all anamolies are taken care of. The fact of the matter is -- whichever team is most efficient is most likely to win in a competition; but the slower the team is, the more likely they are to lose to a worse team(the smaller the point barrier to overcome, the more likely that the worse team can overcome it). This is indirectly mentioned by Dean Oliver in Basketball on Paper in his section on standard deviation, variance, and covariance, etc.
The LRMC, on the other hand, only uses point margin in its calculations. While this effectively works with the idea of a 'point barrier,' it might not work as well in head-to-head, as we know, greater efficiency will always lead to more points, even if the margin is small. Furthermore, as we know, teams that are slow might also be ranked lowly on the LRMC, which would ignore the higher percent of wins that slow teams get against better teams.

Thirdly - neither of these formulas take consistency into account.

So to make the best model, we must use one that includes consistency, but combines the idea of a 'point barrier' and efficiency. It must also be two rankings: head to head and overall win%.

Win%=????

I finally bought Dean Oliver's book, "Basketball on Paper" -- and I think I am onto a breakthrough!

Dean Oliver actually uses a statistical formula based on standard deviation to prove the percent chance that a team will win a game! Unfortunately, Dean has not done nearly as much as Ken Pomeroy in terms of figuring out how to adjust statistics for quality of opponents. So currently I am working on a system that turns Kenpom's predictions into chance of win.

Dean Oliver's formula is as follows:

Win%=NormsDist(Point Margin/Standard Deviation of Point Margin)

so if I perhaps used Kenpom's predictions--

Chance of Win%=Normsdist(Predicted Point Margin/Standard Deviation of Actual Minus Predicted Point Margins of both Teams)


This gives us a good number (for example, 2008-2009 North Carolina posts an on-season 96.5%, whereas the Log5 formula puts them at 97.7%) -- but I am still unable to adjust for opponents' inconsistency. Oh well -- it's definitely a step in the right direction!

Fixing errors & improving accuracy

Much of my conjecture over my past few post has been flawed, but at least it's an attempt in growing.

Here, I will lay out the foundations of my current, modified EMA (or Efficiency Margin Added).
This shows each player's increase in the points per possession his team scores on the floor.

First, I gotta define a few things:

ON = PlusMinus (+/-) while a player is on the court
OFF = PlusMinus (+/-) while a player is off the court
Net = The points a player scores, minus the points his man scores
Min%=percent of game a player plays in, or percent of minutes played

here are two basic estimations for how much one player helps his teammates (which I call TMA, or Teammate Margin Added). This is a factor based on their substitution, i.e. how many points a team stands to benefit by a player being in (how good they are on the court minus how good they are off the court).

TMA1=2 x (Teammates' Net While On Court - Teammates' Average Net)
^for this one, we estimate that they add just as much as they do on the court as their team loses when they are off the court

and
TMA2=Teammates' Net While On Court - (4/5 x OFF)
^i.e. their four teammates make up roughly 4/5 of the point margin while a player is off the court



So we get our estimated Teammate Margin Added (eTMA) by averaging these two estimates.

now, we need to find out what a players' Net is, adjusted for how good his teammates are (adjusted Net, or aNet)

aNet=Net-(1/4) x (All Teammates' Total eTMA)x Min%

The 1/4 multiplier is because each player helps a sum total of four teammates while on the court.

Then, a players' overall Point Margin Added (PMA) simply adds our two estimates:

PMA=eTMA+aNet

and per possession, we calculate EMA as

EMA=PMA/(Team Possessions Played x Min%)

Top 25 NBA Players

 

Nathan's Most Efficient Basketball Players

 
      
      
 

THA/48=Teammate Help Added Every 48 Minutes Played

 

PHA/48=Personal Help Added Every 48 Minutes Played

      
      
      
      
      
 

Player #

 Player

THA/48

PHA/48

THA+PHA/48

1

 MIA

 Wade

1.49

23.19

24.68

2

 CLE

 James

2.00

22.59

24.59

3

 BOS

 Garnett

1.36

20.76

22.13

4

 NOH

 Paul

3.98

12.85

16.83

5

 PHO

 Nash

1.95

14.33

16.28

6

 UTA

 Kirilenko

-1.26

16.25

14.99

7

 LAL

 Odom

8.55

4.81

13.36

8

 PHO

 Stoudemire

-5.81

18.53

12.72

9

 PHO

 O'Neal

-2.88

15.10

12.22

10

 HOU

 Yao

2.48

9.19

11.67

11

 CHI

 Gordon

0.59

10.97

11.56

12

 ORL

 Howard

-2.23

13.58

11.36

13

 CLE

 Ilgauskas

6.98

3.80

10.78

14

 PHO

 Hill

2.52

8.22

10.74

15

 DET

 Hamilton

-2.79

13.48

10.69

16

 LAL

 Bryant

0.21

9.99

10.21

17

 DET

 Wallace

6.69

3.25

9.95

18

 BOS

 R.Allen

1.99

6.98

8.97

19

 CHI

 Noah

5.34

3.58

8.92

20

 LAL

 Bynum

-1.07

9.66

8.59

21

 POR

 Roy

0.33

8.14

8.47

22

 IND

 Granger

-1.82

10.29

8.47

23

 UTA

 Millsap

2.49

5.69

8.19

24

 PHI

 Iguodala

4.90

3.19

8.09

25

 MIL

 Sessions

-4.55

12.38

7.83

Point Margin Above Replacement Player, or PMARP, and its adjusted counterpart, EMARP


FM=Final Margin (of your team)
Net=points scored - points scored by your man
Min%=percent of minutes played
+/-=PlusMinus=point margin change while you're on the floor
OFF=point margin change while you're off the floor

2 categories:

1) Teammate Help Added=THA
THA=(1-Min%)x(FM-Net)-(4/5)x(OFF)
TS THA=Team Sum THA=Combined THA of entire team

2) Personal Help Added=PHA
PHA=Net-(TS THA-THA)*Min%

SHA=Sum Help Added=THA+PHA

SHAPP=Sum Help Added Per Possession=(THA+PHA)/(Team Possessions Played x Minute%)

GameAdjusted SHAPP=SHAPP-(Team Efficiency Margin/5)

Player's Efficiency Margin, Explained

Here's the formula, with an explanation of what it is and how it works.



What it is:
This represents how many points a player is worth (how much they boost the point margin by) when they are on the floor for one possession per team (i.e. -- your team gets the ball, and your opponents' team get's the ball).

How it works:
This is the representation of two measures: the first is how many points a player scores, minus how many points their man scores (The NetTangibles value). This is an obvious good measure for how good a player might be overall (if they can score 20 points, they aren't worth much if they let their man score 50 every game).

The second is the more complicated one: it estimates how much they help their teammates (their intangibles). We take the expected help based on how many points their teammates and man's teammates scored (times the % of minutes the player played) combined with their NetTangibles, and subtract this from how much help the player actually gave (their PlusMinus value). This helps to accurately record how much a player's presence on the court actually benefitted their teammates above when they were off the court. This value will be higher for good point guards, worse for players that turn the ball over a lot, etcetera.

Then we simply estimate how many possessions the player was on the court for (a fairly accurate measure, Google 'estimating possessions played') and divide.

Edit: BUT ALAS, there is one problem with this formula. As a players' minute% reach 100%, the only value that comes into question is their Net Points (that is to say, their intangibles become zero). So our BEST GUESS for how much a player helps a team that also plays every minute of the game is simply their Actual Plus/Minus (PMa). So our best guess is a weight between the prior formula and their amount of minutes played:


Wahoo!

I've just invented a formula for a Player's Efficiency Margin!



Now I need to get back to work.....

Best NBA players

I haven't had a chance to adjust for tempo, but here are my ten most valuable and ten best NBA players, according to how many points they boost their team's margin by per minute.

The first is simply how their team is overall affected, the second is their points minus their points allowed by whomever they are defending (all per-minute).


here tis:
http://dl.getdropbox.com/u/241759/nbastuff.html

Using the Four Factors (and Pace and FT%) to Estimate Point Margin/Efficiency Margin!

The 'Four Factors' are the keys to success in winning a basketball game. They are, generally put, how often you make your baskets, how often you get offensive rebounds, how often you get to the free throw line, and how often you turn the ball over (similarly, how often you let your opponents do the same things). The better these four areas are, the better your team will be. Kenpom.com lists them for each team on their game plan page (example here).

The total number of possessions, Free Throw Percent, and Field Goal Percent are also needed to calculate the actual score, and therefore are also important to winning a game, but we are only given Possessions from Kenpom's four factors data. (Free Throw Percent can be estimated by each teams' average).

My equations are CLOSE, but require a bit of alteration by viewing the regression lines between actual and predicted numbers....but check out the huge formula!

Here's a quick primer of definitions:

Poss=Possessions
FTR=Free throw Rate (FTA/FGA)
eFG%=Effective Field Goal Percentage
(FG%=Field Goal Percent)
OR%=Offensive Rebound percentage
TO%=Turnover percent
(TO=Turnovers)

The first thing we need to find is Free Throws Attempted! We can estimate it thusly:

FTA=(Poss-TO%*Poss)/((1/FTR)*(eFG%+(1-eFG%)*(1-OR%)+.475))

From this we can get:

FGA=FTA/FTR

I'll explain where this comes from later -- but now we have the estimated number of free throws attempted and field goals attempted.

Then you just calculate the final score in the following way:

Final Score=FTA*FT%+FGA*eFG%

After doing this for both teams, you can predict a point margin (or estimate one from a previous game).

To find the efficiency of a team, simply divide the final score by possessions played!


Now for the really nerdy part: WHY DOES THIS WORK??

Algebra!

No seriously, check it out:

Premises:
1) Possession change (the calculation for total possessions) can only happen in the following ways: when a team gets a defensive rebound, when the ball is turned over, when the ball goes in the hoop. Also, we estimate that .475 percent of the time a free throw is attempted, a possession ends.
2) We can represent Field Goal Attempt possession changes in the following way: FGA*(FG%+(1-FG%)*(1-OR%))

That is to say, when you make a shot (FG% means when the ball goes in) and when you miss it (1-FG%) and don't get an offensive rebound, (1-OR%) the other team gets the ball next.
3) We can therefore estimate possessions in the following way:

Possessions=FGA*(FG%+(1-FG%)*(1-OR%))+.475*FTA+TO

4) Unfortunately, the four factors do not offer us FG%, but a close number, eFG%, so we can turn the formula into this:

Possessions=FGA*(eFG%+(1-eFG%)*(1-OR%))+.475*FTA+(TO%*Poss)

5) Since we don't have FTA or FGA, we need to use FTR to get rid of one of these two variables to solve for the other. Let's try FGA.

SINCE FTR=FTA/FGA..........FGA=FTA/FTR

This gives us:

Possessions=FTA/FTR*(eFG%+(1-eFG%)*(1-OR%))+.475*FTA+(TO%*Poss)

6) Factor it and use algebra!

Possessions-(TO%*Poss)=FTA*((1/FTR)*(eFG%+(1-eFG%)*(1-OR%))+.475)
=>
(Possessions-(TO%*Poss))/((1/FTR)*(eFG%+(1-eFG%)*(1-OR%))+.475)=FTA

7) Hooray!



Now, why is this important??
For me, it helps us predict the final score MORE accurately. Kenpom only keeps adjusted stats for offensive and defensive efficiencies....however, this might not be a very accurate representation of how a team works. For example, if one team REALLY relies on not fouling teams (like Uconn did this season) to make up for a stinky factor (like Uconn did with not forcing turnovers), they are more likely to fare poorly against good teams that are good at drawing fouls (like Georgetown, Syracuse, and Michigan St.).

And so we move forward!

Fixing Kenpom.....

This year, Kenpom.com's stats weren't as accurate as I had hoped in predicting the tournament....

But fortunately, I have come up with a stat that fixes some of the errors....
The biggest two problems I found were:
1) that Memphis' best defenses (and Gonzaga's best offenses) came from absolutely crushing terrible teams. Beating Poop St. by 9000 or my Cat by 2390209 isn't exactly an important stat, kenpom.
2) Your Pyth is only adjusted by schedule for your opponents, and not your opponents' opponents. In this way, beating Gonzaga is worth more than beating North Carolina........

But I came up with a fairly good stat for #2! I used Kenpom's derivation for adjusting averages and added a step to it to add opponents' opponents. The new stat is:

Adjusted Offensive Efficiency=Raw Offensive Efficiency * Opponents' Opponents' Raw Offensive Efficiency / Opponents' Raw Defensive Efficiency

similarly,

Adjusted Defensive Efficiency=Raw Defensive Efficiency * Opponents' Opponents' Raw Defensive Efficiency / Opponents' Raw Offensive Efficiency



This is important because: if (for offense) you only adjust to your opponents' raw defensive efficiency, you assume that their raw defensive efficiency is the best measure of how they really play!

My prediction

mean prediction:

Carolina by 7

74 possessions,





EFG %



TO%



OR%



FTR



north carolina



50.94



17.07



32.37



42.68



michigan st.



48.26



20.69



37.03



32.48



North Carolina - 77% from FT
Michigan St - 71% from FT





In order for Michigan State to win (by two)

71 possessions,








EFG %



TO%



OR%



FTR



north carolina



48.9



17.75



31.08



40.97



michigan st.



50.19



19.86



38.51



33.78


North Carolina - 71% from FT
Michigan St. - 71% from FT

The Probability of Having a Good Run

probability of a good run

(((1-to%)*(1-nofga%)*FG%+(1-to%)*(1-nofga%)*(1-fg%)*or%fg%)*2*eFG%/100)^3

Not predicting the unpredictable.

I do not have any magical power that will tell you how the unpredictable things in this tournament will play out. I can't tell you what things outside of the norm are going to happen.

But I can tell you some things to watch out for.
One of those is consistency.

Consistency shows how often a team plays as expected. More accurately, you might call it how well Ken Pomeroy's stats (adjusted by me) can predict a team.



So here are the top 25 teams you shouldn't be surprised by when they do something crazy different than their ranking presupposes on Kenpom.com

(16 and 15 seeds not included)






team



standard deviation from expected point margin



1



cornell



12.96



2



western kentucky



12.59



3



louisville



12.55



4



michigan st.



12.41



5



north carolina



12.31



6



louisiana st.



12.14



7



arizona st.



12.11



8



west virginia



11.94



9



clemson



11.74



10



marquette



11.56



11



gonzaga



11.55



12



tennessee



11.35



13



kansas



11.28



14



akron



11.27



15



maryland



10.98



16



wake forest



10.74



17



michigan



10.73



18



missouri



10.63



19



northern iowa



10.34



20



boston college



10.28



21



syracuse



10.26



22



brigham young



10.22



23



mississippi st.



10.18



24



UCLA



10.17



25



illinois



10.08











Some things to note.

I ran about 5,000 simulations of the bracket today in Excel, to find who was most likely to win the championship (the math was a little too complicated to enter in, so I just made a simulation program).

Here is a short list of some important things to know: http://lazydrumhead.googlepages.com/simresults.html


And here are some other things to note:
1) North Carolina's bracket is ridiculously difficult (Gonzaga and Arizona State are underseeded, and Oklahoma also poses a big threat.
2) I guess you could say that Connecticut's bracket is ridiculously difficult, because Memphis is the overwhelming favorite to win the NC in general.

Here are my top-10 predicted closest games:


1. Marquette v. Missouri - 2nd round
2. ASU v. Oklahoma - Sweet 16
3. Syracuse v. ASU - 2nd round
4. Kansas v. WVU - 2nd round
5. FSU v. Wisconsin - 1st round
6. Louisville v. WVU - elite 8
7. Tennessee v. Oklahoma St. - 1st round
8. Memphis v. Pittsburgh - championship game
9. Washington v. Purdue - 2nd round
10. Gonzaga v. North Carolina - sweet 16

Info on The Bracket-Maker

In order to use the bracketmaker, you must have Microsoft Excel or Openoffice Calc.
To purchase it, click the "Buy" button up at the top of this blog, and you'll receive an email shortly with download instructions.

Here's a quick rundown of how the bracket-maker works:

The day has come!

My random-bracket generator is complete!

Five bucks!


Ok, now here's the picture you need to see:


Oh my goodness....

Joe Lunardi is picking the FIFTH-best conference to get the most NCAA tourney bids.

• Big Ten (7): Michigan State, Illinois, Purdue, Wisconsin, Minnesota, Michigan, Ohio State
• Big East (6): Pittsburgh, Connecticut, Villanova, Syracuse, Marquette, West Virginia
• ACC (6): North Carolina, Duke, Wake Forest, Clemson, Florida State, Boston College

And Penn State is only four teams off the bubble.


The Big Ten qualifiers just have a ton of losses and some random wins against BETTER teams (UCLA, Duke, Louisville).

Overrated/Underrated, continued.

I've been reading up on ESPN's "Giant Killer" stuff (I am using my 30-day trial for INsider) and they make some pretty good points. So I'll just auction off this information:

According to ESPN:
Florida State, Butler, and Xavier all have great potential to fall far.

and my, my, look who's predicted to knock off some giants:
San Diego St., VCU, Cleveland State

VCU, as we all know, owned Duke a couple years Back. And Cleveland State shocked the world (barely) this week as they clinched an automatic NCAA tournament bid by winning the Horizon League tournament, over highly-touted Butler.


While this is all well and good, none of these 3 giant killers are above the 50% stat that they often require to kill giants. So I'm just gonna delve into some more over and underrated territory.

I know I should be patient and wait for the brackets, but some important things are here:

Overrated:
Florida State:
Likely Seed: 5
Likely Opponents' Seeds: 12, 4
Kenpom Rating (Seed): #37 (10)

LSU:
Likely Seed: 6
Likely Opponents' Seeds: 11, 3
Kenpom Rating (Seed): #40 (10)

Utah St:
Likely Seed: 10
Likely Opponents' Seeds: 7
Kenpom Rating (Seed): #59 (15)

Dayton:
Likely Seed: 9
Likely Opponents' Seeds: 8
Kenpom Rating (Seed): #83 (21 aka 16)

Siena:
Likely Seed: 9
Likely Opponents' Seeds: 8
Kenpom Rating (Seed): #63 (16)

Underrated:

West Virginia (still):
Likely Seed: 6
Likely Opponents' Seeds: 6,3,2
Kenpom Rating (Seed): #8 (2)


Gonzaga:
Likely Seed: 4
Likely Opponents' Seeds: 5, 1, 2
Kenpom Rating (Seed): #4 (1)

Purdue:
Likely Seed: 5
Likely Opponents' Seeds: 12, 4, 1
Kenpom Rating (Seed): #16 (4)

San Diego St.:
Likely Seed: 11
Likely Opponents' Seeds: 6, 3
Kenpom Rating (Seed): #32 (8)

wahoo!

my bracket-maker is pretty much ready!


once selection sunday is over, I can put in the data and have it spit out random brackets!

Cool features:

1) I don't strictly use Kenpom's data. The data automatically adjusts here for teams I think he overrates or underrates, according to the regression line of their over/underperformance.
2) You can tell my program how random you want it to be. At 100% randomness, a team's chance of moving on is exactly equal to my predicted chance of them winning. For example, if Providence were playing Louisville, my program gives Providence a 28% chance of moving on. If Providence gets 'lucky,' they move on in the bracket.

But at 0% randomness, the better team always wins. Or, the team my system predicts to win, always moves on to the next round.

I'll be selling this thing for $5 via Paypal, and for free if you help me collect the data! :)

Weighted Goodness

Here's a pretty rough ranking of mine for the top teams in the country, based on the best 3 predictors of NCAA tournament performance: Adjusted Scoring Margin, the LRMC, and Kenpom.com

I can actually use Kenpom stats to generate a much more accurate Adjusted Scoring Margin but I decided to use ESPN insiders'


Anyways, here's Nathan's top 25 teams.

LAZY

Here are the top 10 teams from Kenpom.com, sorted by Provable Laziness

The definition of lazy here is:

"A lazy team stops playing defense when their offense is doing well, and vice versa."



Top 10 Kenpom Teams, Sorted by Laziness




(Correlation and slope of Actual-Expected Defensive efficiency and Actual-Expected Offensive efficiency)

(higher is lazier)





Provable Laziness Factor (slope x R value)
1 Pittsburgh 0.39
2 Connecticut 0.17
3 Memphis 0.12
4 Duke 0.11
5 UCLA 0.1
6 Kansas 0.02
7 West Virginia 0.02
8 North Carolina 0.01
9 Gonzaga 0
10 Louisville 0

Ups and Downs...

The under and overseeded.


Here's how I figger:
How good they really are: http://kenpom.com


I just subtract their 'kenpom seeding' from their actual predicted seeding.

Underseeded:

West Virginia:
Kenpom 'seed': 2
Predicted seed: 7
Seed difference: -5

Gonzaga:
Kenpom 'seed': 2
Predicted seed: 6
Seed difference: -4

Michigan:
Kenpom 'seed': 15
Predicted seed: 12
Seed difference: -5

Brigham Young:
Kenpom 'seed': 3
Predicted seed: 8
Seed difference: -5


Overseeded:

Oklahoma:
Kenpom 'seed': 4
Predicted seed: 1
Seed difference: 3

LSU:
Kenpom 'seed': 10
Predicted seed: 6
Seed difference: 4

Dayton:
Kenpom 'seed': 21 (!)
Predicted seed: 9

Followers

Blog Archive

About Me

I wish my heart were as often large as my hands.