For all your fancy-pants statistical needs.

Praise for The Basketball Distribution:

"...confusing." - CBS
"...quite the pun master." - ESPN

ELRS

So....some people say the 3-point shot is inconsistent. I tend to agree, at least a little bit.

So I have come up with a rating system that shows how effectively a team plays the low-risk 2 pointer.

Check it out here.

Selection Sunday Predictions....

I know this is a long ways off, but I enjoy using kenpom.com's expected record (often vs. realtimerpi.com's rankings) to figure out who might win, and what they'll be seeded. I feel like this year, the Big East will suffer from ACC's illness last year: having too tough of a conference schedule.

Bubblers: -these will be up in the air-

Kansas St.
Ohio St.
Utah
Georgetown*
Syracuse
Southern Cal

Lucky At-Larges:

Virginia Military
Utah St.
Siena

Snubs:

Notre Dame


Underseeded/Sleepers:

Gonzaga
Georgetown*
West Virginia

Overseeded:

Connecticut (barely)
Oklahoma (barely)
Xavier (barely)
Clemson
Michigan St.



This is very preliminary, and humorously far-removed from March, but I would like to know how close to this the brackets tourn (pun accidental) out.



*=couldn't decide

WFU v. Duke, Part II: Weaknesses

Duke's Weaknesses:
#1: Shot defense

This may be counter-intuitive, as Duke has one of the best AVERAGES for defensive efficiency in the nation (81.2 points / 100 possesions, #2). However, examine this:


Duke's performance on guarding the shot (or equally, how well their opponents shoot from the floor) has an extremely high correlation (as far as basketball stats go) to their final score. So, the BEST way to overcome Duke's efficiency is through good 3-point and 2-point shooting. Similarly, but not as highly correlated (R^2=.20), is an opponents' defensive rebounding %, or:

#2: Duke's Offensive Rebounding


Also, if Kyle Singler gets in foul trouble, Duke loses their leader in points, assists, rebounds, and steals.

 Nothing Wake Forest does correlates as well as Duke's 'weaknesses,' but a couple of things stand out:

1) Beating them to the foul line is a plus.
The Virginia Tech game, Wake's only loss, was the most lopsided Free-Throw Rate game I have seen with any good team. While Wake had a miserable 18.5, VT had an insane 74! Both of these were by far the worst on Wake's schedule. In general, this is their highest-correlated 'changer of expected point margin'.

That's about it.





Duke v. WFU, UPDATED AGAIN

COMPLETELY UPDATED: WOOPS. I thought the game was at Duke. And I had entered the pace incorrectly.

Accuscore: Wake, 78-76, (60%-40%)
Kenpom: Duke, 74-71, (63%-47%)
Nathan: Wake, 75-74, (49%-51%) -- I know, it's backwards, just deal with it.

16-8-4-1

Based on the predicted NCAA tournament bracket by Joe Lunardi (http://sports.espn.go.com/ncb/bracketology)

Using Kenpom.com's data. Let's just say Pittsburgh has it rough.

16:  Wake Forest-Georgetown, Memphis-Oklahoma, Pitt-Gonzaga, Xavier-UNC, Duke-West Virginia, Purdue-Louisville, Connecticut-Arizona St, Kansas-Michigan State

8: Wake Forest-Memphis, Pittsburgh-UNC, Duke-Louisville, Arizona State-Kansas

4: Duke-ASU, UNC-Memphis

2: Duke-UNC

1: Duke


That would be a crazy tournament. There were several games that had teams ranked one spot over the other in these matchups.

One problem with kenpom.com

Is that teams play towards a specific point margin, not a specific efficiency margin, and especially not a specific point margin. A good example of this is the Notre Dame - UNC game

UNC's expected defensive performance against Notre Dame should have yielded 100.2 points / 100 possessions. Instead, they allowed 119.8 points / 100 possessions. This is a factor for why UNC's defense is as low (12th) as it is. The problem was, UNC didn't play defense simply because they didn't have to! If your offense has a very good day, and you are up by 20 or 30, there isn't a great need for playing your best defense. While this isn't smart when you're up 5 or 10, when you're up 20 or 30, teams can play more relaxed (reducing injuries) and can allow bench-players to have more time (giving them experience).

As it just so happens, UNC was having an insane offensive day: 140.5 points per 100 possessions! UNC began dominating Notre Dame by being up about 18 or so. At one point, the margin got down to 11 or so, but UNC answered with an 'offense-for-defense' and boosted their margin more. What does this tell us:

In perhaps a small amount, UNC's offense is as overrated as their defense is underrated. A better representation of a team's performance would be their efficiency margin. Carolina was probably predicted to win by about 10 points per 100 possessions, when in actuality they won by around 20 points per 100 possessions. 

As long as the teams' efficiency margins are adjusted for strength-of-schedule, I think this would be a better rating system than kenpom's, as it involves what goes through the players' and coaches' heads. Scoring margin, as we have discussed, is a false indicator of team-matchup winners, and therefore would work less.


nathan's sweet 16

if the brackets were to be split up where these teams were the appropriate 1-4 seeds, this would be my sweet 16, elite 8, and final four


1 Duke
2 UNC
3 Gonzaga
4 Pittsburgh
5 Georgetown
6 Missouri
7 Connecticut
8 Arizona St.
9 Memphis
10 UCLA
11 Louisville
12 Wake Forest
13 West Virginia
14 Oklahoma
15 Purdue
16 Xavier




Clemson @ UNC

iHistorically, Clemson has a 0% chance of winning. They have never won in the Dean Dome.

But here are the computer-predicted scores/% chances of winning.


Accuscore: UNC-90 (87% win), Clemson-75
Kenpom.com: UNC-88 (84% win), Clemson-77

Nathan's win%: UNC: 70% Clemson: 30%



% chance of final margin being within __ points-

1: 4%
3: 12%
6: 24%
9: 35%
12: 47%
13: 51%
15: 58%

Also, I have a new predictor for 3 of kenpom's four factors:


eFG % TO% OR% DR% (1-Opp OR%) PPP






North Carolina 52.4 19.6 44.1 68.0 1.100
Clemson 48.8 21.0 32.0 56.0 0.957








We'll see how it goes! UNC just has to take advantage of their rebounding advantage.

Right now....

Right now, Duke is the best team in the nation. So who has the best chance of beating them?













Predicted Point Margin % chance of neutral win
1 North Carolina -2.15 48.50%
2 Gonzaga -1.75 48.13%
3 Pittsburgh -3.6 43.70%
4 Georgetown -3.33 43.60%
5 Arizona St. -4.25 41.95%
6 Wake Forest -4.21 41.84%
7 Missouri -4.44 41.30%
8 Connecticut -4.25 40.42%
9 West Virginia -5.45 40.24%
10 Memphis -5.25 38.76%
11 Louisville -5.71 38.47%
12 UCLA -5.43 37.99%



There you have it. UNC has a higher % chance of win because the normal distribution assumes that Boston College was an off-day, and Michigan State was generally a pretty good day.

Effective Efficiency

This measures the predicted minimum number of possessions an average team would need near the end to come back and beat the team listed.

As of 1/18


Rank Team Effective Efficiency
1) North Carolina 8.39
2) Duke 8
3) Gonzaga 7.37
4) Missouri 7.15
5) Wake Forest 7.03
6) Connecticut 6.92
7) Pittsburgh 6.78
8) Georgetown 6.74
9) Memphis 6.2
10) Oklahoma 6.07
11) Arizona St. 6.06
12) UCLA 6.02
13) West Virginia 5.78
14) Louisville 5.75
15) Brigham Young 5.75
16) Marquette 5.7
17) Washington 5.64
18) Clemson 5.62
19) Michigan St. 5.6
20) Kansas 5.49
21) Xavier 5.32
22) California 5.3
23) Syracuse 5.17
24) Kentucky 5.14
25) Purdue 5.12


This is roughly the data used with the LRMC: http://www2.isye.gatech.edu/~jsokol/lrmc/
The likely difference comes from how the strength-of-schedule is adjusted for.

Hmmmmm.....

Kenpom has made a "Fanmatch" formula, to find which games will BE THE MOST EXCITING....which sounds a bit familiar to my last post.

Thankfully, he trusts his stats WAYY too much. And he assumes that higher-scoring games will be more exciting.

Unfortunately, despite faster paces being more 'exciting,' what is FAR more exciting is a CLOSE GAME at the end, which fast paces decrease the chance of .
(i.e., 2 teams averaging 1.1 and 1.2 points per possession would score 55 and 60 @50 possessions, a 5-point game---whereas they would score 110 and 120 @100 possessions, a 10 point game)

If you like faster games more than close games, then oh well.

So out of a small amount of anger, I will give MY FanMatches against his fanmatches.

And I'll tell you my formula: the % chance that the final scoring margin will be +/- 9.


Day: Team1 at Team2- My% (His%)

Friday: Loyola MD at Manhattan - 60% (29%)
Saturday: Wake Forest at Clemson- 66% (89%)
Sunday: Minnesota at Northwestern - 66% (75%)

The Underr/Overrateds, and UNC-Duke, continued

4 Underrateds, and 1 Overrated:

1. Gonzaga - Polls: Unranked, Kenpom.com: #2

The Zags have gradually shown their prowess again after losing to Portland State. The press should know better than to treat a team who loses to Connecticut by 6 like they stink. Their 'unranked-ness' clearly comes from the 1-4 sputter they had a few games ago, but only the Portland State game should have been very unexpected. Losing to Utah by three is not that bad, tsk tsk. Gonzaga has one of the best groups of teams beaten: Oklahoma St., Maryland, Tennessee (twice), and Washington State. Oh well.

2. West Virginia - Polls: Unranked, Kenpom.com: #7

The Mountaineers are good for two reasons: Destroyed Ohio State and Cleveland State, and had their worst loss to a then-confident Davidson (#46 kenpom). Getting killed by Marquette was quite the downer, but in Big-East play, AT Marquette, what should be expected? Now, West Virginia might be overrated from their absolute-crushing of ~100s teams (poor Seton Hall), and it does remain to be seen if they can be very consistent, but I think the Mountaineers will soon show Georgetown that they are a much better team than ESPN and the Coaches think.

3. Missouri - Polls: Unranked, Kenpom.com: #11

Missouri, like West Virginia, has had some troubles with consistency. But beating USC (#33 kenpom) by 11 on a Neutral Court, and destroying Cal (#20 kenpom) by 27 at home, have definitely shown some insane spark. Yes, they've done a lot of cupcake-eating, and yes, they lost to a mediocre Nebraska (#71 kenpom) on the road, but such performance isn't that strange in basketball from most top-caliber teams.

4. Memphis - Polls: Unranked, Kenpom.com: #12

I know, I know. Nathan, calling Memphis, UNDERRATED? This year has definitely presented Memphis' toughest competition. And they lost a huge part of their great team from last year. Memphis' biggest drop was at home against Syracuse (#32 kenpom, #8 in the polls!) only by seven! And they took Georgetown (#4 kenpom.com, #12 polls) all the way to overtime AT Georgetown. I can't understand why they aren't ranked. Maybe their streak of 7-0 will increase to 8-0 by Monday, with a relatively tough game against UAB.

Overrated:

Only two teams are in the top 25 of the polls, and not in the top 40 of Kenpom

1. St. Mary's - Polls: #25 (ESPN), Kenpom: #62

This team has beaten ONE top-40 team. San Diego State (#39). By 3. On a neutral court. They've only PLAYED 3 top 100 teams, losing one of those (#81 Texas El-Paso, on a neutral court by THIRTEEN). Seriously. Oh well, if they can notch one more in the "L" column, they'll be off for good.

2. Michigan - Polls: #25 and #25, Kenpom.com: #51

Michigan is a bit of a mystery to me. They have had TWO top-10 Kenpom wins, (UCLA and Duke). They're the only team I can think of that has done this. Nevertheless, they beat perenial bottom-feeders Savannah State (#323 kenpom) only in OVERTIME AT MICHIGAN. Ouch. Inconsistent much? What about #237 Indiana? On the road, Michigan needed an overtime to beat them as well. Despite having some of the best wins in the nation, Michigan needs to do consistently better. 



UNC-DUKE, part 3240923490.

I have come up with yet another cool stat. It's called "% chance of close game" and you get to choose how close the game is!

Duke is still predicted to beat UNC by 6 at Duke, but this brings up some interesting questions....what is the % chance that it will be a close game. Using my new-and-improved formula, and this new addition to it, lets see what we get:

Chance of win-  Duke: 62%, UNC: 38%

Chance of game being within __ points:

Overtime(.49): 3%
1: 6%
2: 12%
3: 18%
4: 25%
5: 31%
6: 36%
7: 42%
8: 47%
9: 52%


Aaaand here's a graph:



That's all, folks.

Why Pitt would beat UNC in the NCAA tournament, but UNC has a better chance of getting to the finals.

short answer: because UNC has a faster paced-game

long answer:

UNC has a greater chance of making the game a higher scoring margin, and thereby putting the game out of range, due to their faster-paced game. The faster you are (at high efficiencies), the higher your point margin. Here is a chart with the # of TOTAL NCAA teams that could come within 1,3,6,9,or 12 points of each team(or beat them).

Then I did some weighted averaging and rounding to give us the number of teams each would beat in a really close, then semi-close games.

How many teams can perform against you















































overall rk Within x points or > 1 3 6 9 12 tight or worse Close-med close or worse

1 Duke 0 2 10 21 39 3 20

2 Pittsburgh 4 8 19 37 58 9 34

3 North Carolina 5 8 16 29 43 8 27

4 West Virginia 7 10 26 42 67 12 41

5 Arizona St. 7 11 26 42 66 12 40

6 Gonzaga 7 11 26 42 66 12 40

7 Connecticut 7 11 24 38 64 12 38

8 Georgetown 7 14 28 47 72 14 44

9 Wake Forest 10 17 32 48 71 17 46

10 Brigham Young 11 18 34 51 73 18 49

Wake Forest v. UNC pt II

Here's the motivating factors:

Wake Forest:
-wants to prove that they should have been televised(this is their first televised game)
-wants to prove themselves against yet another top-team (right after beating BYU at BYU)

North Carolina:
-North Carolina wants to prove that their offense can still be insane, and their defense can be tighter (especially by making smarter traps)
-North Carolina wants to definitively prove that the BC game was a massive fluke

Weaknesses:
WFU: Offense is worse than Eastern Kentucky, Western Kentucky, Boston College, and North Dakota State (47th). They don't play well offensively in games where they draw fouls -- if UNC is aggressive, Wake will have issues. They rely a lot on forcing turnovers. If Lawson gets back on his low-turnover track, they will be doomed. Also, Wake has a tendency to do worse in faster-paced games (which Carolina brings).

UNC: Has similar turnover issues, but not as much. But if Ty Lawson can bring his stealing game above what Wake Forest can stop, UNC will have a dandy day. Also, if they don't get defensive rebounds, their entire defense falls apart. Again, their traps continually seem to be played unintelligently (I am not entirely sure what Roy is thinking when he turns traps into the game-changing strategy....) If Carolina's offense goes in the trend of the Oral Roberts and Boston College games, Wake Forest will have no problem shutting them down, with their insane average-opponent-prediction of .833 points per possession.

My prediction: Carolina - 89, Wake Forest - 82

How To Predict The Final Score

I have a program that can do this pretty automatically, but you can do it on your own!



There are five steps to finding the final score of any game:

1. Find each team's (adjusted) Offensive and Defensive Efficiency (their Points Per 100 Possessions) - from kenpom.com


For Example:
Duke's Offense - 115.4
Duke's Defense - 80.7

North Carolina's Offense - 119.8
North Carolina's Defense - 87.6

2. Find out where the game is being played. If it's a neutral game (not at either team's court), you don't have to adjust the data - you can find this at any sports website, but at kenpom.com, if you click on a team, then click schedule, you can tell if it's (H)ome, (A)way, or (N)eutral

For the home team, add 1.4% to their offense, and subtract 1.4% from their defense.
For the away team, subtract 1.4% from their offense, and add 1.4% to their defense.

For Example:
Duke's Home Offense = 115.4+(.014x115.4)=117.02
Duke's Home Defense = 80.7-(.014x80.7)=79.57

North Carolina's Away Offense = 119.8-(.014x119.8)=118.12
North Carolina's Away Defense = 87.6+(.014x87.6)=88.83

3. Find each team's adjusted pace (Possessions per game), and the League Average Pace from Kenpom.com's stats page

Duke's Pace = 71.5
North Carolina's Pace = 76.3
League Average = 67.6

Multiply these together, and divide by the league average.
Duke vs. UNC Pace = 71.5 x 76.3/67.6 = 80.7 possessions

4. For each team, multiply their Offense with their Opponent's Defense, then Divide by the League Average

League Average (just so happens to be)=100 points (per 100 possessions)

Duke's PPP=Duke O vs. UNC D=117.02x88.83/100=103.95 points per possession this game
North Carolina's PPP=UNC's O vs. Duke's D=118.12x79.57/100=93.99 points per possession this game

5. Multiply the predicted PPP (from #4) by the predicted game pace, and divide by 100 to get the final score

Duke's Points=103.95x80.7/100= 84 points
North Carolina's Points=93.99x80.7/100= 76 points

I still think UNC will win, though. :)


Ok, so....why does this work?
It actually has to do with fractions canceling out. Let's find out why:

The adjusted numbers are actually based on percentages. For example,
UNC's adjusted offensive efficiency is found by doing the following:

UNC's Avg Offense/Opponents' Avg. Defense x The League Avg Offense/Opponent's Avg Defense x Your Opponents' Average Defense

What this gives us is 3 numbers:

1. UNC's % of offense vs. opponents' defense
2. the league average offense% compared to UNC's opponents' defense
3. and the opponents' defense.

The first number tells us how well UNC has played against their opponents. The second tells us how an average team would play against UNC's opponents. Thirdly, again, we have the opponents' defense. If a team plays worse than the average team, they should be ranked lower accordingly. That's how the adjustments work.

When we cancel out the opponents' defense from the equation, we are left with:
UNC's average offense x the league's average efficiency / UNC's opponents' average defense. Look familiar? It's the opposite of our efficiency prediction calculation. (This calculation, by the way, works the same with defense, and pace.)

Now: the predicted efficiencies. Let's explain the opposite.

When UNC plays Duke, we compare UNC's offense to the average team, then Duke's Defense to the average team. The formula here is:

UNC's offense/League Average Defense x Duke's Defense/League Average Offense x Average Efficiency of all teams. This accurately portrays how UNC plays against the average, combined with how Duke plays against the average, in percents (since they are percents of the average, we then multiply by the league average).

This cancels out in much the same way, giving us UNC's Offense x Duke's Defense / League Average Efficiency.




There you have it!

Wake Forest - North Carolina Game Preview

Wow, this one's gonna be close.

The official Kenpom Prediction is 84-83, Wake Forest wins.
My formula now makes a very slight adjustment to how I think the data should be used, but still puts Wake Forest as the 84-83 victor, in 85 possessions.

Overtime is actually very difficult for two teams to achieve, since it requires a lot of things to happen just the right way. On average, the team who ends up doing the slightly better job, (or even just the one who wins the tip-off) will win in regulation by 1 more often than going into overtime. That being said, if the game DOES go into overtime, my heart sets the final score at 96-94 UNC.

The thing most likely to make or break this game is UNC's defense, with a standard deviation of 15.19 points/100 possessions (comparing actual and expected values). Their lackluster defense (lackluster compared to the other top teams) can be noted for also being difficult to predict: it is often times much better and much worse than predicted.

The percent chance of winning this game is going to be close, but I'll give you what my formula spits out anyways:


Chance that North Carolina can 1/2 make up for their disadvantage:
48.4%

Chance that Wake Forest can 1/2 cough up their advantage:
47.6%

Which, through all samples gives us:

Wake Forest - 52%
North Carolina - 48%



Just for fun, here's my adjusted predicted score at different paces:

@ 75 possessions: WFU-74, UNC-73
@ 80 possessions: WFU-79, UNC-78
@ 90 possessions: WFU-89, UNC-88
@ 95 possessions: WFU-94, UNC-93

Nathan's Normal Distribution Formula

Here's the jist of my formula:

I take the statistic of Points Per Possession scored and scored against any given team (adjusted for how difficult play is at www.kenpom.com). Every given team, according to these stats, are predicted to beat other teams by specific Points Per Possession. I take the predicted Points Per Possession Margin and subtract that from the ACTUAL result for all games a team has played. I then adjust this data according to the recency of the game (more recent has more weight) and how terrible their opponents were (if they play a cupcake, there's no reason to beat them by 40 when you can beat them by 20, and sometimes you can beat a team by 40 out of their frustration when on average you would probably beat them by 20).

This gives us a list of values. For example, UNC's list of values (per 100 possessions) is:

Pennsylvania : -11.51
Kentucky : 7.82
UC Santa Barbara: -0.44
Oregon : 3.75
Notre Dame : 5.95
NC Asheville : 15.92
Michigan St. : 34.24
Oral Roberts : -10.4
Evansville : -4.63
Valparaiso : -5.77
Rutgers : -3.93
Nevada : 6.8
Boston College : -30.6



I can assume a standard distribution of this data (although less so than the unadjusted data, since I didn't mess around with this) thanks to the Central Limit Theorem, since teams doing much worse or much better than average is more up to random chance than any of the data used here (i.e. there are millions factors we could use to predict a coin toss, but the toss still on average only gets heads one out of every 2 times....and it would be really hard to find those factors!).

Since this data falls under the normal distribution, we can apply probabilities to it!
The standard deviation of this data is 15.35, and the average is .28 (the average would be an accurate representation of how I think kenpom.com under or overestimates a team, but right now it also takes into account the fact that I don't know how he adjusts his data for recency). By using the spreadsheet function NORMDIST, we can then predict the total % of samples in which any given team predicted to win would lose, and vice versa.

For example, if the Boston College-UNC game were to be played again (at UNC), the following would result:

North Carolina would be predicted to win by an efficiency margin of 22.27 points for every 100 possessions. We must split this between both teams (to find the number of samples that BOTH teams would possibly to good enough / bad enough to change the predicted outcome). So that means we need to find the probability that Boston College can do good enough, and Carolina to do bad enough.

By using the NORMDIST function, we find that in only 5% of all of Boston Colleges' samples (adjusted for their consistency, as we require the standard deviation for this calculation) could they do good enough to make up for the 11.14 points per possession that they would have to get. However, in 23% of all samples, North Carolina could do bad enough to cough up the extra 11.14 points per possession they need to lose. So out of the TOTAL number of samples (200%), we get the following result:

Boston College has a 13.74% chance of winning, and North Carolina has an 86.26% chance of winning.

Now, this data is incorrect to apply to the UNC game everyone witnessed a few days ago for one reason only: the reason UNC could cough up that many PPP is because RECENTLY they have been the most inconsistent all season, namely -- against Boston College!


But this formula can be applied to any team, and help us understand when an underdog isn't really that much of an underdog.

Followers

About Me

I wish my heart were as often large as my hands.