EDIT: I fixed the strength of schedule-adjustment.
Sorry, no team analysis today. Too few games have been played for me to feel comfortable analyzing with variance or a multivariate regression.
Instead, here's a quick peek at the formula by which players will be rated (offensively) on my new site (coming soon!)
Adjusted Player Offensive Rating =
Poss% x (ORTG x LeagueAverage)/(player's team SOS of oppD) + (1-Poss%) x LeagueAvgEfficiency
This basically shows how an average team would benefit (offensively) by replacing one of their players with the player in question. However, most of the values will be very close to the league average (I assume), so we will use a Net value to better isolate the player's value.
Net Offensive Rating =
Adjusted Player Offensive Rating - LeagueAvgEfficiency
I will hopefully soon do the same with defensive rating, although Ken Pomeroy does not calculate these. I'll have to improvise.
EDIT: Fixed the system. Kyle Irving posts an 8.8 instead of a 12.3
Stay tuned.
Praise for The Basketball Distribution:
"...confusing." - CBS
"...quite the pun master." - ESPN
Quick note on Four Factors
If you look at the data from the NCAA four factors analysis (in my prior posts and in David Hess' posts) you might be thinking, correctly:
"This data explains where a team's points come from, but does not explain precisely how they could improve."
Then someone might respond:
"Now wait a second, doesn't this tell us how a team could improve? I mean, all Michigan State needs to do is take better care of the ball to win their games; the numbers say so!"
While the above answer is correct, it is important to realize that most teams don't have data points in Four-Factors ratings as striking as Michigan State's poor ball-handling.
What I will suggest is a continuation of what I have done in the past: figuring out how variable a team's factors are, and what causes this. For example, one might assume that a team thriving off 3-pointers (cough *Northwestern*) has much more variability in predicting offensive rating than one who thrives off 2-pointers, under the old adage, "si on vie par le trois, on mort par le trois." And I suppose it would make more sense to say that we can predict how a team's overall efficiency decreases against certain opponents via Four-Factor regression of individual games*.
Coming soon! (Gotta finish exams first...)
* By this I mean to run a linear/logistic regression to see how much an opponent's factors influence the team's factors.
"This data explains where a team's points come from, but does not explain precisely how they could improve."
Then someone might respond:
"Now wait a second, doesn't this tell us how a team could improve? I mean, all Michigan State needs to do is take better care of the ball to win their games; the numbers say so!"
While the above answer is correct, it is important to realize that most teams don't have data points in Four-Factors ratings as striking as Michigan State's poor ball-handling.
What I will suggest is a continuation of what I have done in the past: figuring out how variable a team's factors are, and what causes this. For example, one might assume that a team thriving off 3-pointers (cough *Northwestern*) has much more variability in predicting offensive rating than one who thrives off 2-pointers, under the old adage, "si on vie par le trois, on mort par le trois." And I suppose it would make more sense to say that we can predict how a team's overall efficiency decreases against certain opponents via Four-Factor regression of individual games*.
Coming soon! (Gotta finish exams first...)
* By this I mean to run a linear/logistic regression to see how much an opponent's factors influence the team's factors.
Team Impacts, Part Deux
I know I haven't recently been naming any teams, any players, or any specific cases...be patient!
Per David Hess's suggestion, I am now adjusting in an 'error-free' and strength-of-schedule-adjusted environment. To do this, we plug in a team's statistical offensive efficiency (different from the regression model)
St.Offense=(avgFGpoints + avgFTpoints)/avgPoss
This one has a lower error than the model since it includes FT% and raw FG%. The only error involved in the equation comes from rounding , miscalculated possessions, and lack of adjustment for 'team' rebounds. So instead of comparing regressed efficiencies with actual efficiencies, I compare statistical offense minus same with the league average replaced for a certain Factor. However, the factors are labeled differently this time around per the deduced equation.
1) FG% and eFG% (eFG% does not accurately count missed vs. made shots)
2) TO%
3) OR%
2) FT% and FTR% (FTR does not accurately count made free throws)
However, I adjust this for estimated strength of schedule (Adj Factor = Adj. Offense / St. Offense) to more accurately represent how the team plays.
So the end result is:
Impact of factor(s)=Adj.Offense - [St.Offense with factor(s) replaced with average]*Adj.Factor
Here's the results in HTML and editable/searchable Excel format.
Per David Hess's suggestion, I am now adjusting in an 'error-free' and strength-of-schedule-adjusted environment. To do this, we plug in a team's statistical offensive efficiency (different from the regression model)
St.Offense=(avgFGpoints + avgFTpoints)/avgPoss
This one has a lower error than the model since it includes FT% and raw FG%. The only error involved in the equation comes from rounding , miscalculated possessions, and lack of adjustment for 'team' rebounds. So instead of comparing regressed efficiencies with actual efficiencies, I compare statistical offense minus same with the league average replaced for a certain Factor. However, the factors are labeled differently this time around per the deduced equation.
1) FG% and eFG% (eFG% does not accurately count missed vs. made shots)
2) TO%
3) OR%
2) FT% and FTR% (FTR does not accurately count made free throws)
However, I adjust this for estimated strength of schedule (Adj Factor = Adj. Offense / St. Offense) to more accurately represent how the team plays.
So the end result is:
Impact of factor(s)=Adj.Offense - [St.Offense with factor(s) replaced with average]*Adj.Factor
Here's the results in HTML and editable/searchable Excel format.
NCAA Four-Factor Impact
Lots of credit here goes to David Hess (aka @AudacityOfHoops) for his work on a simple estimation of how turnover effect efficiency. Check out his pretty blog!
Given the limitations of that formula, I decided to take it a step further: how much does EACH four factor affect a team's offensive performance? Because every time I check out Ken Pomeroy's team four factors I want to better-quantify those green-or-red bits of data.
I've come up with a way to quantify how deviation of the league-mean by each team's four-factors affects their overall offensive efficiency.
The same can easily be done for defense, but for right now, I'm just going to focus on offense:
WARNING: BORING MATH
I took a regression (which myself and David have done before) of the four factors on offensive efficiency. For each team, I took their four factors, save for the one in question, and multiplied them by the regression estimates. I replaced the one in question with the league average. Finally, I took their raw offense and subtracted this number from it. This gives us an estimate of how a team's deviation from the mean affects their overall offense, in terms of the Four Factors.
/BORING MATH
Here's the great news:
1) I made an Excel spreadsheet so you can easily plug this in for any team without having to scour for them (just enter the team under "Team")
2) I used the same color scheme as Ken Pomeroy's numbers :)
2) I also made a PDF for those who don't want to use Excel.
Editable Excel File
PDF File
Given the limitations of that formula, I decided to take it a step further: how much does EACH four factor affect a team's offensive performance? Because every time I check out Ken Pomeroy's team four factors I want to better-quantify those green-or-red bits of data.
I've come up with a way to quantify how deviation of the league-mean by each team's four-factors affects their overall offensive efficiency.
The same can easily be done for defense, but for right now, I'm just going to focus on offense:
WARNING: BORING MATH
I took a regression (which myself and David have done before) of the four factors on offensive efficiency. For each team, I took their four factors, save for the one in question, and multiplied them by the regression estimates. I replaced the one in question with the league average. Finally, I took their raw offense and subtracted this number from it. This gives us an estimate of how a team's deviation from the mean affects their overall offense, in terms of the Four Factors.
/BORING MATH
Here's the great news:
1) I made an Excel spreadsheet so you can easily plug this in for any team without having to scour for them (just enter the team under "Team")
2) I used the same color scheme as Ken Pomeroy's numbers :)
2) I also made a PDF for those who don't want to use Excel.
Editable Excel File
PDF File
Offensive Impacts
EDIT/UPDATE: This old formula has some truth to it, but I have a much more accurate method of describing this, as described in the College Basketball Prospectus 2011-201 book.
There's a very simple stat that estimates how much a player affects their team's overall offensive rating, using Dean Oliver's Individual Offensive Rating (as is posted for all teams' significant players on Kenpom.com)
Formula for offensive impact =
team ORTG - (team ORTG-(%poss*%min*ORTG))/(1-%poss*%min)
(Which estimates the impact a player has on his team's overall Offensive Rating)
Here it is for UNC and Duke:
Now these stats don't exactly compare (a player with a +2 on a bad team is not as good as a player with +2 on a good team) - but this allows you to estimate what current substitutions do for a team, offensively (per 100 possessions).
There's a very simple stat that estimates how much a player affects their team's overall offensive rating, using Dean Oliver's Individual Offensive Rating (as is posted for all teams' significant players on Kenpom.com)
Formula for offensive impact =
team ORTG - (team ORTG-(%poss*%min*ORTG))/(1-%poss*%min)
(Which estimates the impact a player has on his team's overall Offensive Rating)
Here it is for UNC and Duke:
North Carolina | player | %Min | ORtg | %Poss | offensive impact | |||
Tyler Zeller | 69.4 | 119.2 | 23.4 | 3.93 | ||||
Reggie Bullock | 32.2 | 106.3 | 20.7 | 0.54 | ||||
Justin Watts | 29.4 | 106.1 | 14.8 | 0.34 | ||||
Leslie McDonald | 35.3 | 97.7 | 18.7 | -0.06 | ||||
Kendall Marshall | 35.9 | 95.5 | 19.2 | -0.23 | ||||
Harrison Barnes | 69.4 | 96.8 | 22.9 | -0.34 | ||||
Justin Knox | 37.2 | 93.9 | 24 | -0.46 | ||||
Dexter Strickland | 62.8 | 93.7 | 17 | -0.58 | ||||
John Henson | 61.6 | 94.6 | 25.5 | -0.74 | ||||
Larry Drew | 63.8 | 76.8 | 14.6 | -2.21 | ||||
Duke | player | %Min | ORtg | %Poss | offensive impact | |||
Kyrie Irving | 72.2 | 128.8 | 25.2 | 3.12 | ||||
Andre Dawkins | 57.8 | 144.3 | 13.2 | 2.44 | ||||
Seth Curry | 44.1 | 117.3 | 18 | 0.22 | ||||
Ryan Kelly | 35.9 | 116 | 13.5 | 0.07 | ||||
Tyler Thornton | 12.8 | 88.7 | 13.3 | -0.45 | ||||
Nolan Smith | 76.6 | 113 | 27.5 | -0.45 | ||||
Josh Hairston | 14.4 | 90.5 | 13 | -0.46 | ||||
Kyle Singler | 78.8 | 112.1 | 21.2 | -0.52 | ||||
Miles Plumlee | 36.6 | 90.3 | 17.8 | -1.69 | ||||
Mason Plumlee | 66.3 | 101.6 | 21 | -2.11 | ||||
Now these stats don't exactly compare (a player with a +2 on a bad team is not as good as a player with +2 on a good team) - but this allows you to estimate what current substitutions do for a team, offensively (per 100 possessions).
100th post, and a recap of my latest tweets....
100th post!
Hoorah! This blog and twitter have given me a wee voice with which to share the math that runs through my head. Holla to my loyal few!
Updated Ratings
NBA Ratings as of 11/29
This iteration includes predicted wins (assuming an average season's worth of home & away opponents).
Recent Twitterings:
-Tyler Zeller's offensive impact. This is based off of the formulas in my prior post, alongside some basic estimates of what a player's teammates produce. The method here does not encapsulate all usable offensive statistics like Dean Oliver's offensive rating, although I have done that in the past. Perhaps I should just stick with that?
-Good News for the 76ers and Bad News for the Magic -- although other stats-head would likely tell you a similar story.
-The Bobcats (I know I said Hornets....gimme a break) are consistent -- and therefore consistently sub-par. The top of a 95% confidence interval maxes out the H...Bobcats at ~41 wins.
Finally - if anyone's interested, I can keep updating NBA league-wide win probabilities (which are probably more accurate than the expected output from my point ratings).
Hoorah! This blog and twitter have given me a wee voice with which to share the math that runs through my head. Holla to my loyal few!
Updated Ratings
NBA Ratings as of 11/29
This iteration includes predicted wins (assuming an average season's worth of home & away opponents).
Recent Twitterings:
-Tyler Zeller's offensive impact. This is based off of the formulas in my prior post, alongside some basic estimates of what a player's teammates produce. The method here does not encapsulate all usable offensive statistics like Dean Oliver's offensive rating, although I have done that in the past. Perhaps I should just stick with that?
-Good News for the 76ers and Bad News for the Magic -- although other stats-head would likely tell you a similar story.
-The Bobcats (I know I said Hornets....gimme a break) are consistent -- and therefore consistently sub-par. The top of a 95% confidence interval maxes out the H...Bobcats at ~41 wins.
Finally - if anyone's interested, I can keep updating NBA league-wide win probabilities (which are probably more accurate than the expected output from my point ratings).
win-probabilities
Statistically, we can try to estimate a team's overall win% against an average team, and say that's their adjusted Win% (similar to Ken Pomeroy's Pythagorean win%). But this is only part of the picture.
Here I have used my consistency and adjusted ratings to predict home and away win probabilities for every NBA matchup. Instead up predicting how a team would fare against an average team, I predict how they fare on average against every team.
Here are the results (the home teams are the rows, the away teams are the columns).
I plug in the following into the NormDist function.
value=Rating(hometeam)-Rating(awayteam)+HomeCourtAdv
mean=0
standard deviation=sqrt(team1consistency^2+team2consistency^2)
(^this estimates overall standard deviation of the two teams' performance, assuming a covariance of zero.)
cumulative?=1
Speaking of which: Some of you may have thought in the past, "This guy doesn't plug stuff into the NormDist function correctly!" And you would be correct. Technically, I should plug in a value of zero and a mean of an estimated point margin. But in order to find that team's win probability, I would do 1-Normdist(0,est. margin). But this requires more typing, so I use the equivalent, Normdist(est. margin,0).
Here I have used my consistency and adjusted ratings to predict home and away win probabilities for every NBA matchup. Instead up predicting how a team would fare against an average team, I predict how they fare on average against every team.
Here are the results (the home teams are the rows, the away teams are the columns).
I plug in the following into the NormDist function.
value=Rating(hometeam)-Rating(awayteam)+HomeCourtAdv
mean=0
standard deviation=sqrt(team1consistency^2+team2consistency^2)
(^this estimates overall standard deviation of the two teams' performance, assuming a covariance of zero.)
cumulative?=1
Speaking of which: Some of you may have thought in the past, "This guy doesn't plug stuff into the NormDist function correctly!" And you would be correct. Technically, I should plug in a value of zero and a mean of an estimated point margin. But in order to find that team's win probability, I would do 1-Normdist(0,est. margin). But this requires more typing, so I use the equivalent, Normdist(est. margin,0).
Recency....
I think I need to adjust my ratings for recency! ( http://tinyurl.com/BBstatsNBA )
Here's the sample of the games on Nov. 26th
(Predictions->Games->Updated Predictions)
Predictions (before games) on 11/26:
DENVER>CHICAGO by 3.4
PHOENIX>LA CLIPPERS by 7.7
LA LAKERS>UTAH by .5
MEMPHIS>GOLDEN STATE by 4.1
PORTLAND>NEW ORLEANS by 1.3
ORLANDO>CLEVELAND by 9.8
CHARLOTTE>HOUSTON by 2.8
BOSTON>TORONTO by 8.9
MILWAUKEE>DETROIT by 6.2
MIAMI>PHILADELPHIA by 9.8
INDIANA>OKLAHOMA CITY by 3.3
SAN ANTONIO>DALLAS by 6.3
Actual spreads on 11/26 (*=predicted correctly):
*DENVER>CHICAGO by 1
*PHOENIX>LA CLIPPERS by 8
UTAH>LA LAKERS by 6
*MEMPHIS>GOLDEN STATE by 5
NEW ORLEANS>PORTLAND by 19
*ORLANDO>CLEVELAND by 11
*CHARLOTTE>HOUSTON by 10
*BOSTON>TORONTO by 9
DETROIT>MILWAUKEE by 14
*MIAMI>PHILADELPHIA by 9
OKLAHOMA CITY>INDIANA by 4
DALLAS>SAN ANTONIO by 9
Updated spreads on 11/26 (after inputing actual spreads):
*=retrodictively correct
*DENVER>CHICAGO by 3.1
*PHOENIX>LA CLIPPERS by 7.6
*UTAH>LA LAKERS by .4
GOLDEN STATE>MEMPHIS by 1.6
New Orleans @ Portland = tie
*ORLANDO>CLEVELAND by 9.9
*CHARLOTTE>HOUSTON by 3.6
*BOSTON>TORONTO by 9
*DETROIT>MILWAUKEE by 1.4
*MIAMI>PHILADELPHIA by 9.9
INDIANA>OKLAHOMA CITY by 2.3
SAN ANTONIO>DALLAS by 4.2
My very own ratings!
I have finally done something I've been wanting to do for a long time: make my own ratings system! I have found a source of easily-updated data, and a way to VERY quickly update my ratings!
Here's how it goes, as of games through 11/21.
To do list: adjust for tempo*, adjust for recency.
Adjusting for recency will have to be well-thought out...perhaps finding what weights most accurately predict more recent games/etc.
Tempo -- I'm not sure I'll be able to do this. To add this to my data set will most certainly be a pain, and I'm not quite sure that efficiency margin is a better measure of team quality than point margin -- or vice versa (as I have discussed previously -- relating to NCAA ball).
Here's how it goes, as of games through 11/21.
To do list: adjust for tempo*, adjust for recency.
Adjusting for recency will have to be well-thought out...perhaps finding what weights most accurately predict more recent games/etc.
Tempo -- I'm not sure I'll be able to do this. To add this to my data set will most certainly be a pain, and I'm not quite sure that efficiency margin is a better measure of team quality than point margin -- or vice versa (as I have discussed previously -- relating to NCAA ball).
Turning 4-factors into efficiency (and vice versa) - Part I
I used the definitions of the 4 factors (and other statistics) to derive a formula that gives us Points and Points Per Possession from only the following stats:
-eFG%
-OR%
-TO%
-FTR(%)
-FT%*
-FG%*
I like to just use the 4 factors (as they're found on the game plan page of Ken Pomeroy's team pages) -- we can estimate FT% and FG% depending on how far we are into the season. FT% would be estimated by the team's average FT% (or the league's FT%), and FG% would be estimated by: [the team (or league) ratio of average FG%/average eFG%] x [Game eFG%].
So here's the formula otherwise:
To get points, you simply add FT points and FG points. To get efficiency of course, we add the two and divide by possessions played.
In the upcoming weeks, I'll be using Ken Pomeroy's estimates for efficiency to predict 4-factors according to this method. Later, I'll reverse the process (in hopes to get a better picture at how teams control one another's four factors to get the resultant efficiency).
In case you were wondering: the large bracketed term equals Field Goals Attempted.
In the first formula, we simply multiply FGA by (eFG% x 2), as eFG%=Points per Field Goal Attempted/2.
In the second formula, we simply multiply FTR (FTA/FGA) by the FGA term to cancel the FGAs out to give us FTA.Then FTA are multiplied by FT% to give us total Points from Free Throws.
The large bracketed term was derived like so:
Possessions=FGA-OR+.44*FTA+TO
Poss= FGA-FGA*(1-FG%)*OR%+.44*(FGA*FTR%)
(Poss-TO)=FGA*(1-(1-FG%)*OR%+.44*FTR%)
then divided to give us:
FGA=(Poss-TO)/(.44*FTR%+1-(1-FG%)*OR%)
-eFG%
-OR%
-TO%
-FTR(%)
-FT%*
-FG%*
I like to just use the 4 factors (as they're found on the game plan page of Ken Pomeroy's team pages) -- we can estimate FT% and FG% depending on how far we are into the season. FT% would be estimated by the team's average FT% (or the league's FT%), and FG% would be estimated by: [the team (or league) ratio of average FG%/average eFG%] x [Game eFG%].
So here's the formula otherwise:
To get points, you simply add FT points and FG points. To get efficiency of course, we add the two and divide by possessions played.
In the upcoming weeks, I'll be using Ken Pomeroy's estimates for efficiency to predict 4-factors according to this method. Later, I'll reverse the process (in hopes to get a better picture at how teams control one another's four factors to get the resultant efficiency).
In case you were wondering: the large bracketed term equals Field Goals Attempted.
In the first formula, we simply multiply FGA by (eFG% x 2), as eFG%=Points per Field Goal Attempted/2.
In the second formula, we simply multiply FTR (FTA/FGA) by the FGA term to cancel the FGAs out to give us FTA.Then FTA are multiplied by FT% to give us total Points from Free Throws.
The large bracketed term was derived like so:
Possessions=FGA-OR+.44*FTA+TO
Poss= FGA-FGA*(1-FG%)*OR%+.44*(FGA*FTR%)
(Poss-TO)=FGA*(1-(1-FG%)*OR%+.44*FTR%)
then divided to give us:
FGA=(Poss-TO)/(.44*FTR%+1-(1-FG%)*OR%)
Redesign!
Blogger has kindly put out some new tools (and a sick new template!) that will hopefully make this blog a little less clunky.
Some current happenings in statistics land:
Check out my Carolina-Duke win probability meter. This will update incrementally as the season goes on, and is based on the probabilities demonstrated by using Ken Pomeroy's predictions, which for the '10-'11 season is only available as of yet by purchasing College Basketball Prospectus '10-'11.
Also, I've got a formula for estimating end-of-season raw efficiencies (where efficiency is points scored per 100 possessions) based on two variables:
1) The number of games a team has played
1) The number of games a team has played
2) The team's current raw efficiency average (offensive or defensive)
Where g=games played, solve this: x=.141*LN(g)+.466
Where g=games played, solve this: x=.141*LN(g)+.466
Where x gives you the weight a team's current average should have compared to the league's average. i.e.
Est. Final Avg= x*Current Avg + (1-x)*League Avg
Est. Final Avg= x*Current Avg + (1-x)*League Avg
Have fun and be safe!
Best Offensive Players
This adjusts a player's Offensive Plus Minus for their teammates.
We do this based on a Linear Regression that includes:
-a player's contributions (Dean Olivers Offensive Rating and Usage%)
-his teammate's contributions (based on four estimates)
The four estimates of his teammates are as follows:
1) Total Offensive (ONcourt)= TeamORTG(ONcourt)-playerUsg%*playerORTG
2) Average Offensive (ONcourt)= (TeamORTG(ONcourt)-playerUsg%*playerORTG)/(1-usg%)
3) Total Offense (Season) = TeamORTG(season)-playerusg%*ORTG*min%
4) Average Offense (Season)=(TeamORTG(season)-playerusg%*ORTG*min%)/(1-usg%*min%)
We take these variables versus a player's On-Court Offensive Efficiency, to give us the following regression (which has an R^2 value of >.99 since the x values are based on splitting the y value up):
Source Value Intercept -5.674 USG% 33.516 ORtg 0.193 TMO-1 0.302 TMO-2 0.554 TMO-3 -0.009 TMO-4 0.011
We do this based on a Linear Regression that includes:
-a player's contributions (Dean Olivers Offensive Rating and Usage%)
-his teammate's contributions (based on four estimates)
The four estimates of his teammates are as follows:
1) Total Offensive (ONcourt)= TeamORTG(ONcourt)-playerUsg%*playerORTG
2) Average Offensive (ONcourt)= (TeamORTG(ONcourt)-playerUsg%*playerORTG)/(1-usg%)
3) Total Offense (Season) = TeamORTG(season)-playerusg%*ORTG*min%
4) Average Offense (Season)=(TeamORTG(season)-playerusg%*ORTG*min%)/(1-usg%*min%)
We take these variables versus a player's On-Court Offensive Efficiency, to give us the following regression (which has an R^2 value of >.99 since the x values are based on splitting the y value up):
Source Value Intercept -5.674 USG% 33.516 ORtg 0.193 TMO-1 0.302 TMO-2 0.554 TMO-3 -0.009 TMO-4 0.011
Since Plus-Minus is the more inclusive stat, we will take that stat and adjust from there. We simply take the player's Offensive +/-, subtract the TMO variables (times the coefficients), then add the average TMO variables times the coefficients.
The results can be found here: http://dl.dropbox.com/u/241759/adjusted%20offense.pdf
The results can be found here: http://dl.dropbox.com/u/241759/adjusted%20offense.pdf
Box-Out%
This is fun:
http://dl.dropbox.com/u/241759/boxoutpercent.pdf - for all NBA players in 2010 with 25+ minutes per game, who have played 40+ games.
http://dl.dropbox.com/u/241759/boxoutpercent.pdf - for all NBA players in 2010 with 25+ minutes per game, who have played 40+ games.
I created a stat that shows us some representation of the % of the time a player gets a rebound, versus the % of the time their man gets the rebound.
Their man is assumed to be an average player whose rebounding percent (offensive reb% while player in question is on defense, etc) is ~80% the rebounding percent at the player's position, and 20% the average rebounding percentage of all other players. For example:
Oklahoma City's Russell Westbrook (great offensive rebounder for a point guard) collects 6% of all available rebounds while he is on offense. His 'man' is likely to be a point guard, but there is a chance (here estimated to be 20%) that his man will be a different player.
Point guards collect ~10.2% of all available defensive rebounds, while the rest of their team gets on average around 16%. So, (80%*.102)+(20%*.16)=.1016, or ~10.2%.
His 6% versus his 'man's' 10.2% gives him an offensive boxout% of 37.1% by dividing like this:
Westbrook's 6% Offensive rebounds / (His 6% Offensive Rebounds + 'Man's' 10.2% offensive rebounds) = 37.1%
Oklahoma City's Russell Westbrook (great offensive rebounder for a point guard) collects 6% of all available rebounds while he is on offense. His 'man' is likely to be a point guard, but there is a chance (here estimated to be 20%) that his man will be a different player.
Point guards collect ~10.2% of all available defensive rebounds, while the rest of their team gets on average around 16%. So, (80%*.102)+(20%*.16)=.1016, or ~10.2%.
His 6% versus his 'man's' 10.2% gives him an offensive boxout% of 37.1% by dividing like this:
Westbrook's 6% Offensive rebounds / (His 6% Offensive Rebounds + 'Man's' 10.2% offensive rebounds) = 37.1%
Finally, the two are averaged. This gives us the total percent of rebounds the player gets, versus their 'man' (a weighted average does not do this).
Adjusted Offensive Efficiencies
Here we estimate each player's Adjusted (against average competition) Offensive Efficiencies
(The formula is simply (Raw Offensive Efficiency x Average Team Efficiency)/ Opponent's Defensive Efficiency. This is based on the assumption that Raw Team & Player Offensive Efficiency can be described as Real Offensive Efficiency x Real Opponents' Defensive Efficiency / League Efficiency average).
(The formula is simply (Raw Offensive Efficiency x Average Team Efficiency)/ Opponent's Defensive Efficiency. This is based on the assumption that Raw Team & Player Offensive Efficiency can be described as Real Offensive Efficiency x Real Opponents' Defensive Efficiency / League Efficiency average).
The results above 20% of possessions used are here: http://dl.dropbox.com/u/241759/adjusted%20offensive%20efficiencies.htm
Monte Carlo Methods
Since my blog is so ugly, I don't really like updating it. But here's a quick rundown of my monte carlo simulation method.
1) The Ken Pomeroy simulation.
https://dl.dropbox.com/u/241759/bracket_simulation.htm
Ken Pomeroy has set up his statistics in a simple way to find point margin (see one of my way old posts). For several reasons that I have mentioned on this site, I do not agree with his % chance of win statistic, and so instead I stick to Dean Oliver's (the one I learned in statistics class). Simply, in excel, I tell it to look at the normal distribution of the expected outcome of the two teams. Assuming a standard deviation of 10.9 points (which is roughly what we find from most teams in Pomeroy ratings, and the number found by the LRMC paper), we tell the computer:
1) The Ken Pomeroy simulation.
https://dl.dropbox.com/u/241759/bracket_simulation.htm
Ken Pomeroy has set up his statistics in a simple way to find point margin (see one of my way old posts). For several reasons that I have mentioned on this site, I do not agree with his % chance of win statistic, and so instead I stick to Dean Oliver's (the one I learned in statistics class). Simply, in excel, I tell it to look at the normal distribution of the expected outcome of the two teams. Assuming a standard deviation of 10.9 points (which is roughly what we find from most teams in Pomeroy ratings, and the number found by the LRMC paper), we tell the computer:
=Normdist(x, 0, 10.9, 1)
where X is the expected point margin.
Then, we tell the computer to create one random tournament. For each game, the computer generates a random number between 0 and 1. If the value surpasses the better team's win%, (i.e., if it chose .91 while the better team's win% was .9) -- the worse team moves on.
Then by setting up a macro, I record the number of times each team makes it to which round. Then we simply divide the number of times each team makes it to any given round and divide it by the total number of trials to get % chance that a team will make it to whichever round of the tournament.
where X is the expected point margin.
Then, we tell the computer to create one random tournament. For each game, the computer generates a random number between 0 and 1. If the value surpasses the better team's win%, (i.e., if it chose .91 while the better team's win% was .9) -- the worse team moves on.
Then by setting up a macro, I record the number of times each team makes it to which round. Then we simply divide the number of times each team makes it to any given round and divide it by the total number of trials to get % chance that a team will make it to whichever round of the tournament.
2) The LRMC simulation
http://dl.dropbox.com/u/241759/lrmc_montecarlo.htm
This uses the same computer program, but different statistics.
Unfortunately, the LRMC does not post anything that we can convert to point margin or win probabilities. So we have to estimate point margin from each team's ranking. I took Jeff Sagarin's predictor rating of all 347 teams by ranking, and used the LRMC's ranking order. While this certainly has some inaccuracies, I should say that this method was by far my best for the vast majority of the tournament. Then, we just convert his numbers into a win probability by subtracting one rating from another (this gives us predicted point margin).
This uses the same computer program, but different statistics.
Unfortunately, the LRMC does not post anything that we can convert to point margin or win probabilities. So we have to estimate point margin from each team's ranking. I took Jeff Sagarin's predictor rating of all 347 teams by ranking, and used the LRMC's ranking order. While this certainly has some inaccuracies, I should say that this method was by far my best for the vast majority of the tournament. Then, we just convert his numbers into a win probability by subtracting one rating from another (this gives us predicted point margin).
Hope that answers any questions!
Who needs BracketScience?
By taking the simulated # of wins (via kenpom.com's numbers) and average wins for a given seed, we can rate who will have the best performance above what their seed predicts.
Here are the results for the field.
Here are the results for the field.
Quick Math on the bracket
Here is my preliminary results of my bracket simulation, based on stats from Kenpom.com
http://dl.dropbox.com/u/241759/MidwestWest.html
(not yet adjusted for teams' consistency)
http://dl.dropbox.com/u/241759/MidwestWest.html
(not yet adjusted for teams' consistency)
Game-Changers for NCAA Tourney Teams
Here we'll be taking a look at what is likely to alter a team's predicted final score (based on Ken Pomeroy's rankings - http://kenpom.com and the LRMC's rankings - http://www2.isye.gatech.edu/~jsokol/lrmc/)
The two things we'll be doing:
The two things we'll be doing:
1)We'll describe what part of Ken Pomeroy's Four Factors stats (of a team's opponents) affects the predicted outcome. Relies on = is ranked high in, does not rely on= is ranked low in. All these numbers can be found on the team pages at Kenpom.com
2) We'll measure a team's predictability (in terms of consistency of actual versus expected point margin with an average value of 10.9).
-Duke: (#1 Pomeroy, #2 LRMC, #2 bLRMC)
-Predictability: +1.1 points above average
-When opponents' offense relies on heavy free-throw shooting, Duke fares better. (Correlation of +.34)
-When opponents' defense DOES NOT rely on heavy defensive rebounding, Duke fares worse. (Correlation of -.28)
`
-Kansas: (#2 Pomeroy, #1 LRMC, #1 bLRMC)
-Predictability: -.2 points above average
-When opponents' offense relies on good field-goal shooting, Kansas fares worse. (Correlation of -.43)
-When opponents' offense relies on heavy offensive rebounding, Kansas fares better. (Correlation of +.33)
-Duke: (#1 Pomeroy, #2 LRMC, #2 bLRMC)
-Predictability: +1.1 points above average
-When opponents' offense relies on heavy free-throw shooting, Duke fares better. (Correlation of +.34)
-When opponents' defense DOES NOT rely on heavy defensive rebounding, Duke fares worse. (Correlation of -.28)
`
-Kansas: (#2 Pomeroy, #1 LRMC, #1 bLRMC)
-Predictability: -.2 points above average
-When opponents' offense relies on good field-goal shooting, Kansas fares worse. (Correlation of -.43)
-When opponents' offense relies on heavy offensive rebounding, Kansas fares better. (Correlation of +.33)
-When opponents' defense DOES NOT rely on field-goal percentage, Kansas fares worse. (Correlation of -.34)
-Wisconsin: (#3 Pomeroy, #13 LRMC, #9 bLRMC)
-Predictability: +.3 points above average
-Ohio St: (#4 Pomeroy, #7 LRMC, #5 bLRMC)
-Predictability: -.5 points above average
-Wisconsin: (#3 Pomeroy, #13 LRMC, #9 bLRMC)
-Predictability: +.3 points above average
-When opponents' offense relies on good field-goal shooting, Wisconsin fares worse. (Correlation of -.31)
-When opponents' defense relies on field-goal %, Wisconsin fares better. (Correlation of +.24)
-When opponents' defense relies on field-goal %, Wisconsin fares better. (Correlation of +.24)
-Predictability: -.5 points above average
-When opponents' offense relies on good field-goal shooting, Ohio St. fares better. (Correlation of +.24)
-When opponents' defense DOES NOT rely on field-goal %, Ohio St. fares worse. (Correlation of -.30)
-When opponents' defense DOES NOT rely on field-goal %, Ohio St. fares worse. (Correlation of -.30)
UPSET WATCH
NCAA tournament upset watch: Murray State and Pittsburgh are the two teams who will likely be mis-seeded the worst: http://dl.dropbox.com/u/241759/upsets.htm
Point-margin-based Chance of win.
While I think using the four factors can give us a much better picture of point-margin (and therefore, chance of win), let's just look at the 2nd step right now: deriving chance of win from expected point margin.
The Log5 formula used by many people (including Ken Pomeroy) to determine a team's chance of win is fairly accurate. It is based on fitting a model to theoretical results.
Slightly more accurate, I believe, is the LRMC (logistic regression markov chain) steady-state formula, which does the same thing, just to a much higher degree of accuracy; steady-states offer an actual theoretical explanation for the numbers based on team play rather than simply the normal distribution.
For example:
Duke's chances against Maryland, assuming a 2-pt-win-
Log5: 61%
LRMC: 59.8%
Huge difference, huh?
Finally, I must throw in my two cents: empirically, I think it is viable to say that specific teams play more consistently than others. In that way, we can alter win probabilities based on standard deviations of actual minus expected point margin (which explains the basis for this site's creation). Using those numbers (from Kenpom.com), we see that:
Duke's standard deviation of actual minus expected point margin is 9.77.
Maryland's standard deviation of actual minus expected point margin is 10.98
By Duke's numbers alone, we see their chance of win as being 58.1%
By Maryland's numbers alone, we see their chance of win as being 42.77%
By averaging these two values in their context (.581 and 1-.4277) we see that Duke's chance of winning should be around 57.7%
This allows us to solve (or at least partially resolve) Pomeroy's two prediction flaws: lack of accounting for consistency, and lack of accounting for diminishing returns. The first is obvious, the second is because team's expected play versus their actual play should reflect the error in his ratings derivations.
If I had enough time to scour through all the teams' data, I could give an adjusted Standard Deviations (or, 'Consistency') value for teams -- adjusting their consistency based on how consistent or inconsistent their opponents play.
The Log5 formula used by many people (including Ken Pomeroy) to determine a team's chance of win is fairly accurate. It is based on fitting a model to theoretical results.
Slightly more accurate, I believe, is the LRMC (logistic regression markov chain) steady-state formula, which does the same thing, just to a much higher degree of accuracy; steady-states offer an actual theoretical explanation for the numbers based on team play rather than simply the normal distribution.
For example:
Duke's chances against Maryland, assuming a 2-pt-win-
Log5: 61%
LRMC: 59.8%
Huge difference, huh?
Finally, I must throw in my two cents: empirically, I think it is viable to say that specific teams play more consistently than others. In that way, we can alter win probabilities based on standard deviations of actual minus expected point margin (which explains the basis for this site's creation). Using those numbers (from Kenpom.com), we see that:
Duke's standard deviation of actual minus expected point margin is 9.77.
Maryland's standard deviation of actual minus expected point margin is 10.98
By Duke's numbers alone, we see their chance of win as being 58.1%
By Maryland's numbers alone, we see their chance of win as being 42.77%
By averaging these two values in their context (.581 and 1-.4277) we see that Duke's chance of winning should be around 57.7%
This allows us to solve (or at least partially resolve) Pomeroy's two prediction flaws: lack of accounting for consistency, and lack of accounting for diminishing returns. The first is obvious, the second is because team's expected play versus their actual play should reflect the error in his ratings derivations.
If I had enough time to scour through all the teams' data, I could give an adjusted Standard Deviations (or, 'Consistency') value for teams -- adjusting their consistency based on how consistent or inconsistent their opponents play.
The rules for Step One.
Let's try this jam out on UNC.
The best way to predict a team's four factors in a future game is to create a linear regression involving their four factors, and their opponent's four factors.
Unfortunately, Ken Pomeroy has not yet adjusted the Four Factors for quality of opponent play (and for good reason - it's quite complicated). So we need to estimate how strength of schedule affects actual four factors. Unfortunately, I don't have any good way to run this analysis on every team. The best theory of adjustment would apply to all teams, but since there is a good chance that individual teams affect these numbers differently, it's not entirely bad to only regress on a team-by-team basis.
The next part of this is much harder.
We need to find the standard deviation of actual versus predicted four factors stats in order to run it through a Monte Carlo simulation that takes all likely normally-distributed values for all of the four factors+pace (which is 9 variables), which in turn spits out a point margin (whose values come from the previous post).
I'll be coming up with this system pretty soon, so watch out.
Step Two of the Two-Step Process
The best way to predict point margin is to first predict a team's four factors, then convert the four factors into point margin via linear regression.
The linear regression is the 2nd step, and here are the results (with an R^2 value of about .99)
(Numbers derived from http://kenpom.com)
Step one is a bit harder in some-ways, and should probably be done on a team-by-team basis. We'll cover that soon.
UNC's Injuries
Here's how UNC's injuries have affected their play, in terms of points. The number represents how Carolina does versus their average play.
(numbers based on Actual - Expected Point Margin, taking expected point margin from Kenpom.com)
(numbers based on Actual - Expected Point Margin, taking expected point margin from Kenpom.com)
Davis | Zeller | Graves | Ginyard | |
IN | -0.1 | 0.8 | -0.3 | -0.2 |
OUT | -16.0 | -10.1 | -12.7 | -3.9 |
Difference | 15.9 | 10.9 | 12.4 | 3.7 |
Subscribe to:
Posts (Atom)
Followers
Blog Archive
About Me
- Nathan
- I wish my heart were as often large as my hands.