The more runs a baseball team scores, the more games the team should win. Conversely, the fewer runs a team gives up, the more games the team should win. Bill James, probably the most celebrated advocate of applying mathematics to analysis of Major League Baseball (often called sabermetrics), studied many years of Major League Baseball (MLB) standings and found that the percentage of games won by a baseball team can be well approximated by the formula
\[ \begin{equation} \label{eq:1} \frac{\textrm{runs scored}^2}{\textrm{runs scored}^2+\textrm{runs allowed}^2} = \textrm{estimate of percentage of games won.} \end{equation} \]
This formula has several desirable properties.
Consider a right triangle with a hypotenuse (the longest side) of length c and two other sides of lengths a and b. Recall from high school geometry that the Pythagorean Theorem states that a triangle is a right triangle if and only if \(a^2+b^2=c^2\). For example, a triangle with sides of lengths 3, 4, and 5 is a right triangle because \(3^2+4^2=5^2\). The fact that equation (1.1) adds up the squares of two numbers led Bill James to call the relationship described in (1.1) Baseball’s Pythagorean Theorem.
Let’s define \(R=\frac{\textrm{runs scored}}{\textrm{runs allowed}}\) as a team’s scoring ratio. If we divide the numerator and denominator of (1.1) by \((\textrm{runs allowed})^2\), then the value of the fraction remains unchanged and we may rewrite (1.1) as equation (1.2).
\[ \begin{equation} \label{eq:2} \frac{R^2}{R^{2}+1} = \textrm{estimate of percentage of games won.} \end{equation} \]
Below shows how well (1.2) predicts MLB teams’ winning percentages for the 1980–2006 seasons.
#to install any package
#install.packages("package.name")
library(Lahman) #for baseball stats
#load the team data
data("Teams")
#if you are using RStudio, use the View() function to see all the data
#look at seasons 1980-2006, W,L,R,RA
team_df = Teams[Teams$yearID >= 1980 & Teams$yearID <= 2006,
c("yearID","teamID","W","L","R","RA")]
#scoring ratio (sr) = R/RA
team_df$Scoring.Ratio = team_df$R/team_df$RA
#predicted win % = sr^2/((sr^2)+1)
team_df$Predicted.Win.Pct = team_df$Scoring.Ratio^2/
((team_df$Scoring.Ratio^2)+1)
#win % W/G
team_df$Actual.Win.Pct = team_df$W/(team_df$W+team_df$L)
#absolute error = |actual-predicted|
team_df$Absolute.Error = abs(
team_df$Actual.Win.Pct-team_df$Predicted.Win.Pct)
yearID | teamID | W | L | R | RA | Scoring.Ratio | Predicted.Win.Pct | Actual.Win.Pct | Absolute.Error |
---|---|---|---|---|---|---|---|---|---|
2006 | ARI | 76 | 86 | 773 | 788 | 0.9809645 | 0.4903917 | 0.4691358 | 0.0212559 |
2006 | ATL | 79 | 83 | 849 | 805 | 1.0546584 | 0.5265834 | 0.4876543 | 0.0389290 |
2006 | BAL | 70 | 92 | 768 | 899 | 0.8542825 | 0.4218980 | 0.4320988 | 0.0102007 |
2006 | BOS | 86 | 76 | 820 | 825 | 0.9939394 | 0.4969605 | 0.5308642 | 0.0339037 |
2006 | CHA | 90 | 72 | 868 | 794 | 1.0931990 | 0.5444366 | 0.5555556 | 0.0111190 |
2006 | CHN | 66 | 96 | 716 | 834 | 0.8585132 | 0.4243096 | 0.4074074 | 0.0169022 |
2006 | CIN | 80 | 82 | 749 | 801 | 0.9350811 | 0.4664893 | 0.4938272 | 0.0273378 |
2006 | CLE | 78 | 84 | 870 | 782 | 1.1125320 | 0.5531180 | 0.4814815 | 0.0716366 |
2006 | COL | 76 | 86 | 813 | 812 | 1.0012315 | 0.5006154 | 0.4691358 | 0.0314796 |
2006 | DET | 95 | 67 | 822 | 675 | 1.2177778 | 0.5972586 | 0.5864198 | 0.0108388 |
2006 | FLO | 78 | 84 | 758 | 772 | 0.9818653 | 0.4908504 | 0.4814815 | 0.0093690 |
2006 | HOU | 82 | 80 | 735 | 719 | 1.0222531 | 0.5110028 | 0.5061728 | 0.0048300 |
2006 | KCA | 62 | 100 | 757 | 971 | 0.7796087 | 0.3780281 | 0.3827160 | 0.0046880 |
2006 | LAA | 89 | 73 | 766 | 732 | 1.0464481 | 0.5226852 | 0.5493827 | 0.0266975 |
2006 | LAN | 88 | 74 | 820 | 751 | 1.0918775 | 0.5438365 | 0.5432099 | 0.0006266 |
2006 | MIL | 75 | 87 | 730 | 833 | 0.8763505 | 0.4343860 | 0.4629630 | 0.0285769 |
2006 | MIN | 96 | 66 | 801 | 683 | 1.1727672 | 0.5790152 | 0.5925926 | 0.0135774 |
2006 | NYA | 97 | 65 | 930 | 767 | 1.2125163 | 0.5951738 | 0.5987654 | 0.0035916 |
#people love R because the above code can be written as follows
library(dplyr)
team_df2 = Teams %>%
filter(yearID >= 1980, Teams$yearID <= 2006) %>%
select(yearID, teamID, W, L, R, RA) %>%
mutate(Scoring.Ratio = R/RA,
Predicted.Win.Pct = Scoring.Ratio^2/((Scoring.Ratio^2)+1),
Actual.Win.Pct = W/(W+L),
Absolute.Error = abs(Actual.Win.Pct-Predicted.Win.Pct))
Figure 1.1 shows how well (1.2) predicts MLB teams’ winning percentages for the 1980–2006 seasons.
For example, the 2006 Detroit Tigers (DET) scored 822 runs and gave up 675 runs. Their scoring ratio was \(R=\frac{822}{675}=1.218\). Their predicted win percentage from Baseball’s Pythagorean Theorem was \(\frac{1.218^2}{(1.2.18)^{2}+1}=.597\). The 2006 Tigers actually won a fraction of their games, or \(\frac{95}{162}=.586\). Thus (1.2) was off by 1.1% in predicting the percentage of games won by the Tigers in 2006. For each team define error in winning percentage prediction as actual winning percentage minus predicted winning percentage. For example, for the 2006 Arizona Diamondbacks (ARI), error = .469 - .490 = -.021 and for the 2006 Boston Red Sox (BOS), error = .531 - .497 = 0.34. A positive error means that the team won more games than predicted while a negative error means the team won fewer games than predicted. The Absolute.Error column in figure 1.1 computes the absolute value of the prediction error for each team. Recall that the absolute value of a number is simply the distance of the number from 0. That is, |5| = |-5| = 5. The absolute prediction errors for each team were averaged to obtain a measure of how well the predicted win percentages fit the actual team winning percentages. The average of absolute forecasting errors is called the MAD (Mean Absolute Deviation)1. For this data set, the predicted winning percentages of the Pythagorean Theorem were off by an average of 2% per team.
mean(team_df$Absolute.Error)
## [1] 0.01965617
Instead of blindly assuming winning percentage can be approximated by using the square of the scoring ratio, perhaps we should try a formula to predict winning percentage, such as
\[ \begin{equation} \label{eq:3} \frac{R^\textrm{exp}}{R^\textrm{exp}+1}. \end{equation} \]
If we vary exp (exponent) in (1.3) we can make (1.3) better fit the actual dependence of winning percentage on scoring ratio for different sports. For baseball, we will allow exp in (1.3) to vary between 1 and 3. Of course, exp = 2 reduces to the Pythagorean Theorem.
Figure 1.2 shows how MAD changes as we vary exp between 1 and 3. We see that indeed exp = 1.9 yields the smallest MAD (1.96%). An exp value of 2 is almost as good (MAD of 1.97%), so for simplicity we will stick with Bill James’s view that exp = 2. Therefore, exp = 2 (or 1.9) yields the best forecasts if we use an equation of form (1.3). Of course, there might be another equation that predicts winning percentage better than the Pythagorean Theorem from runs scored and allowed. The Pythagorean Theorem is simple and intuitive, however, and works very well. After all, we are off in predicting team wins by an average of 162 \(\times\) .02, which is approximately three wins per team. Therefore, I see no reason to look for a more complicated (albeit slightly more accurate) model.
#numbers from 1-3 going up by 0.1
exponent = seq(1, 3, 0.1)
#take each exponent and plug it into this formula
MAD = sapply(exponent, function(x){
mean(abs(
team_df$Scoring.Ratio^x/
((team_df$Scoring.Ratio^x)+1)
-team_df$Actual.Win.Pct))})
exponent | MAD |
---|---|
1.0 | 0.0317843 |
1.1 | 0.0296585 |
1.2 | 0.0276954 |
1.3 | 0.0258894 |
1.4 | 0.0242529 |
1.5 | 0.0228382 |
1.6 | 0.0216138 |
1.7 | 0.0206476 |
1.8 | 0.0199516 |
1.9 | 0.0196285 |
2.0 | 0.0196562 |
2.1 | 0.0200005 |
2.2 | 0.0206936 |
2.3 | 0.0216168 |
2.4 | 0.0228446 |
2.5 | 0.0243075 |
2.6 | 0.0260084 |
2.7 | 0.0278395 |
2.8 | 0.0297717 |
2.9 | 0.0318052 |
3.0 | 0.0338884 |
To test the utility of the Pythagorean Theorem (or any prediction model), we should check how well it forecasts the future. I compared the Pythagorean Theorem’s forecast for each MLB playoff series (1980 – 2007) against a prediction based just on games won. For each playoff series the Pythagorean method would predict the winner to be the team with the higher scoring ratio, while the “games won” approach simply predicts the winner of a playoff series to be the team that won more games.
Click here to see the code used to scrape all MLB playoff series data from baseball-reference.com
library(scales) #to format percentages
#read the csv from github
all_series = read.csv(
"https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/mlb_playoffs.csv")
#just playoffs from the years 1980 to 2007
series_80_07 = all_series[all_series$year >= 1980 &
all_series$year <= 2007 &
all_series$series != "World Series",]
#add a column for scoring ratio
series_80_07$Ratio = series_80_07$R/series_80_07$pR
#data frame for the winners and losers
winners = series_80_07[seq(1,nrow(series_80_07),2), c(1:3,5:6,12,45,33,65)]
losers = series_80_07[seq(2,nrow(series_80_07),2), c(6,12,45,33,65)]
#rename the losers columns
colnames(losers) = paste0("L", colnames(losers))
#combine the winners and losers
series_df = cbind(winners, losers)
#was the winner win % greater than the loser?
series_df$W.W.Greater = ifelse(series_df$pW > series_df$LpW, TRUE, FALSE)
series_df$W.Ratio.Greater = ifelse((series_df$Ratio) > (series_df$LRatio),
TRUE, FALSE)
We found that the Pythagorean approach correctly predicted 61 of 106 playoff series (59.1%
) while the “games won” approach correctly predicted the winner of only 52.9%
(52 out of 100) of playoff series.2
The reader is probably disappointed that even the Pythagorean method only correctly forecasts the outcome of less than 58% of baseball playoff series. I believe that the regular season is a relatively poor predictor of the playoffs in baseball because a team’s regular season record depends greatly on the performance of five starting pitchers. During the playoffs teams only use three or four starting pitchers, so much of the regular season data (games involving the fourth and fifth starting pitchers) are not relevant for predicting the outcome of the playoffs.
For anecdotal evidence of how the Pythagorean Theorem forecasts the future performance of a team better than a team’s win-loss record, consider the case of the 2005 Washington Nationals. On July 4, 2005, the Nationals were in first place with a record of 50–32. If we extrapolate this winning percentage we would have predicted a final record of 99–63. On July 4, 2005, the Nationals scoring ratio was .991. On July 4, 2005, (1.2) would have predicted a final record of 80–82. Sure enough, the poor Nationals finished 81–81.
Baseball’s Pythagorean Theorem is also important because it allows us to determine how many extra wins (or losses) will result from a trade. Suppose a team has scored 850 runs during a season and has given up 800 runs. Suppose we trade a shortstop (Joe) who “created”3 150 runs for a shortstop (Greg) who created 170 runs in the same number of plate appearances. This trade will cause the team (all other things being equal) to score 20 more runs (170 - 150 = 20). Before the trade, \(R=\frac{850}{800}=1.0625\), and we would predict the team to have won \(\frac{162(1.0625)^{2}}{1+(1.0625)^{2}}=85.9\) games. After the trade, \(R=\frac{870}{800}=1.0875\), and we would predict the team to win \(\frac{162(1.0875)^{2}}{1+(1.0875)^{2}}=87.8\) games. Therefore, we estimate the trade makes our team 1.9 games better (87.8 - 85.9 = 1.9). In chapter 9, we will see how the Pythagorean Theorem can be used to help determine fair salaries for MLB players.
Does the Pythagorean Theorem hold for football and basketball? Daryl Morey, the general manager for the Houston Rockets, has shown that for the NFL, equation (1.3) with exp = 2.37 gives the most accurate predictions for winning percentage while for the NBA, equation (1.3) with exp = 13.91 gives the most accurate predictions for winning percentage. Figure 1.3 gives the predicted and actual winning percentages for the NFL for the 2006-7 season, while figure 1.4 gives the predicted and actual winning percentages for the NBA for the 2006–7 season.
#read the csv off github
nfl_standings = read.csv(
"https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/nfl_standings.csv")
#look at just 2 seasons
nfl_05_07 = nfl_standings[nfl_standings$Year >= 2005 &
nfl_standings$Year <= 2007,]
#pyt win % using exp=2.7
nfl_05_07$Win.Pct.2.7 = (nfl_05_07$Ratio^2.7)/((nfl_05_07$Ratio^2.7)+1)
#pyt win % using morely exp=2.37
nfl_05_07$Win.Pct.morely = (nfl_05_07$Ratio^2.37)/((nfl_05_07$Ratio^2.37)+1)
#absolute error exp=2.7
nfl_05_07$Error.2.7 = abs(nfl_05_07$W.L.-nfl_05_07$Win.Pct.2.7)
#absolute error morely exp=2.37
nfl_05_07$Error.morely = abs(nfl_05_07$W.L.-nfl_05_07$Win.Pct.morely)
Year | Tm | W | L | T | W.L. | PF | PA | Ratio | Win.Pct.2.7 | Win.Pct.morely | Error.2.7 | Error.morely |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 | Arizona Cardinals | 8 | 8 | 0 | 0.500 | 404 | 399 | 1.0125313 | 0.5084053 | 0.5073781 | 0.0084053 | 0.0073781 |
2007 | Atlanta Falcons | 4 | 12 | 0 | 0.250 | 259 | 414 | 0.6256039 | 0.2198737 | 0.2475690 | 0.0301263 | 0.0024310 |
2007 | Baltimore Ravens | 5 | 11 | 0 | 0.313 | 275 | 384 | 0.7161458 | 0.2887520 | 0.3118949 | 0.0242480 | 0.0011051 |
2007 | Buffalo Bills | 7 | 9 | 0 | 0.438 | 252 | 354 | 0.7118644 | 0.2854384 | 0.3088531 | 0.1525616 | 0.1291469 |
2007 | Carolina Panthers | 7 | 9 | 0 | 0.438 | 267 | 347 | 0.7694524 | 0.3301281 | 0.3495267 | 0.1078719 | 0.0884733 |
2007 | Chicago Bears | 7 | 9 | 0 | 0.438 | 334 | 348 | 0.9597701 | 0.4723119 | 0.4756903 | 0.0343119 | 0.0376903 |
2007 | Cincinnati Bengals | 7 | 9 | 0 | 0.438 | 380 | 385 | 0.9870130 | 0.4911773 | 0.4922554 | 0.0531773 | 0.0542554 |
2007 | Cleveland Browns | 10 | 6 | 0 | 0.625 | 402 | 382 | 1.0523560 | 0.5343919 | 0.5301993 | 0.0906081 | 0.0948007 |
2007 | Dallas Cowboys | 13 | 3 | 0 | 0.813 | 455 | 325 | 1.4000000 | 0.7126880 | 0.6894264 | 0.1003120 | 0.1235736 |
2007 | Denver Broncos | 7 | 9 | 0 | 0.438 | 320 | 409 | 0.7823961 | 0.3401638 | 0.3585682 | 0.0978362 | 0.0794318 |
2007 | Detroit Lions | 7 | 9 | 0 | 0.438 | 346 | 444 | 0.7792793 | 0.3377490 | 0.3563953 | 0.1002510 | 0.0816047 |
2007 | Green Bay Packers | 13 | 3 | 0 | 0.813 | 435 | 291 | 1.4948454 | 0.7475261 | 0.7216767 | 0.0654739 | 0.0913233 |
2007 | Houston Texans | 8 | 8 | 0 | 0.500 | 379 | 384 | 0.9869792 | 0.4911541 | 0.4922351 | 0.0088459 | 0.0077649 |
2007 | Indianapolis Colts | 13 | 3 | 0 | 0.813 | 450 | 262 | 1.7175573 | 0.8115997 | 0.7827799 | 0.0014003 | 0.0302201 |
2007 | Jacksonville Jaguars | 11 | 5 | 0 | 0.688 | 411 | 304 | 1.3519737 | 0.6930095 | 0.6714411 | 0.0050095 | 0.0165589 |
2007 | Kansas City Chiefs | 4 | 12 | 0 | 0.250 | 226 | 335 | 0.6746269 | 0.2567923 | 0.2823527 | 0.0067923 | 0.0323527 |
2007 | Miami Dolphins | 1 | 15 | 0 | 0.063 | 267 | 437 | 0.6109840 | 0.2091183 | 0.2372778 | 0.1461183 | 0.1742778 |
2007 | Minnesota Vikings | 8 | 8 | 0 | 0.500 | 365 | 311 | 1.1736334 | 0.6064185 | 0.5937398 | 0.1064185 | 0.0937398 |
2007 | New England Patriots | 16 | 0 | 0 | 1.000 | 589 | 274 | 2.1496350 | 0.8875848 | 0.8598153 | 0.1124152 | 0.1401847 |
2007 | New Orleans Saints | 7 | 9 | 0 | 0.438 | 379 | 388 | 0.9768041 | 0.4841636 | 0.4860981 | 0.0461636 | 0.0480981 |
2007 | New York Giants | 10 | 6 | 0 | 0.625 | 373 | 351 | 1.0626781 | 0.5409429 | 0.5359572 | 0.0840571 | 0.0890428 |
2007 | New York Jets | 4 | 12 | 0 | 0.250 | 268 | 355 | 0.7549296 | 0.3188519 | 0.3393303 | 0.0688519 | 0.0893303 |
2007 | Oakland Raiders | 4 | 12 | 0 | 0.250 | 283 | 398 | 0.7110553 | 0.2848125 | 0.3082780 | 0.0348125 | 0.0582780 |
2007 | Philadelphia Eagles | 8 | 8 | 0 | 0.500 | 336 | 300 | 1.1200000 | 0.5759055 | 0.5667465 | 0.0759055 | 0.0667465 |
2007 | Pittsburgh Steelers | 10 | 6 | 0 | 0.625 | 393 | 269 | 1.4609665 | 0.7356665 | 0.7106335 | 0.1106665 | 0.0856335 |
2007 | San Diego Chargers | 11 | 5 | 0 | 0.688 | 412 | 284 | 1.4507042 | 0.7319488 | 0.7071861 | 0.0439488 | 0.0191861 |
2007 | San Francisco 49ers | 5 | 11 | 0 | 0.313 | 219 | 364 | 0.6016484 | 0.2023257 | 0.2307369 | 0.1106743 | 0.0822631 |
2007 | Seattle Seahawks | 10 | 6 | 0 | 0.625 | 393 | 291 | 1.3505155 | 0.6923893 | 0.6708766 | 0.0673893 | 0.0458766 |
2007 | St. Louis Rams | 3 | 13 | 0 | 0.188 | 263 | 438 | 0.6004566 | 0.2014631 | 0.2299039 | 0.0134631 | 0.0419039 |
2007 | Tampa Bay Buccaneers | 9 | 7 | 0 | 0.563 | 334 | 270 | 1.2370370 | 0.6397643 | 0.6234327 | 0.0767643 | 0.0604327 |
2007 | Tennessee Titans | 10 | 6 | 0 | 0.625 | 301 | 297 | 1.0134680 | 0.5090293 | 0.5079259 | 0.1159707 | 0.1170741 |
2007 | Washington Redskins | 9 | 7 | 0 | 0.563 | 334 | 310 | 1.0774194 | 0.5501645 | 0.5440673 | 0.0128355 | 0.0189327 |
For the 2005–7 NFL seasons, MAD was minimized by exp = 2.7. Exp = 2.7 yielded a MAD of 5.9%
, while Morey’s exp = 2.37 yielded a MAD of 6.2%
.
#read the csv off github
nba_standings = read.csv(
"https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/nba_standings.csv")
#look at just 2 seasons
nba_04_07 = nba_standings[nba_standings$Year >= 2005 &
nba_standings$Year <= 2007,]
#pyt win % using exp=2.7
nba_04_07$Win.Pct.15.4 = (nba_04_07$Ratio^15.4)/((nba_04_07$Ratio^15.4)+1)
#pyt win % using morely exp=2.37
nba_04_07$Win.Pct.morely = (nba_04_07$Ratio^13.91)/((nba_04_07$Ratio^13.91)+1)
#absolute error exp=2.7
nba_04_07$Error.15.4 = abs(nba_04_07$W.L.-nba_04_07$Win.Pct.15.4)
#absolute error morely exp=2.37
nba_04_07$Error.morely = abs(nba_04_07$W.L.-nba_04_07$Win.Pct.morely)
Team | W | L | W.L. | GB | PS.G | PA.G | SRS | Year | Total.PF | Total.PA | Ratio | Win.Pct.15.4 | Win.Pct.morely | Error.15.4 | Error.morely |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Atlanta Hawks | 26 | 56 | 0.317 | 26 | 97.2 | 102.0 | -4.69 | 2006 | 7970 | 8364 | 0.9528934 | 0.3223299 | 0.3382306 | 0.0053299 | 0.0212306 |
Boston Celtics | 33 | 49 | 0.402 | 16 | 98.0 | 99.5 | -1.59 | 2006 | 8036 | 8159 | 0.9849246 | 0.4417831 | 0.4473719 | 0.0397831 | 0.0453719 |
Charlotte Bobcats | 26 | 56 | 0.317 | 26 | 96.9 | 100.9 | -3.90 | 2006 | 7946 | 8274 | 0.9603577 | 0.3491174 | 0.3629342 | 0.0321174 | 0.0459342 |
Chicago Bulls | 41 | 41 | 0.500 | 23 | 97.8 | 97.2 | 0.51 | 2006 | 8020 | 7970 | 1.0062735 | 0.5240590 | 0.5217343 | 0.0240590 | 0.0217343 |
Cleveland Cavaliers | 50 | 32 | 0.610 | 14 | 97.6 | 95.4 | 2.17 | 2006 | 8003 | 7823 | 1.0230091 | 0.5866963 | 0.5784539 | 0.0233037 | 0.0315461 |
Dallas Mavericks | 60 | 22 | 0.732 | 3 | 99.1 | 93.1 | 5.96 | 2006 | 8126 | 7634 | 1.0644485 | 0.7234891 | 0.7044907 | 0.0085109 | 0.0275093 |
Denver Nuggets | 44 | 38 | 0.537 | 0 | 100.3 | 100.1 | 0.36 | 2006 | 8225 | 8208 | 1.0020712 | 0.5079650 | 0.5071945 | 0.0290350 | 0.0298055 |
Detroit Pistons | 64 | 18 | 0.780 | 0 | 96.8 | 90.2 | 6.24 | 2006 | 7938 | 7396 | 1.0732829 | 0.7482159 | 0.7278504 | 0.0317841 | 0.0521496 |
Golden State Warriors | 34 | 48 | 0.415 | 20 | 98.5 | 99.8 | -1.11 | 2006 | 8077 | 8184 | 0.9869257 | 0.4495048 | 0.4543617 | 0.0345048 | 0.0393617 |
Houston Rockets | 34 | 48 | 0.415 | 29 | 90.1 | 91.7 | -1.30 | 2006 | 7388 | 7519 | 0.9825775 | 0.4327422 | 0.4391818 | 0.0177422 | 0.0241818 |
Indiana Pacers | 41 | 41 | 0.500 | 23 | 93.9 | 92.0 | 1.62 | 2006 | 7700 | 7544 | 1.0206787 | 0.5781550 | 0.5706998 | 0.0781550 | 0.0706998 |
Los Angeles Clippers | 47 | 35 | 0.573 | 7 | 97.2 | 95.6 | 1.75 | 2006 | 7970 | 7839 | 1.0167113 | 0.5634628 | 0.5573795 | 0.0095372 | 0.0156205 |
Los Angeles Lakers | 45 | 37 | 0.549 | 9 | 99.4 | 96.9 | 2.53 | 2006 | 8151 | 7946 | 1.0257991 | 0.5968286 | 0.5876636 | 0.0478286 | 0.0386636 |
Memphis Grizzlies | 49 | 33 | 0.598 | 14 | 92.2 | 88.5 | 3.74 | 2006 | 7560 | 7257 | 1.0417528 | 0.6524740 | 0.6385287 | 0.0544740 | 0.0405287 |
Miami Heat | 52 | 30 | 0.634 | 0 | 99.9 | 96.0 | 3.59 | 2006 | 8192 | 7872 | 1.0406504 | 0.6487677 | 0.6351226 | 0.0147677 | 0.0011226 |
Milwaukee Bucks | 40 | 42 | 0.488 | 24 | 97.8 | 98.8 | -1.07 | 2006 | 8020 | 8102 | 0.9898790 | 0.4609157 | 0.4646840 | 0.0270843 | 0.0233160 |
Minnesota Timberwolves | 33 | 49 | 0.402 | 11 | 91.7 | 93.6 | -1.75 | 2006 | 7519 | 7675 | 0.9796743 | 0.4215921 | 0.4290707 | 0.0195921 | 0.0270707 |
New Jersey Nets | 49 | 33 | 0.598 | 0 | 93.8 | 92.4 | 1.11 | 2006 | 7692 | 7577 | 1.0151775 | 0.5577357 | 0.5521925 | 0.0402643 | 0.0458075 |
New Orleans/Oklahoma City Hornets | 38 | 44 | 0.463 | 25 | 92.8 | 95.6 | -2.51 | 2006 | 7610 | 7839 | 0.9707871 | 0.3877973 | 0.3983356 | 0.0752027 | 0.0646644 |
New York Knicks | 23 | 59 | 0.280 | 26 | 95.6 | 102.0 | -6.30 | 2006 | 7839 | 8364 | 0.9372310 | 0.2692733 | 0.2886966 | 0.0107267 | 0.0086966 |
Orlando Magic | 36 | 46 | 0.439 | 16 | 94.9 | 96.0 | -1.26 | 2006 | 7782 | 7872 | 0.9885671 | 0.4558450 | 0.4600980 | 0.0168450 | 0.0210980 |
Philadelphia ers | 38 | 44 | 0.463 | 11 | 99.4 | 101.3 | -2.10 | 2006 | 8151 | 8307 | 0.9812207 | 0.4275261 | 0.4344533 | 0.0354739 | 0.0285467 |
Phoenix Suns | 54 | 28 | 0.659 | 0 | 108.4 | 102.8 | 5.48 | 2006 | 8889 | 8430 | 1.0544484 | 0.6934873 | 0.6764440 | 0.0344873 | 0.0174440 |
Portland Trail Blazers | 21 | 61 | 0.256 | 23 | 88.8 | 98.3 | -8.91 | 2006 | 7282 | 8061 | 0.9033619 | 0.1729112 | 0.1956508 | 0.0830888 | 0.0603492 |
Sacramento Kings | 44 | 38 | 0.537 | 10 | 98.9 | 97.3 | 1.61 | 2006 | 8110 | 7979 | 1.0164181 | 0.5623698 | 0.5563894 | 0.0253698 | 0.0193894 |
San Antonio Spurs | 63 | 19 | 0.768 | 0 | 95.6 | 88.8 | 6.69 | 2006 | 7839 | 7282 | 1.0764900 | 0.7567730 | 0.7359933 | 0.0112270 | 0.0320067 |
Seattle SuperSonics | 35 | 47 | 0.427 | 9 | 102.6 | 105.6 | -2.88 | 2006 | 8413 | 8659 | 0.9715903 | 0.3908251 | 0.4010957 | 0.0361749 | 0.0259043 |
Toronto Raptors | 27 | 55 | 0.329 | 22 | 101.1 | 104.0 | -3.03 | 2006 | 8290 | 8528 | 0.9720919 | 0.3927194 | 0.4028218 | 0.0637194 | 0.0738218 |
Utah Jazz | 41 | 41 | 0.500 | 3 | 92.4 | 95.0 | -2.49 | 2006 | 7577 | 7790 | 0.9726573 | 0.3948568 | 0.4047687 | 0.1051432 | 0.0952313 |
Washington Wizards | 42 | 40 | 0.512 | 10 | 101.7 | 99.8 | 1.57 | 2006 | 8339 | 8184 | 1.0189394 | 0.5717364 | 0.5648780 | 0.0597364 | 0.0528780 |
For the 2004–7 NBA seasons, exp = 15.4 best fit actual winning percentages. MAD for these seasons was 3.35%
for exp = 15.4 and 3.4%
for exp = 13.91. Since Morey’s values of exp are very close in accuracy to the values we found from recent seasons we will stick with Morey’s values of exp.
These predicted winning percentages are based on regular season data. Therefore, we could look at teams that performed much better than expected during the regular season and predict that “luck would catch up with them.” This train of thought would lead us to believe that these teams would perform worse during the playoffs. Note that the Miami Heat and Dallas Mavericks both won about 8% more games than expected during the regular season. Therefore, we would have predicted Miami and Dallas to perform worse during the playoffs than their actual win-loss record indicated. Sure enough, both Dallas and Miami suffered unexpected first-round defeats. Conversely, during the regular season the San Antonio Spurs and Chicago Bulls won around 8% fewer games than the Pythagorean Theorem predicts, indicating that these teams would perform better than expected in the playoffs. Sure enough, the Bulls upset the Heat and gave the Detroit Pistons a tough time. Of course, the Spurs won the 2007 NBA title. In addition, the Pythagorean Theorem had the Spurs as by far the league’s best team (78% predicted winning percentage). Note the team that underachieved the most was the Boston Celtics, who won nearly 9% fewer (or 7) games than predicted. Many people suggested the Celtics “tanked” games during the regular season to improve their chances of obtaining potential future superstars such as Greg Oden and Kevin Durant in the 2007 draft lottery. The fact that the Celtics won seven fewer games than expected does not prove this conjecture, but it is certainly consistent with the view that Celtics did not go all out to win every close game.
The actual errors were not simply averaged because averaging positive and negative errors would result in positive and negative errors canceling out. For example, if one team wins 5% more games than (1.2) predicts and another team wins 5% fewer games than (1.2) predicts, the average of the errors is 0 but the average of the absolute errors is 5%. Of course, in this simple situation estimating the average error as 5% is correct while estimating the average error as 0% is nonsensical.↩
In six playoff series the opposing teams had identical win-loss records so the “Games Won” approach could not make a prediction.↩
In chapters 2-4 we will explain in detail how to determine how many runs a hitter creates.↩