Chapter 1: Baseball’s Pythagorean Theorem

The more runs a baseball team scores, the more games the team should win. Conversely, the fewer runs a team gives up, the more games the team should win. Bill James, probably the most celebrated advocate of applying mathematics to analysis of Major League Baseball (often called sabermetrics), studied many years of Major League Baseball (MLB) standings and found that the percentage of games won by a baseball team can be well approximated by the formula

\[ \begin{equation} \label{eq:1} \frac{\textrm{runs scored}^2}{\textrm{runs scored}^2+\textrm{runs allowed}^2} = \textrm{estimate of percentage of games won.} \end{equation} \]

This formula has several desirable properties.

Consider a right triangle with a hypotenuse (the longest side) of length c and two other sides of lengths a and b. Recall from high school geometry that the Pythagorean Theorem states that a triangle is a right triangle if and only if \(a^2+b^2=c^2\). For example, a triangle with sides of lengths 3, 4, and 5 is a right triangle because \(3^2+4^2=5^2\). The fact that equation (1.1) adds up the squares of two numbers led Bill James to call the relationship described in (1.1) Baseball’s Pythagorean Theorem.

Let’s define \(R=\frac{\textrm{runs scored}}{\textrm{runs allowed}}\) as a team’s scoring ratio. If we divide the numerator and denominator of (1.1) by \((\textrm{runs allowed})^2\), then the value of the fraction remains unchanged and we may rewrite (1.1) as equation (1.2).

\[ \begin{equation} \label{eq:2} \frac{R^2}{R^{2}+1} = \textrm{estimate of percentage of games won.} \end{equation} \]

Below shows how well (1.2) predicts MLB teams’ winning percentages for the 1980–2006 seasons.

#to install any package 
#install.packages("package.name")
library(Lahman) #for baseball stats

#load the team data
data("Teams")
#if you are using RStudio, use the View() function to see all the data
#look at seasons 1980-2006, W,L,R,RA
team_df = Teams[Teams$yearID >= 1980 & Teams$yearID <= 2006,
                c("yearID","teamID","W","L","R","RA")]
#scoring ratio (sr) = R/RA
team_df$Scoring.Ratio = team_df$R/team_df$RA
#predicted win % = sr^2/((sr^2)+1)
team_df$Predicted.Win.Pct = team_df$Scoring.Ratio^2/
  ((team_df$Scoring.Ratio^2)+1)
#win % W/G
team_df$Actual.Win.Pct = team_df$W/(team_df$W+team_df$L)
#absolute error = |actual-predicted|
team_df$Absolute.Error = abs(
  team_df$Actual.Win.Pct-team_df$Predicted.Win.Pct)
Figure 1.1. Baseball’s Pythagorean Theorem, 1980-2006.
yearID teamID W L R RA Scoring.Ratio Predicted.Win.Pct Actual.Win.Pct Absolute.Error
2006 ARI 76 86 773 788 0.9809645 0.4903917 0.4691358 0.0212559
2006 ATL 79 83 849 805 1.0546584 0.5265834 0.4876543 0.0389290
2006 BAL 70 92 768 899 0.8542825 0.4218980 0.4320988 0.0102007
2006 BOS 86 76 820 825 0.9939394 0.4969605 0.5308642 0.0339037
2006 CHA 90 72 868 794 1.0931990 0.5444366 0.5555556 0.0111190
2006 CHN 66 96 716 834 0.8585132 0.4243096 0.4074074 0.0169022
2006 CIN 80 82 749 801 0.9350811 0.4664893 0.4938272 0.0273378
2006 CLE 78 84 870 782 1.1125320 0.5531180 0.4814815 0.0716366
2006 COL 76 86 813 812 1.0012315 0.5006154 0.4691358 0.0314796
2006 DET 95 67 822 675 1.2177778 0.5972586 0.5864198 0.0108388
2006 FLO 78 84 758 772 0.9818653 0.4908504 0.4814815 0.0093690
2006 HOU 82 80 735 719 1.0222531 0.5110028 0.5061728 0.0048300
2006 KCA 62 100 757 971 0.7796087 0.3780281 0.3827160 0.0046880
2006 LAA 89 73 766 732 1.0464481 0.5226852 0.5493827 0.0266975
2006 LAN 88 74 820 751 1.0918775 0.5438365 0.5432099 0.0006266
2006 MIL 75 87 730 833 0.8763505 0.4343860 0.4629630 0.0285769
2006 MIN 96 66 801 683 1.1727672 0.5790152 0.5925926 0.0135774
2006 NYA 97 65 930 767 1.2125163 0.5951738 0.5987654 0.0035916
#people love R because the above code can be written as follows
library(dplyr)
team_df2 = Teams %>% 
  filter(yearID >= 1980, Teams$yearID <= 2006) %>%
  select(yearID, teamID, W, L, R, RA) %>%
  mutate(Scoring.Ratio = R/RA,
         Predicted.Win.Pct = Scoring.Ratio^2/((Scoring.Ratio^2)+1),
         Actual.Win.Pct = W/(W+L),
         Absolute.Error = abs(Actual.Win.Pct-Predicted.Win.Pct))

Figure 1.1 shows how well (1.2) predicts MLB teams’ winning percentages for the 1980–2006 seasons.

For example, the 2006 Detroit Tigers (DET) scored 822 runs and gave up 675 runs. Their scoring ratio was \(R=\frac{822}{675}=1.218\). Their predicted win percentage from Baseball’s Pythagorean Theorem was \(\frac{1.218^2}{(1.2.18)^{2}+1}=.597\). The 2006 Tigers actually won a fraction of their games, or \(\frac{95}{162}=.586\). Thus (1.2) was off by 1.1% in predicting the percentage of games won by the Tigers in 2006. For each team define error in winning percentage prediction as actual winning percentage minus predicted winning percentage. For example, for the 2006 Arizona Diamondbacks (ARI), error = .469 - .490 = -.021 and for the 2006 Boston Red Sox (BOS), error = .531 - .497 = 0.34. A positive error means that the team won more games than predicted while a negative error means the team won fewer games than predicted. The Absolute.Error column in figure 1.1 computes the absolute value of the prediction error for each team. Recall that the absolute value of a number is simply the distance of the number from 0. That is, |5| = |-5| = 5. The absolute prediction errors for each team were averaged to obtain a measure of how well the predicted win percentages fit the actual team winning percentages. The average of absolute forecasting errors is called the MAD (Mean Absolute Deviation)1. For this data set, the predicted winning percentages of the Pythagorean Theorem were off by an average of 2% per team.

mean(team_df$Absolute.Error)
## [1] 0.01965617

Instead of blindly assuming winning percentage can be approximated by using the square of the scoring ratio, perhaps we should try a formula to predict winning percentage, such as

\[ \begin{equation} \label{eq:3} \frac{R^\textrm{exp}}{R^\textrm{exp}+1}. \end{equation} \]

If we vary exp (exponent) in (1.3) we can make (1.3) better fit the actual dependence of winning percentage on scoring ratio for different sports. For baseball, we will allow exp in (1.3) to vary between 1 and 3. Of course, exp = 2 reduces to the Pythagorean Theorem.

Figure 1.2 shows how MAD changes as we vary exp between 1 and 3. We see that indeed exp = 1.9 yields the smallest MAD (1.96%). An exp value of 2 is almost as good (MAD of 1.97%), so for simplicity we will stick with Bill James’s view that exp = 2. Therefore, exp = 2 (or 1.9) yields the best forecasts if we use an equation of form (1.3). Of course, there might be another equation that predicts winning percentage better than the Pythagorean Theorem from runs scored and allowed. The Pythagorean Theorem is simple and intuitive, however, and works very well. After all, we are off in predicting team wins by an average of 162 \(\times\) .02, which is approximately three wins per team. Therefore, I see no reason to look for a more complicated (albeit slightly more accurate) model.

#numbers from 1-3 going up by 0.1
exponent = seq(1, 3, 0.1)
#take each exponent and plug it into this formula
MAD = sapply(exponent, function(x){
  mean(abs(
  team_df$Scoring.Ratio^x/
  ((team_df$Scoring.Ratio^x)+1)
  -team_df$Actual.Win.Pct))})
Figure 1.2. Dependence of Pythagorean Theorem accuracy on exponent.
exponent MAD
1.0 0.0317843
1.1 0.0296585
1.2 0.0276954
1.3 0.0258894
1.4 0.0242529
1.5 0.0228382
1.6 0.0216138
1.7 0.0206476
1.8 0.0199516
1.9 0.0196285
2.0 0.0196562
2.1 0.0200005
2.2 0.0206936
2.3 0.0216168
2.4 0.0228446
2.5 0.0243075
2.6 0.0260084
2.7 0.0278395
2.8 0.0297717
2.9 0.0318052
3.0 0.0338884

How Well Does the Pythagorean Theorem Forecast?

To test the utility of the Pythagorean Theorem (or any prediction model), we should check how well it forecasts the future. I compared the Pythagorean Theorem’s forecast for each MLB playoff series (1980 – 2007) against a prediction based just on games won. For each playoff series the Pythagorean method would predict the winner to be the team with the higher scoring ratio, while the “games won” approach simply predicts the winner of a playoff series to be the team that won more games.

Click here to see the code used to scrape all MLB playoff series data from baseball-reference.com

library(scales) #to format percentages
#read the csv from github
all_series = read.csv(
"https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/mlb_playoffs.csv")
#just playoffs from the years 1980 to 2007 
series_80_07 = all_series[all_series$year >= 1980 & 
                            all_series$year <= 2007 &
                            all_series$series != "World Series",]
#add a column for scoring ratio
series_80_07$Ratio = series_80_07$R/series_80_07$pR
#data frame for the winners and losers
winners = series_80_07[seq(1,nrow(series_80_07),2), c(1:3,5:6,12,45,33,65)]
losers = series_80_07[seq(2,nrow(series_80_07),2), c(6,12,45,33,65)]
#rename the losers columns
colnames(losers) = paste0("L", colnames(losers))
#combine the winners and losers
series_df = cbind(winners, losers)
#was the winner win % greater than the loser?
series_df$W.W.Greater = ifelse(series_df$pW > series_df$LpW, TRUE, FALSE)
series_df$W.Ratio.Greater = ifelse((series_df$Ratio) > (series_df$LRatio),
  TRUE, FALSE)

We found that the Pythagorean approach correctly predicted 61 of 106 playoff series (59.1%) while the “games won” approach correctly predicted the winner of only 52.9% (52 out of 100) of playoff series.2

The reader is probably disappointed that even the Pythagorean method only correctly forecasts the outcome of less than 58% of baseball playoff series. I believe that the regular season is a relatively poor predictor of the playoffs in baseball because a team’s regular season record depends greatly on the performance of five starting pitchers. During the playoffs teams only use three or four starting pitchers, so much of the regular season data (games involving the fourth and fifth starting pitchers) are not relevant for predicting the outcome of the playoffs.

For anecdotal evidence of how the Pythagorean Theorem forecasts the future performance of a team better than a team’s win-loss record, consider the case of the 2005 Washington Nationals. On July 4, 2005, the Nationals were in first place with a record of 50–32. If we extrapolate this winning percentage we would have predicted a final record of 99–63. On July 4, 2005, the Nationals scoring ratio was .991. On July 4, 2005, (1.2) would have predicted a final record of 80–82. Sure enough, the poor Nationals finished 81–81.

The Importance of the Pythagorean Theorem

Baseball’s Pythagorean Theorem is also important because it allows us to determine how many extra wins (or losses) will result from a trade. Suppose a team has scored 850 runs during a season and has given up 800 runs. Suppose we trade a shortstop (Joe) who “created”3 150 runs for a shortstop (Greg) who created 170 runs in the same number of plate appearances. This trade will cause the team (all other things being equal) to score 20 more runs (170 - 150 = 20). Before the trade, \(R=\frac{850}{800}=1.0625\), and we would predict the team to have won \(\frac{162(1.0625)^{2}}{1+(1.0625)^{2}}=85.9\) games. After the trade, \(R=\frac{870}{800}=1.0875\), and we would predict the team to win \(\frac{162(1.0875)^{2}}{1+(1.0875)^{2}}=87.8\) games. Therefore, we estimate the trade makes our team 1.9 games better (87.8 - 85.9 = 1.9). In chapter 9, we will see how the Pythagorean Theorem can be used to help determine fair salaries for MLB players.

Football and Basketball “Pythagorean Theorems”

Does the Pythagorean Theorem hold for football and basketball? Daryl Morey, the general manager for the Houston Rockets, has shown that for the NFL, equation (1.3) with exp = 2.37 gives the most accurate predictions for winning percentage while for the NBA, equation (1.3) with exp = 13.91 gives the most accurate predictions for winning percentage. Figure 1.3 gives the predicted and actual winning percentages for the NFL for the 2006-7 season, while figure 1.4 gives the predicted and actual winning percentages for the NBA for the 2006–7 season.

Click here to see the code used to scrape all NFL standings data since 1922 from football-reference.com

#read the csv off github
nfl_standings = read.csv(
  "https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/nfl_standings.csv")
#look at just 2 seasons
nfl_05_07 = nfl_standings[nfl_standings$Year >= 2005 &
                            nfl_standings$Year <= 2007,]
#pyt win % using exp=2.7
nfl_05_07$Win.Pct.2.7 = (nfl_05_07$Ratio^2.7)/((nfl_05_07$Ratio^2.7)+1)
#pyt win % using morely exp=2.37
nfl_05_07$Win.Pct.morely = (nfl_05_07$Ratio^2.37)/((nfl_05_07$Ratio^2.37)+1)
#absolute error exp=2.7
nfl_05_07$Error.2.7 = abs(nfl_05_07$W.L.-nfl_05_07$Win.Pct.2.7)
#absolute error morely exp=2.37
nfl_05_07$Error.morely = abs(nfl_05_07$W.L.-nfl_05_07$Win.Pct.morely)
Figure 1.3. Predicted NFL winning percentages.
Year Tm W L T W.L. PF PA Ratio Win.Pct.2.7 Win.Pct.morely Error.2.7 Error.morely
2007 Arizona Cardinals 8 8 0 0.500 404 399 1.0125313 0.5084053 0.5073781 0.0084053 0.0073781
2007 Atlanta Falcons 4 12 0 0.250 259 414 0.6256039 0.2198737 0.2475690 0.0301263 0.0024310
2007 Baltimore Ravens 5 11 0 0.313 275 384 0.7161458 0.2887520 0.3118949 0.0242480 0.0011051
2007 Buffalo Bills 7 9 0 0.438 252 354 0.7118644 0.2854384 0.3088531 0.1525616 0.1291469
2007 Carolina Panthers 7 9 0 0.438 267 347 0.7694524 0.3301281 0.3495267 0.1078719 0.0884733
2007 Chicago Bears 7 9 0 0.438 334 348 0.9597701 0.4723119 0.4756903 0.0343119 0.0376903
2007 Cincinnati Bengals 7 9 0 0.438 380 385 0.9870130 0.4911773 0.4922554 0.0531773 0.0542554
2007 Cleveland Browns 10 6 0 0.625 402 382 1.0523560 0.5343919 0.5301993 0.0906081 0.0948007
2007 Dallas Cowboys 13 3 0 0.813 455 325 1.4000000 0.7126880 0.6894264 0.1003120 0.1235736
2007 Denver Broncos 7 9 0 0.438 320 409 0.7823961 0.3401638 0.3585682 0.0978362 0.0794318
2007 Detroit Lions 7 9 0 0.438 346 444 0.7792793 0.3377490 0.3563953 0.1002510 0.0816047
2007 Green Bay Packers 13 3 0 0.813 435 291 1.4948454 0.7475261 0.7216767 0.0654739 0.0913233
2007 Houston Texans 8 8 0 0.500 379 384 0.9869792 0.4911541 0.4922351 0.0088459 0.0077649
2007 Indianapolis Colts 13 3 0 0.813 450 262 1.7175573 0.8115997 0.7827799 0.0014003 0.0302201
2007 Jacksonville Jaguars 11 5 0 0.688 411 304 1.3519737 0.6930095 0.6714411 0.0050095 0.0165589
2007 Kansas City Chiefs 4 12 0 0.250 226 335 0.6746269 0.2567923 0.2823527 0.0067923 0.0323527
2007 Miami Dolphins 1 15 0 0.063 267 437 0.6109840 0.2091183 0.2372778 0.1461183 0.1742778
2007 Minnesota Vikings 8 8 0 0.500 365 311 1.1736334 0.6064185 0.5937398 0.1064185 0.0937398
2007 New England Patriots 16 0 0 1.000 589 274 2.1496350 0.8875848 0.8598153 0.1124152 0.1401847
2007 New Orleans Saints 7 9 0 0.438 379 388 0.9768041 0.4841636 0.4860981 0.0461636 0.0480981
2007 New York Giants 10 6 0 0.625 373 351 1.0626781 0.5409429 0.5359572 0.0840571 0.0890428
2007 New York Jets 4 12 0 0.250 268 355 0.7549296 0.3188519 0.3393303 0.0688519 0.0893303
2007 Oakland Raiders 4 12 0 0.250 283 398 0.7110553 0.2848125 0.3082780 0.0348125 0.0582780
2007 Philadelphia Eagles 8 8 0 0.500 336 300 1.1200000 0.5759055 0.5667465 0.0759055 0.0667465
2007 Pittsburgh Steelers 10 6 0 0.625 393 269 1.4609665 0.7356665 0.7106335 0.1106665 0.0856335
2007 San Diego Chargers 11 5 0 0.688 412 284 1.4507042 0.7319488 0.7071861 0.0439488 0.0191861
2007 San Francisco 49ers 5 11 0 0.313 219 364 0.6016484 0.2023257 0.2307369 0.1106743 0.0822631
2007 Seattle Seahawks 10 6 0 0.625 393 291 1.3505155 0.6923893 0.6708766 0.0673893 0.0458766
2007 St. Louis Rams 3 13 0 0.188 263 438 0.6004566 0.2014631 0.2299039 0.0134631 0.0419039
2007 Tampa Bay Buccaneers 9 7 0 0.563 334 270 1.2370370 0.6397643 0.6234327 0.0767643 0.0604327
2007 Tennessee Titans 10 6 0 0.625 301 297 1.0134680 0.5090293 0.5079259 0.1159707 0.1170741
2007 Washington Redskins 9 7 0 0.563 334 310 1.0774194 0.5501645 0.5440673 0.0128355 0.0189327

For the 2005–7 NFL seasons, MAD was minimized by exp = 2.7. Exp = 2.7 yielded a MAD of 5.9%, while Morey’s exp = 2.37 yielded a MAD of 6.2%.

Click here to see the code used to scrape all NBA standings data since 1950 from basketball-reference.com

#read the csv off github
nba_standings = read.csv(
  "https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/nba_standings.csv")
#look at just 2 seasons
nba_04_07 = nba_standings[nba_standings$Year >= 2005 &
                            nba_standings$Year <= 2007,]
#pyt win % using exp=2.7
nba_04_07$Win.Pct.15.4 = (nba_04_07$Ratio^15.4)/((nba_04_07$Ratio^15.4)+1)
#pyt win % using morely exp=2.37
nba_04_07$Win.Pct.morely = (nba_04_07$Ratio^13.91)/((nba_04_07$Ratio^13.91)+1)
#absolute error exp=2.7
nba_04_07$Error.15.4 = abs(nba_04_07$W.L.-nba_04_07$Win.Pct.15.4)
#absolute error morely exp=2.37
nba_04_07$Error.morely = abs(nba_04_07$W.L.-nba_04_07$Win.Pct.morely)
Figure 1.4. Predicted NBA winning percentages.
Team W L W.L. GB PS.G PA.G SRS Year Total.PF Total.PA Ratio Win.Pct.15.4 Win.Pct.morely Error.15.4 Error.morely
Atlanta Hawks 26 56 0.317 26 97.2 102.0 -4.69 2006 7970 8364 0.9528934 0.3223299 0.3382306 0.0053299 0.0212306
Boston Celtics 33 49 0.402 16 98.0 99.5 -1.59 2006 8036 8159 0.9849246 0.4417831 0.4473719 0.0397831 0.0453719
Charlotte Bobcats 26 56 0.317 26 96.9 100.9 -3.90 2006 7946 8274 0.9603577 0.3491174 0.3629342 0.0321174 0.0459342
Chicago Bulls 41 41 0.500 23 97.8 97.2 0.51 2006 8020 7970 1.0062735 0.5240590 0.5217343 0.0240590 0.0217343
Cleveland Cavaliers 50 32 0.610 14 97.6 95.4 2.17 2006 8003 7823 1.0230091 0.5866963 0.5784539 0.0233037 0.0315461
Dallas Mavericks 60 22 0.732 3 99.1 93.1 5.96 2006 8126 7634 1.0644485 0.7234891 0.7044907 0.0085109 0.0275093
Denver Nuggets 44 38 0.537 0 100.3 100.1 0.36 2006 8225 8208 1.0020712 0.5079650 0.5071945 0.0290350 0.0298055
Detroit Pistons 64 18 0.780 0 96.8 90.2 6.24 2006 7938 7396 1.0732829 0.7482159 0.7278504 0.0317841 0.0521496
Golden State Warriors 34 48 0.415 20 98.5 99.8 -1.11 2006 8077 8184 0.9869257 0.4495048 0.4543617 0.0345048 0.0393617
Houston Rockets 34 48 0.415 29 90.1 91.7 -1.30 2006 7388 7519 0.9825775 0.4327422 0.4391818 0.0177422 0.0241818
Indiana Pacers 41 41 0.500 23 93.9 92.0 1.62 2006 7700 7544 1.0206787 0.5781550 0.5706998 0.0781550 0.0706998
Los Angeles Clippers 47 35 0.573 7 97.2 95.6 1.75 2006 7970 7839 1.0167113 0.5634628 0.5573795 0.0095372 0.0156205
Los Angeles Lakers 45 37 0.549 9 99.4 96.9 2.53 2006 8151 7946 1.0257991 0.5968286 0.5876636 0.0478286 0.0386636
Memphis Grizzlies 49 33 0.598 14 92.2 88.5 3.74 2006 7560 7257 1.0417528 0.6524740 0.6385287 0.0544740 0.0405287
Miami Heat 52 30 0.634 0 99.9 96.0 3.59 2006 8192 7872 1.0406504 0.6487677 0.6351226 0.0147677 0.0011226
Milwaukee Bucks 40 42 0.488 24 97.8 98.8 -1.07 2006 8020 8102 0.9898790 0.4609157 0.4646840 0.0270843 0.0233160
Minnesota Timberwolves 33 49 0.402 11 91.7 93.6 -1.75 2006 7519 7675 0.9796743 0.4215921 0.4290707 0.0195921 0.0270707
New Jersey Nets 49 33 0.598 0 93.8 92.4 1.11 2006 7692 7577 1.0151775 0.5577357 0.5521925 0.0402643 0.0458075
New Orleans/Oklahoma City Hornets 38 44 0.463 25 92.8 95.6 -2.51 2006 7610 7839 0.9707871 0.3877973 0.3983356 0.0752027 0.0646644
New York Knicks 23 59 0.280 26 95.6 102.0 -6.30 2006 7839 8364 0.9372310 0.2692733 0.2886966 0.0107267 0.0086966
Orlando Magic 36 46 0.439 16 94.9 96.0 -1.26 2006 7782 7872 0.9885671 0.4558450 0.4600980 0.0168450 0.0210980
Philadelphia ers 38 44 0.463 11 99.4 101.3 -2.10 2006 8151 8307 0.9812207 0.4275261 0.4344533 0.0354739 0.0285467
Phoenix Suns 54 28 0.659 0 108.4 102.8 5.48 2006 8889 8430 1.0544484 0.6934873 0.6764440 0.0344873 0.0174440
Portland Trail Blazers 21 61 0.256 23 88.8 98.3 -8.91 2006 7282 8061 0.9033619 0.1729112 0.1956508 0.0830888 0.0603492
Sacramento Kings 44 38 0.537 10 98.9 97.3 1.61 2006 8110 7979 1.0164181 0.5623698 0.5563894 0.0253698 0.0193894
San Antonio Spurs 63 19 0.768 0 95.6 88.8 6.69 2006 7839 7282 1.0764900 0.7567730 0.7359933 0.0112270 0.0320067
Seattle SuperSonics 35 47 0.427 9 102.6 105.6 -2.88 2006 8413 8659 0.9715903 0.3908251 0.4010957 0.0361749 0.0259043
Toronto Raptors 27 55 0.329 22 101.1 104.0 -3.03 2006 8290 8528 0.9720919 0.3927194 0.4028218 0.0637194 0.0738218
Utah Jazz 41 41 0.500 3 92.4 95.0 -2.49 2006 7577 7790 0.9726573 0.3948568 0.4047687 0.1051432 0.0952313
Washington Wizards 42 40 0.512 10 101.7 99.8 1.57 2006 8339 8184 1.0189394 0.5717364 0.5648780 0.0597364 0.0528780

For the 2004–7 NBA seasons, exp = 15.4 best fit actual winning percentages. MAD for these seasons was 3.35% for exp = 15.4 and 3.4% for exp = 13.91. Since Morey’s values of exp are very close in accuracy to the values we found from recent seasons we will stick with Morey’s values of exp.

These predicted winning percentages are based on regular season data. Therefore, we could look at teams that performed much better than expected during the regular season and predict that “luck would catch up with them.” This train of thought would lead us to believe that these teams would perform worse during the playoffs. Note that the Miami Heat and Dallas Mavericks both won about 8% more games than expected during the regular season. Therefore, we would have predicted Miami and Dallas to perform worse during the playoffs than their actual win-loss record indicated. Sure enough, both Dallas and Miami suffered unexpected first-round defeats. Conversely, during the regular season the San Antonio Spurs and Chicago Bulls won around 8% fewer games than the Pythagorean Theorem predicts, indicating that these teams would perform better than expected in the playoffs. Sure enough, the Bulls upset the Heat and gave the Detroit Pistons a tough time. Of course, the Spurs won the 2007 NBA title. In addition, the Pythagorean Theorem had the Spurs as by far the league’s best team (78% predicted winning percentage). Note the team that underachieved the most was the Boston Celtics, who won nearly 9% fewer (or 7) games than predicted. Many people suggested the Celtics “tanked” games during the regular season to improve their chances of obtaining potential future superstars such as Greg Oden and Kevin Durant in the 2007 draft lottery. The fact that the Celtics won seven fewer games than expected does not prove this conjecture, but it is certainly consistent with the view that Celtics did not go all out to win every close game.

Keep Reading

Contact Me!

Home


  1. The actual errors were not simply averaged because averaging positive and negative errors would result in positive and negative errors canceling out. For example, if one team wins 5% more games than (1.2) predicts and another team wins 5% fewer games than (1.2) predicts, the average of the errors is 0 but the average of the absolute errors is 5%. Of course, in this simple situation estimating the average error as 5% is correct while estimating the average error as 0% is nonsensical.

  2. In six playoff series the opposing teams had identical win-loss records so the “Games Won” approach could not make a prediction.

  3. In chapters 2-4 we will explain in detail how to determine how many runs a hitter creates.