Chapter 1: Baseball’s Pythagorean Theorem

The more runs a baseball team scores, the more games the team should win. Conversely, the fewer runs a team gives up, the more games the team should win. Bill James, probably the most celebrated advocate of applying mathematics to analysis of Major League Baseball (often called sabermetrics), studied many years of Major League Baseball (MLB) standings and found that the percentage of games won by a baseball team can be well approximated by the formula

\[ \begin{equation} \label{eq:1} \frac{\textrm{runs scored}^2}{\textrm{runs scored}^2+\textrm{runs allowed}^2} = \textrm{estimate of percentage of games won.} \end{equation} \]

This formula has several desirable properties.

The predicted win percentage is always between 0 and 1.
An increase in runs scored increases predicted win percentage.
A decrease in runs allowed increases predicted win percentage.

Consider a right triangle with a hypotenuse (the longest side) of length c and two other sides of lengths a and b. Recall from high school geometry that the Pythagorean Theorem states that a triangle is a right triangle if and only if \(a^2+b^2=c^2\). For example, a triangle with sides of lengths 3, 4, and 5 is a right triangle because \(3^2+4^2=5^2\). The fact that equation (1.1) adds up the squares of two numbers led Bill James to call the relationship described in (1.1) Baseball’s Pythagorean Theorem.

Let’s define \(R=\frac{\textrm{runs scored}}{\textrm{runs allowed}}\) as a team’s scoring ratio. If we divide the numerator and denominator of (1.1) by \((\textrm{runs allowed})^2\), then the value of the fraction remains unchanged and we may rewrite (1.1) as equation (1.2).

\[ \begin{equation} \label{eq:2} \frac{R^2}{R^{2}+1} = \textrm{estimate of percentage of games won.} \end{equation} \]

Below shows how well (1.2) predicts MLB teams’ winning percentages for the 1980–2006 seasons.

#to install any package 
#install.packages("package.name")
library(Lahman) #for baseball stats

#load the team data
data("Teams")
#if you are using RStudio, use the View() function to see all the data
#look at seasons 1980-2006, W,L,R,RA
team_df = Teams[Teams$yearID >= 1980 & Teams$yearID <= 2006,
                c("yearID","teamID","W","L","R","RA")]
#scoring ratio (sr) = R/RA
team_df$Scoring.Ratio = team_df$R/team_df$RA
#predicted win % = sr^2/((sr^2)+1)
team_df$Predicted.Win.Pct = team_df$Scoring.Ratio^2/
  ((team_df$Scoring.Ratio^2)+1)
#win % W/G
team_df$Actual.Win.Pct = team_df$W/(team_df$W+team_df$L)
#absolute error = |actual-predicted|
team_df$Absolute.Error = abs(
  team_df$Actual.Win.Pct-team_df$Predicted.Win.Pct)

Figure 1.1. Baseball’s Pythagorean Theorem, 1980-2006.
yearID	teamID	W	L	R	RA	Scoring.Ratio	Predicted.Win.Pct	Actual.Win.Pct	Absolute.Error
2006	ARI	76	86	773	788	0.9809645	0.4903917	0.4691358	0.0212559
2006	ATL	79	83	849	805	1.0546584	0.5265834	0.4876543	0.0389290
2006	BAL	70	92	768	899	0.8542825	0.4218980	0.4320988	0.0102007
2006	BOS	86	76	820	825	0.9939394	0.4969605	0.5308642	0.0339037
2006	CHA	90	72	868	794	1.0931990	0.5444366	0.5555556	0.0111190
2006	CHN	66	96	716	834	0.8585132	0.4243096	0.4074074	0.0169022
2006	CIN	80	82	749	801	0.9350811	0.4664893	0.4938272	0.0273378
2006	CLE	78	84	870	782	1.1125320	0.5531180	0.4814815	0.0716366
2006	COL	76	86	813	812	1.0012315	0.5006154	0.4691358	0.0314796
2006	DET	95	67	822	675	1.2177778	0.5972586	0.5864198	0.0108388
2006	FLO	78	84	758	772	0.9818653	0.4908504	0.4814815	0.0093690
2006	HOU	82	80	735	719	1.0222531	0.5110028	0.5061728	0.0048300
2006	KCA	62	100	757	971	0.7796087	0.3780281	0.3827160	0.0046880
2006	LAA	89	73	766	732	1.0464481	0.5226852	0.5493827	0.0266975
2006	LAN	88	74	820	751	1.0918775	0.5438365	0.5432099	0.0006266
2006	MIL	75	87	730	833	0.8763505	0.4343860	0.4629630	0.0285769
2006	MIN	96	66	801	683	1.1727672	0.5790152	0.5925926	0.0135774
2006	NYA	97	65	930	767	1.2125163	0.5951738	0.5987654	0.0035916

#people love R because the above code can be written as follows
library(dplyr)
team_df2 = Teams %>% 
  filter(yearID >= 1980, Teams$yearID <= 2006) %>%
  select(yearID, teamID, W, L, R, RA) %>%
  mutate(Scoring.Ratio = R/RA,
         Predicted.Win.Pct = Scoring.Ratio^2/((Scoring.Ratio^2)+1),
         Actual.Win.Pct = W/(W+L),
         Absolute.Error = abs(Actual.Win.Pct-Predicted.Win.Pct))

Figure 1.1 shows how well (1.2) predicts MLB teams’ winning percentages for the 1980–2006 seasons.

For example, the 2006 Detroit Tigers (DET) scored 822 runs and gave up 675 runs. Their scoring ratio was \(R=\frac{822}{675}=1.218\). Their predicted win percentage from Baseball’s Pythagorean Theorem was \(\frac{1.218^2}{(1.2.18)^{2}+1}=.597\). The 2006 Tigers actually won a fraction of their games, or \(\frac{95}{162}=.586\). Thus (1.2) was off by 1.1% in predicting the percentage of games won by the Tigers in 2006. For each team define error in winning percentage prediction as actual winning percentage minus predicted winning percentage. For example, for the 2006 Arizona Diamondbacks (ARI), error = .469 - .490 = -.021 and for the 2006 Boston Red Sox (BOS), error = .531 - .497 = 0.34. A positive error means that the team won more games than predicted while a negative error means the team won fewer games than predicted. The Absolute.Error column in figure 1.1 computes the absolute value of the prediction error for each team. Recall that the absolute value of a number is simply the distance of the number from 0. That is, |5| = |-5| = 5. The absolute prediction errors for each team were averaged to obtain a measure of how well the predicted win percentages fit the actual team winning percentages. The average of absolute forecasting errors is called the MAD (Mean Absolute Deviation)¹. For this data set, the predicted winning percentages of the Pythagorean Theorem were off by an average of 2% per team.

mean(team_df$Absolute.Error)

## [1] 0.01965617

Instead of blindly assuming winning percentage can be approximated by using the square of the scoring ratio, perhaps we should try a formula to predict winning percentage, such as

\[ \begin{equation} \label{eq:3} \frac{R^\textrm{exp}}{R^\textrm{exp}+1}. \end{equation} \]

If we vary exp (exponent) in (1.3) we can make (1.3) better fit the actual dependence of winning percentage on scoring ratio for different sports. For baseball, we will allow exp in (1.3) to vary between 1 and 3. Of course, exp = 2 reduces to the Pythagorean Theorem.

Figure 1.2 shows how MAD changes as we vary exp between 1 and 3. We see that indeed exp = 1.9 yields the smallest MAD (1.96%). An exp value of 2 is almost as good (MAD of 1.97%), so for simplicity we will stick with Bill James’s view that exp = 2. Therefore, exp = 2 (or 1.9) yields the best forecasts if we use an equation of form (1.3). Of course, there might be another equation that predicts winning percentage better than the Pythagorean Theorem from runs scored and allowed. The Pythagorean Theorem is simple and intuitive, however, and works very well. After all, we are off in predicting team wins by an average of 162 \(\times\) .02, which is approximately three wins per team. Therefore, I see no reason to look for a more complicated (albeit slightly more accurate) model.

#numbers from 1-3 going up by 0.1
exponent = seq(1, 3, 0.1)
#take each exponent and plug it into this formula
MAD = sapply(exponent, function(x){
  mean(abs(
  team_df$Scoring.Ratio^x/
  ((team_df$Scoring.Ratio^x)+1)
  -team_df$Actual.Win.Pct))})

Figure 1.2. Dependence of Pythagorean Theorem accuracy on exponent.
exponent	MAD
1.0	0.0317843
1.1	0.0296585
1.2	0.0276954
1.3	0.0258894
1.4	0.0242529
1.5	0.0228382
1.6	0.0216138
1.7	0.0206476
1.8	0.0199516
1.9	0.0196285
2.0	0.0196562
2.1	0.0200005
2.2	0.0206936
2.3	0.0216168
2.4	0.0228446
2.5	0.0243075
2.6	0.0260084
2.7	0.0278395
2.8	0.0297717
2.9	0.0318052
3.0	0.0338884

How Well Does the Pythagorean Theorem Forecast?

To test the utility of the Pythagorean Theorem (or any prediction model), we should check how well it forecasts the future. I compared the Pythagorean Theorem’s forecast for each MLB playoff series (1980 – 2007) against a prediction based just on games won. For each playoff series the Pythagorean method would predict the winner to be the team with the higher scoring ratio, while the “games won” approach simply predicts the winner of a playoff series to be the team that won more games.

Click here to see the code used to scrape all MLB playoff series data from baseball-reference.com

library(scales) #to format percentages
#read the csv from github
all_series = read.csv(
"https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/mlb_playoffs.csv")
#just playoffs from the years 1980 to 2007 
series_80_07 = all_series[all_series$year >= 1980 & 
                            all_series$year <= 2007 &
                            all_series$series != "World Series",]
#add a column for scoring ratio
series_80_07$Ratio = series_80_07$R/series_80_07$pR
#data frame for the winners and losers
winners = series_80_07[seq(1,nrow(series_80_07),2), c(1:3,5:6,12,45,33,65)]
losers = series_80_07[seq(2,nrow(series_80_07),2), c(6,12,45,33,65)]
#rename the losers columns
colnames(losers) = paste0("L", colnames(losers))
#combine the winners and losers
series_df = cbind(winners, losers)
#was the winner win % greater than the loser?
series_df$W.W.Greater = ifelse(series_df$pW > series_df$LpW, TRUE, FALSE)
series_df$W.Ratio.Greater = ifelse((series_df$Ratio) > (series_df$LRatio),
  TRUE, FALSE)

We found that the Pythagorean approach correctly predicted 61 of 106 playoff series (59.1%) while the “games won” approach correctly predicted the winner of only 52.9% (52 out of 100) of playoff series.²

The reader is probably disappointed that even the Pythagorean method only correctly forecasts the outcome of less than 58% of baseball playoff series. I believe that the regular season is a relatively poor predictor of the playoffs in baseball because a team’s regular season record depends greatly on the performance of five starting pitchers. During the playoffs teams only use three or four starting pitchers, so much of the regular season data (games involving the fourth and fifth starting pitchers) are not relevant for predicting the outcome of the playoffs.

For anecdotal evidence of how the Pythagorean Theorem forecasts the future performance of a team better than a team’s win-loss record, consider the case of the 2005 Washington Nationals. On July 4, 2005, the Nationals were in first place with a record of 50–32. If we extrapolate this winning percentage we would have predicted a final record of 99–63. On July 4, 2005, the Nationals scoring ratio was .991. On July 4, 2005, (1.2) would have predicted a final record of 80–82. Sure enough, the poor Nationals finished 81–81.

The Importance of the Pythagorean Theorem

Baseball’s Pythagorean Theorem is also important because it allows us to determine how many extra wins (or losses) will result from a trade. Suppose a team has scored 850 runs during a season and has given up 800 runs. Suppose we trade a shortstop (Joe) who “created”³ 150 runs for a shortstop (Greg) who created 170 runs in the same number of plate appearances. This trade will cause the team (all other things being equal) to score 20 more runs (170 - 150 = 20). Before the trade, \(R=\frac{850}{800}=1.0625\), and we would predict the team to have won \(\frac{162(1.0625)^{2}}{1+(1.0625)^{2}}=85.9\) games. After the trade, \(R=\frac{870}{800}=1.0875\), and we would predict the team to win \(\frac{162(1.0875)^{2}}{1+(1.0875)^{2}}=87.8\) games. Therefore, we estimate the trade makes our team 1.9 games better (87.8 - 85.9 = 1.9). In chapter 9, we will see how the Pythagorean Theorem can be used to help determine fair salaries for MLB players.

Football and Basketball “Pythagorean Theorems”

Does the Pythagorean Theorem hold for football and basketball? Daryl Morey, the general manager for the Houston Rockets, has shown that for the NFL, equation (1.3) with exp = 2.37 gives the most accurate predictions for winning percentage while for the NBA, equation (1.3) with exp = 13.91 gives the most accurate predictions for winning percentage. Figure 1.3 gives the predicted and actual winning percentages for the NFL for the 2006-7 season, while figure 1.4 gives the predicted and actual winning percentages for the NBA for the 2006–7 season.

Click here to see the code used to scrape all NFL standings data since 1922 from football-reference.com

#read the csv off github
nfl_standings = read.csv(
  "https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/nfl_standings.csv")
#look at just 2 seasons
nfl_05_07 = nfl_standings[nfl_standings$Year >= 2005 &
                            nfl_standings$Year <= 2007,]
#pyt win % using exp=2.7
nfl_05_07$Win.Pct.2.7 = (nfl_05_07$Ratio^2.7)/((nfl_05_07$Ratio^2.7)+1)
#pyt win % using morely exp=2.37
nfl_05_07$Win.Pct.morely = (nfl_05_07$Ratio^2.37)/((nfl_05_07$Ratio^2.37)+1)
#absolute error exp=2.7
nfl_05_07$Error.2.7 = abs(nfl_05_07$W.L.-nfl_05_07$Win.Pct.2.7)
#absolute error morely exp=2.37
nfl_05_07$Error.morely = abs(nfl_05_07$W.L.-nfl_05_07$Win.Pct.morely)

Figure 1.3. Predicted NFL winning percentages.
Year	Tm	W	L	W.L.	PF	PA	Ratio	Win.Pct.2.7	Win.Pct.morely	Error.2.7	Error.morely
2007	Arizona Cardinals	8	8	0.500	404	399	1.0125313	0.5084053	0.5073781	0.0084053	0.0073781
2007	Atlanta Falcons	4	12	0.250	259	414	0.6256039	0.2198737	0.2475690	0.0301263	0.0024310
2007	Baltimore Ravens	5	11	0.313	275	384	0.7161458	0.2887520	0.3118949	0.0242480	0.0011051
2007	Buffalo Bills	7	9	0.438	252	354	0.7118644	0.2854384	0.3088531	0.1525616	0.1291469
2007	Carolina Panthers	7	9	0.438	267	347	0.7694524	0.3301281	0.3495267	0.1078719	0.0884733
2007	Chicago Bears	7	9	0.438	334	348	0.9597701	0.4723119	0.4756903	0.0343119	0.0376903
2007	Cincinnati Bengals	7	9	0.438	380	385	0.9870130	0.4911773	0.4922554	0.0531773	0.0542554
2007	Cleveland Browns	10	6	0.625	402	382	1.0523560	0.5343919	0.5301993	0.0906081	0.0948007
2007	Dallas Cowboys	13	3	0.813	455	325	1.4000000	0.7126880	0.6894264	0.1003120	0.1235736
2007	Denver Broncos	7	9	0.438	320	409	0.7823961	0.3401638	0.3585682	0.0978362	0.0794318
2007	Detroit Lions	7	9	0.438	346	444	0.7792793	0.3377490	0.3563953	0.1002510	0.0816047
2007	Green Bay Packers	13	3	0.813	435	291	1.4948454	0.7475261	0.7216767	0.0654739	0.0913233
2007	Houston Texans	8	8	0.500	379	384	0.9869792	0.4911541	0.4922351	0.0088459	0.0077649
2007	Indianapolis Colts	13	3	0.813	450	262	1.7175573	0.8115997	0.7827799	0.0014003	0.0302201
2007	Jacksonville Jaguars	11	5	0.688	411	304	1.3519737	0.6930095	0.6714411	0.0050095	0.0165589
2007	Kansas City Chiefs	4	12	0.250	226	335	0.6746269	0.2567923	0.2823527	0.0067923	0.0323527
2007	Miami Dolphins	1	15	0.063	267	437	0.6109840	0.2091183	0.2372778	0.1461183	0.1742778
2007	Minnesota Vikings	8	8	0.500	365	311	1.1736334	0.6064185	0.5937398	0.1064185	0.0937398
2007	New England Patriots	16	0	1.000	589	274	2.1496350	0.8875848	0.8598153	0.1124152	0.1401847
2007	New Orleans Saints	7	9	0.438	379	388	0.9768041	0.4841636	0.4860981	0.0461636	0.0480981
2007	New York Giants	10	6	0.625	373	351	1.0626781	0.5409429	0.5359572	0.0840571	0.0890428
2007	New York Jets	4	12	0.250	268	355	0.7549296	0.3188519	0.3393303	0.0688519	0.0893303
2007	Oakland Raiders	4	12	0.250	283	398	0.7110553	0.2848125	0.3082780	0.0348125	0.0582780
2007	Philadelphia Eagles	8	8	0.500	336	300	1.1200000	0.5759055	0.5667465	0.0759055	0.0667465
2007	Pittsburgh Steelers	10	6	0.625	393	269	1.4609665	0.7356665	0.7106335	0.1106665	0.0856335
2007	San Diego Chargers	11	5	0.688	412	284	1.4507042	0.7319488	0.7071861	0.0439488	0.0191861
2007	San Francisco 49ers	5	11	0.313	219	364	0.6016484	0.2023257	0.2307369	0.1106743	0.0822631
2007	Seattle Seahawks	10	6	0.625	393	291	1.3505155	0.6923893	0.6708766	0.0673893	0.0458766
2007	St. Louis Rams	3	13	0.188	263	438	0.6004566	0.2014631	0.2299039	0.0134631	0.0419039
2007	Tampa Bay Buccaneers	9	7	0.563	334	270	1.2370370	0.6397643	0.6234327	0.0767643	0.0604327
2007	Tennessee Titans	10	6	0.625	301	297	1.0134680	0.5090293	0.5079259	0.1159707	0.1170741
2007	Washington Redskins	9	7	0.563	334	310	1.0774194	0.5501645	0.5440673	0.0128355	0.0189327

For the 2005–7 NFL seasons, MAD was minimized by exp = 2.7. Exp = 2.7 yielded a MAD of 5.9%, while Morey’s exp = 2.37 yielded a MAD of 6.2%.

Click here to see the code used to scrape all NBA standings data since 1950 from basketball-reference.com

#read the csv off github
nba_standings = read.csv(
  "https://raw.githubusercontent.com/capstat/mathletics/master/Chapter_1/nba_standings.csv")
#look at just 2 seasons
nba_04_07 = nba_standings[nba_standings$Year >= 2005 &
                            nba_standings$Year <= 2007,]
#pyt win % using exp=2.7
nba_04_07$Win.Pct.15.4 = (nba_04_07$Ratio^15.4)/((nba_04_07$Ratio^15.4)+1)
#pyt win % using morely exp=2.37
nba_04_07$Win.Pct.morely = (nba_04_07$Ratio^13.91)/((nba_04_07$Ratio^13.91)+1)
#absolute error exp=2.7
nba_04_07$Error.15.4 = abs(nba_04_07$W.L.-nba_04_07$Win.Pct.15.4)
#absolute error morely exp=2.37
nba_04_07$Error.morely = abs(nba_04_07$W.L.-nba_04_07$Win.Pct.morely)

Figure 1.4. Predicted NBA winning percentages.
Team	W	L	W.L.	GB	PS.G	PA.G	SRS	Year	Total.PF	Total.PA	Ratio	Win.Pct.15.4	Win.Pct.morely	Error.15.4	Error.morely
Atlanta Hawks	26	56	0.317	26	97.2	102.0	-4.69	2006	7970	8364	0.9528934	0.3223299	0.3382306	0.0053299	0.0212306
Boston Celtics	33	49	0.402	16	98.0	99.5	-1.59	2006	8036	8159	0.9849246	0.4417831	0.4473719	0.0397831	0.0453719
Charlotte Bobcats	26	56	0.317	26	96.9	100.9	-3.90	2006	7946	8274	0.9603577	0.3491174	0.3629342	0.0321174	0.0459342
Chicago Bulls	41	41	0.500	23	97.8	97.2	0.51	2006	8020	7970	1.0062735	0.5240590	0.5217343	0.0240590	0.0217343
Cleveland Cavaliers	50	32	0.610	14	97.6	95.4	2.17	2006	8003	7823	1.0230091	0.5866963	0.5784539	0.0233037	0.0315461
Dallas Mavericks	60	22	0.732	3	99.1	93.1	5.96	2006	8126	7634	1.0644485	0.7234891	0.7044907	0.0085109	0.0275093
Denver Nuggets	44	38	0.537	0	100.3	100.1	0.36	2006	8225	8208	1.0020712	0.5079650	0.5071945	0.0290350	0.0298055
Detroit Pistons	64	18	0.780	0	96.8	90.2	6.24	2006	7938	7396	1.0732829	0.7482159	0.7278504	0.0317841	0.0521496
Golden State Warriors	34	48	0.415	20	98.5	99.8	-1.11	2006	8077	8184	0.9869257	0.4495048	0.4543617	0.0345048	0.0393617
Houston Rockets	34	48	0.415	29	90.1	91.7	-1.30	2006	7388	7519	0.9825775	0.4327422	0.4391818	0.0177422	0.0241818
Indiana Pacers	41	41	0.500	23	93.9	92.0	1.62	2006	7700	7544	1.0206787	0.5781550	0.5706998	0.0781550	0.0706998
Los Angeles Clippers	47	35	0.573	7	97.2	95.6	1.75	2006	7970	7839	1.0167113	0.5634628	0.5573795	0.0095372	0.0156205
Los Angeles Lakers	45	37	0.549	9	99.4	96.9	2.53	2006	8151	7946	1.0257991	0.5968286	0.5876636	0.0478286	0.0386636
Memphis Grizzlies	49	33	0.598	14	92.2	88.5	3.74	2006	7560	7257	1.0417528	0.6524740	0.6385287	0.0544740	0.0405287
Miami Heat	52	30	0.634	0	99.9	96.0	3.59	2006	8192	7872	1.0406504	0.6487677	0.6351226	0.0147677	0.0011226
Milwaukee Bucks	40	42	0.488	24	97.8	98.8	-1.07	2006	8020	8102	0.9898790	0.4609157	0.4646840	0.0270843	0.0233160
Minnesota Timberwolves	33	49	0.402	11	91.7	93.6	-1.75	2006	7519	7675	0.9796743	0.4215921	0.4290707	0.0195921	0.0270707
New Jersey Nets	49	33	0.598	0	93.8	92.4	1.11	2006	7692	7577	1.0151775	0.5577357	0.5521925	0.0402643	0.0458075
New Orleans/Oklahoma City Hornets	38	44	0.463	25	92.8	95.6	-2.51	2006	7610	7839	0.9707871	0.3877973	0.3983356	0.0752027	0.0646644
New York Knicks	23	59	0.280	26	95.6	102.0	-6.30	2006	7839	8364	0.9372310	0.2692733	0.2886966	0.0107267	0.0086966
Orlando Magic	36	46	0.439	16	94.9	96.0	-1.26	2006	7782	7872	0.9885671	0.4558450	0.4600980	0.0168450	0.0210980
Philadelphia ers	38	44	0.463	11	99.4	101.3	-2.10	2006	8151	8307	0.9812207	0.4275261	0.4344533	0.0354739	0.0285467
Phoenix Suns	54	28	0.659	0	108.4	102.8	5.48	2006	8889	8430	1.0544484	0.6934873	0.6764440	0.0344873	0.0174440
Portland Trail Blazers	21	61	0.256	23	88.8	98.3	-8.91	2006	7282	8061	0.9033619	0.1729112	0.1956508	0.0830888	0.0603492
Sacramento Kings	44	38	0.537	10	98.9	97.3	1.61	2006	8110	7979	1.0164181	0.5623698	0.5563894	0.0253698	0.0193894
San Antonio Spurs	63	19	0.768	0	95.6	88.8	6.69	2006	7839	7282	1.0764900	0.7567730	0.7359933	0.0112270	0.0320067
Seattle SuperSonics	35	47	0.427	9	102.6	105.6	-2.88	2006	8413	8659	0.9715903	0.3908251	0.4010957	0.0361749	0.0259043
Toronto Raptors	27	55	0.329	22	101.1	104.0	-3.03	2006	8290	8528	0.9720919	0.3927194	0.4028218	0.0637194	0.0738218
Utah Jazz	41	41	0.500	3	92.4	95.0	-2.49	2006	7577	7790	0.9726573	0.3948568	0.4047687	0.1051432	0.0952313
Washington Wizards	42	40	0.512	10	101.7	99.8	1.57	2006	8339	8184	1.0189394	0.5717364	0.5648780	0.0597364	0.0528780

For the 2004–7 NBA seasons, exp = 15.4 best fit actual winning percentages. MAD for these seasons was 3.35% for exp = 15.4 and 3.4% for exp = 13.91. Since Morey’s values of exp are very close in accuracy to the values we found from recent seasons we will stick with Morey’s values of exp.

These predicted winning percentages are based on regular season data. Therefore, we could look at teams that performed much better than expected during the regular season and predict that “luck would catch up with them.” This train of thought would lead us to believe that these teams would perform worse during the playoffs. Note that the Miami Heat and Dallas Mavericks both won about 8% more games than expected during the regular season. Therefore, we would have predicted Miami and Dallas to perform worse during the playoffs than their actual win-loss record indicated. Sure enough, both Dallas and Miami suffered unexpected first-round defeats. Conversely, during the regular season the San Antonio Spurs and Chicago Bulls won around 8% fewer games than the Pythagorean Theorem predicts, indicating that these teams would perform better than expected in the playoffs. Sure enough, the Bulls upset the Heat and gave the Detroit Pistons a tough time. Of course, the Spurs won the 2007 NBA title. In addition, the Pythagorean Theorem had the Spurs as by far the league’s best team (78% predicted winning percentage). Note the team that underachieved the most was the Boston Celtics, who won nearly 9% fewer (or 7) games than predicted. Many people suggested the Celtics “tanked” games during the regular season to improve their chances of obtaining potential future superstars such as Greg Oden and Kevin Durant in the 2007 draft lottery. The fact that the Celtics won seven fewer games than expected does not prove this conjecture, but it is certainly consistent with the view that Celtics did not go all out to win every close game.

Keep Reading

Contact Me!

Home

The actual errors were not simply averaged because averaging positive and negative errors would result in positive and negative errors canceling out. For example, if one team wins 5% more games than (1.2) predicts and another team wins 5% fewer games than (1.2) predicts, the average of the errors is 0 but the average of the absolute errors is 5%. Of course, in this simple situation estimating the average error as 5% is correct while estimating the average error as 0% is nonsensical.↩
In six playoff series the opposing teams had identical win-loss records so the “Games Won” approach could not make a prediction.↩
In chapters 2-4 we will explain in detail how to determine how many runs a hitter creates.↩

Mathletics

by Wayne Winston, R code by Nick Capofari

January 2, 2017

Chapter 1: Baseball’s Pythagorean Theorem

How Well Does the Pythagorean Theorem Forecast?

The Importance of the Pythagorean Theorem

Football and Basketball “Pythagorean Theorems”