Mathletics, by Wayne Winston, is a wonderful book that explains how data analytics is used in professional sports. The data work in the book is done using Microsoft Excel. I am an R user. While reading the first few chapters of his book, I began thinking about how the analytic examples he uses could also serve as great introductory examples for anyone interested in learning R.

R is a freely available programming language used for statistical computing. R is appealing because of the price, the fact that it is open source (it is constantly being improved by an engaged community of R users around the world), and R’s awesome graphic capabilities. R is quickly becoming the go-to tool for data analytics. I have attempted to take the examples from “Mathletics” and use them to create an R tutorial.

To install R, follow this link. Once you have finished, you should also install RStudio. RStudio is a an awesome intergated devlopment environment (IDE) for programming in R.

Below are some R basics. A longer overview can be found here. R’s introduction manual can be found here. I found both of these extremely helpful while coming up with these examples and explanations.

If you have any questions or comments, please contact me.

If you know some R already, or just want to dive in, head straight to Chapter 1.

R has a robust library of open source packages. They are easy to install. Make sure you load the package after installation is finished.

```
install.packages("Lahman") #installs the package
library(Lahman) #this loads it
```

This package provides the tables from the Sean Lahman Baseball Database.

Since this is an attempt to replicate work that was originally done using Microsoft Excel, a basic understanding of R’s data frame structure is needed. The simplest explanation of a data frame for an excel user is that data frames are essentially spreadsheets. Each column represents a variable and each row contains all measured variables for the same unit.

The rows and columns of a data frame are vectors. This is the simplest R data structure. A vector groups elements together in a specific order. You can assign a vector to a variable using the *function* `c()`

.

```
#basic vectors
x = c(1, 2, 3)
```

Most R users use `<-`

as the assignment operator. I use `=`

. This link can explain the difference. When I first started programming in R, I found using `=`

less confusing, so I have stuck with it. But just so you know, Google’s R style guide recommends using `<-`

, as does the R community.

```
#more vectors
y = c("My", "first", "vector")
z <- c(TRUE, FALSE, TRUE) #same as z = c(TRUE, FALSE, TRUE)
mixed_vector = c(x[1], y[2], z[3])
mixed_vector
```

`[1] "1" "first" "TRUE" `

There are other ways to create vectors when the vector follows a specific pattern.

`seq(from=2, to=10, by=2)`

`[1] 2 4 6 8 10`

`rep(c("One", "Two"), times=3)`

`[1] "One" "Two" "One" "Two" "One" "Two"`

`c(-2:5)`

`[1] -2 -1 0 1 2 3 4 5`

To create a data frame, we can group vectors togther.

```
Year = seq(2014, 2016)
Team = rep("New York Mets", 3)
W = c(79, 90, 87)
L = c(83, 72, 75)
mets = data.frame(Year, Team, W, L)
#if you are using RStudio, use the View() function to inspect the data frame
#View(mets)
mets
```

```
Year Team W L
1 2014 New York Mets 79 83
2 2015 New York Mets 90 72
3 2016 New York Mets 87 75
```

We can access the information in a data frame in many ways.

```
#use $ to access a data frame column
mets$Year
```

`[1] 2014 2015 2016`

```
#use logical expressions with vectors
mets$W[mets$W >= 81]
```

`[1] 90 87`

```
#data frame indexing: df[row, column]
mets[2, 3]
```

`[1] 90`

`mets[ , c("W", "L")]`

```
W L
1 79 83
2 90 72
3 87 75
```

`mets[mets$Year==2016, c(3, 4)]`

```
W L
3 87 75
```

`mets[mets$W > mets$L, "Year"]`

`[1] 2015 2016`

R has many built in functions. Some functions return one element.

`max(mets$Year)`

`[1] 2016`

`sum(mets$W)`

`[1] 256`

`mean(mets$W)`

`[1] 85.33333`

`length(mets$Team)`

`[1] 3`

Other functions return a vector the same length as the input.

`paste(mets$Year, mets$Team, sep="---")`

`[1] "2014---New York Mets" "2015---New York Mets" "2016---New York Mets"`

```
#apply a fuction to every element in a vector
sapply(mets$Year, function(x){ x-2000 })
```

`[1] 14 15 16`

```
#create a new column of the fly using $
mets$Games = mets$W + mets$L
#no spaces for column names, unless column name is inside ` `
mets$W.pct = round(mets$W/mets$Games, 3)
#vectorized if
mets$`Over 500` = ifelse(mets$W > mets$L, TRUE, FALSE)
```

We can add rows and columns to our data frames using the `rbind`

and `cbind`

functions.

```
next_year = c(2017, "New York Mets", 162, 0, 162, 1.000, TRUE)
rbind(mets, next_year)
```

```
Year Team W L Games W.pct Over 500
1 2014 New York Mets 79 83 162 0.488 FALSE
2 2015 New York Mets 90 72 162 0.556 TRUE
3 2016 New York Mets 87 75 162 0.537 TRUE
4 2017 New York Mets 162 0 162 1 TRUE
```

`cbind(mets, League=rep("NL", 3))`

```
Year Team W L Games W.pct Over 500 League
1 2014 New York Mets 79 83 162 0.488 FALSE NL
2 2015 New York Mets 90 72 162 0.556 TRUE NL
3 2016 New York Mets 87 75 162 0.537 TRUE NL
```

```
#our changes to the data frame were not saved
#make sure to store any changes in a variable
mets_2 = cbind(mets, Manager=rep("Terry Collins", 3))
mets_2
```

```
Year Team W L Games W.pct Over 500 Manager
1 2014 New York Mets 79 83 162 0.488 FALSE Terry Collins
2 2015 New York Mets 90 72 162 0.556 TRUE Terry Collins
3 2016 New York Mets 87 75 162 0.537 TRUE Terry Collins
```

There are many ways to import datasets into R. `read.csv`

or `readLines`

can be used when we have a file we would like to work with. Many times R packages come with ready to use datasets.

```
#the cars dataset that is included with the basic R installation
data(mtcars)
#what are we wroking with?
colnames(mtcars)
```

```
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
```

`nrow(mtcars)`

`[1] 32`

```
#sneak a peek
head(mtcars) # or tail(mtcars)
```

```
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
```

```
#get a quick summary
summary(mtcars[ , c(1:4)])
```

```
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
```

Ok! Now we’re ready to move on to the fun stuff!