5 Prediction and Prediction Errors
Let's return to looking at the relationship between runs and at_bats. We'll combine our graph before with our fitted regression line from the model superimposed on top. Recall the equation for the regression line is:
\[\textrm{Runs} = -2789 + 0.6305*\textrm{At-Bats}\]
ggplot(data = mlb11, aes(x = at_bats, y = runs)) +
geom_point() +
labs(title = "Relationship between Runs and At-Bats", x = "At-Bats", y = "Runs") +
geom_smooth(method = "lm", se = FALSE)Here, we are literally adding a layer on top of our plot. geom_smooth adds the regression line fitted by minimizing the errors sum of squares. It can also show us the standard error se associated with our line, but we won't show that for now, using the se = FALSE argument. (Although you can set se = TRUE if you're interested in looking at it. It essentially creates a continuous confidence interval for values around the regression line.)
The regression line can be used to predict \(y\) at any value of \(x\) within the range of observed \(x\) values..
Interpolation is when...
Extrapolation is when...
Extrapolation is not usually recommended. Predictions made within the range of the data are more reliable, and are also used to compute the residuals.
Use the fitted model to answer the questions below.