Chapter 7: Multiple Regression
This chapter introduces the concept of multiple regression, which in many ways is similar to bivariate regression. Both methods produce conditional predictions, though multiple regression employs more than one independent X variable to predict the value of the Y variable. Just as before, the predicted value of the dependent variable is expressed in a simple equation, and in the case of least squares regression the RMSE summarizes the likely size of the residual and the R2 statistic measures the fraction of total variation, which is explained by the regression. Once again, the OLS regression coefficients are those that minimize the SSR.
Multiple regression introduces some new issues, however. Some of the complications are purely mathematical. Although it is relatively easy to move back and forth between the algebraic expression and the pictorial (geometric) representation of the regression line in the bivariate case, most people have difficulty translating the algebraic formulation for a multiple regression into its geometric representation as a plane (in trivariate regression) or hyperplane (when there are more than two independent variables). Furthermore, the formulas for the OLS regression coefficients become very unwieldy (we discuss them in the appendix of this chapter).
To help you deal with the additional complexities of multiple regression, we will try to keep you focused on the main issues. The central goal is still doing a good job of conditional prediction of values of theY variable based on our knowledge of values of the X variables. Just as with bivariate regression, multiple regression can again be interpreted as a compression of a (more complicated) graph of averages. The OLS regression coefficients are still weighted sums of the Y variable. Finally, running a multiple regression on a computer is no more difficult than running a bivariate regression. In addition to the more involved mathematics, multiple regression highlights two important conceptual issues: confounding and multicollinearity. Confounding is so important that it was already introduced in Chapter 1. We suggest that you reread the discussion of separating out the influence of price and income in the demand for cigarettes in Section 1.2.
This chapter makes extensive use of a single artificial example with data on the demand for heating oil. Section 7.2 explains how least squares multiple regression is the solution to the familiar optimization problem of minimizing the SSR, where the Predicted Y variable is now based on more than one X variable. Section 7.3 comes back to the artificial example to explain the concept of confounding. Section 7.4 treats multicollinearity, which is a technical issue you need to be aware of when running your own regressions. The appendix shows how all OLS regression coef- ficients can be obtained from an analytic formula, which we go on to derive in the trivariate case. The appendix also states the omitted variable rule, which is a simple mathematical relationship explaining the magnitude of confounding.