Chapter 7: Multiple Regression
This chapter introduces the concept of multiple regression, which in many ways is similar to bivariate regression. Both methods produce conditional predictions, though multiple regression employs more than one independent X variable to predict the value of the Y variable. Just as before, the predicted value of the dependent variable is expressed in a simple equation, and in the case of least squares regression the RMSE summarizes the likely size of the residual and the R2 statistic measures the fraction of total variation, which is explained by the regression. Once again, the OLS regression coefficients are those that minimize the SSR.
Multiple regression introduces some new issues, however. Some of the complications
are purely mathematical. Although it is relatively easy to move back and forth
between the algebraic expression and the pictorial (geometric) representation
of the regression line in the bivariate case, most people have difficulty
translating the algebraic formulation for a multiple regression into its geometric
representation as a plane (in trivariate regression) or hyperplane (when there
are more than two independent variables). Furthermore, the formulas for the
OLS regression coefficients become very unwieldy (we discuss them in the appendix
of this chapter).
To help you deal with the additional complexities of multiple regression,
we will try to keep you focused on the main issues. The central goal is still
doing a good job of conditional prediction of values of theY variable based
on our knowledge of values of the X variables. Just as with bivariate regression,
multiple regression can again be interpreted as a compression of a (more complicated)
graph of averages. The OLS regression coefficients are still weighted sums
of the Y variable. Finally, running a multiple regression on a computer is
no more difficult than running a bivariate regression. In addition to the
more involved mathematics, multiple regression highlights two important conceptual
issues: confounding and multicollinearity. Confounding is so important that
it was already introduced in Chapter 1. We suggest that you reread the discussion
of separating out the influence of price and income in the demand for cigarettes
in Section 1.2.
This chapter makes extensive use of a single artificial example with data
on the demand for heating oil. Section 7.2 explains how least squares multiple
regression is the solution to the familiar optimization problem of minimizing
the SSR, where the Predicted Y variable is now based on more than one X variable.
Section 7.3 comes back to the artificial example to explain the concept of
confounding. Section 7.4 treats multicollinearity, which is a technical issue
you need to be aware of when running your own regressions. The appendix shows
how all OLS regression coef- ficients can be obtained from an analytic formula,
which we go on to derive in the trivariate case. The appendix also states
the omitted variable rule, which is a simple mathematical relationship explaining
the magnitude of confounding.
Excel Workbooks