Chapter 7: Multiple Regression
This chapter introduces the concept of multiple regression, which in many ways is similar to bivariate regression. Both methods produce conditional predictions, though multiple regression employs more than one independent X variable to predict the value of the Y variable. Just as before, the predicted value of the dependent variable is expressed in a simple equation, and in the case of least squares regression the RMSE summarizes the likely size of the residual and the R2 statistic measures the fraction of total variation, which is explained by the regression. Once again, the OLS regression coefficients are those that minimize the SSR.
    Multiple regression introduces some new issues, however. Some of the complications 
    are purely mathematical. Although it is relatively easy to move back and forth 
    between the algebraic expression and the pictorial (geometric) representation 
    of the regression line in the bivariate case, most people have difficulty 
    translating the algebraic formulation for a multiple regression into its geometric 
    representation as a plane (in trivariate regression) or hyperplane (when there 
    are more than two independent variables). Furthermore, the formulas for the 
    OLS regression coefficients become very unwieldy (we discuss them in the appendix 
    of this chapter).
    To help you deal with the additional complexities of multiple regression, 
    we will try to keep you focused on the main issues. The central goal is still 
    doing a good job of conditional prediction of values of theY variable based 
    on our knowledge of values of the X variables. Just as with bivariate regression, 
    multiple regression can again be interpreted as a compression of a (more complicated) 
    graph of averages. The OLS regression coefficients are still weighted sums 
    of the Y variable. Finally, running a multiple regression on a computer is 
    no more difficult than running a bivariate regression. In addition to the 
    more involved mathematics, multiple regression highlights two important conceptual 
    issues: confounding and multicollinearity. Confounding is so important that 
    it was already introduced in Chapter 1. We suggest that you reread the discussion 
    of separating out the influence of price and income in the demand for cigarettes 
    in Section 1.2.
    This chapter makes extensive use of a single artificial example with data 
    on the demand for heating oil. Section 7.2 explains how least squares multiple 
    regression is the solution to the familiar optimization problem of minimizing 
    the SSR, where the Predicted Y variable is now based on more than one X variable. 
    Section 7.3 comes back to the artificial example to explain the concept of 
    confounding. Section 7.4 treats multicollinearity, which is a technical issue 
    you need to be aware of when running your own regressions. The appendix shows 
    how all OLS regression coef- ficients can be obtained from an analytic formula, 
    which we go on to derive in the trivariate case. The appendix also states 
    the omitted variable rule, which is a simple mathematical relationship explaining 
    the magnitude of confounding. 
  
Excel Workbooks