Chapter 2: Correlation
This chapter begins the study of describing data that contain more than one
variable. We will see how the correlation coefficient and scatter plot can
used to describe bivariate data.
Not only will you learn the meaning and usefulness of the correlation
coefficient, but, just as important, we will stress that there are times when
correlation coefficient is a poor summary and should not be used. There is
no such thing as a perfect summary measure of data.
In addition, we emphasize that correlation merely indicates the level of
linear association between two variables and should never be used to infer
causation. It is tempting to suppose that a high correlation implies some
of causal connection, but this is wrong.
Although much of this material may be familiar to students of statistics,
we conclude the chapter with a discussion of ecological correlation, which
often omitted from introductory statistics courses. We show that the correlation
coefficient based on individual level data may be markedly different
when computed with grouped data. In economics, this is called the aggregation
problem, and it merits attention.