### Chapter 11: **The Measurement Box Model**

Regression is the dominant method of empirical analysis in economics. It has two basic applications: description and inference. The first eight chapters of this book use regression for description. Chapters 9 and 10 introduce and review tools for making statistical inference. We are now ready to see how regression is used when the data are a sample from a population.

The next few chapters prepare the ground for the study of regression as a
tool for inference and forecasting. Inference in general means reasoning from
factual knowledge or evidence. In statistics, we have a *sample* drawn
from a *population* and use the sample to infer something about the
population.

For example, suppose we have data on 1,178 people in the United States in 1989 selected at random from the adult working population. We have the level of experience and the wages of these people. Part 1 discusses the use of regression to provide a summary of the bivariate wage-experience data. Statistical inference aims at a much more ambitious goal. Instead of simply describing the relationship for those 1,178 people, we wish to discover the relationship between wage and experience for all of the adult workers in the United States. Our aim is to make educated guesses about the population based on information gathered from the sample.

Throughout our study of regression applied to inferential questions, we will
emphasize the importance that chance and sampling error play in our educated
guesses, which we will call *estimates*. Although the details require
concentration and effort, the main idea – that an estimate based on
a particular sample is likely to be off the true, unknown, population value
– is not difficult to grasp.

We stress the importance of understanding the role of chance in an inferential
setting because regression for inference requires an explicit model of the
chance process. We do not want the student to memorize a list of rules that
must be met or, worse, assumed. Instead, our goal is true understanding of
different models of chance and their implications for regression analysis
in an inferential setting. Thus, much of the presentation in the rest of the
book is built on the idea of sampling and sampling error. Although proceeding
with caution over some difficult terrain, we do count on prior knowledge of
elementary statistical inference.

In this chapter we discuss a simple model for the data generation process
first used by astronomers as a way of combining measurements of celestial
bodies to estimate their true orbits. The problem these scientists faced was
that, despite strong theoretical evidence that planets ought to orbit along
smooth curves, their measurements did not all fit on a single curve. They
realized that the data resulted from imperfect measurements of the exact location
of the planets. The scientists’ task was somehow to reconcile the data
to come up with a single best estimate of the true orbit. In this endeavor
astronomers realized that, in general, it was a good practice to make use
of all the observations. The question was how. The solution ultimately depended
on arriving at a satisfactory model of the data generation process.

We begin with this model in a book dedicated to econometrics because it serves
as an easily understandable bridge from the data generation processes of basic
statistics (what we have called the coin-flip and polling box models) to the
classical econometric model of Chapter 13. Sections 11.2 through 11.5 discuss
a univariate problem in which we measure a single quantity repeatedly. We
will show how the basic models of the data generating process reviewed in
Chapter 10 can be modified to work out the properties of the sample average
in this measurement problem. In Section 11.6, a crucial conceptual leap is
made by extending the measurement box model to the problem of the relationship
between two variables estimated via a bivariate regression.

Chapters 11 through 13 present three different descriptions of the data generation
process. In Chapter 13, we point out that, mathematically speaking, the measurement
box model of this chapter and the classical econometric model of Chapter 13
are identical. Why do we distinguish between them? We do so because we wish
to stress that one must have a coherent, plausible explanation for the data
generation process before one proceeds to statistical inference. The measurement
box model of this chapter assigns very different roles to chance error than
does the classical econometric model.

This chapter also demonstrates two complementary approaches applied throughout
the rest of the book: the box model, which facilitates comprehension of the
data generating process, and Monte Carlo simulation, which enables us to approximate
the distribution of estimates obtained according to a specified data generating
process.

**Excel Workbooks**