Chapter 11: The Measurement Box Model
Regression is the dominant method of empirical analysis in economics. It has two basic applications: description and inference. The first eight chapters of this book use regression for description. Chapters 9 and 10 introduce and review tools for making statistical inference. We are now ready to see how regression is used when the data are a sample from a population.
The next few chapters prepare the ground for the study of regression as a tool for inference and forecasting. Inference in general means reasoning from factual knowledge or evidence. In statistics, we have a sample drawn from a population and use the sample to infer something about the population.
For example, suppose we have data on 1,178 people in the United States in 1989 selected at random from the adult working population. We have the level of experience and the wages of these people. Part 1 discusses the use of regression to provide a summary of the bivariate wage-experience data. Statistical inference aims at a much more ambitious goal. Instead of simply describing the relationship for those 1,178 people, we wish to discover the relationship between wage and experience for all of the adult workers in the United States. Our aim is to make educated guesses about the population based on information gathered from the sample.
Throughout our study of regression applied to inferential questions, we will emphasize the importance that chance and sampling error play in our educated guesses, which we will call estimates. Although the details require concentration and effort, the main idea – that an estimate based on a particular sample is likely to be off the true, unknown, population value – is not difficult to grasp.
We stress the importance of understanding the role of chance in an inferential setting because regression for inference requires an explicit model of the chance process. We do not want the student to memorize a list of rules that must be met or, worse, assumed. Instead, our goal is true understanding of different models of chance and their implications for regression analysis in an inferential setting. Thus, much of the presentation in the rest of the book is built on the idea of sampling and sampling error. Although proceeding with caution over some difficult terrain, we do count on prior knowledge of elementary statistical inference.
In this chapter we discuss a simple model for the data generation process first used by astronomers as a way of combining measurements of celestial bodies to estimate their true orbits. The problem these scientists faced was that, despite strong theoretical evidence that planets ought to orbit along smooth curves, their measurements did not all fit on a single curve. They realized that the data resulted from imperfect measurements of the exact location of the planets. The scientists’ task was somehow to reconcile the data to come up with a single best estimate of the true orbit. In this endeavor astronomers realized that, in general, it was a good practice to make use of all the observations. The question was how. The solution ultimately depended on arriving at a satisfactory model of the data generation process.
We begin with this model in a book dedicated to econometrics because it serves as an easily understandable bridge from the data generation processes of basic statistics (what we have called the coin-flip and polling box models) to the classical econometric model of Chapter 13. Sections 11.2 through 11.5 discuss a univariate problem in which we measure a single quantity repeatedly. We will show how the basic models of the data generating process reviewed in Chapter 10 can be modified to work out the properties of the sample average in this measurement problem. In Section 11.6, a crucial conceptual leap is made by extending the measurement box model to the problem of the relationship between two variables estimated via a bivariate regression.
Chapters 11 through 13 present three different descriptions of the data generation process. In Chapter 13, we point out that, mathematically speaking, the measurement box model of this chapter and the classical econometric model of Chapter 13 are identical. Why do we distinguish between them? We do so because we wish to stress that one must have a coherent, plausible explanation for the data generation process before one proceeds to statistical inference. The measurement box model of this chapter assigns very different roles to chance error than does the classical econometric model.
This chapter also demonstrates two complementary approaches applied throughout the rest of the book: the box model, which facilitates comprehension of the data generating process, and Monte Carlo simulation, which enables us to approximate the distribution of estimates obtained according to a specified data generating process.