Chapter 10: Review of Inferential Statistics
The goal of statistical inference is to use sample data to estimate a parameter (a statistic about the population) or determine whether to believe a claim that has been made about the population. We never actually observe the parameter we are interested in; instead we use an estimate of the parameter based on data from a sample. The sample estimate is almost always different from the claimed value of the parameter. There are then two possibilities: the difference (between the estimate and the claim) may be real or it may be due to chance. Thus, the fundamental question of statistical inference becomes, Is the difference real or due to chance?
To answer the fundamental question, we require a model for the data generation process, or DGP. The DGP describes how each observation in the data set was produced. It usually contains a description of the chance process at work. Given a DGP and certain parameter values, we can calculate the probability of observing particular ranges of outcomes.
In this chapter,we try to clarify these complicated issues by reviewing basic concepts of inference from introductory statistics. Our approach is somewhat unusual in that we downplay the mathematical formalism and instead emphasize the logic of statistical inference. We borrow the extremely useful metaphor of a box model from Freedman, Pisani, and Purves (1998). The box model is a way of concretely representing a random variable. In this chapter, we will distinguish between two basic types of box models – we call them coin-flip and polling box models. Though these models differ in important respects, it turns out that we can answer the fundamental question of inference in the same way with both models.
In subsequent chapters, we will develop additional box models that are designed to handle the more complicated situations arising when one examines data from observational studies. We will, however, be able to use the basic strategy outlined in this chapter to answer the question of whether the difference is real or due to chance.
The next section introduces the box model as a metaphor for handling chance processes. Sections 10.3 and 10.4 introduce the two fundamental box models and demonstrate how they work.We then present a review of hypothesis testing and follow up with the concept of a consistent estimator. Finally, we explain the algebra of expectations – a set of rules that are useful for computing the expected value and standard deviation of random variables.
We will call on the box model metaphor over and over again throughout the rest of this book. We will almost always employ Monte Carlo analysis to demonstrate properties of the various box models. On occasion, we will make use of results from the algebra of expectations to provide an alternative, more rigorous derivation of these properties.
Although the experienced statistics student may wish to skip this review chapter, we recommend a quick perusal of the material if only to ensure that the box model metaphor makes sense. Of course, every student can benefit from a detailed review to sharpen the crucial skills and concepts learned in an introductory statistics course.