Why Take a Design-based Approach to Modeling Data from Complex Surveys?

David Binder

Anyone who has fitted models to data from complex surveys has been faced with the issue of whether or not to use the sample design information in the analysis. This may mean using the survey weights and using design-based variance estimates. (Alternatively, it may mean incorporating the design information in the model itself). What statistical theory is available to give guidance to the analyst?

We discuss the differences between the design-based and the model-based approaches to inferences about the model parameters. A small artificial example is used to demonstrate why the model-based approach can lead to misleading results. A general theory is given to show when a design-based approach can give correct inferences for superpopulation model parameters. We also describe some cases where the general theory fails.