Survey weights
Natalie Shlomo, Professor in Social Statistics, CCSR.
Survey weights aim to:
- inflate the sample to the level of the target population
- reduce bias arising from nonresponse when the characteristics of the respondents differ from those not responding
- increase the precision of estimates by utilizing known auxiliary variables that may be correlated with the survey topics.
Nonresponse
Nonresponse to a survey occurs when a selected unit does not provide the requested information. This is out of control of the research and affects the quality of the survey estimates.
There are two types of nonresponse: unit nonresponse and item nonresponse. Unit nonresponse occurs when a selected individual does not provide any information and item nonresponse occurs when some questions have been answered.
In case of item nonresponse, the typical treatment of missing values is through imputation. We compensate for unit nonresponse through the survey weights.
Computing survey weights
To compute survey weights we first inflate the sample by the design weight which is one over the first order inclusion probability. If the nonresponse is not selective, we can adjust the design weight by the response rate.
For selective non-response, the design weights are adjusted in such a way that the response is representative with respect to several known auxiliary variables, and if these auxiliary variables have a strong correlation to the survey target variables, then the weighted sample should also be representative and provide more accurate estimates of the population characteristics.
The adjustment can be calculated as the response rate within weighting classes defined by the known auxiliary variables or by an estimated response propensity typically calculated through a logistic regression model.
Post-stratification
One well known technique for calculating survey weights is post-stratification. A stratified sample design increases the precision of estimates compared to a simple random sample when the variables defining the strata are correlated with the survey target variables.
In some cases, it is not possible to carry out a stratified sample design in advance since population characteristics may not be known on the sample frame. However, we can divide the sample up into post-strata and use the known population totals in each post-strata to calculate a survey weight for all units in the strata.
The post-strata are defined by cross-classifying known auxiliary variables, eg. geography by age group by gender. Since the sample size in each post-strata are random there needs to be a sufficient number of units in each strata. In addition, some of the auxiliary population totals may not be known for the fully cross-classified variables defining the post-strata but may be known on the margins.
In this case, the survey weights can be calculated using iterative proportional fitting where we obtain the probabilities of getting observations in the post-strata of the complete cross-classification given the probabilities for marginal distributions.
Generalized regression estimator (GREG)
A more general method to post-stratification is the model-assisted approach called the generalized regression estimator (GREG) where the auxiliary variables can be categorical or continuous or both. Moreover, the GREG method can provide a single weight for all units in a cluster, eg. a single survey weight for all individuals in a household. This ensures consistency between individual and cluster-level estimates of population characteristics.
Based on post-stratification and the more general GREG techniques, the final weighted sample counts are calibrated to the known population totals and hence the accuracy and precision of estimates increase. In addition, we obtain consistency of auxiliary totals across all surveys which is an important feature for National Statistics Institutes.
Example: Point estimates from a survey of women of reproductive age in 30 Counties of China
Un-weighted | Design Weighted | Non-response Adjusted | Calibrated | ||
Average Income | All | 10,717.2 | 10,883.8 | 10,380.8 | 10,351.7 |
Age of husband less than 38 | 10,392.5 | 10,536.9 | 10,290.1 | 10,169.4 | |
Age of husband greater than 39 | 11,068.3 | 11,250.7 | 10,478.0 | 10,567.7 | |
Level of Education of husband (in %) | Illiterate/semi-literate | 2.5 | 2.5 | 2.4 | 2.5 |
Primary school | 21.4 | 21.2 | 22.0 | 22.6 | |
Junior high school | 46.3 | 46.6 | 49.1 | 49.8 | |
Senior and over | 29.8 | 29.7 | 26.5 | 25.1 | |
Total | 100.0 | 100.0 | 100.0 | 100.0 |
Manchester experts and projects
The research group Survey Methods and Analysis (SMA) within the CCSR provides cutting edge research in the area of survey design and estimation. We aim to work closely with all disciplines in the Social Sciences to increase awareness when analyzing complex survey data.
Key publications
- Bethlehem, J. (2009) Applied Survey Methods: A Statistical Perspective. New York: Wiley
- Lohr, S.L. (2009) Sampling: Design and Analysis, 2nd Edition. Pacific Grove: Duxbury Press
- Sarndal, C.E., Swensson, B. and Wretman, J. (1991) Model Assisted Survey Sampling. New York: Springer.