Introduction
2. Disproportionate sampling

2.1. What it is
Proportionate Stratified Sampling

1.1. What it is
2.4 Obtaining unbiased estimates from a disproportionate sample /
weighting the data
.
2.3 Why disproportionate stratification is used
1.4 What impact proportionate stratification has on survey estimates and their standard errors
1.6 Calculating standard errors for surveys using implicit stratification
1.5 Proportionate stratification in practice (explicit and
implicit stratification.)
1.2. A simple example
Suppose a sample of 100 students is to be selected from a school with 2000
students, so that the sampling fraction to be used is 1 in 20. If, before drawing the sample, the school roll is divided by age and sex, and a separate sample is drawn per age and sex stratum, then if the sampling fraction of 1 in 20 is used in each stratum the sample would be a proportionate stratified sample.

1.3 Why proportionate stratification is used

1.7 Stratification for surveys with long fieldwork periods
2.2. A simple example
feedback form
text/print version
In proportionate stratification a distinction is made between ‘explicit’ and ‘implicit’ stratification.

‘Explicit stratification’ is where the population of sampling units is explicitly divided into strata and a separate sample selected per stratum. ‘Implicit stratum’ is where the population of sampling units is sorted by some characteristic(s) and then the sample is selected from the sorted list using a fixed sampling interval and a random start.
For example, a population of adults might be sorted by sex, and then, within sex by date of birth. Suppose every nth person is then selected from the population by taking a random start between 1 and n and then every nth person after that, working down the list. This sample would then be described as a proportionate stratified sample with explicit stratification by sex and implicit stratification by date of birth. Note that for explicit stratification only categorical stratifying variables can be used (or continuous variables that have been grouped into categories).

Implicit stratification, in contrast, which only involves sorting a population rather than grouping it, tends to use continuous variables.

Large-scale surveys often use a combination of explicit and implicit stratification. The sampling frame will firstly be grouped into a number of explicit strata, and within each of these the sampling frame will be sorted by a continuous variable.
Software packages that calculate standard errors for complex surveys only allow
for explicit stratification. The way around this for a survey that uses implicit stratification
is to:
(a) Keep the sample in the same order as it was selected in.
(b) Put achieved cases into pairs, working down the list (i.e. the first two achieved cases working down the list are the first pair, the third and fourth achieved are the second pair, etc.).
(c) If there are an uneven number of achieved cases the put the last three achieved cases together to give a triplet.
(d) Treat each pair/triplet as if they were selected from the same explicit stratum. So there will be half as many explicit strata as there are achieved cases.

This ‘trick’ needs some care when calculating standard errors for sub-groups, since the approach only works if there are two achieved cases per ‘pair’. For a sub-group this can easily drop to one. One option would be to re-pair the sample for each sub-group, but this is too onerous in practice. (Add link to section on how each software package avoids this.)
For large-scale government sponsored surveys it is common practice to
spread fieldwork over a period, often of a year.

Examples include:

In these cases the sample for a whole year is selected at one point in time (usually using a combination of implicit and explicit stratification) and then the primary sampling units are systematically allocated to the 12 months of the year. The allocation is done in such a way that, within each month, the original stratification is maintained.

With this design the decision on how to deal with the stratification in estimating standard errors is not so straightforward. If the pairing follows the sample stratification then, in all pairs, the two primary sampling units will be from different months of the survey. This means that the ‘within-stratum between-psu’ estimated component of variance will incorporate both a genuine between-psu element plus a between-month element (which will often be a seasonal effect). This latter component tends to over-estimate the standard errors for estimates.

To avoid this one approach is to treat the sample for each month as an independent sample and then treat the sample within each month as a stratified sample. This in effect means that the original sample is resorted, firstly by month, and then within month, by the original order. The pairs are then constructed from this new list.
Suppose a sample of 50 white students and 50 non-white students is to be
selected from a school with 2000 students, of whom 100 are non-white. To achieve this the school roll would need to be divided into two strata: white and non-white, and separate samples selected per strata. The sampling fraction to be applied in the white stratum would be 1 in 38; the sampling fraction to be applied in the non-white stratum would be 1 in 2.
Disproportionate stratification is used for two purposes:

A. to give larger than proportionate sample sizes in one or more sub-groups so that separate analyses by sub-group will be possible; and, far more rarely

B. to increase the precision of key survey estimates.

Disproportionate stratification will only reduce standard errors (relative to a proportionate stratified sample) if the population standard deviation for the variable of interest is higher than average within the over-sampled strata. (In practice, standard errors will be minimised if the sampling fraction used per stratum is proportional to the population standard deviation within the stratum).

The fact that most surveys collect data on a wide range of variables mean that disproportionate stratified sampling to reduce standard errors is very rarely used – since, the optimal sample design for one variable is unlikely to be optimal for others. Furthermore, the population standard deviations are often not known at the design stage.
To obtain unbiased estimates for a disproportionate stratified sample, the survey estimates have to be weighted. This is achieved within most software packages by defining a weight variable that gives a weight per case. The cases are then ‘weighted by’ this weight variable in the analysis.

The calculation of the weight is fairly straightforward: it is simply the inverse of the sampling fraction used in the stratum that the case belongs to. So, in a stratum where the sampling fraction is 1 in 10 all cases would get a weight of 10; and in a stratum where the sampling fraction is 1 in 22 all cases would get a weight of 22.

In practice the weights applied to a particular survey may be more complex than this if, for instance, within strata not all cases are selected with equal probability, of if non-response weights have been included.

Proportionate allocation is used for two reasons:

(i) to reduce standard errors for survey estimates;
(ii) to ensure that samples sizes for strata are of their expected size.

For example, almost all large-scale GB surveys that use the Postcode Address File (PAF) as a sampling frame use samples stratified by region, and within region, by a measure of relative area deprivation. The first stratifier (region) is used to ensure that the selected sample is correctly proportioned by region. (A national sample that, just by chance, happened to under or over-represent some of the regions would be considered by many as ‘unrepresentative’). The second stratifier (area deprivation) is used to ensure that the selected sample is correctly proportioned by area type.

In practice, many survey statisticians would argue that of the two, only the second stratifier is strictly necessary, and that the regional stratifier is largely cosmetic. This is because area deprivation is strongly correlated with many of the outcome measures social surveys collect. So ensuring that the sample has the correct area deprivation profile means there will be less sampling variance in the estimates and standard errors are almost bound to be smaller than would be the case with an unstratified sample. Put another way, if the area deprivation profile of the sample is controlled, the risk of selecting an unrepresentative sample by chance is reduced.

Region, in contrast, tends to be only weakly associated with social survey outcome measures, so stratification by region does not reduce sampling variance by very much. In other words, even if the regional profile of the sample is controlled, the risk of selecting an unrepresentative -sample by chance does not significantly reduce.

backup
backup
In a disproportionate stratified sample, the population of sampling units are divided into sub-groups, or strata, and an sample selected separately per stratum. Crucially, the sampling fraction is not the same within all strata: some strata are over-sampled relative to others.
Sample stratification involves two steps:

(a) divide the population of sampling units into population sub-groups, called strata

(b) select a separate sample per strata

If the same sampling fraction is used in each stratum this is termed ‘proportionate stratified sample’; if the sample fraction is not the same in each stratum this is termed ‘disproportionate sampling’. More commonly the latter would be described as ‘over-sampling of one or more sub-groups’.

Proportionate stratified sampling almost always leads to an increase in survey precision (relative to a design with no stratification), although the increase will often be modest, depending upon the nature of the stratifiers. Disproportionate sampling sometimes increases precision and sometimes reduces precision. Surveys using disproportionate sampling have to utilise survey weights if they are to give unbiased cross-strata estimates.
In a proportionate stratified sample, the population of sampling units are divided into sub-groups, or strata, and an sample selected separately per stratum. For the sampling to be proportionate, the sampling fraction (or interval) must be identical in each stratum.
Relative to taking a completely unstratified sample, taking a proportionate sample is either a good thing, in that in reduced standard errors, or a neutral thing, in that standard errors don’t change. Proportionate stratification can never increase standard errors. The reasoning is as follows:

- total sampling variance can be decomposed into two components: within-strata variation and between-strata variation (the split between the two depending on how the strata are defined);
- with proportionate stratification the between-strata variance becomes zero. So, proportionate stratification is most efficient when the stratifiers that are used split the total variance in a way that maximises the between-strata variance.
Sample Stratification
Introduction

Proportionate Stratified Sampling
1.1 What it is > 1.2 A Simple Example > 1.3 Why proportionate stratification? > 1.4 Stratification Estimates > 1.5 Proportionate Stratification >1.6 Calculating Standard Errors >1.7 Stratification
for Surveys


Disproportionate Sampling
2.1 What it is > 2.2 A Simple Example > 2.3 Why it is used > 2.4 Obtaining unbiased estimates
backup
backup

backup

backup
backup
backup
backup
backup
backup
backup