And so in those particular cases, we're going to use that kind of information,
our substantive understanding of what's going on,
in order to choose among those variables.
We may also have past surveys in which we can calculate what differences there are,
what percent of the variance is being explained by different factors.
Different auxiliary variables, in order to decide which of them are the most
important to explain most of the variance,
and the ones that we want to use in our stratification.
So shown on the lower left, it's this idea of the stratification,
these differences between the group.
Homogeneous within, differences between,
we want to capitalize on these auxiliary variables to create that kind of
stratification that is as stark and sharp a contrast as possible.
But there's a corresponding part of this, and
that has to do with multipurpose surveys.
And that is, I'm talking about multiple x variables in a regression model, multiple
right hand side variables, multiple auxiliary variables informing the strata.
But we also have to realize that in these surveys, we don't just measure one thing.
Very few of the surveys that are being done,
whether it's in government context or in private industry or
in academic settings, very few of them deal with just one variable as an outcome.
They have many kinds of things that they measure.
So, for example, we might have been doing this particular survey among faculty and
records, but we might be doing a much larger household survey.
Households being collections of people in a country, and I've picked out a country
in the Persian Gulf, some of you may need to look this up, Qatar.
And in this particular country, maybe we're doing a national survey looking at
such things as assets, building ownership, use of expatriate labor,
expenditures on various kinds of food and housing and so on, income, health,
healthcare use, psychological well-being, social integration.
A host of factors, some of which we're doing because this is our one chance to
do it, lets get the data and do a multipurpose survey.
We're going to do health, but by the way,
we better bring in the income items as well.
We're going to do social and psychological well-being, because we think they're
related to health, and so we're going to include those as part of our survey.
And so we have many variables in that same survey.
The stratification then becomes more complicated.
But it turns out that that scheme we just looked at,
that proportionately allocated stratified sampling scheme,
gives us gains in precision for almost all the variables in a multipurpose survey.
It turns out to be a very good way to approach the stratification.
Not the only way, but it is a good starting strategy for
thinking about these kinds of things as we go along.