At the very end I want to say just a little bit about estimation.
That's beyond the scope of this course, but
we'll mention just a few things about it before we wrap up on this lecture.
So the first thing that one does is think about the sampling units that one's going
to have available for a particular application.
Now, many times the applications are such that other people have already done
similar work, and you can build on their work or borrow from it.
In our particular case, we're going to be looking at drawing a sample of
housing units from across the United States.
And we're going to start by using units that are readily available to us.
In this particular case, nothing to do with housing units specifically,
but to land areas.
We're going to identify the primary sampling units, our starting point,
something that has a large enough number of those units.
Not just a few of them that we're going to take an even smaller number, but
a large number of them that we're going to make a selection of them.
And then we're going to divide the selected primary sampling units up into
secondary units.
And then we're going to select those, but
only within the selected primary sampling units.
And within those secondary units, etc., we're going to select either
third-level units or housing units, however we're going to get there.
And I'm going to take you through just a few illustrations of some of the material
that might be used for something like this.
Now, our primary sampling units for
our particular application in the United States are subdivisions called counties.
There was a time in which not all the states had counties.
They all had subdivisions, but
sometimes they were called different kinds of things.
And actually, to this day, some of them still have those labels.
For example, in Louisiana, the counties are referred to as parishes.
But they're basic units, and this is a map of the United States showing the detailed
boundaries of all of these counties.
There are over 3,000 of them in the United States, and Alaska and
Hawaii are inserted here, and they, too, have these kinds of subdivisions.
They vary enormously in geographic size, but
we're going to think about these counties now as collections of housing units.
Not a collection of housing units that we can go to and list all of them.
These counties can have hundreds of thousands,
millions of housing units in them.
And some can have fewer than 10,000.
So there's a lot of variation in size.
But this is a good place for starting to sample.
We've defined the unit that we're going to use as our primary sampling,
our first-stage selection.
Knowing that there will be additional steps in the selection,
stages in the selection, clustering, to help us get to a point where we can have
a list of housing units from which we can draw our sample.
So, we've identified those first-stage units, and what we're going to do with
them is stratify them, just as we've done before when we dealt with elements.
We're now going to stratify the primary sampling units.
Stratification was a general-purpose tool and we used it to assure
representation and possibly provide us with gains in precision,
as you recall when we did proportionally allocated stratified samples of elements.
When we do that application of stratification to these PSUs,
in this case counties, we're going to follow principles that are similar to
those that we used in element stratification.
But here, we're going to use cluster characteristics.
We're going to stratify the clusters, not the elements.
Although implicitly, the housing units are getting stratified,
because they're contained within these PSUs and
we're putting the PSUs into groups where they're relatively homogeneous.
The counties within a group are similar to one another,
with respect to characteristics that we're interested in.
With respect to the nature of the counties, is it an urban or
a rural location?
Does it have high or low rates of unemployment?
Does it have a high fraction of occupied housing units or a low one?
All sorts of things that we can identify for these primary sampling units.
Our groups will be mutually exclusive and exhaustive, just as we did before.
The stratifying variables, the boundaries and the other kinds of things
follow the same kinds of element sampling stratification principles we talked about
before, but it's applied just to those units, as I've been saying.
But once we've got that creation of those groupings,
those strata, then we're going to allocate a sample of clusters across the strata.
We'll figure out how many of these things to use
using the kind of approach we talked about before with respect to cluster sampling,
where we identified the optimum subsample size, and thereby, the optimum
number of primary sampling units that we can select, the number of clusters.
We could do the allocation across our strata proportionately.
We could do it to such an extent that we have only two selections per stratum.
We could do it so that there's a different number than two but
an equal number in each of the strata, or other kinds of allocations.
We've mentioned Neyman or minimum variance kinds of allocations.
Very seldom used with this kind of thing, because there is not much
gain available beyond the kinds of things we can do proportionately.
Within each of the groups,
within each of the strata, we're going to select our samples.
And then, we are going to, after data collection, Compute estimates for
each of the strata, each of the groups separately.
Not only the statistic we're interested in, let's just say a mean or
a proportion, but also the sampling variance of that mean or
that proportion within each group, and then we're going to combine them.
We're going to use those combining rules we did for stratified sampling.
But now applying it to the strata for the primary sampling units.