Hi, my name is Brian Caffo and this is Mathematical Biostatistics Boot Camp
Lecture twelve on Bootstrapping. Today, we're going to talk about the tool
of bootstrapping which is an incredibly useful, handy result in statistics that
you can use in a variety of settings. It was made, sort of coincided with the
personal computer revolution. And so, it, it gives us a way to avoid an
awful lot of mathematics in biostatistics. Before we talk about the bootstrap, we're
going to talk about the jackknife, which is a precursor to the bootstrap that is as
its name suggests, a handy little tool. So, let's talk about the jackknife here
briefly, before we go into the bootstrap. The jackknife is as exactly as it's name
suggests, a handy little tool. The bootstrap, on the other hand, is like an
entire workshop of tools. The key idea in both the jackknife and the
bootstrap is, to use the data, so called resampling of the data, to get at
quantities that are difficult to get at otherwise.
For example, variances, and biases, and that sort of thing.
Now, we, we don't need either the bootstrap or the jackknife for something
like the sample mean where we know all its theoretical properties.
But, for other less obvious statistics, we need something that does it for us and,
You know, it'd be preferable if that something didn't require a year of
mathematics just to get us to the starting point.
And, in contrast, the bootstrap is you dream up a, statistic or something like
that. And you want to estimate a standard error
with it, you can start bootstrapping it immediately.
So, Let's talk a little bit about what the
jackknife does before we begin with the bootstrap because sort of, historically,
the jackknife came first. The first use of the jackknife was by the
statistician, I mean.
A butcher. His name, but I think it's pronounced
Quenouille. And, he used the jackknife to estimate
bias, I believe. Then, the jackknife was really popularized
and further refined by the extremely well-known statistician, John Tukey, who
we talked about a little bit in the lecture on plotting.
Tukey had numerous inventions including the fast discreet Fourier transform.
He coined the term bit for binary digit. He was the first person to do that,
And he did lots of things. He invented the box plot.
I think when you see it, you'll conclude along with me, the jackknifes a handy and
incredibly clever thing for someone to think of.
So, the idea behind the jackknife and similar to the idea behind the bootstrap
is to, you have something you don't know, like the bias in a statistics, or the
standard error of a statistic, and the idea is to use the data to get a sense of
it. Wellm what the jackknofe does is it says,
okay, well, one way to get at these quantities is to take one of the
observations out. And then, formulate the statistic on the remainder and see how
well the statistic does, you know, at estimating that one pulled out
observation. And this is very related to the idea that,
you know, frequently you hear of in machine learning and statistical
prediction of so-called cross validation. The jackknife tends to have a different
goal. In that, the goal of the jackknife tends
to be bias estimation, or variation estimation.
But the principle is very similar in that you're deleting observations.
Leave one out cross validation is typically used as an estimate of
prediction error. So, anyway, let's just focus on the
jackknife. And if you take classes in machine
learning or something like that, you'll talk about cross validation.
The jackknife deletes one observation and calculates whatever estimate you're
thinking of based on the remaining n - one of them.
And then, it uses this estimate based on n - one of them in which you get n estimates
having left out one observation one at a time.
It uses these n estimates to do something like, estimate biases and standard error.
And again, no, we don't need this for the sample mean. We know that the sample mean
is unbiased under certain assumptions, and we know exactly what the standard error of
the sample mean is under the standard setting. So, the jackknife isn't necessary
for those settings, but it's, maybe necessary for other ones.
So, let's just consider the jackknife for univariate data.
And let's let x1 to xn be a collection of univariate data points where we want to
estimate a parameter theta. And so, let's let theta be the estimate
based on the full data set. And then, let's let theta hat sub i.
Be the estimate of theta that you obtain, where you use the n - one observations
obtain by deleting observation i.. And then, let's let thta bar be the
average of the leave one out estimates. So, with that notation in mind, the
jackknife estimate of the bias of our statistic theta hat, is just n - one theta
bar minus theta hat. So, let's kind of consider the principal
of this before we've, get to why in the world that n - one is there.
So, theta hat is our estimate. Looking at how close it is to the averages
of estimates where we deleted an observation each time,
Is exactly going to give us a sense of kind of population level bias.
And then, you might wonder, where in the world does this n - one come from?
It's, factor that's based on the, For example, the sample variants where you
would experiment the bias of the sample of variants, it would give you the correct
answer. The n - one is sort of calibrated by,
statistics that we actually know. So again, this estimate is really related
to how far the average delete one estimate is from the actual estimate.
And then, this n - one is just a factor that was sort of, a good estimate of what
is the appropriate multiplier to have to get the bias to be an estimate of the true
bias. And then, the jacknife estimate of the
standard error is n - one over n times the sum of the squared deviations of the
delete-one estimates around the average of the delete-one estimates.
So, it's sort of like the square root of n - one times the variance of the delete-one
out estimates. You, so, again, the rationale for this
factor out front, The extra n - one, why not just, why not
just take the variance of delete-one out estimates as an estimate of the standard
error of the statistic? Well, it turns out that delete-one out
estimates because they have the majority of the data.
They have n - one of the data points included.
They tend to be quite close to one another,
And excessively close to one another. So, the variance, by itself, is not a good
estimate of the standard error of the statistic.
So, we need a fact, and they calibrated that n - one is a reasonable factor to do
that, And the same thing is true with a bias.
That, to delete-one out statistics tend to be a little too close to one another
unless you sort of multiply this by its estimate by a little, but you don't get
reasonable estimate. So, let's go through an example.
So, we had 630 measurements of gray matter volume from workers from a lead
manufacturing plant. The gray matter volume wound up to be
about 589 cubic centimeters. And, we want to estimate the bias and the
standard error of the median. And then, I'll come back to this
discussion of jackknife the median because that's where we're going to move forward
to the bootstrap. So, for example, the gist of the code to
do this. Now, you don't actually have to execute
the code. I'll show you in a page, how to do it.
But, you can do it in any language, not just R.
You just have to figure out how to delete observations one at a time.
So, let's let n just be the number of observations we have.
Theta hat is the median of these grey matter volumes.
And then, the jackknife estimates are the median that I obtain each time where I
delete the i-th observation, This sapply function is exactly that.
Then, theta bar, just exactly from the notation from the previous couple of
slides is just the mean of these delete-one out jackknife estimates.
Then, my bias estimate is going to be n - one times the difference between theta bar
and theta hat. And, the standard error is going to be the
square root of n - one times the average squared deviation of the jackknife
estimates around the average of the jackknife estimates.
And then, on the next page, it's a lot easier to do this. [laugh] If you want to
just use the software in the bootstrap library, you can jackknife, out is the
jackknife function is the list of my grey matter volumes and the function I want to
calculate the jackknife estimate of is the median.
And then, I assign that to a variable out, then I pick out the standard error and the
bias calculation. Both methods yield a estimated bias of
zero and a standard error of 9.94. And,
There's an odd little fact. The jackknife tends to work well for sort
of smooth functions, and empirical quantiles often don't satisfy that
requirement. The median is an example.
So, it's an odd little fact the jackknife estimate of the bias for the median is
always zero when the number of observations is even.
So, the medians an example where the jackknife isn't that good of a thing to
do. In general, if your function of the data, a nice smooth function,
The estimate that you're getting is a nice smooth function of the data, then the
jackknife will work fine. But, if it's not, then it tends to work pretty poorly.
In that, there was a very well known paper by Efron, the inventor of the bootstrap
that illustrated this quite starkly. And the jackknife has been shown to be a
linear approximation of the bootstrap. So, if you're in some setting where it's
going to be difficult to program off the bootstrap, then doing a jackknife, which
is a pretty simple thing to do, is a handy little tool to use.
And then, just to remind you, you know, don't use the jackknife for sample
quantiles. It's a handy procedure and it works in a
lot of settings, but maybe not for sample quantiles, like the median, as it's been
shown to have some poor properties. And what could you possibly use then?
Well, why not try to use the bootstrap. So, let's move on to the bootstrap which
is maybe a little bit more of a complete toolbox but it's certainly a little less
compact of a tool than the jackknife in exactly the way the analogy to the tools
sounds like. By the way, the term bootstrap comes from
this idea of pulling one out by ones own bootstraps, right?
And, you know, of course, This has been discussed a lot.
It's kind of an unfortunate title for a statistical procedure, because it makes it
sound like the information's coming from nowhere,
Right? Because you can't pull yourself up from
your own bootstraps. It's physically impossible.
But, you know, there's been plenty of theoretical work that shows where the
information is coming from, from the bootstrap in, sort of, when it is
applicable. Another thing I would note is this idea of
pulling oneself up from one's own bootstrap is from the fable of Baron
Munchhausen. And so, there's a great movie called The Adventures of Baron Munchausen.
And it was done by some of the people who made the Monty Python series.
If you get a chance, you should, you know, in honor of this lecture, watch the Baron
Munchausen movie. But, at any rate, from that fable is where
the term, pulling oneselves up from one's own bootstrap comes from.
And then, that's where they got the idea for the name from this procedure.
Any rate, Back to the jacknife.
So, another way to think about the jackknife is this idea of so called,
pseudo observations. So, if you take n times theta hat minus n
minus one times theta hat sub i, you can kind of think of these as whatever
observation I contribute to the estimate of theta.
And then, notice that if, if the theta hat is the sample mean, then these pseudo
observations are exactly the data themselves.
So, it's sort of this idea of taking what worked in a very neat and tidy sense for
the sample mean in trying to extend the idea to other statistics.
And then, the sample standard error of these observations is the jackknife
standard error. And, the mean of these observations is a
sort of bias corrected estimate of the parameter that you're interested in.
So, it takes your ordinary estimate and attempts to correct the bias.
I have to admit, for my thinking about the jackknife, I kind of prefer to think about
it this way in terms of the pseudo observations than in the, sort of,
classical development of it.