0:10

Hi, this video is about what you can do if you've noticed that you have some very

large weights in your data set.

So you're using inverse probability of treatment weighting and

you notice there's some very large weights.

You're concerned about it and so what can you do about it?

So we're going to discuss various options and also in particular we'll discuss

weight truncation and the sort of trade-offs that are associated with it.

0:35

So, the first thing I would probably do if I noticed there were some individuals that

had very large weights is to actually look at their observations.

So I would identify the people with the highest weights,

I would sort of pull out their data separately and I would look at them.

And with the main thing I would be trying to do is try to understand first of all,

why do they have large weights?

So let's imagine you've fitted a logistic regression for example,

you have all these coefficients.

And then you also have this person's data who has a large weight, and

you can kind of look and see what is driving it to have a large weight.

1:10

It might be just a whole combination of variables that happen to be,

these particular combinations of variables might happen to be,

of values might happen to be unusual.

But it also could be that just one of the variables takes an extreme value, and

so I would first want to look at that and understand what is going on.

So what's unusual about them?

I would be interested in just,

does this person really have data that are all plausible?

1:36

So is there a data error, is there something wrong with it?

And then I would also want to know, well,

maybe there's something wrong with my propensity score.

So it actually could be that there's nothing wrong.

It could be that your model's correctly specified and

this person really just does happen to have a large weight.

But I would first want to make sure that that's the case,

so I would investigate these various aspects of it.

2:00

So I'm just going to give you one example where, let's imagine to motivate this so

we had just a single confounder and let's say it's continuous.

We have a single continuous confounder X.

And let's imagine our propensity score model looks like this,

where it's logit of the probability of treatment is its linear function in X.

And so let's imagine that it looks like this where these tick marks here,

these red marks correspond to actual observations.

And this is sort of what the fitted curve would look like.

So you'll notice on the left side here we have probability of treatment,

so that's all between zero and one.

And our horizontal axis is the actual covariant.

And we'll see that almost all of our data are in the small range,

let's say between roughly negative 0.4 and

positive 0.4, so most of our data are right in here.

So we're estimating those betas from the previous slide from the logistic

regression model.

And it's almost entirely going to be based on how well our data

would sort of correspond to this curve just within this narrow range.

So we really want a particular set of betas that sort of fit well in that

narrow range.

However, we have this one outlier value, so

one person has a very unusual value of X.

So there's going to be varying, that person isn't going to provide very much

information about the fit of this logistic regression model,

about the estimation of betas.

Because almost all of the data are In this narrow range, so

there's this one outlier observation.

And you'll see that for this person, if you believe this logistic regression

curve, their value is here, which is extremely close to 1.

So there's somebody who is very likely to get treated according to our model.

So if they actually didn't get treated, they would have a extremely large weight.

However, we really don't know much about, there's a lot of extrapolation

that is actually taking place outside of the range.

So like, in this direction and this direction, we really don't know what

the true propensity score curve as a function of X should look

like outside of that sort of narrow range defined by those two vertical bars.

Because the data really informed the stuff inside of those vertical bars.

But outside there just really wasn't much data to tell us anything about it.

So that actual shape is determined by an assumption that we had that it's

linear on the logit scale, so that was an assumption we made.

We don't know if it's true or not, and so

much of our sort of inference there's a lot of extrapolation here, essentially.

So I'm drawing a hypothetical sort of alternative curve that you'll notice

that this red curve, this is just a hypothetical kind of curve that I drew in.

4:58

Inside this range again,

it seems like it probably fits just as well as the black curve does.

They're practically agreeing with each other,

but it doesn't sort of rise as high, as quickly.

So out here, we now would have a value that would be quite a bit lower.

It wouldn't be, the probability of treatment isn't one anymore,

it's more like a little above 0.8.

So then that person, if they weren't actually treated,

if this person wasn't treated, they would have a much lower weight.

And then so the point is, that there's a whole bunch of possible curves,

that would fit really well potentially in this area.

And the only reason we ended up with this particular black curve is,

well, the main reason is because we imposed this sort of linear on

the logit scale kind of assumption.

And that might be fine if we weren't doing such extreme extrapolating but

we have this outlier person.

So we have this outlier person here, and so we might end up with a large weight

because our model doesn't fit well out there and

we don't have enough information in our data to tell us anything about that.

So it could be that we've just made a bad assumption in our model and

that could be why we have the high weight.

Alternatively, it could also be, as I mentioned,

that this value just isn't a real value, it could be a typo.

6:16

Sometimes we'll see this with laboratory values in clinical kind of research,

where we'll sometimes you'll just get a value that isn't plausible.

So hopefully you've done data cleaning ahead of time,

range checks to make sure your data are all plausible.

But those are two possibilities,

two of several possibilities as to why you might have an extreme weight.

So this is really something to think about,

and ways in which you might want to refine your propensity score model.

7:43

So, I mentioned a common trimming strategy.

And this is just one thing that I have seen people do,

is let's say we are interested in the 2nd and 98th percentiles.

This is somewhat arbitrary, but we are talking about the extremes.

So, imagine we are looking at treated subjects.

Well, what we could do then is, treated subjects are going to tend to have higher

values of the propensity scores than control subjects.

So people with the very highest,

treated subjects who have very high values of the propensity score,

there might be nobody in the control group like them, and we want to trim them off.

So you imagine taking the 98th percentile

of the propensity score among those in the control group, so

restrict to the control group, look at the 98th percentile.

So what are the highest values of the propensity score in the control group, and

then in the treatment, in both groups really.

Then get rid of anybody above that,

anybody who has a higher propensity score than that.

You would do the same thing on the opposite sort of side of things, where

8:43

you would remove control subjects who had a propensity score that was below

the 2nd percentile of the propensity score distribution from the treated subjects.

So you take treated subjects, take their extreme left end tail and then cut it

there, and delete anybody, remove anybody from your data set who aren't in there.

So then you're left with sort of

this middle range where there should be a lot of overlap.

Doing that should eliminate some of the most extreme weights,

because the most extreme weights are going to occur out in the tail.

So this is a reasonable strategy in the sense of A,

it makes the positivity assumption more plausible, and B,

it should eliminate some of the extreme weights.

But just as a reminder, that does involve changing our population of interest,

our target population who are we making inference about.

So if we don't trim the tails, we're focusing on the population as a whole.

But if we trim the tails, we're now focused on a sub-population

who had a reasonable probability of getting either treatment,

whereas reasonable was defined by our trimming strategy.

9:49

So in general, I'm pretty comfortable with this idea

because it makes sense to sort of focus inference on the sub-population of

people who had some chance of getting either treatment.

However, it's something to keep in mind, because the population becomes a little

more difficult to define, the second you start sort of eliminating people.

Okay, so you could trim the tails.

Another thing you could do, whether you trim the tails or not,

is something called weight truncation.

Okay, so here what we're going to do is we're not going,

now imagine we have a population that we're going to stick with.

So maybe we've trimmed the tails, maybe we haven't, but this is our population now.

We have our weights and

we might decide ahead of time that there's some maximum allowable weight.

So we might only be comfortable with let's say a weight of 100 or

it also could be based on a percentile.

So we might say, let's take the 99th percentile of the weight distribution and

not allow any weights greater than that.

And what truncating then means is, it means then setting

10:54

the weight for anyone who had a value higher than that truncation point,

to set it to the maximum value.

So if, for example, you chose 100 as the maximum weight you'll accept,

then if somebody had a weight that was greater than 100,

you would actually change their weight to 100.

So you're actually going to replace any weight greater than your maximum,

you're going to set it to the maximum, so that's what we mean by weight truncation.

11:19

So this might sound a little strange because we're, in some sense,

we're changing data, we're just going in by hand and changing values.

So this does introduce bias, so if you don't truncate,

you have this problem of potentially having no easy estimators.

So you might have a very large variance, but

it's unbiased assuming your causal assumptions were met.

11:41

Whereas if you truncate, you're going to induce, or it's going to create some bias

because you're no longer properly weighting, but it's going to lead to.

Potentially, you might have a much less noisy estimate,

so it would have a smaller variance.

So again this is a classic kind of bias variance trade-off situation.

So if you decide to truncate, what you're deciding to do is accept some bias for

a smaller variance and you're hoping that that trade-off is worth it.

There could be situations where you truncate and

that induces only a small amount of bias but it greatly reduces the variance,

in which case, you'll probably be very happy with that trade-off.

And so a lot of times people judge the sort of was the trade-off

worth it based on what's known as a mean squared error.

So you could think of that as bias squared plus variance.

So it's been shown and at least in some simulation studies that if you do

truncate the weights, you can improve the mean square error.

So you can end up with a trade-off that's acceptable.

But of course if you truncate to an extreme degree,

you'll end up creating probably too much bias.

So there's some question about should you truncate and if so

where's the right place to truncate?

So typically what people are really trying to do is get rid of very extreme values,

so it's not unusual, for example, to just truncate it like the 98th or

99th percentile.

Alternatively you could pick an actual,

the main thing we're trying to do is get rid of the really extreme values where

they're much larger than any of the other ones.

In which case, I think there's reason to believe that if you don't truncate too

many people's, it's just a small number that had extreme weights,

that you'll probably be better off in terms of mean squared error.