So this section, we're going to show how to investigate effect modification with multiple linear regression. So far the only phenomena we've been able to look at with regards to confounding and effect modification was confounding, because we can use multiple linear regression. We can adjust it out, estimates of an outcome exposure relationship adjusted for potential confounders. And then can compare them to their unadjusted counterpart. To assess confounding we have to go beyond looking at unadjusted and adjusted comparisons together to assess effect modification. And I'll show you two different ways to do it in this lecture set using multiple linear regression. But the biggest take home message is that in order to investigate effect modification, one needs to be able to see different or separate estimates of an outcome, expose a relationship for different levels of a potential effect modifier. Not one overall adjusted relationship, but separate relationships. We're going to look at two different approaches to using multiple linear regression to investigate effect modification. And gain some exposure to the use of what's called an interaction term approach for estimating both the magnitude of, and assessing the significance of or the lack thereof, of potential effect modification via multiple linear regression. In other words, this will give us a formal test of whether there is effect modification or not. So let's just talk about effect modification in general. And we've discussed it in detail in a previous lecture set, but let's just talk about it a bit further. While there's usually a singular direction of interest scientifically, effect modification is what I'd call a two-way street. If Z modifies the relationship between Y and X, then X modifies the relationship between Y and Z. Again, scientifically, we're generally only interested in one direction. But the way we estimate it will apply to both directions when we get into the interaction term approach. But for example, if sex modifies the relationship between mortality and treatment, then treatment also modifies the relationship between mortality and sex. But scientifically, we're looking at whether we should prescribe the same thing to men and women. And our interest would be about sex modifying the relationship between mortality and treatment because we can alter the treatment but not alter the sex. But technically, this would also mean that the difference in mortality between sexes depends on what treatment they get as well. If ethnicity modifies the relationship between systolic blood pressure and age, then age modifies the relationship between systolic blood pressure and ethnicity. So we'll generally parse these results in one direction of the two possible. But I just want to remind you that it is a two way street in the sense that if A modifies the relationship between Y and X, then X modifies the relationship between Y and A as well. Effect modification results are easiest to interpret when at least one of the two pieces, the two variables we're looking at where one modifies the effect of the other, is binary or categorical. So for example if we want to look at whether treatment, drug versus placebo, is modified by biological sex which is binary, or whether height. So whether the association for example of height on arm circumference is modified, for example biological sex, height is continuous but sex is binary. Or whether the relationship between systolic blood pressure and age, for example, is modified by ethnicity. Your age is continuous, but ethnicity is categorical with five levels in this particular example. Effect modification can be assessed when both pieces are continuous but is messy to interpret so it's not generally done. So if we wanted to see whether the relationship between, say, arm circumference, and height was modified by weight, and both were measured as continuous. In terms of the interpretability of the results, it would be advantageous, perhaps, to take weight that was measured on a continuum and categorize it. We'll look at an example of a situation where the results of effect modification where both variables are continuous. We'll parse them in the additional example section but I'll show you that it is indeed messy. So how do we investigate effect modification with multiple linear regression? How do we estimate separate outcome predictor relationships for different levels of a third variable? Well, one way, option one would be to stratify our data into different subsets based on each level of one of the variables involved in our effect modification investigation. So if we want to see whether the relationship between mortality and treatment is modified by sex, we might separate our data into males and females. And we do separate regressions on each subset, so a regression only on males and only on females. And then we could compare the estimated slopes for each predictor across the different subsets. Let's see what the slope for treatment looks like for males. Let's see what it looks like for females. And let's see what the slope for other things, if we include them, how they compare between males and females. This is a good option especially if the researcher's concerned that one measure, the stratifying variable, like sex or ethnicity, modifies all outcome predictor relationships. So this allows us to look at different relationships between the outcome and all other potential predictors, not just one at a time. But it involves subsetting the data and does not use all of the data in any one analysis. We want separate regressions for each subset, and normally use the data in that subset. Another option that's more often used and presented in the literature is called an interaction term approach. What this involves, and we'll show the mechanics of, is creating what's called an interaction term between the main exposure of interest in terms of the effect modification, and its potential effect modifier. We create a term that actually couples information about the two things together, then run a regression that includes the predictor exposure of interest, the potential effect modifier. The interaction term as well, and all other predictors of interest. And we'll see how this plays out with several examples. And what this allows for is an assessment of a specific effect modification of interest while using all of the data in the sample to estimate the other adjusted outcome exposure relationships if there's other predictors in the model. Besides the exposure and effect modifier, or potential effect modifier. And this also allows for a formal statistical hypothesis test of whether effect modification exists. So to look at an example to get us started with the interaction term approach and to sort of get the idea behind it, I'm going to use a regression with two predictors only. I'm going to look at arm circumference, we've done before, regressed on height and sex of children. And we'll first go through the process that we did before, show how to investigate confounding and whether this relationship between arm circumference, height is confounded by sex. And we'll show what that looks like, that result where we estimate adjusted associations looks like visually. And we'll also talk about extending the analysis to look at whether the relationship between arm circumference and height is modified by sex. For this situation, with only two predictors, both of the options I laid out before, either separating the data by sex and running an arm circumference height regression only on males. And an arm circumference height regression only on females. Or by doing this one regression including what's called an interaction term between height and its potential effect modifier, sex, will give the same results. But if there were more predictors beyond height and sex, then splitting the data into sex specific groups and running separate regressions Versus doing the second interaction term based approach would give different results. So, let's start by looking at something we've looked before, the relationship between our circumference and height, and sex both in the unadjusted context and the adjusted. So these are regression coefficients from simple linear regression models of arm circumference on the height and arm circumference on sex. And then this is the adjusted estimates of each for a multiple linear regression model, with arm circumference on height and sex. So you can't even see the results are so closely aligned geometrically visually, it's hard to tell the difference. This solid line we're seeing here has a slope of 0.16 and ostensibly, that's the one line in blue estimating the overall relationship between arm circumference and height not taking any other characteristics into account and that has a slope of 0.16. We can see when we adjust for sex the slope of the height relationship does not change, but there are some differences between the sex groups that we should be able to see geometrically. Visually, we should be able to see two separate lines relating arm circumference to height, one for males and one for females. They both have the same slope of 0.16, but they'd be separated by a distance of -0.08. But because of the scaling of this, you can't actually visually see that. So I'm going to show it on this next slide here not drawn to scale. So we'll look at the unadjusted model for the relationship between arm circumference, height, and then what we get with the adjusted. So the unadjusted model is as follows. The estimated arm circumference mean for a given height is 2.7, the intercept, + 0.16 times height. This is for everyone, males and females combined, and that's this line right here with a slope of 0.16. The adjusted association is exactly the same as the unadjusted in that the relationship between arm circumference and height still has a slope of 0.16. That does not change at all when we take into account sex differences. But, this slope for sex now gives the vertical distance between males and females. So at any given height, the difference between males and females is 0.08 centimeters. The difference in average arm circumference for any given height is 0.08 centimeters, negative for females. Indicating that they have arm circumferences on average of 0.08 centimeters lesser than males of the same height and this difference, this -0.08 is consistent across the entire height range. Here we see a situation where after adjusting for sex, we get the same overall slope as when we didn't adjust. So there's no confounding by sex, but this model where we have sex and height as separate predictors forces the estimated slopes for the adjusted comparison to be the same for males and females. This adjusted slope could differ, of course, and the other adjusted. It doesn't in this case, but this model is pulling the information across males and females to estimate one common slope for the two sex groups. So if we wanted to extend this further? How could we set the model up to actually allow for estimates of separate arm circumference, height associations, depending on sex? In other words, we want to get one slope estimate of height for females and one slope for males and statistically compare those two. But here's how it works, this interaction term approach is pretty neat. It's a way of tricking the computer into doing something that will benefit us. And the way it works, and just follow me on the mechanics, we'll parse several examples of this to really drill it home, but follow me for a moment. What we're going to do is we're going to increase the model we fit before to include not only height and sex, but we're going to add something called inner action term. Which brings together these two things that were separately incorporated previously. And the interaction term is basically computed by taking, in the computer, generating a new variable that's the product of each child's height value and his or her sex value, where sex is coded as a 1 for females and a 0 for males. So this interaction term is only activated when we're looking at females. For males, this is equal to height x1 times 0. So this only gets turned on when we're dealing with female children. So let's see what the overall model estimates about the relationship between arm circumference and height for the two different sexes. So when x2 = 0 and we're looking at males, we're going to keep x1 as a generic placeholder because we haven't specified height. The equation looks like this, the estimated arm circumference being y hat is equal to this original intercept, beta naught plus our slope times height. And then we plug in a zero in the other two occurrences of it, both when it's its own thing and when it's a piece of the interaction term. And so these two things cancel and the model reduces to a model that looks like this. Arm circumference mean is equal to intercept plus a single slope beta 1 hat times height. So among the males and we'll see if its the same for females but we've only looked at males this far. The single number that describes the relationship between arm circumference and height is this slope beta-one hat. There's only one occurrence type in this model and its slope is beta 1 hat. Let's see what happens for females. When the sex is female our value for the sex variable is equal to 1. And so we're going to have more terms in this result than we did for males. So let's plug in 1 wherever we see the currents of x2 which is both on the term here, beta 2 times 1 and is part of the interaction term, height x1 times 1. And so if we do this and do all the math, we get beta naught hat + beta 1x1 + beta 2 times 1 + beta 3 hat times x1 times 1. So when the dust settles, you pick up a copy of beta 2 hat. We also pick up another copy of x1 with its coefficient here of beta 3 hat. So when the dust settles, if we factor this and combine like terms, the two pieces that are not connected to an x or beta naught hat + beta 2 hat. And then if we factor out the common parts, ax here, the overall piece we're multiplying by x in this model is the original slope of height hat beta 1 hat plus the slope for the interaction term of beta 3. So the slope for x1 in females is actually the slope we got for males, beta 1 hat, plus this additional piece, the slope of the interaction term. So, let's think about this for a minute. If there were no difference in the relationship between arm circumference and height for males and females, we'd expect them to have the same slope. And under this formulation if males and females were to have the same slope, the slope for males is beta 1 hat. In order for females to have the same slope this interaction piece would be 0. So testing whether the true value of the interaction term is equal to 0 is a test as a test to whether there is statistically significance affected modification going on. So let me put in some numbers here, and let's just do this out with numbers. Put in the values here for beta naught, beta 1 hat, beta 2 hat, and beta 3 hat. When we do this out again, for males it simplifies pretty quickly and the slope of height for males is 0.17. So for males, each additional Centimeter height is associated with an increased mean arm circumference of 0.17 centimeters. For females, when the dust settles, the slope is the original slope that we saw ended up in males, of 0.17, plus this slope of -0.02 for the interaction term. So this -0.02, this slope for the interaction term is estimating the difference in slopes of height between males and females. The difference in that relationship. And when we add that together, the resulting slope estimate for females is slightly less, by 0.02, than it was for males. It's 0.17 + -0.02, or 0.15. So notice, the intercepts are different. And I'll comment on why that is as well, for males and females, because the female part also pulls in this piece related to just the individual term for sex alone. So by doing this interaction term, we're able to turn on something in females that gets picked up in the height association that was not there for males. What would this look like in terms of what we have here? Well, this is what it would look like. We have, not drawn to scale again, this red line is the unadjusted association, y hat = 2.7 + 0.16 times height x1. These blue lines are the adjusted for sex. Each of them has a different intercept because of that difference between males and females at any given height. But the slope was 0.16 times height for both. And then finally, these black dotted lines are the interaction lines, the ones that give separate lines for males and females. Turns out, this the line for female, male's here. And this is the line for females. So you can see the males have the greater slope of point. And this is not drawn to scale, but this is the 0.17 and the females have that slope of 0.15. But what happens if we trace this back, if we trace this all the way back to zero, these two lines would intersect the y-axis at different points. And that's why we have different overall intercepts for males and females. The intercept for males was just the main intercept of the equation. And the intercept for females was the piece for males plus that slope for sex alone. So what you can see here is that the slopes are different. And that male's arm circumference on average increases more quickly, is a function of height compared to females. What you can also see now is that any given height, if I were looking at the difference arm circumference between males and females of the same height, it actually changes depending on what height I'm looking at. So that's what I was talking about in the sense of the two-way street of effect modification. Here, we can see that the relationship between arm circumference and height is modified by sex because the slope of height differs in these interaction results for males and females. But you could also say the relationship between height and sex is modified by height. Because the difference in average arm circumference between males and females depends on what height group we're looking at. So for those of you interested in more examples of investigating interaction with linear regression, there's another section a, part two to this lecture set, where we'll look at several more examples. And then also in the additional examples section, there's several more examples. But the big picture that I want you to get is that in order to investigate effect modification, we need to do something that allows for the estimation of separate outcome exposure relationships for different levels of an effect modifier. One approach that we'll explore in more detail in the next part, for those interested, is to split the data into subgroups. And look at regression results separately by subgroup and compare them across groups to estimate separate regressions of an outcome exposure. And get other variables for males and females, for example, and compare the results on each predictor. Versus honing in on one outcome exposure relationship, estimating separate associations while adjusting for other variables, using all the data at once by introducing the interaction term in another multi model. Where we have more predictors than just the two where we want to investigate effect modification.