This section will give a few more examples of investigating effect modification with multiple linear regression. For those of you who are interested in looking at the mechanics of interaction terms in further detail, and also looking at the approach where you would break the data into several parts and estimate separate regressions for each portion and we'll compare and contrast that approach to that of the interaction term. Let's go back to some results we looked at before, where we adjusted for multiple differences in characteristics of the children. We had this unadjusted column. We have the column where we looked at the relationship between arm circumference and height, and weight only. Each was adjusted for the other. Then we brought in age and quartiles, and then we also included, sex ultimately. So again, from the unadjusted model, if we focus on the relationship between arm circumference and height, it's a slope of 0.16 centimeters. In other words, the estimated mean difference in arm circumference should differ by one centimeter in height is 0.16 centimeters. If we go to model three over here, the slope of height is negative 0.1, and that's the estimated mean in difference in arm circumference for two groups of children who differ by one centimeter in height, among children of the same weight, age and sex. It's a different comparison than the unadjusted which includes all children. Now we're restricting it to children of the same weight, age and sex. What if a researchers is interested in whether the relationship between arm circumference and height differs for different levels of another predictor such as sex, or age, or weight? For example, what if a researcher wants to see whether the relationship between arm circumference and height is different from male and female children who are otherwise the same in terms of age and weight? They can't do it based on the results we've looked at it. This estimated slope for height here from model three, negative 0.10 is estimated under a model that estimates the same association between arm circumference and height, for a male and female children of the same age and weight. In other words, it's the adjusted association should mean difference in arm circumference between two groups of children who differ one centimeter height, and all of the same sex. It doesn't matter whether they're males to males, or females to females as long as they're of the same sex, and age and weight. This is the common estimate of the relationship between arm circumference and height for both sexes. What could the researcher do if they wanted to investigate whether the relationship between arm circumference, height differs for different levels of another predictor such as sex, or age, or weight? Well the first option is we could split the data into two subsets. If we were thinking of sex is the effect modifier, we can split the data into two subsets, males only, and females only and estimate separate regressions by sex of arm circumference on height, weight and age. We can compare these estimated slopes for their height, and their 95% confidence intervals from these two models. Here it goes. Here multiple linear regressions of arm circumference on height, weight and age run separately for males and females. There's males only, and females only. Let's look at the slopes of height for these two different sex groups. The weight and age adjusted slope for height for males is negative 0.04, and for females it's negative 0.18. They looked different. But there's overlapping 95% confidence interval. We can't formally test whether these are statistically different, but the overlapping confidence intervals would incur with them not being statistically significantly different, within each sex group. But because we've separated by sex, we can actually compare the results for estimating different results, potentially different results, separate results for each of the other predictors as well. The relationship between arm circumference and weight after adjustment for height and age in males is a slope of 1.23, versus a slope of 1.56 in females. Again, different magnitude of the estimates but the confidence intervals overlapped meaning that these are not statistically significantly different associations. Interestingly enough, while age is statistically significant predictors of arm circumference for males, after accounting for weight and height, it is not for female. You might say well, stop right there, we found something that the relationship between arm circumference and age depends on sex, because for males it's significant, and females it's not. Should report these results separately. But I would say given the small sample sizes, there's overlap in these confidence intervals for each age group indicator, and maybe the lack of significance here is mainly due to power issues. I'm not convinced here that we have differences in that relationship. But, nevertheless, one can present the separate results, so that one could look at differences in the arm circumference relationship between a arm circumference, height, weight and age all at once, for males compared to females. However, one might say look. I'm not really interested in the interaction or effect modification of the arm circumference, height relationship by weight or age, I only want to look at it with regards to sex, and when I will use all the data otherwise to estimate the adjusted associations with weight and age. Instead of separating the data and estimate separate associations for each of these predictors separately for males and females, we can take this interaction approach and end up with a result where we estimate separate slopes with the relationship between arm circumference and height for males and females, adjusting using all the data for weight and age. If we did this and this is how the researcher should present the results, not presenting the results from an interaction model, and give the coefficients, or slopes for each piece, but they should combine things where appropriate. What we have here is the presented separate slopes, the relationship between arm circumference and height for males and females based on this approach. The slope for males is negative 0.1, the slope for females is slightly smaller negative 0.11. These are both adjusted for weight and age, but you can see the confidence intervals overlap, and the t-value for testing interaction, testing that the underlying slopes of height for males and females are the same at the population level is 0.64. Again, even after adjustment, there's no statistical evidence of effect modification of the arm circumference-height relationship by sex. But this just helps how one could present this only as in effect modification for one factor, and then use the information in all the data. Otherwise and this approach would involve putting an interaction term into a model that included height, sex, weight and age as predictors. So, let me show you what this model looks like and we'll just go through a little bit of the detail to show you the genius of this approach once again. So, the interaction term from the model was created by taking the product of continuous height and binary sex, just like we did in the first time around. So, I'm not going to show the slopes for all the adjustment variables but they do exist. I want to focus in on the relationship between arm circumference and height. So the model will include and I'll just give the results up to and through the interaction term is overall intercept of 10.87, a slope of height on its own x1 of negative 0.10, a slope of sex on its own of 0.3 and the slope for x1 times x2 or height times sex of negative 0.01. So let's just write out what this model estimates with regards to the relationship between arm circumference and height for each of the separate sexes. Let's do it for males, they're easy to work with here. When x2 equals 0, this drops out of the model and the interaction term is 0 and so all we're left with for the height portion is the overall intercept plus a slope of negative 0.10 times height. So this is the estimated relationship between arm circumference and height for males adjusted for the other characteristics age and weight. For females, for these first three terms now we pick up everything. The negative 0.10 times x1 because they're female x2 is equal to one so we pick up 0.83 times one but that only concern sex, there's no height piece there. But then the interaction piece is x1 or height times sex which is now a one. So, we get another copy of x1 or height in the model for females that we did not get for males. So, if we rearrange things, take the things that are not connected to an x, the 10.87 and 0.83 put them over here, then if we factor out the coefficients of x1 into one sum, it's the negative of 0.10 here plus the negative 0.01 here and the overall slope for height in females once we've combined terms is negative 0.11 here and it has a different intercept as well for the entire line because of that differential between males and females overall. So, let's look at one more example of the use of an interaction term, we're going to bring back something we looked at when we first talked about effect modification. This is the example on dead and damaged trees, where an elevation for 64 US sites in the northeastern part of the United States and the majority of these sites were in northern states but eight of the 64 we're in southern states. So, what we had shown before was when we estimated one overall relationship between damage and elevation not taking into account region at all we got a pretty flat line and the slope was not statistically significant 0.009 times elevation in meters. But what we noticed is that the southern sites tended to have less damage and then we're at higher elevation and so we first ascertain there may be some confounding that this overall relationship is being pulled down because of the southern sites with greater elevation and less damage. So when we did the adjustment model we included both damage and elevation and region as predictors and reformulated to the estimated common slope for southern and northern sites was now positive and statistically significant 0.057. So, if we just stop there, we show a different relationship between damage and elevation for northern sites and southern sites and we could then the unadjusted association and we've said there was confounding and then that the difference in intercepts here is just the difference for any given elevation in the average damage for northern sites compared to southern. You can see that's constant because these lines are parallel. This is the adjusted association adjusted for region. But if we took a third approach and allow those slope of elevation to differ by region, we saw and it's hard to see now they've marked up the picture here but in southern sites, the slope was slightly negative. With increased elevation there was decreased damage on average. That's this line here, slope of negative 0.02 in northern sites, there's a positive slope of 0.091. So, very different qualitatively. Of course we could test now using interaction term approach whether they were statistically different as well. So, let's just recap these results and then we'll show how they're found by the interaction approach. So, this was the unadjusted slope, not statistically significant, very low R-squared when we adjusted for region we did much better in terms of R-squared and our result was statistically significant but we estimate a common slope of elevation on damage for both southern sites and northern sites. We said we'll average it across these two but we won't allow the slope to differ by region. When we finally did the third approach allowing the slopes differ by region, we got qualitatively different results. If we look at the confidence intervals, they did not overlap and the R-squared was even higher than when we took into account region but only adjusted for it do not allow for different separate relationships between damage and elevation by region but estimated one overall common adjusted association. So, here's what the model looked like, the adjustment model looked like this, y hat equals negative 10.3 plus 0.06 times elevation plus negative 0.498 times x2. This is the average difference in damage. Remember southern sites were much less damaged than northern sites. This is the average difference in percent data damaged trees among sites of the same elevation and this is the average difference in percent of data damaged trees per one meter difference in elevation among sites of the same region. The interaction model, what we will do is create a third term which is equal to elevation x1 times x2 which was region, which is a one for south and a zero for north. The p-value for testing whether this interaction term was statistically significant is less than 0.001. So now we have a formal answer as opposed to just comparing competence intervals for their separate slope estimates that these are statistically different. So, let's do this together and you can maybe stop this and try and write it out on your own and see if you concur. So, you could do this, you could stop the slide now and see if you can write out these results separately for northern and southern sites and get the difference slopes for elevation or you can watch me do it. Let's start with northern sites, they are easier because their value for region x2 is equal to 0. So that means we don't have to, this term will disappear and the interaction term is equal to elevation or x1 times x2 but when x2 is 0 that will be 0 as well. So what we're left with the northern sites is the relationship between damage and elevation is simply 0.09 times x1 where x1 is elevation. There's nothing else in this model and so the slope of elevation on damage for northern sites is 0.09. For southern sites, this is where x2 equals 1. We're going to have to bring in everything.Negative 41.2. Let's leave the height as unknown x1 plus negative 78.4 times one plus negative 0.11, times elevation x1 times region which is now a one. So, if we group our like terms, we're going to get negative 41.2 plus negative 78.4, these are the unattached or parts without x's and then the slope for x is now going to be, for x1 here is 0.09 plus negative 0.11. So, when all the dust settles we have a different intercept for this line for southern sites and a different slope of negative 0.02. So, this interaction term has allowed us to in the context of a single model estimate different associations between an outcome and a predictor for different levels of another variable. So, effect modification of an outcome exposure relationship can be investigated via linear regression in two-way. Option one will allow for separate estimates of the all outcome exposure relationships for different levels of the potential effect modifier, the data can be split into separate subsets based on the levels of a potential. Effect modifier and separate outcome exposures regressions can be run on each subset and the resulting slopes and 95 percent confidence intervals for each predictor can be compared across the models. Option two which is more common, when done and you'll see it referenced in the literature is via the interaction term approach. An interaction term can be created between the main exposure of interest in the potential effect modifier. Operationally this interaction term is created by multiplying those two are mentioned elements. Now, if one of our modifier variables was categorical and we'll see some examples of this in logistic regression, we may have more than one interaction term would be the primary predictor for each indicator of different categories, but it's just extends what we've done here but we'll start here with just a situation where effect modifier is binary in nature in these examples. What this allows for with the interaction term is for the estimation of a single instance of effect modification, the relationship between y and x is modified by z after adjustment using all the data on the other predictors. Investigating effect modification with or without using interaction terms works best when at least one of the predictors involved is binary or categorical. If both predictors are continuous it can be done and it's okay for prediction but in terms of showing the results and quantifying them and making them sensible and making them easy to interpret it doesn't work well. I'll show an example just to give you a heads up on that in the additional example section.