And start. So let's talk about investigating effect modification with multiple logistic regression. So in this section, we'll, again, look at two different approaches to using multiple logistic regression as we did with multiple linear regression to formally investigate effect modification. And we'll focus more on the second option, which is using the interaction term or estimating the interaction model for estimating and assessing the significance of potential effect modification via a multiple logistic regression. Just some reminders about effect modification in general, this is the same preamble I started with in the lecture regarding effect modification in multiple linear regression. But while there's usually a singular direction of interest scientifically, effect modification as a two-way street, so speak. If Z modifies the relationship between Y and X, then X modifies the relationship between Y and Z. So, for example, if sex modifies the relationship between mortality and treatment, then technically, treatment modifies the relationship between mortality and sex. But the researchers are more likely to look in the direction of sex modifying the relationship between mortality and treatment because they may end up making different treatment decisions based on the sex of the person. They can change the treatment easily based on the sex, they can't change the sex easily based on the treatment they get. If ethnicity modifies the relationship between systolic blood pressure and age, then age modifies the relationship between systolic blood pressure and ethnicity. And again, effect modification results are easiest to interpret when at least one of the two pieces is binary or categorical. It can be assessed, and we showed an example with regards to linear regression when both pieces are continuous, but it's messy to interpret. So how to go about investigating effect modification with multiple logistic regression? Well, one thing the researcher could do is split the data into difference subsets, based on each level of one of the variables involved. So if we were looking at whether an outcome and exposure relationships were modified by sex. Maybe split the data out by sex and then estimate separate regressions on each subsets. Of religious regression, we could either compare the estimated slopes or their exponentiated resulting odds ratios for each predictor across the different subsets. This is a good option potentially if the researchers are concerned that one measure, the stratifying variable, modifies all outcome predictor relationships not just one outcome exposure relationship. So in here is an example where this was done in presentations of recent research results. This is from an article in the American Journal of Public Health. This was looking at adolescents in Asian/Pacific Islander adolescents and the interest was whether or not both suicide ideation, thinking about suicide and actually attempting suicide, were related to the student's sexual identity. Whether or not they were identified as being homosexual or not. And the researchers for what they did was they looked at the outcome of conceptualizing suicide or thinking about it and also attempting it as a function of just self-identified homosexuality on it's own. And then also that same as a function of self-identified homosexuality, but also after including information on the ethnicity or whether or not the person had been involved in a relationship with physical abuse. Whether they experienced or were suffering from alcohol abuse, and whether they had feelings of hopelessness. So, for each outcome, they had the unadjusted association between suicide ideation and homosexuality and then that adjusted for other characteristics. But they separated these models, these results separately for males and females. So we get different estimates to the relationship between suicide ideation and homosexuality both unadjusted and adjusted for boys and girls. And we can also compare the results relating to the control variables in the adjustment model. We get separate estimates for boys and girls as well for each outcome. So again, this is what it looks like for the boys. They have a set of regressions related to the suicide ideation outcome and also also suicide attempts they related to self-identified homosexuality, both unadjusted and adjusted for other characteristics. But they do this separately for girls as well, and then we could compare all of the results. But what the authors were really focusing on, despite the fact that they split the data into two sets, is they were interested in what the relationship between the odds of suicide ideation. And attempts look like for those who identified as homosexual to those who didn't, in boys and separately in girls. So let me just hone in on that piece here. And here the odds ratios both unadjusted and adjusted of having attempted suicide for those who identify as homosexual compared to those who don't. So, among the boys, those who were identified as homosexual had five times the odds of having attempted suicide. In boys, the association of that large increase in the estimated odds suicide attempts was not confounded by the ethnicity of the boys, whether they'd been in an abusive relationship, whether they abused alcohol, etc. In girls there was also a positive association between suicide attempts and identifying as homosexual. 2.5 times the increase in the odds of attempting suicide females who identify as homosexual versus those who don't. In the unadjusted version, it attenuates slightly, but is still a large ratio. In the adjusted version, technically is no longer statistically significant, but nevertheless, the estimate shows a great increase in odds. However, if we compare these for males and females, both of them show positive associations, although after adjustment it's not statistically significant for females between identifying as homosexual and the increased odds of suicide attempts. But the magnitude of the increased odds is notably larger for boys than girls. Now, the confidence intervals across over here and that may be a function of power of sample size. So we can't make a clear cut conclusion whether these are statistically different, but I think at least at face value, it's interesting to note that the ratios are much larger for boys, and we could think about why that is sociologically. So another approach the researcher could take, and the authors of the last article could've done this ostensibly, because they ultimately only focused on that one predictor, self-identified homosexuality, and then the difference in those odds ratios for boys and girls. They didn't concern themselves with differences in the other associations in the models that included ethnicity, etc. What another approach would be to create an interaction term between the main exposure of interest in terms of effect modification and it's potential effect modifier. Then run a regression that includes the exposure of interest, the potential effect modifier, the interaction term and all other predictors of interest. This allows for an assessment of a specific effect modification of interest, while using all of the data to estimate other adjusted outcome exposure relationships. And this also allows, as we saw with linear regression as well, for a formal test, a statistical test of effect modification. So I'm going to start. I'm going to use a regression with two predictors only. We will investigate confounding and effect modification using regression and then show the results visually as well. For this situation, technically speaking, with only two predictors, the two approach either separating the data by age categories and running separate outcome exposure relationships over the effect modifier groups, will give the same results. This would not be the in case if we included more predictors for adjustment. So what we are going to look at is something we looked at earlier in this lecture set on logistic regression was associations between obesity and HDL levels. And we're going to include also an age of a person as a potential predictor. So we had shown more extensive models when we were adjusting but I'm going to hone it in on one now where with only two predictors, we're looking at ultimately in an adjustment model are the cholesterol levels in milligrams per deciliter and the age. And so we've showed, or I'm showing now, that if we look at the unadjusted association between HDL and obesity after adjusting for age, the results are almost identical. The results are very similar numerically indicating that the relationship between obesity and HDL was not confounded by age. However, there are differences in the relative odds of obesity for different age groups amongst persons, after adjusting for HDL. So amongst persons with the same HDL. So all the groups older than the reference, the youngest 18 to 32 had higher odds of being obese amongst younger persons of the same HDL level. And this is what this looks like visually on the log odds scale. And climbing the log odds of obesity because remember, that's what's ultimately being modeled for logistic regression. And the slopes we get are then exponentiated to get these adjusted and unadjusted odds ratios. But I'm going to plot on the log odd scale log because that's the scale we work on to start with regression. And what I have in the red line, here, this shows the unadjusted associations. This is just a model that includes only HDL as a predictor and the slope for HDL could be found by taking the natural log of 0.967 and then the subsequent nearly parallel four blue lines showing relationship adjusted for age. And, again, the results were almost identical in terms of the association between obesity and HDL, regardless of age. And that's why these four lines are parallel to the overall unadjusted line, because their slopes are almost identical. But the differences in these four lines visually on the log odds scale, these are the shifts up and down, depending on which age group the subjects are in. So the one with the lowest long odds, at any given HDL level, is the reference group of 18 to 32. And then, these other three lines shifted up respectively from that, at any given HDL level by an amount related to the slope for the indicator of that age group. So that's what we have visually here. So just again, the results look this on the odds scale, the unadjusted association of the log odds of obesity is equal to intercept of 1.20 plus a slope of -0.034, that's the slope of this red line. And the adjusted associations include the indicators for the age groups. So this is the slope of HDL. This group is -0.035 and then this shifts up on the log odd scale from the reference group. The shifts in those lines vertically are given by these three slopes for the age indicators of age quartile two, three, and four. But again, the adjustment model assumes that after we've adjusted for these overall differences in age, we're estimating a common association between obesity and HDL. So how could we expand this from an interaction term approach to see whether the association between obesity and HDL differs depending on age group? Well, this looks messy, but we'll be careful to parse it. What we would do is we have a main exposure of interest which here is HDL, and we have this other variable that we included before, which is a multi-categorical variable with three indicators, because it has four categories. This is the age quartile indicators And we're going to need, since there's three levels of age quartiles, we're actually going to need three interaction terms between HDL and each of those indicators. So this first interaction term here is interaction between HDL, variable, and the indicator for age group two, for example. And then the second one which I've noted down here is interaction between HDL and age group three. At end, well, you can figure this out and it's already written here but I'll just for completion, write it that this is the interaction between HDL and age group four. So how does this work in terms of the accounting and parsing this? So let's just look at couple different age groups and see what the resulting model gives us to represent the association between obesity and HDL. So for the reference age group, this is the easiest to work with because all of the indicators for age groups x2 through x4 are 0. And because each of the interaction terms involves multiplying HDL by each of these indicators and they're all 0, the three interaction terms are all 0 as well. And so we're left with this relatively simple model when we're only focusing on age quartile 1. This model says the log odds of obesity is equal to the intercept plus the slope times this predictor of HDL. So when all the dust settles for this reference group, the one number summary that summarizes the relationship between obesity and HDL is this single number beta one hat. What happens for each quartile 2? Well, now we're starting to get a little more complex because now one of our indicators is activated. In each quartile 2, x2 is equal to 1. So that means the interaction term that it relates, HDL and age group two together, will be activated as well. It's equal to HDL, which we call x1 times this indicator, x2, and we know for quartile 2, this is equal to 1. So what we get here is x1 times 1, which is equal to not 1, but x 1. So we just pick up another copy in this interaction term of x1, so let's see how this plays out in the equation. The log odds of obesity is equal to this overall intercept plus our slope times HDL. Plus slope times x2 which will be turned one, plus a slope times x5, the interaction term between HDL and age quartile 2. Substituting the value of one wherever we have an x2 would go like this. We get the intercept plus the slope times HDL plus beta 2 hat plus 1, plus beta five times the interaction term which is just HDL times 1. So if you do write this out and combine like terms, we can collect the beta naught plus beta 2 here, and then if we factor out the common factors for x1, it's beta 1 hat as it was before, plus this additional piece, beta 5. And so when all the dust settles, the slope for HDL in age quartile 2 is beta 1 hat plus beta 5 hat. Remember the slope in age quartile 1 was simply beta 1 hat, so this number, beta 5 hat, estimates a difference in the association, In the obesity, HDL association, For age quartile 2 compared to age quartile 1. So it's not a difference in obesity between age quartile 2 and age quartile 1, but a difference in the relationship between obesity and HDL for these two age quartiles. Similarly, for age quartile 3, we would replace x3 with a 1 because that's the indicator of age quartile 3. And our interaction term between HDL and age quartile 3 is x6 which would equal x1 times x3, but again x1 is a 1. So this would take on just a value of x1 for that age group. So writing this out, again when you get something like this, the log odds of obesity will bring in the components that will persist in modal for age quartile 3. We get the intercept plus the slope times HDL plus the slope for age quartile 3 times x3, which will be a 1, plus slope for the interaction between HDL and age quartile 3 times that interaction term, which will ultimately be x1 times 1. Substituting in one wherever we see age x3, gives us this result. We pick up another copy of x1 here. And so the resulting slope of HDL, the relationship between obesity in HDL in each quartile 3 is quantified by that original estimate beta 1 hat plus beta 6 hat. So, again, this is the association, beta 1 hat, was the association between obesity HDL in quartile 1 and beta 6 is what we add to that to get the association in quartile 3. So beta 6 estimates the difference in that association between the two age quartiles. I'm not going to show it to you for age quartile 4, I think you could figure it out and guess what it is, without doing the work on your own, but if you want to give it a whirl, then we can discuss. So, for those of you interested in thrilling conclusions to this, where we actually bring in the numbers from the output of the computer, then actually compute some of these, and then ultimately present them on the odds ratio scale. And we'll talk about a appropriate way to do this, you wouldn't want to present the results from a model with multiple slopes, either unexponentiated or exponentiated. And you want to put together a cogent summary with regard to the effect modification. We'll do that in part two, or the next section. So for those of you who are interested in going beyond the mechanics here and actually getting into the numbers and the numeric result, that will be in the next section as well as another example for those interested.