Hi, my name is Brian Caffo, and this is mathematical biostatistics bootcamp two, lecture eight on Chi-Squared tests. Okay, so specifically when we talk about chi-squared testing, there's a lot of different variations of chi-squared testing. Here we're going to be talking about chi-squared testing for contingency tables. and the most classic contingency table test is testing independence. we'll relate that to testing independence of several proportions. We'll go through the natural generalizations to higher order contingency tables. and then we'll talk about Monte Carlo variations to get exact tested independence. we'll finish it off with a discussion of goodness of fit testing, which is a special kind of contingency table test that's useful for testing whether or not data arrive from a particular distribution. Okay, so we've talked about testing in a 2 by 2 table, equality of two proportions. say binomial proportions. And an alternative approach to doing this the so-called Chi-squared testing. And the form for the Chi-squared test is the same. It's the summation of the observed data, the observed cell counts minus the expected cell counts. And I'll talk about what that means later. square divided by the expected. And we will talk about where the motivation for this formula comes as well. So the, here the observed and the re observed counts. The expected or the expected counts under the non-hypothesis, and this sum is over all of the cells of contiguity tables, successes and failures. And in the 2 by 2 table cast for testing a quality proportions, this ones are being I squared. this distribution. I know the null hypothesis has a Chi-squared distribution with one degree of freedom. and it turns out the, the, and what, maybe I'll, I'll assign this somehow as a homework. the Chi-squared test is exactly the square of the difference in proportion score statistic, where you have the The common proportion in the denominator for the standard error. and notice because the, so, so notice, basically the statistic is a, is a distance. It's the distance between the observed cell counts and what would you would expect to, to get under the model. but notice it's squared so there's no directionality. It's just whether it's, things are kind of different than what you would expect to see under the model, so, under the model, under the null hypothesis. so it doesn't test directionality, it doesn't test whether, it, it tests it, it's effectively to not equal to hypothesis. This is always. So, let's go back to our familiar example where we were, we had a treatment. here the treatment is x and y. Let's say x is the treatment and y is the placebo. And we want er, probably more accurately. Let's suppose we're comparing two equally effective treatments, and we want to see if one adds more. side effects in the other. So x and y are two different treatments, and we want to determine whether or not one has more side effects. Here, we were, we assigned 100 people to x and 120 people to y, and we're going to treat the binomial, the counts of the number of people with side effects out of the total as if they were binomial, that will be our model. Okay, and here are the rates of side effects in two proportions. p1 for x and p2 for y. And we're interested in whether p1 equals p2. Okay, let's see if we can logic our way through implementing this formula. So the x-squared statistic I've said is the summation of the observed counts, minus the expected, over the expect-, squared over the expected. Well part of it's easy, right? The observed counts o-11, o-21, o-12, and o-22, if we label the cells of the rows and columns of the two by two table that way, then we have those. 44, 77, 56, and 43. Now what about the expected counts? Well, let's kind of think this through. Okay. If the rate of side effects was the same, for the two groups, for the two for the treatments, then our estimate of the common proportion, just like in the denominator of the score statistic, would have to be 121 over 220. Right? And now, we don't know that, so let's, let's use that as our best estimate as for the overall proportion of side effects. Regardless of treatment because under the null hypothesis, the proportions are, or the true proportions are assumed equal. So that's our estimate of that. And then how many people we expect to see with the side effects. well if that's the estimate of this common proportion than 100 times that would be, we would expect to see 55. and so on. And you go through it we would expect to see, out of the 120 receiving the second treatment, we would expect to see 120 times this proportion or 66. I'm rounding here. The expected counts by the way don't have to be integers. They shouldn't be integers. You should carry out the calculations to, y'know, many decimal places. I just, I don't know why I round them, I think for didactic reasons, I rounded them up for images here, for integers here. But in general, the expected counts don't have to be integers in the same way the observed counts must be integers. so any rate, just a reminder, you want to carry these calculations out not, not just round them at 55. Okay and so on, and then, and then 1 minus this probability is then 99 over 220. and that's how many, non-side effects we would, at times 145, is how many non-side effects we would expect in the x treatment and 99 times 120 or 54 is what we would expect in the Y treatment. So these counts here on the right hand side. Represent what we would expect to see under the null hypothesis using our best estimate for the common proportion. So the margins the 100 and 120 are fixed by the design, and the best estimate of our common proportion is, is that one, so this is kind of our best guess is what we would expect to see. And if these counts are very different from the observed count, then that would shed some light maybe that the new hypothesis is not true. Okay, so then our test statistic is 44 minus 55 squared over 55 and so on. add all those up. And it turns out to be 8.96. compare that to a chi-squared with one degree of freedom. And again, we're rejecting for large values, right? Because this is the distance between the observed and the expected counts, we favor the alternity of the further away from the expected counts we are, so the bigger the test statistic is, is going to favor the alternative, so we're going to reject for large values. So lets do pchi squared 8.96. It's one degree of freedom when you have a 2 by 2 table. I am going to give you a general role when you don't have a 2 by 2 table. And we say lower tail goes false, because we want the upper probability, not the lower probability, the result is 0.002. In the other way you can think about this of'course is chi square with one degree freedom, is actually the square of \a standard normal. So it's unlikely for a standard normal to be above two or below minus-two, right? There's only a 5% chance of that happening. So Chi-squared, it's unlikely going to be above four, right? The square of two and the square of minus-two. So that's going to have about 5% probability, so chi-squared over about four is about the same benchmark as a normal of about two. A chi-squared of about nine is about the same as a standard normal for about three. Again, remembering that you're testing both bigger than two and less than two, or bigger than three and less than three, because remember the chi-squared. Always does a two sided test. So in this case, the result is 0.002, there is some evidence to suggest that there is a difference in the rate of side effects between the two treatments, though of course the side effect, the result of the chi squared test doesn't tell you which direction it goes. [INAUDIBLE] Okay. So that's how we do it. And here's some simple R code for executing it in R E, so you don't have to do the calculations with a calculator. so we just create a data matrix. It's this matrix command here, and then chisq.test(dat). And then, you'll notice, if you do this, you don't get exactly the same test statistic that we got. And the reason is because the chi-squared approximation you know, it's the, it's an asymptotic approximation. But the counts are discrete, and you can improve the chi-squared approximation by fudging a little bit, you know, in, in the way that, that, you know, In a way that kind of, if you're doing kind of numerical integration with boxes, you can maybe do a little better with trapizoids, or something like that. It's along that line of thinking. And that boils down to adding a little bit to every cell. And that, that's called a continuity correction, basically accounting for the fact that the cell counts are discon, are, are counts, and it can improve on the asymptotic approximation. So if you actually put in correct equals false, it won't do that continuity and correction, you'll see then that you'll get the exact same answer that we did. You do want to do the, so for didactic purposes, we're not presenting the continuity correction, but when you actually do the test, you want to leave correct equals true on there, because it does yeild a better approximation, that's why it's the default in r. Okay, so let's recap. We're going to reject if the chi-squared statistic is too large or is large. the alternative is always two sided. You know, you're always comparing whether the proportions are different. You do not divide your alpha by two, even though it is a two sided test, remember, we're dividing, the reason we're dividing the alpha by two In the standard Gaussian cases, because you're checking bigger than, you're checking less than, because we've squared the statistic and the chi-squared is only a positive statistic, we don't need to do that. alpha divided by 2 for the quantile. a small chi-squared statistic implies little difference between the observed values and those expected under h nought, so it supports h nought. you can think of the chi-squared statistic as actually distance and then what we'll talk about, and it's really kind of a fun subject I think, the chi-squared statistic and approach generalizes to other kinds of tests and larger contingency tables. It's also one of these phenomenons that often occurs in statistics where the same procedure arises out of several different settings and data structure. And so the interpretation changes but the actual procedure stays the same and we'll go through another one next where we have we think about this problem in a different way we wind up with an identical procedure. And that happens a lot in statistics where even though you think about the problem in a different way, you get the same procedure. The mean is general, is frequently a good estimator you know even if the data is IID exponentional, or IID normal, it's going to estimate the mean into the data well. And so the mean, is, you know has, it pops above over the place. Well you know the same sort of thing happens in Chi-squared testing where you get kind of the equivalencies despite very, very different sampling strategies and, and assumptions underlying the data. And there's a neat com-, computational form. In a two-by-two case, it looks like this, where I'll go, I'll have the notation on the next slide. But where here the n i,j's are the cell counts. And you put a little plus in front of the index, if you're summing over that cell. So these are the margins. And here briefly is the notation that I'm going to be using, where i-j indexes, cell count. I'll call n plus one meaning something over the first index. So that's this margin. N plus two is this margin. And one plus is this margin. And two plus. Its this margin. And if I need to refer to n1 and n2, let's say those are the row margins, and then n'll be the sum, this, this, corner cell right here. The sum of, the total number of observations. So, you know, kind of an interesting fact about the chi-squared statistic if we look at it We transpose the table the statistics doesn't, doesn't actually change its value, which is kind of interesting, right. It, it means it doesn't care about which margin is sort of fixed by the samp, the study design. It, it, it's, it doesn't care about that. and it, and it, and if you errantly was thinking of, of side effects as being the, outcome, you get the same thing as if you were thinking about which treatment they received was the outcome. You are going to put the same test and you know only one of them, they are correct, way in which the experiment was conducted. So that's interesting, and it ties into, its utility then ties into a lot of these instances where, you, you really want to play around with the interpretation of the, of the, what variable is the outcome and what variable is the predictor, and we'll talk about that when we talk about case control studies. It's interesting that in case control studies, some of the fundamental work was done right here at Hopkins by, by person in cornfield. Any rate, the so the, so the Chi Squared statistic, it can rise. You can state a model for which the Chi Squared statistic is kind of the obvious thing to do. if the rows are fixed, so you have binomial, you can do it. If the columns are fixed, which is just maybe a different kind of binomial if you're. You know, again here, we're assuming. Binomial Ness of the data, neither of these cases. But let's imagine if you didn't assume that the, data were in binomial that the rows or columns are fixed. Let's say that you assumed that only the total sample size is fixed. So imagine as an example of that You collected, instead of randomly assigning 100 people to receive the treatment, and randomly assigning 120 of the other people to not receive, to receive the other treatment, imagine if you just happened to go out and collect a bunch of people, and ask them what treatment they had, and ask them whether or not they had side effects. Now, granted, it's a different experiment, they way in which you would interpret the result of a chi squared test would be different. But but there you, you would, you would might think oh, I sampled 220 people, you know? That was really my, the part that was fixed by the design. And you know, however people fell out in terms of s-, side effects and In treatment, well, you know, that's part of the randomness, and, and so I'm going to model that whole thing as if they were multinomial with four possible things. They could be taking treatment x with no side effects, treatment x with side effects, treatment y with no side effects, treatment y with side effects. I think I hit all four, but you see what I mean. There's four possible elements in a two by two table. So in that case, it wasn't fixed. And I'll go through it right now. But you wind up with the same exact test statistic, you wind up with the same exact test statistic if you assume multinomial-ness. And the specific null hypothesis, namely, a test of independence. So that's, I find that cool. And that's why often, people don't, people are a little bit loosey-goosey about their assumptions in multinomial settings, because they kind of apply in different ways. And I do think there's a lot of problem with that, because Yes, the number of the, that comes out from the test statistic is the same. But, it doesn't the interpretation's very different, different, different, experiments lead to very different interpretations. Very different interpretations of the assumption. in, in my just example I just gave you, in one case we randomized a treatment, or in my fictitious case that I'm making up, in one case we randomized the treatment. And in the other case we just went out and sampled people observationally. Those are very different interpretations. I think I would, you know, most people would agree that the randomization would kind of average over confounding effects and other things that on, on observed variables where in the observational case, that, that wouldn't necessarily be the case.