Okay, so now is a good time to talk about confidence interval. confidence intervals and their connection to hypothesis testing since we've talked about hypothesis testing and two sided hypothesis tests. So now, consider testing H naught mu over, equal to some value, versus H a mu different from some value. Like we just discussed in the previous slide with respect to the Respiratory Disturbance Index. And then take the set of all possible values for which you fail to reject H naught. Now think about this set. These are, these are in, in some level the values of mu. that are supportable as null hypotheses. So they're, they're reasonable values of mu. So it isn't that big of a stretch to, to acknowledge, or to to guess that this will form a confidence interval for mu. What's interesting is that it forms exactly a 1 minus alpha percent confidence interval for, for mu. So, if you have a 5% type one error rate for a set of tests, then you have a 95% confidence interval, which is nice. and then, the same works in reverse, which is probably even the more useful direction for us. If 95% intervals say contains mu not, then we fail to reject mu naught, right? Which makes sense, right? The, the value of mu naught was supported as a potential value when we created the confidence interval. So it would make sense that we'd elect to, to, to fail to reject H naught. Then, then, to conclude that mu was different from mu naught. And, in the next slide, we'll go through the argument. Okay, let's just briefly go through this argument. so consider that we do not reject H naught, for a two-sided test of mu equal to mu naught, versus mu different from mu naught. If our test statistic, absolute value x bar minus mu naught divided by the standard error, s over square root n. If that's less than the t quantile, the valued it at 1 minus and n minus 1 degrees of freedom. And 1 minus alpha over two quantile. So, you remember, when we reject, if it's bigger than that particular t quantile. Okay, so we can just, the s over square root of n is positive so we can just move it over to the right-hand side and we have a [UNKNOWN] to the inequality. And this inequality here absolute value of the x bar minus mu naught less or equal to t, this t quantile times s over square root of n. And that's exactly equivalent to the statement below that the mu naught lines is in between x bar minus t times the square root of standard error. And so and so that is exactly the same as saying mu naught lies inside the confidence interval. So this is equivalent to saying, if mu naught lies inside the confidence interval, then we would have failed to reject H naught. And then you can obviously reverse this argument to get the other direction. So that proves the statements we made from the previous slide. And it shows that this, sort of inherent duality between confidence intervals and two-sided hypothesis tests. This has several uses. First of all, it, it'll, it tells you if you create say, a 95% confidence interval, it conveys a little bit more information than the result of hypothesis test. Because A, you can do the hypothesis test. but B it also gives you a sense of what values of mu are sort of, well supported. and this helps to combat things like the difference between scientific significance and statistical significance. Where statistical significance you know, if you, you know, if we have a giant, giant sample size, and our x bar, is, from our respiratory disturbance index example. If our x bar is 30.01, well that isn't very different from 30, and it may not be scientifically meaningful. The confidence interval would both show us what values of mu, what range of values of mu are, are estimated. And that, that in fact, our interval's quite close to 30, even if 30 isn't right in it. plus we can actually mechanically perform the hypothesis test. So that's why, I think, in general, people have a preference. If, when you can, if you could report a confidence interval rather than, simply the result of a hypothesis test. Okay, let's introduce the concept of P-values. So, when we had a sample size 100, and we were doing the z test. we rejected the one sided hypothesis test when alpha was 0.05. would we have rejected if alpha was 1%, or how bout 0.001 percent? Okay, now of course at some point, we're going to get to the point where the z quantile is larger than our observed test statistic. And that will correspond to a specific alpha level. And that value, the smallest alpha for which you still reject a null hypothesis is called the attained the, that's exactly called the attained significance level. and this is equivalent, but philosophically I guess a little bit different from an entity called the P-value. The P-value, on the other hand, which, which again is the same number but conceptually is a different thing in my opinion. The P-value is the probability under the null hypothesis of obtaining evidence, as or more extreme than would be observed by chance alone. where here chance is governed by the null distribution. the null probability distribution. So here's the less, if P-values were invented by the the great statistician Fisher. And here's the logic by, so the attained significance level has a kind of a easy logic there to it, right? You know, it would just say, you know, why don't we just report the smallest significance level for which you fail to reject. Then if I give someone that number, then they'll know, whatever their alpha level is, whether or not they reject. If their alpha happens to be bigger than the smallest significant level. Then, then they would reject if it's smaller than the smallest significance level, then it, then they would fail to reject. So, they obtained significant, thinking about the P-values and the obtained significance level, at that level it's merely just a convenient thing to report. Because regardless of a person's alpha level, then they can compare it to the P-value and tell you whether or not they reject. The P-value, on the other hand, has an, has a more interesting, interpretation. Because the idea is that it, it is at some level, people, claim that it's, it's, it's a measure of evidence. so, here, here's the logic. If the P-value is small, then either, the null hypothesis is true and we've observed something that's very unlikely given that the null hypothesis is true. Or that the null hypothesis is false. And that's why Fisher introduced the P-value. The P-value he thought was a, the, was a convenient calibrated entity, because it was a probability. That would tell you, sort of, in a sense, whether or not getting a test statistic as or more extreme than you observed, wa, was rare under the null hypothesis. It is, it is, and if it was rare, then that cast some doubt on the veracity of the null hypothesis. And, you know, I think this use of the, the, the P-value as a, as a measure of evidence is a little bit more controversial of an entity. The P, attained significance level which is again the same exact number, it's just a different interpretation. The attained significance level is maybe a a less controversial entity. It's merely just telling you it's it's a mere mathematical answer to the question. What's the smallest alpha level for which I would have rejected the null hypothesis? Okay, so, let's calculate our P value from our from our example. lets do it for our T statistic. so if we're thinking that the sample size is 16, our test statistic was 0.8. What's the probability of getting a T statistic as or larger than 0.8? Well, this is, pt, which stands for T probability, 0.8, 15 degrees of freedom. And this lower tail equals false just means that it, we want above 0.8, not below 0.8. So, this works out to be, 22%, of course, it's larger than, say, 5%, which we knew. Because we fail to reject the null hypothesis, if the P-value's larger than alpha, you're going to fail to reject, if it's smaller than alpha, you will reject. Okay, so the probability of seeing evidence as or more extreme than actually obtained, that probability calculated under the null hypothesis is 22%. Okay, let's just show the calc, the, the, computing of the P-value, in this case. So our test statistic, x bar minus 30 over s over square root 16, worked out to be 0.8. So the probability, let's just see, the probability of, of being 0.8 or larger from the T-distribution was 15 degrees of freedom works out to be 22%. So here, let's draw a picture, so there's our T-distribution. and then, so right here is, where our test statistic is 0.8, the probability of lying above it, from the t-distribution is 22%. And so you can see this area right here. obviously 0.8 then is below the fifth, the fifth upper quantile, which would be up here somewhere. and so we would know that we would reject, but we also know that because 22% is larger than 5% it just given our P-value that we would fail to reject. Okay, let's do some notes. So, by reporting a P-value, the, the reader or whomever can form the hypothesis test of whatever alpha level he or she chooses to. that's because the P-value is mathematically equivalent to the obtained significance level. So, if the P-value is less than alpha, you reject the null hypothesis. If the P level is bigger than alpha, you fail to reject. so for two sided hypothesis test, my recommended P-value calculation is just going to be the, double or smaller the, the, to one sided hypothesis value P-values. That's an easy procedure. It's generally right. so don't just report P-values right away, just give confidence intervals, it's a little hard when the problem is harder than one dimensional. but if it's a one dimensional problem, then you have no excuse to give a, give a confidence interval not just a P-value. Okay, some final thoughts about the P-value. You know, one of the problems of P-value is they only consider significance, unlike confidence interval which so, P-value, it's difficult to distinguish practical significance from statistical significance. It's by itself, it's just too, a little bit too crude of a summary of your data. It's a tremendously useful quantity, but it needs to be used with care. there's quite a bit of work on, on the philosophy of whether whether p values measure evidence. And, and the argument against the P-value is that absolute measures of the rareness of an event are not necessarily good measures of evidence for or against a hypothesis. And that's the the intrinsic, philosophy bind that the P-value is that, it's measure of evidence virtue of being the measure of the rareness of the hypo, of the null hypothesis in a certain sense. And certainly P-values can become somewhat abusively used and frequently misinterpreted. And, and that's one of the main issues, is that the actual interpretation of the P-value is hard. It's the probability of obtaining test statistics as a more extreme in favor of the alternative, where the calculation is done under the null hypothesis. That's the actual interpretation of a P-value. and then people try to interpret P-values in all sorts of different ways, because that interpretation, is a little, sounds a little complicated. But that is the actual interpretation. So the P-value's a confusing quantity. Sort of, get used to just regurgitating the whole definition correctly. So that you don't take short cuts, and give incorrect definitions. because, you know, let me also say, people love to complain about p values as well. so, you, you, you should get your P-value interpretations, correct.