Greetings. In the next two sets of lectures, we'll be talking about the topic of statistical hypothesis testing. So to get this started in this lecture set we're going to first explore the general concept of hypothesis testing, give an overview, and then look at procedures for comparing means between two populations both in the paired and unpaired study designs. So, first, we'll talk about hypothesis testing for population comparisons and give an overview, a conceptual overview before we start getting into the mechanics and the results of hypothesis tests for comparing means between two populations. Then we'll follow up in the next lecture set for looking at hypothesis tests for comparing proportions between two populations and incidence rates between depopulations. So, upon completion of this section, you will begin to understand a conceptual framework for the process of statistical hypothesis tests and how confidence intervals which we've looked at in the previous lecture sections and hypothesis testing are related. So, I'm going to use this preamble that I've used previously just to remind us that we're building on some of the same ideas we've done before but at looking at it with a different approach. So, again, frequently in public health, medicine, science etc, researchers and practitioners are interested in comparing two or more outcomes between two or more populations using data collected on samples from these populations. It's not only important to estimate the magnitude of the difference in the outcome of interests between the two groups being compared, but to also recognize the uncertainty in the estimate when we extrapolate back to the full populations which we could not directly observe all the data for. So, one approach to recognizing the uncertainty in these estimates is to create confidence intervals, which we've looked at in the previous section for population comparison measures, and a complimentary approach is called hypothesis testing. So, the types of two group comparisons we have, parallel we did for confidence intervals when comparing means between two populations, we have a hypothesis tests corresponding to the paired design and also the unpaired design. When we're comparing binary outcomes or incidence rates and or time to event data, we can use incidence rates for time to event data and there's also the survival curves when we're comparing them between populations we have the unpaired approach. For the next two sets of lectures, we'll be looking at comparing two populations at a time and then we'll follow up with a lecture set to show how these approaches can also be extended to allow for more than two-population comparisons in one test for the unpaired studies. How to compare three or more populations with one test based on three or more samples. So, just like with confidence intervals, the hypothesis test is going to be built at common application of the central limit theorem and we know it turns out the differences of two quantities whose distributions are approximately normal, have a normal distribution. As such, we can extend the basic principles of the central limit theorem to understand and quantify our sampling variability for our measures of association. Things like mean differences between two independent populations, differences in proportions between two independent populations and to the natural logs of ratios which as we saw before, can be expressed as differences in the log of the numerator minus the log of the denominator. So, let's think about what we did with confidence intervals, especially when were comparing populations through a measure like a mean difference or risk difference or a ratio. So, the creation of a confidence interval for a measure of association, uses the results from the central limit theorem coupled with our properties to the normal distribution to create an interval that likely includes the unknown truth for our population comparison measure. So, what we did there was we said, look the central limit theorem tells us if we repeated a study of the same size in infinite number of times and estimated our two sample comparison measure an infinite number of times and created a histogram of our estimates of that comparison measure whether it be a mean difference, difference in proportions or ratios on the log scale, we get a curve that was approximately normal if we smooth it over the histogram of those values, and that it would be centered at the unknown truth. So, the central limit theorem says you don't have to repeat the study in infinite number of times for us to know this would happen. Furthermore, we learned how to estimate the variability in this curve using a single study result. So we can estimate that what we call the standard error of our estimates are around the unknown truth and then we can exploit this whole logic to say, look we don't know the truth, but we do know that if we were to do this study an infinite number of times, the distribution of all possible estimates of the unknown truth would behave in a normal fashion in the distribution around that unknown truth, where one of an infinite number of studies our estimate will be somewhere under this curve. It could be right next to the truth, could be way out here, could be right here etc. For most of the studies we could do, most of the two-sample comparisons for two samples we get by random sampling or representative sampling from the respective populations, the estimates we get for those studies most of them will fall within plus or minus two standard errors of that truth. So, two standard errors of the estimate from that unknown truth. So, if we start with our estimate and add and subtract an interval of plus or minus two standard errors and there's slight corrections for mean differences in smaller samples but this is the general rule, we'll get an interval that includes the truth. So, what we're saying here is we're saying, with our data results, here's my best estimate from my data, here's a range of possible values when I add in uncertainty for the unknown truth, we're saying let's use my data results to take me to the truth. But another approachable approach to getting to the unknown truth from our data is to go in reverse and to start with some competing exhaustive possibilities for the unknown truth about the population comparison measure, and use data from the samples we have to choose between these possibilities. So, there's usually for what we do in this process of hypothesis testing, two possibilities for the unknown truth. The first one is very specific and the second one is very broad. So, the first truth that we usually deal with is no difference in whatever we're measuring means, proportions etc, between the two populations being compared. That's a very specific statement and the other possibility which is mutually exclusive and covers all other possibilities is that there is a difference in our measure between the two populations. So, these two competing truths are often called the null hypothesis and the alternative hypothesis. The null is the hypothesis that there's no difference in what we're measuring between the two populations. The alternative is that there is a difference in this measure between the two populations. We can express these null and alternative hypotheses in several equivalent ways for the types of data outcomes we have considered. So, the two possibilities could be phrased in terms of the no values we discussed in lecture set eight. So, for example, when comparing means between two populations if the means are the same at the population level, mu one equals mu two, then their true difference in means is zero. That would be our null hypothesis regarding the difference in means. If the means are not the same, mu one does not equal mu two, then the difference in means is not zero and that's our exhaustive other piece to the two competing hypotheses. So again, in terms of what we're doing, the measures we work with, the null hypothesis we can express in terms of means we've already shown, but it's that the true means are equal or that their difference is zero, versus that their difference is not zero. For binary measures, the null is the T proportion with the outcome in both groups we're comparing is the same versus the T alternative is that they're not the same and we can express this in terms of the risk difference and we'll get into this in more detail in the next lecture set, but in terms of the risk difference, it would be that difference is zero versus the difference is not zero. We can do it in terms of relative risk, odds ratios et cetera. Then in terms of time-to-event measures, and measures where we don't have the exact times but we're computing incidence over time period, the null hypothesis is that the incidence rate of the outcome of interests between the two groups being compared is the same versus that they're different and we can express that in terms of the incidence rate ratio being one or not as well. So, how can the study data be used to choose between one of the two truths while accounting for the uncertainty in our study data? Well again, this theoretical sampling distribution, the distribution of all possible estimates of the truth along the truth will again be utilized in this process and we can again appeal to that from the results of the central limit theorem. So, let's just cast here. The idea for confidence intervals is the sample estimated difference or comparison measure will be close to the unknown truth. If we go from our estimate plus or minus two standard errors of our estimate, most of the time, this will yield an interval includes the truth. So, most of the time, the sample estimated difference will be close in terms of being within two standard errors of the truth. Hypothesis testing doesn't start with our estimate and try and come up with interval. The truth, it starts by postulating the truth and saying, well, let's assume that the truth is no difference in whatever we're comparing. So, let's just get involve, can generalize this to other things but this could be that the mean difference is zero, or the difference in proportions is zero, or the relative risk is one, but on the log scale that's zero we've been looking, et cetera. It says, "If that's the truth then the estimate we get of however we characterize this distance, we will expect it to be close to zero." We would expect it to be within plus or minus two standard errors of zero again with big samples and using the result for the central limit theorem. So, what we're going to do with hypothesis testing, we start with this truth to zero and then measure how far our result is from zero in terms of standard errors. If our result is relatively close within plus or minus two standard errors of zero, we're going to say our result is consistent with this null hypothesis, is the truth, and we're going to stick with that. But if our result is far away from the assumed difference under the null, and by far we'll define this more stringently but if it's more than two standard errors away, generally speaking, it's out here or it's over here. We're going to source anywhere beyond that two standard error bound, we're going to say, so this is again two standard errors. So, if it's farther than two standard errors, we're going to say it's inconsistent with the null. It's not likely to have occurred when the null is the truth because our result is far from the null and we're going to reject that in favor of the broad alternative hypothesis that the difference is not zero. In the end, result of this to make a decision about whether we're far away or not is reconvert our distance from the null value from standard errors to a p-value, which is going to be the proportion of results that are as far or farther than our result from that assumed true difference of zero. So, what the p-value measures and we'll invoke this many times to get it straight, is the p-value is the probability of getting the study result as extreme or more extreme. In other words as far or farther from the null value. Another way to say that is to say as unlikely or more unlikely than what we got. By chance alone, just by random sampling variability, if the null hypothesis is the underlying population truth. So, the idea is if our p-value is relatively low, say it's not that likely to get a result like ours if the underlying truth were that the difference of interest is zero and we're going to reject the null in favor of that alternative. So, what we'll do is take the p-value. We're going to have to use to make a decision about two competing hypotheses. So, again, the p-value gives us a probability. When I was saying this, we're going to have to decide whether our results are "likely" or "unlikely" to have occurred when the null is true, and we're going to have to make a decision based on the probability of getting something like our results or something even more extreme when the null is true. So, we have to have a rule for likely versus unlikely. The general universally utilize cutoff and you probably, even if you don't know what p-values are but have heard of them, you probably know this Ps is the general cutoff for whether the result is unlikely or likely is 0.05. This can be altered but this is generally used in science and research in general. This is called the rejection level or the alpha level, and we would reject the null if our p-value comes in less than this cutoff of 0.05 and fail to reject the null if it comes in a greater than or equal to 0.05. This cutoff certainly doesn't have to be 0.05, but as we'll see 0.05 corresponds to a 95 percent confidence interval and 0.05 corresponds to being two standard errors away from the truth, and that's generally the rule we've used to determine close versus not. How is the p-value used to make a decision about the two competing hypotheses? Well, again, if p is less than this cutoff of 0.05, the result is the decision is made to reject the null hypothesis in favor of the alternative hypothesis, and we would call the results statistically significant. We've used that term before but not with hypothesis testing, with confidence intervals and we'll see in a moment that they're late. If p is greater than or equal to 0.05, the decision is to fail to reject the null hypothesis, which might say, well, why hence you say accept the null hypothesis. We're going to have to build somewhat ambiguous neural language when we fail to reject or when p is greater than 0.05, because depending on the characteristics of our study that may be an ambiguous conclusion, and in the next set of lectures we'll talk about why that is. So, what is the relationship between the 95 percent confidence interval, the appropriate null value for comparison measure, and the resulting p-value? Well, generally speaking and we'll explore this in more detail, we'll book in the end of this lecture set with more detail on this after we've looked at some examples of hypothesis testing, but if p is less than 0.05 and we reject the null, then the 95 percent confidence interval for the measure of interests whether it be a mean difference or difference in proportions et cetera, will not include the null value. If p is greater than 0.05, greater than or equal to 0.05, then the 95 percent confidence interval for the measure of interest will include the null value. So, since we're using the same data to create confidence intervals and do hypothesis testing, we would expect and we'll show, they will always come to the same general conclusion about the null value, whether it be through the confidence interval or the hypothesis test. So, in summary confidence intervals and hypothesis testing are two complementary ways of addressing uncertainty in sample-based comparisons when making statements about the unknown population comparisons. Both methods operate on the principle that for most random sample based-studies, the sample results should be close to the truth, and if they're not, we will reject certain possibilities for the truth. In the next two sections of this lecture set, we'll look at specific examples of hypothesis tests with real data for comparing means between two populations, and then it will bookend this lecture section with a debriefing part one on p-values that we'll pick up and we attend to in the following lecture set.