[MUSIC] One of the three approaches to drawing inferences from your data was the path of Action, the Neyman-Pearson approach. In the Nymen Pearson approach, the main thing is trying to control the errors when you draw inferences from your data. When you say, "Yes, I accept the null hypothesis", or, "I will reject the null hypothesis", controlling the amount of times in the long run that you make a mistake. Now we talk about this in terms of the Type 1 and the Type II errors that you can make. In the goal of error control, what you try to do is prevent making a fool out of yourself, too often in the long run and I think this is a very useful thing to try when you draw inferences from your data. Now let's take a look at an example. Let's say you're walking down the street and you come across this person. And you have some suspicions that this person might be a vampire, but it could also be someone who's not getting enough sunlight. You have to make a decision to either stake or not stake this individual. And this person can be a true vampire, or it can just be a random person on the street. So we see that there's a possibility that you correctly identify this person as a vampire and stake them. But you can also make an error. You can make two types of errors. Type I error is, "Well this is a vampire when actually it's not a vampire". And in this case you would stake an innocent person. The second type of error that you can make, the Type II error, is saying that this is probably not a vampire when it actually is a vampire. And in this case, you're probably going to end up dead, because this vampire is going to suck all the blood out of you. Let's take a look at some definitions when we're talking about this topic. So first, we have to define the null hypothesis and the alternative hypothesis, which we'll often indicate with H0 and H1. It's important to keep in mind that the null hypothesis doesn't necessarily need to be the prediction that the difference between certain conditions is exactly zero. Think about flipping a coin, in which case the null hypothesis is probably that there's a 50% probability of flipping heads or tails. So the null hypothesis can be anything. But very often, it's the idea that there's no difference between two experimental conditions. The alpha that we use is the probability of a significant result when the null hypothesis is true. And this is the definition of a Type 1 error rate. And you're free to set this alpha at any level that you want. A very often used level in psychology is 5%, but we already saw that in some fields like physics it's very common to set a much much lower Type I error rate. And you can also think about areas such as medicine where it might be useful to set a much lower error rate as well. The beta is the probability of a non-significant result, given that the alternative hypothesis is true. This is known as a Type II error. So this is the probability that you'll say, "There is nothing going on here". When there's actually true effects to be observed. If we look at the difference of 1-beta, this is the probability of a significant result when the alternative hypothesis is true. So this is the statistical power. When you design a study, one minus the type two error rate is the probability that you'll actually find the result if there is a true effect to be found. Now remember that Neyman thinks that these error rates are frequentist concepts, so they apply to the long run. Now in any single study there's either a true effect or not. You were right or you were wrong. But these error rates are about long-run frequencies. So you can be - 5% of the time, you can make a mistake. But in any single study, you've either made a mistake or not. It's important to think about this in the long run. So if you do many, many studies, how often will you make an incorrect conclusion? Because we have these two situations where the null hypothesis is true or the alternative hypothesis is true, and you can either observe a significant effect or you will find a non-significant result, we can divide these options into these four cells. And here we can see that when the null hypothesis is true and you find a significant effect, you've made a false positive. A type I error. You said, "There is something here," when there is actually nothing. When the null hypothesis is true, you can also correctly say that there is no true effect which we call a true negative. Now, when the alternative hypothesis is true. You find a significant effect, that's a true positive. Something is going on, and you correctly identified it. You can also draw the conclusion that there is nothing going on, when there is actually a true effect. And we call this a false negative. Now, remember that in any single case you're either sampling from a true effect or from a false effect. So you're in one of two parallel worlds. Either there is no true effect, and then you can only make false positives or true negatives. Or you're doing research where there is actually a true effect, and then you can only make the correct conclusion: A true positive or false negative. Now let's take this understanding of Type I errors and Type II errors and true positives and true negatives to think about what's most likely to happen in your next study. I'll describe a typical situation which I think is fair and describes many of the studies that we do. Let say, that you don't really know if there's a true effect or not. The null hypothesis might be true, the alternative hypothesis might be true and both are equally likely. Let's give them both 50% - we just don't know what's going to happened. We also use a Type I error rate, the alpha of 5%, which is very typical in most research. And we'll aim for statistical power of 80%. Which means that there is a Type II error rate of 20%. If there is a true effect, we will not observe it in 20% on the time. That's sounds like a lot maybe, but this is actually a very recommended power to aim for. So in this specific situation, let's assume that this reflects what you want to do. You use this type 1 error rate. And you actually exactly meet this statistical power. You have 80% power in your next study. You never really know but let's assume that this is the case. Think a moment what's most likely to happen. Will you find a significant result or a non-significant result? Make a false positive or a false negative? Now let's do the math and see what really happens... So as we said, there's a 50% probability that the null hypothesis is true and a 50% probability that the alternative hypothesis is true. Remember, we just didn't know what was going to happen. We have a 5% Type I error rate and a 20% Type II error rate. So, we can use these numbers to calculate the probability of each of these four cells to happen, and of course, they should add up to 100%. Let's first assume that the alternative hypothesis is true, we're examining a true effect. The probability of finding a significant effect equals the statistical power that we have, which is 80%, and its 50% probable that the alternative hypothesis is true. So we can multiply 80% by 50%, and we find that it's 40% likely that we conclude that there's a true effect. A true positive. It's also possible that we find a non significant effect. This will happen 20% of the time. Multiplying 20% with the 50% probability of the alternative hypothesis being true gives us a 10% probability of making a false negative. Now let's move over to the situation where the null hypothesis is true. Here we have a 5% error rate. We can multiply this 5% times the 50% probability of the null hypothesis is true and in 2.5% of the time we'll conclude that there's a false positive. We'll say that there is an effect when there's actually no effect to be found. And now we come to the most likely outcome in this very typical situation, a true negative. There's a 95% probability that will conclude that nothing is going on. Multiply it by the 50% probability that the null hypothesis is true which gives us 47.5%. So that's the highest number of these four possible outcomes. And this might be surprising, but if you do a new study and you don't know whether the null is true or the alternative is true, you use the recommended alpha level and the recommended statistical power. Then the most likely outcome of your next study is a true negative. Nothing is going on, and you correctly conclude that nothing is going on. So you might be surprised that the most likely outcome is a true negative, and very often, your interested in finding a significant effect and finding support for the alternative hypothesis. Saying, "There is a true effect", and correctly concluding that there is a true effect. So how can you improve these probabilities in the previous slides? Well, there are different options. Let's take a look at what works best. You might say, "The problem here is probably that I only had 80% power. When there is a true effect, I only had an 80% probability of finding the effect when it's there, and I need to increase my statistical power". Now in this slide you can see the red numbers, and we have increased the power to 99%. Which means that almost all of the time, you correctly conclude that there is an effect when there is a true effect to be observed. Our true positive rate has indeed increased. But not so much. You can see that now it's about 94.5% likely that you conclude that there's a true positive. It is now the most likely outcome but it's not ridiculously high. Another idea you might have is, say, "Well, probably the false positive rate is too high. Let's reduce the false positive rate". Set it at 1%. So Type I error is now 1%. And you can see where changes occur. Now, of course the difference is only in the situation when the null hypothesis is actually true. Because you can only make a Type 1 error if the no hypothesis is true. So we see that there's a reduction in the number of false positives from 2.5%. Now it's only 0.5%. But still, true negatives are most likely to happen. So this all doesn't really help. So whats the most important thing to do if you want to improve the probability of finding a true positive? What you should do is examine hypotheses that are likely to be true. The biggest problem here is this 50/50 prior probability of the null and the alternative hypothesis being true. We can see that if it's about 90% probable that we are right, that our hypothesis is 90% likely to be true when we start data collection. You can see that the probability of a true positive increases sharply. Now, I always use these calculations whenever a master's student enters my office and says, "I have a crazy idea. I don't know if it's true or not, but this is my passion, this is something that I really find interesting and I want to examine it". I always say, "That's perfectly fine". But I don't know if the hypothesis is true or not. It's 50/50. Could go either way. So realize that the most likely outcome of your research is a true negative. If you're interested in finding support for the alternative hypothesis, the best thing you can do is examine studies - or examined hypotheses - that are most likely to be true. Now let's use a slightly different way to think about these Type I errors and Type II errors and true positives and true negatives. This is an illustration of the null hypothesis and the alternative hypothesis in terms of normal distributions. You can visit this website and play around with this visualization if you want to. Let's look at all the different components here because there's quite a lot of information. Now the white bell shaped curve is the situation when the null hypothesis is true. When there is no true effect to be observed. You can see that it's centered at zero. So, this reflects the situation where the null hypothesis is true: There is no true effect but, there is some random variation. Now, if we take a look the light blue bell-shaped curve. This is the distribution when there is a true effect, and in this case there's a true effect of a specific size, namely 0.35. We'll talk about effect sizes later in the course, but for now, you can see that the likelihood of finding effects that are large is much more likely under the alternative hypothesis. And under the null hypothesis, you'll find effect sizes that are zero most of the time. There's a certain critical value that we often use. It's also illustrated here, and in this case, this equals a p-value of 0.05. So this is the critical value, your Type I error rate. Now, given this cutoff off the two distributions, we can make a division between the four cells that we saw earlier, but now we're visualizing it in another way. If we look at the white bell-shaped curve, the white area reflects the true negatives. There is no true effect to be observed, and we will find it a lot of the time. We can also make a false positive. This is still part of the distribution of the null, so the white curve slowly moves towards this red area that you can see. With the arrow of the false positives pointing to it. These are situations when we observe an extreme result when the null hypothesis is true. We're saying that there's something, but we're actually making a Type I error. Now let's switch our attention to the light blue area. You can see that this falls on the right side of the critical value. So in this case, we are observing p-values that are smaller than 0.05, and we are correctly saying that there is an effect. We're observing a significant result, and there is a true effect to be observed. So the light blue area corresponds to the true positives that we'll observe. Sometimes we'll conclude that there is no effect when we're actually sampling from this reality where there a true effect. This is illustrated in the dark blue area. These are false negatives. The values we observe are not extreme enough to lead to a p-value smaller than 0.05. So these are false negatives. Now the important question when you're designing is, which error rates should you aim for? I've already mentioned that the alpha level in the beta that we often use is an alpha of 0.05 and the beta of 20% or a statistical power of 80%. However statisticians themselves say, you should not use these benchmarks. You should think for yourself what the best choice is. Neyman himself talks about it in this way, "Is it more serious to convict an innocent man, or to acquit a guilty?" If you're convicting an innocent man, this person has done nothing. But you're saying this person has done something, so this is Type I error. If you're acquitting a guilty man, this person has done something wrong. But you're saying no crime has been committed, so this is Type II error. Depending on your beliefs, one of these two types of errors might be more serious. In the case of medicine, we can see that you have to weigh these two types of errors very carefully. If you have Type 2 error and you are saying, "You are not sick. You don't need any treatment," when the person is actually sick, there might be very severe consequences and this person might die. So there are more likely to make an error whether you say, "You might have a specific sickness, we're not completely sure". And then follow-up research will confirm this in more extensive tests. So Neyman himself says that, "determining how the balance must be struck should be left to the investigator". So it's up to you to decide what types of errors or what levels of errors you find acceptable. You don't really need to use these defaults, although they are very tempting. Cohen himself recommended this 80% power threshold. And when he did this, he explicitly hoped that this recommendation would be ignored whenever you as a researcher can make better informed judgements. At our department for example, we use typically a 90% power if we're designing new studies where we hope that they will lead to insights that we can publish. So, we have a 90% threshold because we try to spend the tax money on research that will give us informative results. Even when we don't find a significant result, we want to be reasonably sure that there's really nothing there, that we didn't miss anything. In the case where you're examining very impactful things, where, if you find an effect, you might have a real breakthrough. So, missing out on a true effect is very costly. You want to increase your power or reduce your Type II error as much as possible. Most of the scientific discoveries that lead to Nobel prizes are discovered by people between 35 and 40 years of age. So in my case I'm of course keeping my Type II error rate as small as possible. Because at this moment when I do a discovery and I miss out on something, I could may be even be missing out on a Nobel prize-winning idea. So for me personally, I try to have high power and low Type II error rates. The Type I error rate is slightly less important. At the same time, we see that people are very, very hesitant to change Type 1 error rates. You could even increase it. You could set your Type 1 error rates to 10% and it's perfectly possible, If in your specific situation, this is acceptable. But it's almost like Moses came down from the Mount Sinai and he had these tablets with ten commandments, and one of them was: You will always set your alpha to 0.05. People are very hesitant to change this, but if you really know what you're doing and you're making an informed judgement, this is perfectly possible. So it's very useful to think about how you can control your error rates when you're designing a new study. You want to minimize one or both of these types of errors, so that in the long run, you won't make a fool out of yourself too often. [MUSIC]