Hi! In this module, I'm going to introduce the phenomenon of regression to the mean. Regression to the mean is a phenomenon that you may have heard about perhaps in the context of the discussion of parents and their offspring. It turns out that, you know, parents that are extraordinary on some measure, unfortunately for them, their children, on average, will be less extraordinary than they were. They will be closer to the mean than they were. For example, the children of parents with very high IQ, on average, will be closer to the mean IQ of their generation than their parents were. If average IQ isn't changing over time, on average, the offspring of these high IQ parents will actually have, again on average, lower IQ than their parents did. And this applies to almost anything that you can measure if you're talking about the association from one generation to the next in something where, in the first generation, we're looking at people that were out at the end of a distribution, out at the tail of a distribution. On average, for example, if you think about high-income parents, on average, their children will have incomes that are closer to the mean of their generation than their parents did. Now, this phenomenon of regression to the mean actually shows up in a lot of different places in sometimes unexpected ways. And so, I want to sensitize you to it and talk about some of the ways we have to think about it when we're designing a research study. So let's look in detail about what it is. So, let's go back and think about what we talked about with regard to correlations in the previous module. Correlations are always between -1 and one. When we talked about standardized scores or z-scores, we learned that this means that a change of one standard deviation in one variable must always lead to a change in the other variable that is less than one standard deviation or greater than -1 standard deviation. In other words, a one standard deviation change in one variable can't be associated with a change of more than one standard deviation in another variable. So, a one standard deviation change in one variable must be associated with a change in the other variable of between -1 and one standard deviations. In the example in our previous module, these two hypothetical variables for the example that we constructed, a one standard deviation change in X was associated with a 0.79 standard deviation change in Y. Now, that may sound very technical, so we have to walk through how that relates to the examples that we just talked about, the IQ of parents and the IQ of children and so forth. Now, when we go back from standard deviations and standardized scores to actual original values, the implication is that if we associate two variables through correlation, if we pick one of those variables and we pick one of the observations, on average, the value for the other variable in that observation, on average, will be closer to the mean of its distribution. In this case, on average, if the observation is one standard deviation away from the mean for X, on average, it will just be 0.79 standard deviations away from the mean on Y. Now, this has some subtle implications for research design that you have to think about carefully. And in particular, it has a specific implication for studies that use samples or that look at populations that have been selected for extreme values on some distribution. This is more common than you might think. Imagine that in our example, our made-up example, X and Y are test scores at two points in time. So each observation is one person, we measure their test scores at one point in time – that's X – and then six months or a year later, we test them again and we record their score. And the correlation between these two is 0.79. Now, again, we're gonna assume that the correlation is 0.79, so one standard deviation change in one of the variables actually leads to a 0.79 change in the standard deviation of the other variable. So imagine, as an example, that we seek to test an intervention for poorly performing students. I'm gonna walk through this example to show you some of the unusual ways in which, again, the regression to the mean can insert itself or creep into our research. So, again, we're gonna test an intervention to help out poorly performing students. And we'll assume that the hypothetical data that we've been using already represents the test scores of these students at two points in time, time one and time two, and that, in general, test scores for the same person across time are correlated at the 0.79-level. So we, because we're interested in helping these students that are performing the least well, we select students whose scores the first time we administer the test are more than 0.5 standard deviations below the mean. So we're picking the students at the bottom of the distribution, the lower tail, because they're the ones that we want to help. Their average score, if we think about it, is gonna be one standard deviation below the mean. So, in effect, I'm talking here in terms of z-scores or standardized scores that we talked about in the last module; we're talking in terms of standard deviations above or below the mean. So we take these students, the ones that are not performing well, and then we conduct our intervention – perhaps it's a new curriculum or something else. So we'll call Y the score that we measure at time two. Well, if test scores, in general, are correlated at the 0.79-level across time, just because people, even though their IQ isn't changing, how well they do on a test may bounce around because on one day they may feel better, one day they may not feel as well, etc. So their scores may bounce around, be correlated, at again, an assumed level of 0.79 even if at some level their IQ, their ability, nothing else is changing. So we go and measure this group of students that we selected because they were more than 0.5 standard deviations below the mean. Go back and measure them after the completion of the intervention. On average, actually even if the intervention did nothing, the mean of their test scores, that is the test scores at time two, the Y in this analysis, will be 0.79 standard deviations below the mean, so it will actually be 0.21 higher than it was the first time we administered the test, even if at some level the intervention achieved nothing. So this is an important point that relates to regression to the mean. If you have before-and-after comparisons where you have a population that you're studying that you selected because it had some sort of extreme value on some measure that you're interested in – perhaps test scores or could be heart rate or cholesterol levels. You measure them again at some future point in time. So long as the correlation in the measurements are not one, you're almost guaranteed that those people's averages the second time you measure them will be closer to the mean even if your intervention achieved nothing. So here, as we just showed you, even in the absence of an effective intervention, if an intervention did nothing at all, we would expect just based on the phenomenon of regression to the mean and, say, the assumed correlation in test scores from one time to another, that people that were initially selected for being at the bottom of the distribution in terms of test scores. Measure them again a few months later, a few years later; on average, they're gonna be closer to the mean. In other words, their average score will be higher than it was the first time we measured them. So, we'll go from an average of -1 standard deviations to -0.79 standard deviations. To recap, an apparent improvement, even if actually the intervention didn't work or we did nothing at all, of 0.21. Now, I'll go back to some examples of other ways that this creeps in, but I'll talk about why we have to deal with this especially if we're conducting some sort of intervention study, an experiment where we're selecting people into a sample based on, again, their extreme values – in this case, test scores. So, this means that even for a before-after comparison, when we want to test some kind of intervention, we'll still need a control and a treatment design so that we can compare the trajectories of the people that received the intervention and the people that did not. So, if we divide our group in the figure – we've identified the people in our treatment group with circles around their dots and then the control group is left without circles, so we've divided the original students into two groups at random – then we can follow them up. We can give the intervention, the teaching intervention, to the treatment group, leave the control group alone, and then six months or a year later we can test them again. If the intervention has an effect, it'll be apparent in a difference between the control and the treatment groups. They will have differing trajectories. So the control group should see some improvement in scores just because of regression to the mean. If the intervention works, we should see a larger or better improvement in scores. If the intervention is ineffective, the change in the scores in the treatment group will be the same as in the control group. So, again, if they differ, the scores at time two differ between control and treatment, then we have an effect. So, where does this come in? Well, there are a lot of settings where a simple before-and-after comparison will suggest that an intervention or a policy change had an effect, especially again if the population for the study was selected based on extreme values. This will be because of regression to the mean. Let me give you some common examples. One is medical treatments. So, on average, if you think about somebody's medical condition, perhaps how well they feel, whether or not they're experiencing pain. If you measure people at different points in time in terms of how much pain they are feeling, it will not be perfectly correlated. There will be some value, some correlation less than one, that the amount of pain that they report will vary from one time to another. So if you think that you've got some new way of helping people with pain and you select a bunch of people that report being in great pain at a particular point in time, on average, if you look at them six months later, 12 months later, on average, that group that was selected for having, again, extreme values will probably be better off than they were at the first point in time because of regression to the mean. There may be new people that are experiencing more pain, but the original group that we selected for their presence in the tail of the distribution, will probably be better off than they were. So if we did anything in that time, it will appear to have an effect. That's why medical studies that test interventions can't just be before-and-after comparisons. If we're testing some relief for pain, we have to take the people that are suffering at time one, divide them into two groups, leave one of them alone, and then administer the treatment to the other. The reason I'm singling out medical treatments here is because they're a rich source of examples of this sort of problem because, of course, most of the time when we want to find a treatment for somebody, we're doing so because people are at the extreme of some distribution on some measure that we care about – their blood pressure is very high or their cholesterol is very high. These are all variables that are not perfectly correlated from one measurement to another. They are correlated well below the level of -1 or one; they're much closer to zero than that. So again, if you are trying to test some intervention for high blood pressure and you start out by selecting a bunch of people with very high blood pressure at a particular point in time, and then actually even if you do nothing at all, but you measure their average blood pressure six months later, on average, they'll be closer to the mean blood pressure. So it'll look like anything that you did worked unless you have adopted a control and treatment design. Another way that regression to the mean creeps in is with people assessing the effects of policy changes. Policy changes, often like medical treatments, are not distributed at random. They often occur when people feel that a particular phenomenon, a social phenomenon, is out of control or at some extreme. So, people introduce new laws to prevent crime when crime rates are high. If crime rates are low or just typical, why do anything about it? So, extremes in the prevalence of some phenomenon that concerns people tend to trigger the interventions via policy and so forth. So if you think about it, if you think about the, for example, the crime rate as a variable that if you measure, at multiple points in time, it drifts around. It's correlated and it's not correlated at one or -1, might be correlated at some much lower level. So, if you think about the crime rate at some point in time – and it's an extreme value within the overall distribution of crime rates and this is what led people to panic and introduce new laws – on average, if you measure the crime rate at some later point in time, even if at some level the intervention did nothing, on average, the crime rate will probably be closer to the mean, the long-term mean, for the crime rate, and an intervention may appear to have an effect. Now, it's much harder to do control-treatment designs for things like policy changes, and so we end up simply with arguments about whether or not a policy induced some change or whether it was simply an example of regression to the mean. Now, in some cases, economists and others do look for, as we talked about in previous lectures, natural experiments – situations where policies were changed at different times in different places and so forth. Those sometimes can give some insight into cause and effect and get us away from the problems with regression to the mean. Now, of course, the other example I would like to draw to your attention is educational interventions, like in the example that I just gave you. So, we're all interested in finding ways to make things easier or make things better for our least well-performing students. And so if we conduct studies where we select students because of their lower performance, their poorer performance, on average, they're gonna get better anyway next time we measure them. So we have to think about how to do something about that, most likely with a control and treatment design. That's a little easier to do in an educational setting. Now the flip side of this is is that we're trying to introduce, say, intervention to help out gifted kids, where we pick a bunch of kids that are at the top of the distribution, and we think we're gonna do some kind of enrichment that will help them even further. Well, I hate to break the news but, on average, it's always gonna look like it's gonna make the kids do less well because, on average, their scores are probably gonna be closer to the mean next time you measure them. And it's not necessarily a reason to be disappointed in the intervention, it just means, it means that there is a need for a more carefully designed study, again, a control and a treatment. So, overall, I hope that I've convinced you that regression to the mean, even though it sounds somewhat paradoxical and it's related to the structure of correlations, is actually an important phenomenon that you have to think about when you're designing a study or interpreting the results of other studies presented to you by other researchers. So keep regression to the mean in mind especially if you're trying to assess the effects of interventions or other changes.