Hi there, in the previous video, we spent some time looking at the possible causal connections between trust and other social, political and economic variables. In this video, we will see how social scientists try establish and verify such links. This is a strict scientific process. Employing statistical techniques known as correlation and regression analysis. Now, if you're already familiar with these, you could always skip this lecture. On the other hand, if you want to refresh your memory, you're welcome to stay on board. For those of you who don't know anything about these techniques, don't worry. You don't have to actually do anything yourself, I'm just going to give you a passive knowledge of what's going on. Not quite as painful as it sounds. Well, let's start then with a list of countries for which we have measurements. And we'll call these measurements x and y. These could be anything. Doesn't matter at all. The first question one's to answer is, is there any connection between the two? Okay, how do we go about it? Well, the first thing we want to do is to plot them on a graph. One on the horizontal axis and the other on the vertical axis. We'll do it twice with different examples. The top graph shows no relation whatsoever between the two. But in the one below, you can already see a pattern emerging. But how close is that relationship? Do we have a simple way of expressing the degree of closeness? And the answer is that we do, and that relationship is called correlation. Now what correlation does, is to place a statistical measure which it calls r small r between the two. And r can vary from not or zero where there is no relationship at all. 2r equals 1 for complete unity between the two measures. Our top graph will be close to 0. The bottom one close to 0.7. Now if both variables increase or decrease together, we talk about a positive correlation. We give it a plus sign. On the other hand, if an increase in one is accompanied by a decline in the other, we talk about negative correlation, and we give it a minus sign. In our bottom graph then, we have a positive correlation. Now so far, we've made no judgements on what causes what. So, we can take the analysis further when we do make a hypothesis that one x is actually the cause of the other, y. And when we do this we automatically make the implicit hypothesis that there might not be any relationship at all between the two. And this is called the Null Hypothesis. So, let's get back to our graph. We assume that changes in x cause changes in y, and we have the causal variable on the horizontal axis. The next challenge now is to draw a line through the graph that provides the best fit for all the observations. Now, I've taken our graph and tried to draw a line. Or in fact draw two lines that seem best to fit the data. One is in red. The other is in blue. And they both seem to do the job quite well. But that's not really scientific. What I need is a formula that will allow me to draw what really is the best fit line. A line that minimizes the distance between each of the data points. And the hypothetical line or relationship I'm trying to establish. Such a formula exists, but I don't need you to know it. Well, not now anyway, and anyway nowadays a computer does it for you. This line is called the Least Squares Regression Line. Now, the regression line is usually expressed as a formula. Which stipulates how high or low on the Y axis the zero value of X begins. And the direction and gradient of the line. Note that I've extended the axes since the relationships expressed in the regression line should hold for the missing data as well. So, if I add the results of ten more countries, they should show the same pattern, and it should also predict the pattern for any other matching set of data juxtaposing x or y. For example, this could be the data for the same countries, but for 2013 instead of 2012. But how sure are we of this relationship? How confident are we that this is not a chance result? Well basically, our confidence depends on the closeness of the relationship. The correlation. And the number and the range of the observations. There are statistical tables for doing this. But nowadays, they're embedded in computer programs. And such tests can confirm the degree of confidence you can have, statistically, in the result. Confident, for example, that the data for the following year for the same countries, would show the same relationship. Or confident that the same relationship would appear with a different group of countries. So, after seeing our r value and the regression line formula, social scientists should also tell you the confidence level. And depending on that confidence level, you can go ahead on the basis of the supposition that x causes y. Confidence level is usually 99%. But sometimes, 95% is okay. And this is all you need to know for now. But note. First, this is a purely statistical relationship. Second, we still need to check that the initial data is accurate. Third, we need to check whether the hypothesis is plausible. Fourth, we need to ask ourselves whether there might not be a reverse causation. And last but not least, we need to check whether the author gives us the confidence level or error margin in the results. There should always be one, but often there isn't. Now despite all of this, there is still disagreements among social scientists. Why should that occur? Well sometimes the data is incomplete. How many countries are the total are in the comparison? Is there a bias in the ones that are missing. Often the truth is uncertain. Time after time, variables are entered into the calculation ignoring the fact that they have their own margins of their own. And again, the historical periods chosen for comparison might be different. And therefore the quality of the data might have changed in the interval. As it often does. Another reason is that the data is chosen as a proxy for reality. An artificial construction. Labeled as representative for something else. But does it really cover the issue as it's claimed? So, let's sum up now. In this video, we've examined the way in which social scientists try to establish statistical relationships between sets of variables. We've dealt with correlation, regression and confidence levels. Now, that's not bad, but we need to internalize these concepts, because they're absolutely necessary whenever anybody tells you that more of x leads to more of y. And believe me, they're telling you this all the time. Now next week, we'll look at the various ways in which society might be fragmented and what the implications might be for levels of trust and for governments.