So the first question we're going to raise now is, how do we understand the effectiveness of averaging a crowd? Let's take the Galton experiment. And, let's say that the true weight of the ox is x. This number to be any positive number and follows certain distribution. So, we'll soon be talking about the expectation of error relative to this, the distribution of this x. But everybody got some error term, which could depends on the actual weight x and also differs from one person to another. We call this the estimate y sub I for the i-th participant in the experiment as a function of x So this is an additive error model. Of the estimate provided by each person, and let's say there are n of these people. So, i goes from one, to da, da, da, to n say 787. Now we're going to assume that these error terms are both unbiased and independent. Unbiased means that the expectation of this error term for each user I as a function of x, Taken over the distribution of. x is zero. So, sometimes you may overestimate, sometimes you underestimate but, you would not systematically over or underestimate this goes for all users on. That's unbiased independent means that this error term only depends on i, it does not depend on any other user j. In reality neither unbiased nor independent assumption is true in general. We'll often see biased, systematic bias, certain people tend to overestimate while others tend to underestimate and sometimes these errors also dependent. In fact, in lecture seven we'll look at example of information cascade, where. The dependents of estimates destroys the wisdom of Kraus. Now, for Amazon review, actually. Is it unbiased or independent? Essentially not unbiased. [inaudible]. Although you may be able to use user ID and history of her rating to do some normalization. And it's sort of independent. Most people enter review based on her own opinion. So even if she can see the existing rating and reviews, she may not be reacting in response to them. But sometimes they do. Sometimes a review says that, the previous reviews are certainly biased. I'm here to correct that. So there is also some element of dependence on Amazon. Alright, so now we are going to look at the so called wisdom of crowds. Exemplified by Garden's experiment by comparing two terms, one is the average of errors made by each individual participant, the other is the error of the averaged estimate, and the hope is that the error of average estimate is much smaller than compared to the average of the error. Let's first look at average of error this is easy, the error term Claire is epsilon. I of participant i and we're going to use our two norm or mean squared error. Just like last lecture for Netflix, except over there we use the Ruben square. Sometimes we take the square root but the idea is the same. I'm going to use the squared error as the metric to quantify error terms. So, we'll look at this thing squared and then we'll look at the expectation of this term over the distribution of x and then we'll sum, sum over all the participants from one to n. Take the average by dividing over n, and this is what we call the, error term for the average error. Okay, or averaged square to be more precise. So just remember this expression. This is the averaged squared error. On the other hand, we also want to look at the error of the average. First of all, what is the average? Well, average is the sum of Yis divided by n. And the error of the average is that minus x, Okay? This could be positive, it could be negative but this is the same as one over n times sum of Yi minus n times x, obviously. So we can also write this as the sum of one over n times summation of Yi - x by bringing x inside the summation here cuz there's effective n, we're okay. And this is just one times the summation of errors. Cuz each term inside the summation of y - x is just epsilon i by definition. So now, this error of the average is what we want to look at. And we're going to look at this error of average as the expectation over distribution of x, of one over n summation epsilon i. This whole thing squared which equals one over n squared can take that out of the expectation times the expectation of the summation epsilon Ix. Summing over i. This whole term squared. Now you say, look this expression isn't it the same as the last expression. Looks like saying there's and summation expectation epsilon squared but not quite. This is taking the sum of the expectation of squared epsilon. Whereas this one is the expectation of this square of the sum. So, this one is talking about square of summation, not the summation of squares, and the two are clearly different. For example, this two user case. Epsilon plus epsilon two squared, which is what this is about, is epsilon squared plus epsilon two squared plus two times epsilon one, epsilon two. This, is what, the averaged error is about. This is the error, of the average. The difference is the collection of these cross terms. But, if they are independent, Then we say that epsilon times epsilon two, You take the expectation over distribution x on this expression is going to be zero, because these two error terms are not correlated with each other. So, these cross terms are cancelled. And in that case, sum of square, and square of sums are the same. And therefore, this expression is the same as this expression. So, the only difference is the multiplication factor in front of it. In this case its one over n. In this case its one over n squared and therefore we have the desired relationship, that is the error term of the averaged, the average first. Then you look at the error, is actually only one over n of the error term, if you look at the averaged error. So compute error first of the individual ones, and then take the average versus average first and then look at the error term. These two are different bi effector, multiplicative effector of n. And this multiplicative effect of N is codified as an example of wisdom of crowds in Galton's experiment. This as simple averaging can work if this unbiased and independent estimates. There's no bias terms, and these cross terms cancel each other under the expectation of distribution backs. And we have effected n enhancement. If there are five people, is five times better. If there are 1000 people, it's 1000 times better in terms of the mean square error. Now what if they're completely dependent? Now we don't need these calculations at all. Whether you air it first in the typical error of that wisdom of crowns, or you just look at the average of the individual errors doesn't matter. They are the same. If they're completely dependent. If they're somewhat dependent, then you would have to look at the expectation of these error terms correlation in more detail. But you'll be somewhere between the vector one versus a vector of n or one over n, depending on how you look at it. This sounds very promising encouraging with a simple duralation, which seems to be able to identify the root cause of the wisdom of crowds. Well, and let's look at the positive side before we look at some cautionary remarks. The positive side says that as long as they're dependent of each other, then you have the wisdom of crowds. This particular type of wisdom has nothing to do with identifying who's the expert. It just says that you might be wrong in all kinds of directions. But as long as you are wrong in different directions, then we will have the power of this fact of n enhancement of their return. In hindsight it's hardly a surprise, this is really the law of large number, plus the convexity of convex quadratic function at play here. Now, you may object first. What if I actually know who are the experts here, maybe she's a farmer. Well in the advanced material, we'll look at how we can use boosting methods to extract the more important opinion by the experts. There will be a very different philosophy. This philosophy in what we just talked about, only depends on the fact that you can all be wrong as long as you're wrong in different ways, independent of each other. Now second objection is what about the scale? This factor of n effect holds even when n is two or when n is 1,000,000. To most people the wisdom of crowds means that there should be some threshold value and star above which you see the effect below, which you don't. But this one applies just as well to a two people case, so that's some mystery that remains to be needs to be resolved. The third objection or to say the addition to this discussion is that this vector n is only one view on [inaudible] crowds we call the, multiplexing game. There's another real covert diverse thinking that says if there's some bad events that you didn't want happen, then, by putting n of these entities together you get a one minus, one minus P bracket to the power n whereas n is the size of the crowd. In fact, this is called the diversity gain. This is called the multiplexing gain. We will encounter this diversity gain in later chapters. In fact, we encounter both kinds of gains in [INAUDIBLE] in both technology and the social networks. Between wireless networks, on the WiFi LTE network, all the way to Galton's example. But this is our first encounter on one side of this coin.