0:14

This number to be any positive number and follows certain distribution.

So, we'll soon be talking about the expectation of error relative to this, the

distribution of this x. But everybody got some error term, which could depends on

the actual weight x and also differs from one person to another.

We call this the estimate y sub I for the i-th participant in the experiment as a

function of x So this is an additive error model.

0:51

Of the estimate provided by each person, and let's say there are n of these people.

So, i goes from one, to da, da, da, to n say 787.

Now we're going to assume that these error terms are both unbiased and independent.

Unbiased means that the expectation of this error term for each user I as a

function of x, Taken over the distribution of. x is zero.

So, sometimes you may overestimate, sometimes you underestimate but, you would

not systematically over or underestimate this goes for all users on.

1:35

That's unbiased independent means that this error term only depends on i, it does

not depend on any other user j. In reality neither unbiased nor

independent assumption is true in general. We'll often see biased, systematic bias,

certain people tend to overestimate while others tend to underestimate and sometimes

these errors also dependent. In fact, in lecture seven we'll look at

example of information cascade, where. The dependents of estimates destroys the

wisdom of Kraus. Now, for Amazon review, actually.

Is it unbiased or independent? Essentially not unbiased.

[inaudible]. Although you may be able to use user ID

and history of her rating to do some normalization.

And it's sort of independent. Most people enter review based on her own

opinion. So even if she can see the existing rating

and reviews, she may not be reacting in response to them.

But sometimes they do. Sometimes a review says that, the previous

reviews are certainly biased. I'm here to correct that.

So there is also some element of dependence on Amazon.

Alright, so now we are going to look at the so called wisdom of crowds.

Exemplified by Garden's experiment by comparing two terms, one is the average of

errors made by each individual participant, the other is the error of the

averaged estimate, and the hope is that the error of average estimate is much

smaller than compared to the average of the error.

Let's first look at average of error this is easy, the error term Claire is epsilon.

I of participant i and we're going to use our two norm or mean squared error.

Just like last lecture for Netflix, except over there we use the Ruben square.

Sometimes we take the square root but the idea is the same.

I'm going to use the squared error as the metric to quantify error terms.

So, we'll look at this thing squared and then we'll look at the expectation of this

term over the distribution of x and then we'll sum, sum over all the participants

from one to n. Take the average by dividing over n, and

this is what we call the, error term for the average error.

4:19

Okay, or averaged square to be more precise.

So just remember this expression. This is the averaged squared error.

On the other hand, we also want to look at the error of the average.

First of all, what is the average? Well, average is the sum of Yis divided by

n. And the error of the average is that minus x,

Okay? This could be positive, it could be

negative but this is the same as one over n times sum of Yi minus n times x,

obviously. So we can also write this as the sum of

one over n times summation of Yi - x by bringing x inside the summation here cuz

there's effective n, we're okay. And this is just one times the summation

of errors. Cuz each term inside the summation of y - x is just epsilon i by

definition. So now, this error of the average is what

we want to look at. And we're going to look at this error of

average as the expectation over distribution of x, of one over n summation

epsilon i. This whole thing squared which equals one

over n squared can take that out of the expectation times the expectation of the

summation epsilon Ix. Summing over i. This whole term squared.

6:05

Now you say, look this expression isn't it the same as the last expression.

Looks like saying there's and summation expectation epsilon squared but not quite.

This is taking the sum of the expectation of squared epsilon.

Whereas this one is the expectation of this square of the sum.

So, this one is talking about square of summation, not the summation of squares,

and the two are clearly different. For example, this two user case.

Epsilon plus epsilon two squared, which is what this is about, is epsilon squared

plus epsilon two squared plus two times epsilon one, epsilon two.

This, is what, the averaged error is about.

This is the error, of the average. The difference is the collection of these

cross terms. But, if they are independent,

Then we say that epsilon times epsilon two, You take the expectation over

distribution x on this expression is going to be zero, because these two error terms

are not correlated with each other. So, these cross terms are cancelled.

And in that case, sum of square, and square of sums are the same.

7:30

And therefore, this expression is the same as this expression.

So, the only difference is the multiplication factor in front of it.

In this case its one over n. In this case its one over n squared and therefore we

have the desired relationship, that is the error term of the averaged, the average

first. Then you look at the error, is actually only one over n of the error

term, if you look at the averaged error. So compute error first of the individual

ones, and then take the average versus average first and then look at the error

term. These two are different bi effector,

multiplicative effector of n. And this multiplicative effect of N is

codified as an example of wisdom of crowds in Galton's experiment.

This as simple averaging can work if this unbiased and independent estimates.

There's no bias terms, and these cross terms cancel each other under the

expectation of distribution backs. And we have effected n enhancement.

If there are five people, is five times better.

If there are 1000 people, it's 1000 times better in terms of the mean square error.

9:21

If they're completely dependent. If they're somewhat dependent, then you

would have to look at the expectation of these error terms correlation in more

detail. But you'll be somewhere between the vector

one versus a vector of n or one over n, depending on how you look at it.

This sounds very promising encouraging with a simple duralation, which seems to

be able to identify the root cause of the wisdom of crowds.

Well, and let's look at the positive side before we look at some cautionary remarks.

The positive side says that as long as they're dependent of each other, then you

have the wisdom of crowds. This particular type of wisdom has nothing

to do with identifying who's the expert. It just says that you might be wrong in

all kinds of directions. But as long as you are wrong in different

directions, then we will have the power of this fact of n enhancement of their

return. In hindsight it's hardly a surprise, this

is really the law of large number, plus the convexity of convex quadratic function

at play here. Now, you may object first.

What if I actually know who are the experts here, maybe she's a farmer.

10:45

Well in the advanced material, we'll look at how we can use boosting methods to

extract the more important opinion by the experts.

There will be a very different philosophy. This philosophy in what we just talked

about, only depends on the fact that you can all be wrong as long as you're wrong

in different ways, independent of each other.

Now second objection is what about the scale?

This factor of n effect holds even when n is two or when n is 1,000,000.

To most people the wisdom of crowds means that there should be some threshold value

and star above which you see the effect below, which you don't.

But this one applies just as well to a two people case, so that's some mystery that

remains to be needs to be resolved. The third objection or to say the addition

to this discussion is that this vector n is only one view on [inaudible] crowds we

call the, multiplexing game. There's another real covert diverse

thinking that says if there's some bad events that you didn't want happen, then,

by putting n of these entities together you get a one minus, one minus P bracket

to the power n whereas n is the size of the crowd.

In fact, this is called the diversity gain.

This is called the multiplexing gain. We will encounter this diversity gain in

later chapters. In fact, we encounter both kinds of gains

in [INAUDIBLE] in both technology and the social networks.

Between wireless networks, on the WiFi LTE network, all the way to Galton's example.

But this is our first encounter on one side of this coin.