So now we actually use the, similarity values, before we do that, we have to

calculate the similarity values. And, when we actually do the similarities

we do them on the errors,um, that we take obtained in the baseline prediction.

We basically subtract out to figure out how off baseline prediction was, just for

the training set again, 'cause we can only use the training set.

And we augment baseline predictor with the similarity values that we obtained on

the errors. and so there's a couple reasons why do

that. First is that we really need to center

these values. At zero.

and the reason I've set it at zero is that we're kind of subtracting any, any

bias out already, by the fact that we're going to an error.

So the errors are going to include zero. And in a sense, they'll be centered

around zero. And, in order to do correlation, you need

things to be centered about zero. So then you can either have positive or

negative values, so if we did the money ratings themselves, which are from 1 to

5. You know we can never have negative

values, we'd only have positives, and that doesn't give us that sign

differential that we need. So we,we subtract down and we do them

from kind of around 0 to some Positives and negatives, and second reasons that

we're really, we're trying to correct for the errors here, right?

So this is an augmenting the baseline predictor with this, similarity.

the neighborhood method, really it's called.

And, so we, we want to correct for those errors, so we don't really want to do it

on the errors themselves because that's what we want to, sort of to go away.

And I mean we're not just going to add you know the errors back and give us zero

error on everything. We're going to do it in a way that makes

sense and is not reverse engineering. We'll see how we do that in a minute, but

first let's try to calculate some of the values.

So, here I'm showing the table the table that is.

So here I'm showing the table of error values.

And I've got this again, I got this by subtracting the predictions from the

actual values. So I took the actual values and I

subtracted the predictions. So if the predictions were higher than

the actual values, then this is going to be negative which means, well, we should

have made the prediction lower than we had it.

And if the predictions are less than the actual values, then this will be

positive, which means that the prediction should have been higher.

And so we can use the positive as a negative values accordingly.

but now for movies one and two let's just just apply that cosine similarity.

to do that we have to just figure out how the users that have rated both movies

because we can't use in the equation we can't have users that have only rated

one. And clearly we can't use those that have

rated none, none of the movies. but for instance we can't use A here,

because A has not rated movie two. we can use B, we're good because B has

rated both movies. we can't use C because, well this is part

of the, test set again. So we can't use the test set.

we can't use D, because this is part of the test set.

We can't E because we don't know it's value.

but we can use F because we know both of these.

So we have to use Only b and f for this. So, for movies one and two then, we're

just going to apply that equation that we had before.

We multiply b1 times f1. So we do -0.30 times -0.05, and then

remember we add. So we add the product of the terms, so we

do this times this plus this times this So then we add .17 times -0.58 and then,

remember we have to divide by the length, and then we divide by b, square root of

b1 squared plus b2 squared. So, this is negative 0.3 squared plus

0.17 squared. And then we multiply it by negative 0.05

squared, plus negative 0.58 squared. Then, if we do this out.

We get negative 0.0220 over. 0.3041 times 0.6044.

All right, so this whole top right here comes to this.

This comes to this. And this comes to this.

And then that Is equal to negative 0.11, okay.

So the cosine, or the similarity between movies one and two, is -0.11.

Remember now, we said that we have to see whether it's closer to -1, 0, or +1.

And if it's closer To negative 1 or plus 1, then it's useful, and otherwise it's

not. And so you see, this is really, kind of,

close to 0, it's somewhere around here. so that's, that's not very useful at all.

And so these movies, we would say are reallly not, very correlated.

Now lets try movies III and V if is another example.

So we will go through again and we will see III and V now a works so we'll just

do next it will say check mark we can use a because a has rated 3,, movie III and

this greater movie V. We can use B, again because B is rated

movie three, movie five. We can't use, three because this is part

of the test set. Sorry, it's not part of the test set,

this is, value that we don't know, we can't We can use, user d right here

because these we know both these values. We cannot use e because we don't know

this as part of the test set. And, we cannot use f because this is part

of the test set. But we do have three values now, 1, 2, 3

on each of these sides, so. we have to do a little more.

There's a little more terms here. So, now, we do this the same as last

time. Which is, we have 3 terms, that we're

summing instead of just two. So we do this times this, plus this times

this, plus this times this. So we have negative 1 times negative

0.43, plus. Negative, or plus, sorry, 0.25 times

negative 0.10 plus 0.25, again, times negative 0.10.

They actually turn out to be the same, for both of those.

then we divide by the square roots of this squared plus this squared plus this

squared times the square root of this squared plus this squared plus this

squared. So we do the square root of 1 squared

plus 0.25 squared plus 0.25 squared. Times the square root of .43 squared.

Notice that I'm omitting the square root, and that's because we, when you square

something it becomes positive again. So I don't need to write this garder

rits. + .1squared + 11 squared and If we do

this, if you do this whole thing out this entire multiplication I'll leave it to

you to actually run through the calculation but you get .79 right here.

And so that's closer to plus 1 right. So it's a positive correlation and it's

kind of closer to plus 1 so we would see if these movies are positively

correlated, right. So now we can use this then we can come

up with a full table of similarity values and here I'm tabulating this.

this is a similarity between one and two you can see.

the similarities between also 3 and 5, 0.79 we just found, and so on.

And now a couple things to note. First is that this table is symmetric,

right. So we've said before that things

sometimes aren't symmetric but here they are so the similarity from one to two is

the same as the similarity from two to one.

So that's why 1 and 2 and 2 and 1 are the same, just like you take 2 and 4, for

instance, and 4 and 2, same. So you can slap, flop it over this and,

if you mirror image it over it, it will be the same.

now in, in the next in the next segment, we're going to choose one neighbor or a

movie, right. So we're having the neighbor's movies

here, for each movie we're going to choose one neighbor.

and we could choose more, we could choose two, three or four.

But we've already made the math complicated enough.

But and that would get even more complicated.

So we'll just stick to choosing one neighbor for each movie.

Which will simplifly things a lot when actually go to do it out.

And so, basically down the columns right now we'll say movie one we want to try to

find the neighbor with the highest similarity.

Right, and so that's that would be three here.

Right, and these, a lot of the backgrounds now, of green, which are the

similarities, or the ones with the highest similarities.

So one would choose three as his neighbor.

And for that reason, and again where we were finding the magnitude so we want the

magnitude to be the highest. So even though this is negative, very

negative its still really negatively correlated which is a, a useful thing and

now for 2 we are going to choose 1, we choose between point 11, point 741 and

point 88 And therefore we're going to choose this.

Even though again it's negative it's still the highest in magnitude.

For three, three would choose one because it's got negative 0.82, which is higher

than any other values in magnitude. Four would choose two just like two chose

four. Those are actually perfectly.

negative correlation [UNKNOWN] one other. So, four and two.

And the reason that they're actually perfectly negative correlation is because

there's only one value that we're multiplying here.

And so that's not really that great. I mean, normally you need more data

before you say that two things are perfectly negative correlation.

But here that's just the way it turns out.

Now five will actually choose two to be his neighbor right.

This .88 is higher than the other values. And notice twodid not choose five it

chose four. but five choose two.

So they don't have to choose each other. even though the table is symmetric, they

don't necessarily choose each other. And we could define other metrics, for

instance, we could say, all right, well I'm not going to use similarity at all

unless the similarity value is higher, than like, 0.9.

For instance. And then in that case, we would only use

one neighbor for one pair here, we wouldn't use similarity at all.

Some people do taht because sometimes it makes sense to say well, unless

similarity is high enough, I'm not going to use it.

but here we're just going to choose the most similar and just use that to do our

calculation.