0:00

Welcome to Lecture four of Networks, Friends, Money and Bytes.

And today's topic is Netflix. Let me first warn you though, that

throughout the course of twenty plus one lectures, if you plot the length of the

video lecture, okay? Against these twenty lectures, they go up

and down. The average might be 75 minutes across all

the modules, okay? So, they may go up and down if I

interpolate. Now, this Lecture four is actually, going

to be by far, the longest lecture. You will see that we have quite a few

modules and each module is kind of long because we have to introduce both the

concepts of collaborative tutoring, okay? Individualized recommendation by scaling

out the system, the network as well as the mathematical language of more optimization

theory. Again, if you plot the level of

mathematical difficulty involved, or the amount of symbolic operation involved

against these twenty lectures. So, they go up and down quite a bit

actually, you know? I'll say that, this lecture is perhaps, by

far the most mathematical one. You, if you can pass lecture, you probably

can face any of the remaining mathematics throughout the rest of the course, okay?

So, to some degree we are climbing up a hill today, and this hill is both

conceptually literally involved, and mathematically would require some hard

work as climbing up a real hill would demand.

But, if you can climb through this hill, then the rest of the course should be

actually much easier, especially on some of the lectures coming up.

But, having said that, let's first review what we had in the past three lectures.

We saw three beautiful equations, and they are very useful equations too.

The first one was in Lecture one, DPC, Distributed Power Control used in 3G CDMA

data networks. And it says that, for each transmitter of

logical pair, i, the transmit power, P, at the next discreet time slot, t + one

should be whatever it is right now, P sub i of t, times a gang parameter and that

parameter is the ratio between the target SIR gamma, and the currently achieved SIR

on this length. Now, this does not change over time as far

as this algorithm is concerned, it is a constant.

And this varies over time as different people change their transmit power level.

And we saw that this is essentially, an implicit feedback signal telling you

whether you should increase or decrease your power.

And the second equation that is beautiful and useful we saw in Lecture two can be

summarized as following, that it is the payoff function for the i-th bidder in an

auction, U sub i is a function of all the bidders' bids, just like the SIR is a

function to all the transmit powers, the vector b.

And, it is the difference between your private independent evaluation, V, and the

price you have to pay, which depends on all the bids.

And this function, the price, which is not the power now, the price, as a function of

all the bids. The shape of this function is determined

by the designer of the auction. And for different kind of function P of b,

you will induce different kind of behavior by the bidders.

They will bid different, bi, stop the optimal according to some metric, b, for

example, whatever maximizes the utility. Now, of course, this is assuming you

actually win the allocation. If you do not get item then utility is

zero. This definition of payoff function, which

also highlights the fact that you can design the auction and then that will

induce different bidding behavior, is the second equation.

The third one, which we touch upon last lecture, can be written as the following.

Simply the vector pi star transpose equals pi star transpose times G.

This is the Google matrix that we defined last time, and it is defined in such way

that the corresponding eigenvector for responding to the largest eigenvalue of

one can be uniquely defined and efficiently computed, okay?

So, this defines that Google PageRank score, which further leads to the rank

ordering of the web pages. Now, if you count how many times DPC is

used in the mobile world. If you count how many times auctions used

in many contexts including the online space, if you count how many times the

Google PageRank is used every time you search on Google, you can tell these three

are powerful equations. They are simple but they are very, very

widely used. So, now, we're going to continue this

track of four lectures. We've talked about power control,

distributed protocol. We talked about second price option, we

talked about PageRank. And now, collaborator filtering for

recommendation, like on Netflix. And these four lectures also introduce the

language of optimization, then game, then graph, and our learning theories.

Now, the backdrop is Netflix. To those of you who are in the US, you

know Netflix very well, started it's DVD rental business a while back, actually, in

1997. So, the story says that, the founder of

Netflix was tired of paying fees to the DVD rental stores.

If you remember those days, there were a lot of those on the street corners.

And if you don't return the DVD in time, or the VCD in time, or the tape in time,

then you pay extra fees. He says, well, that's not very appealing

to me. So instead, he says, you can keep the DVDs

as for as long as you want, okay? As long as you keep paying the monthly

fee, then that's actually great thing for the company.

You don't return, I don't send you new ones, okay?

So, you can keep paying the monthly fee without getting any new DVD.

This idea is going to show up again in a completely different context of sliding

window control in TCP, congestion control. That would be in Lecture fourteen.

But, is effectively the same concept. You don't return, I don't give you more.

And by 2008, in the US, there were about nine million users of Netflix DVD rental.

And then, that year or so, Netflix started trying out something quite new of using

the internet to deliver feed. Internet as the medium instead of the

postal service to deliver video. So, the content is stored in some video

servers, is sent through the internet and may be wireless networks to your internet

connected devices, which could be TVs or set-up boxes, but also could be game

consoles, smart phones, tablets, PCs and so on.

And by 2011 spring, it's got 23 million customers.

8:22

And, in summer of 2011, it was counted that about one in every four, a quarter of

the bits going through the internet that month was Netflix traffic.

That was huge, okay? Now, of course, video is a, a very big

files and therefore, if you count by volume, you're going to get big numbers of

percentage. But still, one in four, that's a lot.

And then, in September 2011, you may remember that Netflix tried to split into

different, two different businesses separating DVD rental, online streaming

and then they later put them back together but the billing still what became

separated. Now, later what we will look at the

technology network involved for multimedia streaming over the internet in Lecture, I

think, seventeen. We look at Netflix, and Amazon, Prime,

Google, HBO, Go, IPTV and so on, look at the differences across the models.

Today, this Lecture four, the focus is, however, on the social network dimension

of recommendation system, okay? You and I, the customers on Netflix, also

form a network. These 23 million and, bigger number now,

they form a social network. They don't directly interact, but the way

that they watch and rate movies will be used and leveraged by Netflix to make

better individualized recommendation. Think about it this way, you're on open

online platform, okay? Whatever that platform might be.

And there are, maybe, 100,000 of you. By scaling out, scaling up a social

network, we actually can then scale down the individualized recommendation because

we get to see other like minded people like you how they behave, what they like

and don't like. And we have much more data.

So, even if you don't provide a huge amount of activity on the platform, we get

to see others. And this is the basic idea of

10:39

Collaborative Filtering, CF. Now, there are many kinds of a

recommendation system. For example, on Amazon, you may notice

that each time you refresh the homepage after you log in, they will recommend

different kind of products to you. That's largely based on what's called

content based filtering. Basically, if you purchase certain kind of

products before, like certain kind of electronics, then they will recommend

similar electronics to you in the future. If you buy a book written by Steven

Hawking, then they will try to recommend new books by Steven Hawking.

That is called content based filtering, okay?

Look at what you bought, what you'd liked, and then recommend accordingly.

Youtube, we will see later in a few lecture's time the way they make

recommendations is largely driven by the so-called co-visitation counts.

Briefly speaking, it means that it will look at how many users watch these two

videos back to back, okay? And then if so, then we put a score to the

co-visitation count between these two videos, say A and B.

And then, this will give you a way to start quantitatively decide, what video

should you recommend. We'll come back to this, later.

In Pandora, if you have used that, it is a mostly expert-driven recommendation of

mix-up music. But, as a consumer you get to thumb up or

thumb down Just like in some discussion forums, you get to vote up or down.

Now, Netflix does not want to use expert-driven movie review, even though

there's no shortage of movie reviewers, professional ones out there.

It instead, uses a collaborated filtering. The idea is that not only we'll look at

what you like, but we also look at what similar people similar to you liked and

watched. The question is, how do we define

similarity between people, or flipped the other way around between movies.

Before going further, let's first more rigorously define this recommendation

system. A recommendation system is very helpful

feature, okay? For stickiness of the consumers for

inventory control and so on and so forth. Now, in the case of Netflix, you can think

of this as a, say, a black box. There are inputs, outputs.

What are the inputs? First of all, the user ID, okay?

That's your log in. We're going to index user by u.

Second, movie ID, we're going to index each movie by i.

And then, there will be of course, a rating which is really a number drawn from

one, two, three, four, five stars, and we call this r sub ui, okay?

R for rating, of user u or movie i, And, the time that this happened, this review

was entered, it's tui. Of course, there's also the review of

text, okay? And, we're now going to talk much about

text review in today's lecture. So, we've got u and i, and rui, and tui.

How many rui's and corresponding tui's are we talking about here?

Actually, a lot. Think of millions of users and tens of

thousands of movie titles. Of course, only a few percent of the users

actually watch all the movies out there. And, or any substantial amount of movies

out there. And, among all the movies that you

actually watched, you likely will only rate a few of them.

Some people like me, actually, rate none of them, I just don't like rating movies.

But, a lot of people rate. But, they don't necessarily rate every

single movie that they watched. Despite that, we're talking about the

order of billings of rui's for Netflix. Now, it's a much challenging problem,

actually, when you just have very few rating in your database, the co-star

problem. But, Neflix doesn't have that problem now.

Alright, those are the inputs. What does the recommendation system do?

What is the output there? The output, primarily of course, is the

predicted rating, lets put a r hat ui, okay?

Now, in the case of Netflix price, they actually know the true rui.

They just don't tell you, the competitor into the price, competition.

In that case, you actually know the ground truth, then you can compare your

prediction r hat ui and the actual rui. But, of course, in the real operation of

Netflix, they don't know the ground truth that's why they want to recommend you to

watch their movie, okay? So, you are going to just have to believe

that if it works well in the test set where you know the ground truth as if it's

going to work well, and other part of the system too, okay?

So, this r hat however, could be a fractional number, it could be a real

number, for example, 4.2 star. You cannot rate 4.2 star but the

prediction might say 4.2. So, you can interpret that as maybe 80

percent of the chance this user u will rated this movie i with five star, and

maybe twenty percent of chance he'll rate it as four star.

Maybe that's an interpretation of number 4.2 This r hat to be also be going above

five or go under one, but then we have to clip it so that if it is above five, it's

five, if it's under one, we just call that one.

All right. How do we compare two recommendation

systems? What is the metric of [inaudible] that

we're going to use to do the comparison to say, look, this recommendation system is

better than the other one. What do you mean by better?

Well, there are three levels of understanding here, going from most

relevant but least tractable to most tractable but, perhaps, least irrelevant

among the three. The first level is there, of course,

eventually, what I care about is Netflix is customer satisfaction.

If I recommend you to watch some movie, do you lack that recommendation in the end?

Okay. That actually is just very hard to gather

data. The second level is, all right, at least I

want to know what is the prediction effectiveness, okay?

If I recommend some movie, do you even watch it or not?

Let alone you like it or not? Again, that's hard to collect.

So, we're going to look at a proxy. Just like in Google PageRank.

Eventually, what really matters is, does the search inquirer actually find the

resulting search listing useful or not? But, that ultimate test is very hard to

quantify and collect data about. So, they use the this, PageRank to say,

well, I'm going to say, this is the importance of nodes.

Here, we are going to just say, let's quantify the notion of prediction error

here by looking at r hat ui and rui, okay? Again, how do you know rui?

In real life, you don't. But, when you compare different

recommendation systems, you can hold some known data as ground truth and then use

that as a test data set, okay? Well, of course, I would look at the

difference between the two. Maybe this over shoots, maybe this is

under shoots. So, it can't just be the difference.

It could be the absolute value, okay? It could also be the L2 norm, the square

of the difference, okay? So, if I overshoot by one star or

undershoot by one star after squaring the all positive numbers, meaning, positive

error, okay? Term.

So, I'm going to square them. And, how many do I have?

Let's say, I've got a C of, of those ui pairs.

19:29

So, I'll have to sum up across those ui pairs, and the C of them, this squared

difference. But, since you squared the things, you

actually changed the scale to bring the scale back down, people often put a square

root. And that is so-called Root Mean Squared

Error, RMSE. This effectively, so-called L2 norm of a

vector. The vector being the error vector, the

difference between r hat and r across all these ui pairs.

You're going to take a look at the each entry of this vector square it, then

average it and then take the square root. Of course, when you minimize an error,

whether you minimize the square root of the error or without a square root the

solution won't change, okay? So, min square error or root min square

error, either one, can be used equivalent as the objective function that you want to

minimize. All right.

So, now, we're looking at this RMSE or MSC as the quantitative metric describing how

wrong your recommendations are across this set of test points, and we'll use that as

a proxy for these eventual goals. So, our job is not to say, give me the

inputs, okay? U and i, and rui and tui, I'm going to

find a way to make recommendations and give you an output r hat, okay?

I will train this black box by looking at known r and known, and, and the resulting

t hats. How do I train them?

By minimizing the RMSE. Then, I say, alright, good, I'm done with

training this black box. Now, throw at me some questions where I

don't know the ground truth r. Maybe you, as the examiner, knows the

ground truth r. Well, good, then keep it to yourself and

I'm going to make a guess r hat. Then, you check your ground truths, see

how good I am. That is the spirit of Netflix price.

It was announced in October 2006. And, the challenge is to say, can you make

a really good recommendation system? I'll give you data to train your

recommender. Now, can you make it work really well,

better than what I currently have, with I meaning, Netflix.