Welcome to Lecture three of Networks, Friends, Money, and Bytes. In today's lecture, we will continue to talk about Google. But this time, how Google rank webpages. In the last lecture, we talked about how Google auctions the advertising slots, usually on the, the right panel of the search results. And today, we'll talk about how Google ranks web pages displayed in the main panel of the Search Result page. And we will see that, the algorithm that Google uses to rank these web pages, can be viewed as the solution to a gigantic system of linear equations. So, while in this lecture, we'll start using some Linear Algebra and matrix notation to accelerate to the presentation, you should know that at the heart of all these notation, it is just solving linear equations. Now, the idea that links can be embedded in text dates back to at least the middle of last century. But then in the 1990s, with the introduction of the Web in '89, the browser in 1990, and the web portal in 1994, this vision of linking text grew to a gigantic scale. Now, we can view these web pages, as a network. They form a network of information represented as web pages. So, each node is a web page. And a link is what's called a hyperlink. It connects the text in one web page to that of another web page and you go from this page to the other by clicking on that link. As we all know, that webpage A points to webpage B doesn't mean that B points back to A so we have to provide a sense of direction to this link. We call this a directional link. And the resulting graph is called a directed graph. How big is this graph? It is very big. Nobody knows for sure but it is estimated that N, the number of webpages out there, is somewhere between 40 to 60 billion. At the same time, this graph is also very sparse meaning that there aren't that many links compared to the number of no's out there. Okay. Many of these webpages have very few links pointing in or links pointing out. So, we're talking about a huge and a sparse graph. It's a very special kind of graph. And the idea is that we will be able to leverage the pattern of connectivity in this graph to calculate what we call the importance score of each node. And in this case, each node is a web page, and importance score will be used to rank that page and what we want to do is to say that let's turn a seemingly circular logic that says, important nodes pointing to, say you, you being a web page, means that you are also important. This argument almost sounds like a cyclic argument. But, in fact, we can use it to derive a recursive definition to define, define an equilibrium. Now, this equilibrium is not the equilibrium of a convergence of an algorithm, like in power control, or the equilibrium of strategic thinking as in gang theory. This is a fixed point equation equilibrium as we will see in the rest of this lecture, alright? So, coming back to this question. How can we quantify the relative importance of different nodes in a graph? Or, more specifically in this case, quantify the importance of each web page. We will see later in Lecture eight, quite a few different ways to quantify the notion of importance of nodes in a graph. In today's lecture, we'll look at a specific one used to, by Google. Now, there are quite a few intuitions you can think about this. One is to say, well, you want to quantify node importance, right? So, maybe, I'll just count how many links does this node have. Well, since this is a directograph, where each directed link, I'll just count how many links point into this node. In this case, there are three links pointing into this node. So, we call this the in-degree. And the out-degree is one because there's just one node, one link pointing out of this node. And the total degree of this node is four. Okay. So, this is a simple intuition as we will see that it's sort of getting on the right track, but it lacks many elements for it to be a useful metric. Now, we will also see in the discussion of important scores of nodes, a recurring theme in this course. And that recurring theme says, that we never will talk about a network whether it's a network of people, a network of webpages, a network of wireless devices. There actually are two different elements in it. One concerns the topology, that is the pattern of connectivity, and that's usually represented visually by graphs and algebraically, by matrices. We will see within today's lecture, three matrices to describe an increasing order of complexity, the graph that we are dealing with. But that's only half of the story. The other half is the functionality that you would like to study that's running on top of this network. That's what you do on the graph. In today's case, what you do is, can be viewed as a special case of navigation or search. Later in Lecture nine, we will talk six degrees of separation. That's also a kind of navigation and search. In Lecture thirteen, we'll talk about routing on the internet. That's another special case here. So, in order to fully understand a network, we need to know not just properties of the topology, but also viewed models of functionalities. And, in fact, in Modules three and four of today's lecture, we will see some very crude models of the functionality of search and navigation that is implicitly assumed behind Google's ranking algorithm. Alright, now, back to our intuition. That intuition says, let's count the number of links. We call that Try number zero, that's the very basic intuition. And we will soon see that is hardly a very useful one, when it comes to ranking web pages. Here's a little smarter idea that says, what we can do is to say, each node has a certain score. Give it a number, give it a positive real number. We're going to denote that number as pi sub node index. In this case, say, node one. So, it's got a number denoted as pi sub one. And another node, say node two, also has a number, pi sub two, okay? And here's another one, node three with a number pi sub three. And our job for the rest of this lecture is to compute these pis, what we called importance scores, we can collect them into a vector, pi one, pi 2... Up to pi N. This is a very long vector and by convention, it's a column vector and we call this vector pi, okay? In the textbook or in written media printed media, you can't use bold face. It's hard to draw bold face on the screen so we're just going to use this arrow to represent that vector. So, pi vector, that's what we want to compute. So, Try one says, that look, you got these pis, okay. Maybe you can just add up the pis that points to you. For example, if node one and node two both point to node three, then I can say that node 3's importance is the sum of pi one and pi two, okay. Simple enough. This is slightly better than just counting degrees cuz I now allow each node to have different important score, okay? And, of course, in general, first node and second node, they may have many other connections coming in and going out. So, this is a reasonable choice, okay? However, what it ignores is that some node, like this node one here and two, they are pointing to many other nodes, not just pi of node number three. So, think of a professor who writes reference letters for her students. And if this professor keeps writing that this is the very best student I have ever had in my 80 years of teaching at this university, and so on. If every student should rise in that way, and should rise like, say 100 letters every semester, then soon people will realize that, hey, this professor's reference letters should be discounted. And that's the idea here, is that you've got to normalize by how you spread your important score. If you point to too many web pages, then each of those web pages that you point to should not receive all your importance. And that is the idea that maybe I should look at the number of outgoing neighbors that I have. So, if node one with importance score pi one points to three other web pages, then O sub one = three, how degree is three. And I should divide node was important square pi one by O one, okay? So, I should be assigning to pi one over three to each of the three outgoing neighbors. And therefore, if you point to more nodes, each of them will uniformly receive pi one over, say, five, instead of pi one over three. And if you only point to one node, then that node receives all your important score. So, that's try number two now, okay? The ideas that we get to normalize by the strata of importance. It turns out that by this point, we're sort of close to what Google does. In the next of three modules, we will have to quantify this intuition and then, provide two more enhancements about what are getting there. The idea is that, look at all the importance scores, add them up across the links pointing to you, but make sure you normalize by the outgoing number of links. Now, you may wonder, if I have a node here, number two with important square pi two, I can write pi two as some combination of the important scores normalized from the incoming neighbors. And each of these, say, this involves pi one, can also be written as the summation of normalized important scores from its incoming neighbors. So, the question is, on the left-hand side, I will have pi one, pi two, and all the pis. On the right-hand side, I also have all these pis, divided by their outgoing number of neighbors. So, I've got pis both on the left-hand side and the right-hand side of this, this linear equation system. Can they actually make sense collectively? Is there is a set of consistent scores pi? Is there a vector pi that, can satisfy all these linear equations simultaneously? If so, then we have found a particular meaningful and self-consistent way to assign importance scores. If not, then we're in trouble. So, the question now is how can we modify this basic intuition in our try number two now, in order to guarantee both the existence of such a set of consistent scores and an efficient way to compute these scores? But before we go into the general space, let's look at a simple, very small example. There are only four nodes and six directed links. Nodes one and two points to three, three points to four, and then four points to back to one, two, and three, okay? Now, what is the intuition of the importance score vector pi, which should be a vector of length four here? If we assume that such a pi consistent set of scores actually exist. Here's one particular view. We can think of nodes three and four as collectively acting like a supernode, okay? Let's call this supernode three+4. And here's node one. Here's node two. One points to supernode three + five, so does two, and a supernode points back, okay? So now, intuitively, you look at the supernode and say, this supernode got two links pointing inwards. But each of these nodes, one and two, only has one length pointing inward. So, intuitively, you'll think that nodes one and two should each, has a smaller important score than the supernode three + four. And furthermore, by symmetry of this graph here, nodes one and two should have the same importance score. And then, the supernode three + four. Need to distribute its score in some way between nodes three and four. So, at least we know intuitively that pi one should equal to pi two, and each of them should be smaller than pi3 + four, the supernode score, okay? So, intuitively that's what we should. Anticipate. Now, we can write down the math a little further and calculate in this small example very easily the result. Now, first of all you'll also notice that, in this node of three pointing to node four, it says, that basically all the important score of pi three that we get goes into node four. And node four has only got this one incoming link. So, pi four should equal to pi three, or pi three divided by one, cause there's only one outgoing link from node three. And plus what? There's no other terms. There's only one link pointing inwards to node four. So, we can just say pi four equals pi three, and pi one equals pi two. Now, once we know that, the job is very simple now. Let's, say, pi one equals pi two, this number is x. Pi three equals pi four, this number is y, okay? Now, we know that there is one link from node four pointing to node one, right? But this node four is actually pointing outwards to three other nodes. So, what we can write down now is that x = y / three, okay? This is x, x, y, y. So, just by looking at the this arrow, we can write down x = y / three. At the same time, we also know that x + x + y + y. So, the summation of all the important score need to be normalized. Let's say, normalized to one. Why do we need to normalize it? Because it doesn't matter what is the actual absolute magnitude. All that matters is the relative importance of one node relative to the other nodes. So now, we got two equations. One says, x = y / three. The other says, by normalization 2x + 2y should equal to, say, a constant one. Now, we got two variables in two linear equations. In this case, we got a solution. The solution is x = one / eight and, and y = three / eight, which satisfies our intuitive understanding in the last slide. Now, therefore, what we're going to output is that we will say, nodes three and four pi for position number one. Three, and four, and nodes one and two tied for position number, I guess three, okay? Cuz positions one and two are pi between nodes three and four. So, you can't display three and then four, and then one and then two, for example, on the Search Result page by Google. Now, is this the right answer or not? And that's actually a tricky, almost philosophical question. What is a right answer? In some sense, the right answer depends on how does each user find that Google search results useful or not as useful. Now, because Google is such a dominant force in the search space today, it's actually hard to imagine what other search orders might look like so it's not easy to actually falsify whatever we might conjecture about that. And it is just very hard to measure or quantify subjective view on the usefulness of the search results, okay? So, what we are doing here is really a proxy of the real problem, which is make each individual users find the search results as useful as possible. And furthermore, as you can see, that we carried out a whole bunch of computation. Well, in this case, it's not that much, two variables and two linear equations. But the actual numbers of the important scores are not displayed in the search result page. All you see is the rank order. You do not see the scale behind the order. So, order is displayed. Scale, the numerical scale is hidden from users' view, okay? So, in that sense, scale is really a means to the ends. So, we don't know, for example, that this node is three times more important than that node. Now, these are some of the interesting caveats as we try to understand the true meaning of usefulness for search results. But for the rest of this lecture, we will just focus on computing this scale now, which translates as a special case into the order of presenting these results on the Search Result page. So now, we have seen a very, very small example. The question is, can this computation be generalized to an arbitrary structure of the graph? And furthermore, can we efficiently compute when the graph becomes very big?