0:19

Recall that the distance between two nodes is the length of the shortest

Â path between them.

Â So for example, in this network, nodes 34 and 2 have a distance

Â 2 because there are multiple ways of getting from node 34 to 2 in two steps.

Â For example, we can take the path 34, 31, and 2.

Â We can take the path 34, 14, 2.

Â Or we can take the path 34, 20, and 2.

Â Notice that nodes 31, 14, and 20 are in the shortest paths between node 34 and 2.

Â And so, this is the kind of question we're looking at with betweeness centrality.

Â What are the nodes that show up in the shortest paths between

Â two different nodes?

Â 1:05

The way we are going to measure centrality,

Â the betweenness centrality of a node v is by taking nodes s,

Â t and finding all the shortest paths between nodes s and t.

Â That, we're going to call sigma s, t.

Â Sigma s, t is going to be the number of shortest paths between nodes s, t.

Â And then, we're going to look at how many of those shortest paths actually

Â contain node v in the middle?

Â That's this value here.

Â So, sigma s, t is the number of shortest paths between nodes s and t.

Â And sigma s, t(v) is going to be the number of shortest paths between s and

Â t that contain node v.

Â And the betweenness centrality of node v is going to be the sum of these

Â ratios overall possible s and t's.

Â Actually, we're going to find that there are different ways in which we can pick

Â the specific s and t's that we use to compute the centrality node v.

Â But we'll talk about that next.

Â Basic idea here is that a node v has high betweenness centrality

Â if it shows up in many of the shortest paths of nodes s and t.

Â One of the questions, one of the options that we have when we do this is whether or

Â not we actually include node v as a possible node s or t.

Â 2:23

Let's say for example that we exclude node v from being a possible node s or t.

Â So then we'll have the following.

Â If we measure the betweenness centrality of node B in this simple network,

Â we'll find that is the sum of three different things.

Â First, we're going to have the number of shortest paths between nodes A and D.

Â And there is only one shortest path between node A and D, which is the path A,

Â B, and D.

Â And so because B is involved in this path, then that is going to have value 1 over 1.

Â 2:57

Next, we look at the two nodes A, C.

Â And we find that the shortest path between notes A and C is the path A, B, C.

Â And of course, that involves node B, so that contributes the value of 1 over 1.

Â Finally, if we look at the pair of nodes C and D, we find that its shortest

Â path is just simply going from node C to D directly, since they're connected.

Â And that does not involve node B, so that contributes 0.

Â And so betweenness centrality of node B when we exclude B from playing

Â the role of s and t, we find that it has betweenness centrality of 2.

Â Now if we actually include node v as one of the endpoints here,

Â then we find that there are many more options to look at, right.

Â So first, we have to look at the pair of nodes A and B,

Â which of course involves node B, so that contributes 1.

Â We look at A, C, which has the shortest path A, B, C.

Â And that involves B, therefore that contributes 1.

Â A, D also passes through node B, so that contributes 1.

Â B, C, well, this one involves the node B itself, so of course node B is

Â involved in the shortest path from node B to C, and so that contributes 1.

Â B, D same story, it's one of the endpoints, so it contributes 1.

Â And then finally, C, D.

Â C, D, again, they're connected to each other.

Â So to get from one to the other, they don't have to pass through node B and so

Â that contributes 0.

Â And so when we include node B in the computation,

Â we find that it has a higher betweenness centrality, in this case 5.

Â We'll find that when we use network x to compute this,

Â we'll have the option of either including or

Â excluding the node as one of the endpoints in the pair of nodes.

Â Now, here we have another concern, which is that sometimes some pairs of nodes

Â are actually not connected to each other, they cannot reach each other.

Â So what happens to the computation when we have this case?

Â Again, this tends to happen more often in graphs that are directed.

Â And so let's look at this example.

Â Note here that node D has no nodes that are actually pointing to it.

Â So actually, no node can actually reach node D.

Â And so how do we compute this given that no node can actually reach D?

Â If we were simply to apply the definition as we stated it and

Â we actually include pairs for example A, D, then sigma A,

Â D would be 0 because they're no shortest paths between A and D.

Â And we have that sigma A, D in the definition a=is in the denominator.

Â And so this would make this undefined.

Â And so, we have to fix that in some way.

Â 5:48

And what we do is that we simply only consider the nodes that actually have at

Â least one shortest path between them when we're considering nodes s and t.

Â So, in this case, what is there betweenness centrality of node B?

Â Let's say we're not including node B in the computation as one of the endpoints.

Â Well, then we have to see which ones are the nodes that actually have at

Â least one shortest path between them.

Â And we'll use those in the computation.

Â So A, C has a shortest path, thus A,B, C.

Â And B involved in that, so that it contributes 1.

Â C, A, C can reach A in just one step by connecting directly to it.

Â That does not involve node B, so that contributes 0 to the computation.

Â D, C, again, they're connected without including B, so

Â that's contributes 0 to the computation.

Â And then D to A, that has the shortest path D, C,

Â A, which does not involve node B.

Â And therefore, it contributes 0 to the computation.

Â Notice that you can also go from D to A passing through B.

Â You can go D, B, C, A.

Â But that is a longer path, so it's not the shortest path.

Â And so B is not involved in the shortest path between D and A.

Â And so in this case, node B has a centrality of 1.

Â 7:17

Let's look at the same question for node C.

Â And so again, we have to look at all the pairs of nodes that actually have

Â a shortest path between them.

Â And we're not including C as one of the endpoints.

Â And so the first one is A, B, which there is a direct connection between them,

Â and does not involve C, so that contributes 0.

Â B to A, the shortest path from B to A is B, C, A.

Â And that involves node C so it contributes 1 to the centrality of C.

Â There's D, B, they're directly connected, so that contributes 0.

Â And then there is D, A.

Â And again the path from D to A, the shortest path passes through node C, so

Â that contributes 1 to the computation.

Â And overall we've find that node C has a betweenness centrality of 2.

Â 8:09

So, so far we haven't talked about normalizing the betweenness

Â centrality in any way.

Â And the problem with this is that nodes that are in graphs that have

Â a larger number of nodes will tend to have higher centrality than nodes

Â of graphs that are smaller in terms of the number of nodes.

Â That's simply because in large graphs, there are more nodes, s and t,

Â to choose from to compute the centrality of the nodes.

Â And so for example, if we look at these friendship network in the 34

Â person karate club, the nodes there are going to have lower centrality

Â than the nodes in this larger network of 2200 people.

Â And so, sometimes if we want to compare betweenness centrality across networks,

Â it's useful to normalize.

Â 9:16

And in directed graphs, you have twice the number of pairs because for

Â any pair s, t, you could have a path from s to t, but

Â also a potentially different path from t to s.

Â So you would divide the betweenness centrality of node v by (N-1)(N-2).

Â And in network x, you can use the function betweenness

Â centrality to find the centrality of every node in the network.

Â And you have the various options that we've discussed here.

Â So for example, you can choose to normalize or not.

Â And you can also choose the question of the endpoints,

Â whether you use the node that you're computing the centrality of as one of

Â the endpoints in the computation of its centrality.

Â So you can choose to do this in any way you want here.

Â So for example, if we look at the karate club and look at betweenness centrality,

Â compute the betweenness centrality of all the nodes and then find the five largest,

Â the five nodes with the largest betweenness centrality,

Â we find that these are the nodes 1, 34, 33, 3, and 32.

Â Now one of the issues with betweenness centrality is that it can be

Â very computationally expensive.

Â Depending on the specific algorithm you're using,

Â this computation can take up to order number of nodes cubed time.

Â And just for us to get some idea about this,

Â look at the network of friendship, marital tie, and

Â family tie among these 2,200 people.

Â This is a relatively small network, yet when you look at the number pairs of nodes

Â that it can have, you have a new order of 4.8 million pairs of nodes.

Â And so even small networks have lots and lots and lots of pairs of nodes.

Â And so computing the betweenness centrality of these networks becomes

Â pretty expensive.

Â So one of the things that you can do is rather than the computing betweenness

Â centrality based on all the nodes s and t, all the possible nodes s,

Â t in the network, you can approximate it by just looking at a sample of nodes,

Â instead of looking at all the nodes.

Â And in network x, you can do this by using the parameter k

Â that says how many nodes you should use to compute the betweenness centrality.

Â And so here, I'm computing the betweenness centrality of the nodes in

Â the karate club network using only 10 nodes rather than 34 nodes.

Â And so this gives you an approximation for

Â what the betweenness centrality of the nodes actually is.

Â And if I look at the five nodes with the largest

Â approximated betweenness centrality,

Â we find that these are nodes 1, 34, 32, 3, and 2.

Â So we get almost exactly the same list as we did when we didn't approximate,

Â when we find the actual betweenness centrality, except that we now get

Â 2 as one of the top five and we lose 33 as one of the top five.

Â So it gives us something that is close to the actual.

Â But of course, there can be some differences since now

Â you're only using 10 rather than 34 nodes to compute the centrality

Â 12:28

The other thing that sometimes is useful is that sometimes you rather compute

Â the betweenness centrality based on two subgroups in the network,

Â not necessarily looking at all potential pairs of nodes.

Â But you maybe really care about two groups communicating with each other.

Â So you want to find what are the most important nodes in this network that tend

Â to show up in the shortest paths between a group of source nodes and

Â a group of target nodes?

Â And so to do this in network x, you can use the function betweenness centrality

Â subset, in which you pass the graph and then you pass the set of source nodes and

Â the set of target nodes.

Â And you can choose to normalize or not.

Â In this case, I'm normalizing.

Â And I'm just sort of here selecting two groups of nodes, pretty arbitrarily,

Â just to kind of give you an example here.

Â So we're going to see, based on these source nodes and target nodes,

Â what are the most important nodes?

Â And again, here what the meaning of these source nodes and

Â target nodes is that when we select the nodes s, t to compute the centrality of

Â all the nodes, we're always going to choose s from the set of source nodes,

Â and t from the set of target nodes, rather than selecting all possible pairs.

Â And so when we find the top nodes here with the highest betweenness centrality

Â in this setup, with these source nodes and these target nodes,

Â we find that nodes 1, 34, 3, 33, and 9 are the most important nodes.

Â Now notice that these tend to be the nodes that also have highest centrality when you

Â don't restrict to source and subset of source and target nodes.

Â But there are some changes.

Â So for example, we had not seen that node number 9 was important before.

Â Now we see that it's important in connecting this particular sets of nodes.

Â 14:15

The other thing you can do is you can define the betweenness centrality of

Â an edge, rather than the betweenness centrality of a node,

Â in much the same way that you defined betweenness centrality for a node.

Â So if you're defining the betweenness centrality of an edge,

Â you're going to again look at pairs of nodes as t.

Â And you're going to take the ratio of the number of shortest paths in going from

Â s to t that involve the edge e divided by all shortest paths between nodes s and t.

Â So it is the exact same definition.

Â But now rather than asking is this particular

Â node showing up in the shortest path between s and t,

Â we are asking is this particular edge showing up in the shortest path?

Â And in network x, you can use the function edge betweenness centrality to find

Â the betweenness centrality of all the edges in the network.

Â 15:30

In the same way that you could define a specific set of source nodes and

Â a specific set of target nodes, you can do the same thing when you compute

Â the edge betweenness centrality rather than node betweenness centrality.

Â And for this, you can use the function edge betweenness centrality subset.

Â And you pass again the graph and the source nodes and the target nodes.

Â And if we find here the top five edges with the highest betweenness

Â centrality for this particular choice of source and target nodes,

Â we find that these are the the most important ones.

Â And notice that most of them tend to be edges that go from inside the target or

Â inside the source set to the outside.

Â And that make sense because these are the ones that actually end

Â up showing up in the shortest paths between the source and the targets.

Â 16:37

So in summary, we say that betweenness centrality makes the assumption that

Â important nodes tend to connect the other nodes.

Â And this is the formula that we use to compute it.

Â In general, it's the sum of the fraction of the number

Â of shortest paths that involve a particular node v divided by all

Â the possible shortest paths between the nodes s and t.

Â We talked about normalizing this, especially if we're comparing betweenness

Â centrality among different networks of different sizes.

Â So we divide by the number of pair of nodes.

Â We also talked about approximating this because sometimes we're unable

Â compute it exactly because it can be computationally expensive.

Â So we can approximate it by selecting a subset of nodes rather than all the nodes.

Â And we showed you how to do this in network x.

Â We also talked about choosing specific sets of target nodes and

Â specific sets of source nodes rather than using all possible pairs.

Â That's if you have a particular sets of nodes that you care about and

Â that you want to know, who are the important nodes that

Â are connecting nodes in this two specific sets.

Â And then we talked about how we can generalize this a bit more and

Â talked about the betweenness centrality of not only the nodes but also the edges.

Â Much in the same way that we define it for nodes, we can also define for edge.

Â This is all for this video.

Â Thank you for watching and I see you next time.

Â