0:08

Hello. Today we're going to talk about the concept of distance in social networks.

Â So the idea here is that sometimes we'd like

Â to know how far nodes are away from each other.

Â So for example in this network that you see here how far is node A from node H?

Â Or we'd like to know, for example,

Â are some nodes far away from

Â each other and other nodes close to each other in general in this network?

Â And if so, which nodes are closest and

Â which nodes are the furthest away from each other in the network?

Â To answer all these questions,

Â we need to develop a concept of distance between nodes.

Â And that's what we're going to do in this video today.

Â So to answer this question,

Â the first concept we need is the concept of a path.

Â A path is simply a sequence of nodes that are connected by an edge.

Â So for example, we can find paths that go from the node G to the node C.

Â Here's the path G-F-C.

Â And you can find different paths so for example here's the path G-F-E-C.

Â So how far is node A from node H?

Â Well, we will look at different paths that go from A to H.

Â So for example, we can find a path A-B-C-E-H.

Â And notice that this path it takes four hops to get

Â from A to H. And there are other paths so for example there is

Â the path A-B-C-F-E-H which takes five hops.

Â And so for a path,

Â we're going to define the path length to be the number

Â of steps that it contains from the beginning to the end.

Â So in this case path one has length four and path two has length five.

Â And to define the distance between two nodes,

Â we're going to define it to be the length of

Â the shortest possible path between the two nodes.

Â So going back to the question of what is the distance between node A to node H,

Â the answer is four, because the shortest path between

Â A and H has four hops or has length 4.

Â In network X, you can use the function shortest_path

Â to find a distance from any node to any other node.

Â So here in this example finding the distance between

Â node A and H in the graph G which is the graph that you see here.

Â And so here you get the shortest path between A

Â and H. If you're interested in just the length

Â of this path then you can use the function shortest_path_length,

Â and this gives you the length of this path which is four.

Â Okay, so sometimes what we would like to do

Â when we have real social networks is to find a distance

Â from a single node to all the other nodes in the network to figure out how

Â far away are other nodes from this specific route to node, in this case A.

Â Let's say we're interested in figuring out what distance

Â from node A to all the other nodes in the network is.

Â Well, this is something that you can easily

Â do manually in the network that's very small like this,

Â but this could become very tedious to do

Â manually if you have a very large social network.

Â And so what you can do is you can program a computer to do this.

Â And so what we're going to do is we're going to talk about one of

Â the efficient ways that we

Â have to compute the distances from a given node to all the other nodes.

Â And this one is called breadth-first search and what it basically does is that you

Â start at a node and you start kind of discovering different nodes or different layers,

Â and at each given layer,

Â you discover the set of nodes that are one

Â distance away from the nodes that were in the previous layer.

Â So, I'm going to walk you through an example of how breadth-first search works.

Â So here we have the network and we're interested in figuring out

Â the distance from node A to all the other nodes in the network.

Â So what we're going to do is we're going to start at A and we're going to start

Â discovering new nodes as we kind of walk through this network.

Â And we're going to be writing down all the nodes that we discover.

Â So we start at A and we sort of process the node A by looking at who is connected to A.

Â In this case, K and B are connected to A and

Â so those are going to be a distance one away because

Â they're the shortest path from each one of those nodes to A it's just one hop, right?

Â A path of length one.

Â Okay, so now we're going to process each one of the newly discovered nodes and

Â ask which nodes are connected to

Â this newly discovered node that we haven't discovered yet?

Â And those nodes are going to be assigned to the next layer.

Â So let's say we process node B.

Â Node B is connected to K, A and C.

Â But we've already discovered nodes A and K,

Â so the only node that we discover here is node C. Now we're going to process node K,

Â and node K is connected to node A and B,

Â but we've already discovered both of those.

Â So the only newly discovered node is node C and it's a distance two away from A.

Â Now we process node C which is connected to B, F, and E.

Â And here we've already discovered B so the only two nodes that we discover are

Â F and E and those are a distance three away from A.

Â Okay, now we're going to process node E.

Â Okay, node E has five connections and out of those five,

Â C and F we already discovered.

Â So the only new ones are the other three which are D,

Â I and H. So we assign those to the next layer.

Â Now we process node F which is connected to three nodes G,

Â C and E. But the only one we haven't discovered yet

Â out of all those is G so I want this to get assigned to the next layer.

Â And all of those nodes are a distance four away from A.

Â Okay, now we have to process each one of those newly discovered nodes.

Â And by now you can see that we're already almost done here.

Â So let's process node D which is only

Â connected to E. But we've already discovered E so D does not discover any new nodes.

Â Now let's go with I. I is connected to E and J.

Â And we haven't discovered J yet,

Â so this one it's assigned to the next layer.

Â Next we process H which is only connected to E but we already discovered E. And finally,

Â we process G which is connected to F which you've already discovered.

Â So, J is a distance five away. All right?

Â We have to process J, but J is only connected to

Â I which we already discovered and now we're done.

Â We've processed all the nodes.

Â There are no new nodes to discover.

Â And so here you can see how we efficiently

Â figure out the distance between A and all the other nodes in the network,

Â and this is something that you can

Â program using a computer to do this in an efficient way.

Â You can use network X to run the breadth-first search

Â algorithm by using the function bfs_tree.

Â And what it does is it gives you the tree that we've built.

Â And this graph that we built here is called a tree,

Â and is the tree of the nodes that you discovered.

Â And so with this function,

Â you're able to obtain this tree.

Â So if you look at the edges of

Â the graph t here and you get that these are the edges of that tree.

Â So there is A-K and A-B and B-C and so on.

Â Now if you're interested in

Â not necessarily the tree in the order in which these nodes were discovered,

Â but just simply the actual distances between A and all the other nodes,

Â then you can use shortest_path_length and you give it the graph,

Â which in this case is G, and the root node,

Â which in this case is A.

Â And you get a dictionary of all the distances from the node to all the other nodes.

Â So, A is a distance zero from itself,

Â a distance one from B,

Â a distance two from C and so on.

Â Okay, so we've defined distance between any two nodes in a network,

Â but if we go back to the original questions we had, in the beginning,

Â we were interested in, we're characterizing the distances between

Â all pairs of nodes in the graph.

Â Our nodes in general close to each other,

Â far away from each other.

Â If some are close and some are far,

Â then how can we figure out, which are close and which are far and so on?

Â And so, now, we're going to try to answer these questions.

Â So, the first thing you can do, is you can simply take

Â the average distance between every pair of nodes in the network.

Â In network X, you can do that by using the function average shortest path length.

Â In G, and in this case,

Â this graph has an average path length of 2.53.

Â So, that means, that on average any pair of nodes in

Â this graph are a distance 2.53 from each other.

Â So, that tells you what sort of,

Â on average what happens.

Â Now, what is the maximum possible distance between any two nodes,

Â that sort of like, what are the two nodes that are furthest away from each other?

Â How long is that? How far away from each other are they?

Â This is called the diameter and is

Â simply the maximum distance between any two pair of nodes.

Â And in network X, you can use the function diameter to get it,

Â and in this case the diameter of the graph is

Â five and you can see that by looking at the distance from K to J,

Â which has a length of five.

Â The other thing that is useful to define is the eccentricity of a node.

Â The eccentricity of a node is the largest distance

Â between the node and all the other nodes in the network.

Â So, you take a node, measure the distance from the node to all the other nodes,

Â and figure out which one of those instances is the largest one of all.

Â In network X, you can use the function eccentricity to get all those distances,

Â and here, you can see for example,

Â that A has an eccentricity of five, as we had seen.

Â It has a distance five to some node,

Â which in this case that's J.

Â So, that's pretty large.

Â It's actually as big as it could get, right?

Â Because the diameter, the largest possible distance between two nodes was five.

Â But, if you think of a node like for example,

Â node E here, which looks like it's closer to all the other nodes,

Â then you see that it has a eccentricity of three,

Â which means that no node in this graph is a distance

Â larger than three from node E.

Â And so, now, that you have this eccentricities,

Â the radius of the graph is the minimum eccentricity in the network.

Â It basically asks which node or not which node,

Â but what is the maximum distance that a node has from all the other nodes,

Â that's eccentricity and the radius takes the smallest one of those.

Â And in network X, you can use the function radius to get the radius of

Â the network and as you could see from the dictionary of eccentricities,

Â the radius of this network is three.

Â The next question we had was,

Â now that we know how to calculate distances and now that we have

Â a sense for what is a large distance in the network, like, the diameter,

Â what is the short distance, like the radius,

Â we can try to identify which nodes are sort of far away

Â from all the other nodes and which nodes are close to all the other nodes.

Â So, for the first one, we have the periphery,

Â which is the set of nodes in a graph that have

Â an eccentricity equals to the diameter of the network,

Â which is sort of the largest eccentricity that you could have,

Â since the diameter is the largest distance between any two nodes in the network.

Â So, if you apply this definition to this graph,

Â you find that the periphery of this graph are the nodes A, K, and J.

Â You can get these set of nodes using networks,

Â network X function periphery.

Â And as you would sort of imagine these nodes tend to be

Â sort of on the outskirts of the network far away from all the other nodes.

Â The opposite concept is the concept of nodes that are central.

Â So, the center of the graph is a set of nodes that have eccentricity equal to the radius.

Â And when you check which nodes are in the center in

Â this graph using the center function network X, you find that C, F,

Â and E are nodes that are in the center of the graph,

Â and this is sort of like,

Â it make sense there in the towards the center of the graph,

Â they tend to be close to most of the nodes.

Â Okay, so with these tools, now,

Â you're able to take a look at a network,

Â a real social network,

Â and start to ask questions about how far are these nodes from each other?

Â Which nodes are central?

Â Which nodes are not and so on?

Â So, let's run an example in this using the Karate Club Network,

Â which we had seen in a previous video.

Â So, this is a network of friendship in a Karate Club.

Â And as you may remember, the story here,

Â is that node one is the instructor of the Karate Club,

Â and this node 34 is an assistant,

Â and they have some type of dispute,

Â they're not friends with each other,

Â and so the club actually splits into two groups,

Â and sort of, this is the separation of the two groups.

Â So one, this set of students on the left go with one of

Â the instructors or with the assistant and the other ones go with the original instructor.

Â So, if we take this network and apply

Â the definitions about distances that we just covered,

Â we can discover how far nodes are from each other and who's central and who's not.

Â So, let's begin by loading up this network.

Â This network is so famous that actually on network X you can

Â simply load it by using the function karate club graph.

Â So, that one returns this particular graph.

Â Now, I'm converting the nodes labels to be integers,

Â so that they match the figure I have here.

Â So, that's what I'm doing with that command there,

Â and then I could ask different questions about the network.

Â So, in this case,

Â the average shortest path between the nodes is about 2.41.

Â The radius of the network is three,

Â and the diameter is five.

Â So, meaning there's a pair of nodes here,

Â there are distance far away

Â from each other and that's the largest that a distance can be.

Â And then, we're going to ask who's at the center of this network?

Â So, here the nodes are in the center,

Â and here, I'm highlighting them in red.

Â So, as you can see the instructor is in the center and

Â all the other nodes are in the center also connected to the instructor,

Â and they also tend to have high degrees,

Â so they are easily connected to many other high degree nodes,

Â and they just have small distances from them to all the other nodes in the network.

Â Now, when you look at the periphery,

Â these are the peripheral nodes,

Â and I'm highlighting them here and in blue and as you can see,

Â they're kind of on the outside,

Â they tend to have small number of connections,

Â and none of them are actually connected to the instructor.

Â Now, you might look at this and say,

Â okay, this make sense.

Â But, for example, this node 34,

Â was the assistant here,

Â he seems pretty central.

Â He's connected to a bunch of nodes,

Â it seems like he could be close to all the other nodes in the graph as well.

Â Why is 34 not showing up in the center?

Â Well, it turns out that if you look carefully,

Â node 34 has a distance four to node 17, right?

Â To get from 34 to 17, you have to go 34, 32, 16, and 17, and so,

Â it couldn't be in the center because the radius of the graph is

Â three and this one has a node that is the distance four away from it.

Â Now, it turns out that actually if this node 17 was just a bit closer,

Â for example, if this node 17 was a distance three away from 34,

Â then 34 would actually be in the center,

Â because 34 is a distance at most three to every other node in the network.

Â And so, this shows that this definition of center

Â is pretty sensitive to just one node that happens to be far away, right?

Â So, if you make a very small change to a graph that

Â makes a particular node far further away from the other,

Â you can sort of make it or break it for the node to be in the center or not.

Â And in the future, we'll look at other measures of

Â centrality for nodes that are less sensitive to small changes like that.

Â So, to summarize, we've looked at a set of measures.

Â The first one is the distance between two nodes and that was defined to be the length of

Â the shortest path between the two nodes and the eccentricity of a node,

Â which is defined to be the largest distance between

Â the node and all the other nodes in the network.

Â Both of those definitions take place at the either node level

Â or at the pair of nodes level.

Â We've also looked at definitions that sort of take place at the whole graph level,

Â they try to characterize the distances in the whole network

Â and they were the average distance between any two pair of nodes.

Â The diameter which is the largest distance between any two pair of

Â nodes and then the radius which was the smallest eccentricity in the graph.

Â And then, we use those definitions to identify the periphery and the center of the graph,

Â which is the set of nodes,

Â the periphery is the sets of nodes with eccentricity equal to

Â the diameter and the center is a set of nodes with eccentricity equal to the radius.

Â And that's all for this video and I hope to see you in the next one.

Â