0:00

So in this lecture we're going to drill down into our first specific, search

strategy for graphs and also explore some applications. Namely, breadth first

search. So let me remind you the intuition and applications of breath first search.

The plan is to systematically explore the nodes of this graph beginning

with the given starting vertex in layers. So let's think about the following example

graph. Where S is the starting point for our breadth first search. So to start

vertex S will constitute the first layer. So we'll call that L zero. And then

the neighbors of S are going to be the first layer. And so those are the vertices

that we explore just after S. So those are L one. Now the second layer is going

to be the vertices that are neighboring vertices of L one but are not

themselves in L one or for that matter L zero. So that's going to be

C and D. That's going to be the second layer. Now you'll notice for example S is

itself a neighbor of these nodes in layer one, but we've already counted that in a

previous layer so we don't count S toward L two. And then finally the

neighbors of L two, which are not already put in some layer is E. That will

be layer three. Again notice C and D are neighbors of each other but, they've

already been classified in layer two. So, that's where they belong, not in layer

three. So that's the high level picture of breadth first search you should have. We'll

talk about how to actually precisely implement it on the next slide.

Again just a couple other things that you can do with breadth first search which we'll

explore in this video is computing shortest paths. So it turns out shortest path

distances correspond precisely to these layers. So, for example if you had that as

S, you had that as the Kevin Bacon node in the movie graph, then Jon Hamm would pop

up in the second layer from the breadth first search from Kevin Bacon. I'm also

going to show you how to compute the connected components of an undirected

graph. That is to compute its pieces. We'll do that in linear time. And for this

entire sequence of videos on graph primitives, we will be satisfied with

nothing less than the holy grail of linear time. And again, remember in a graph you

have two different size parameters, the number of edges M and the number of nodes

N. For these videos I'm not going to assume any relationship between M and N.

Either one could be bigger. So linear time's gonna mean O of M plus N. So let's

talk about how you'd actually implement breadth first search in linear time. So

the sub routine is given as input both the graph G. I'm gonna explain this as if

it's undirected, but this entire procedure will work in exactly the same way for a

directed graph. Again, obviously in an undirected graph you can traverse an edge

in either direction. For a directed graph, you have to be careful only to traverse

arcs in the intended direction from the tail to the head, that is traverse them

forward. So as we discussed when we talked about just generic strategies for graph

search, we don't want to explore anything twice, that would certainly be

inefficient. So we're going to keep a boolean at each node, marking whether

we've already explored it or not. And by default, I'm just, we're just going to

assume that nodes are unexplored. They're only explored if we explicitly mark them

as such. So we're going to initialize the search with the starting vertex S. So we

mark S as explored and then we're gonna put that in what I was previously calling

conquered territory the nodes we have already started to explore. So to get

linear time we are gonna have to manage those in a slightly non naive but, but

pretty straightforward way namely via a queue, which is a first in first out data

structure that I assume you have seen. If you have never seen a queue before, please

look it up in a programming textbook or on the web. Basically a queue is just

something where you can add stuff to the back in constant time and you can take

stuff from the front in constant time. You can implement these, for example, using a

doubly linked list. Now recall that in the general systematic approach to

graph search, the trick was to, in each iteration of some while loop, to add one

new vertex to the conquered territory. To identify one unexplored node that is now

going to be explored. So that while loop's gonna translate into one in which we just check

if the queue is non-empty. So we're assuming that the queue data structure

supports that query in constant time which is easy to implement. And if the queue is

not empty we remove a node from it. And because it's a queue, removing nodes from

the front is what you can do in constant time. So call the node that you get out of

the queue V. So, now we're going to look at V's neighbors, vertices with which it

shares edges, and we're gonna see if any of them have not already been explored.

So, if W's something we haven't seen before, that's unexplored, that means it's

in the unconquered territory, which is great. So, we have a new victim. We can

mark W as explored. We can put it in our queue and we've advanced the frontier and

now we have one more explored node than we did previously. And again, a queue by

construction, it supports adding constant time additions at the end of the queue, so

it's where we put W. So, let's see how this code actually executes in this same graph

that we were looking at in the previous slide. And what I'm gonna do is I'm gonna

number the nodes in the order in which they are explored. So, obviously

the first node to get explored is S. That's where the queue starts. So now, when we

follow the code, what happens? Well in the first iteration of the while loop we ask

is the queue empty? No it's not, because S is in it. So we remove in this case the only

node of the queue. It's S. And then we iterate over the edges incident to S. Now there are two

of them. There's the edge between S and A and there's the edge between S and B. And

again this is still a little under specified. In the sense that the algorithm

doesn't tell us which of those two edges we should look at. Turns out it doesn't

matter. Each of those is a valid execution of breadth first search. But

for concreteness, let's suppose that of the two possible edges, we look at the

edge S comma A. So, then we ask, has A already been explored? No, it hasn't. This

is the first time we've seen it, so we say, oh goody. This is sort of new grist

for the mill. So, we can add A to the queue at the end and we mark W as, sorry

mark A as explored. So, A is gonna be the second vertex that we mark. So, after

marking A as explored and adding it to the queue, so now we go back to the for loop,

and so now we move on to the second edge. It's into S, that's the edge between S and

B. So, we ask, have we already explored B? Nope, this is the first time we've seen

it. So, now we have the same thing with B. So, B gets marked as explored and gets

added to the queue at the end. So the queue at this juncture has first a record

for A, cause that was the first one we put in it after we took S out. And then B

follows A in the queue. Again, depending on the execution this could go either way. But

for concreteness, I've done it so that A got added before B. So at this point, this

is what the queue looks like. So now we go back up to the while loop, we say is

the queue empty? Certainly not. There's actually two elements. Now we remove the

first node from queue, in this case, that's the node A that was the one we put in

before the node B. And so now we say, well, let's look at all the edges incident

to A. And in this case A has two two incident edges. It has one that it

shares with S and it has one that it shares with C. And so, if we look at the

edge between A and S, then we'd be asking an if statement. Has S already been

explored? Yes it has, that's where we started. So, there's no reason to do

anything with S. That's already been taken out of the queue. So, in this for

loop for A, there's two iterations. One involves the edge with S, and that one we

completely ignore. But then there's the other edge that A shares with C, and C we

haven't seen yet. So, at that part of the for loop, we say ahah. C is a new thing,

new node we can mark as explored and put in the queue. So, that's gonna be our

number four. So now how has the queue changed. Well, we got rid of A. And

so now B is in the front and we added C at the end. And so now the same thing

happens. We go back to the while loop, the queue is not empty, we take off the first

vertex, in this case that's gonna be B. B has three incident edges, it has one

incident S but that's irrelevant, we've already seen S. It has one incident to C,

that's also irrelevant, that's also irrelevant, because we've already seen C.

True, we just saw it very recently, but we've already seen it. But the edge

between B and D is new, and so that means we can take the node D, mark it as explored

and add it to the queue. So D is going to be the fifth one that we see. And now the

queue has the element C followed by D. So now we go back to the while loop and we take C off

of the queue. It again has four now edges. The one with A is irrelevant, we've already

seen A. The one with B is irrelevant, we've already seen B. The one with D is

irrelevant, we've already seen D. But we haven't seen E yet. So, when we get to the

part of the for loop, or the edge between C and E, we say, aha, E is new. So E will

be the sixth and final vertex to be marked as explored. And that will get added at

the end of the queue. So then in the final two iterations of the while loop

the D is going to be removed, we'll iterate through its three edges, none of

those will be relevant because we've seen the three endpoints. And then we'll go

back to the while loop and we'll get rid of the E. E is irrelevant cause it has two

edges we've already seen the other endpoints. Now we go back to the while loop.

The queue is empty. And we stop. That is breadth-first search. And to see how this

simulates the notion of the layers that we were discussing in the previous slide

notice that the nodes are numbered according to the layer that they're in, so

S was layer zero. And then the two nodes that S caused to get added to the queue, the A

and the B, are number two and three, and the edges of layer three are precisely the

ones, sorry the edges of layer two are precisely the ones that got added to the

queue, while we were processing the nodes from layer one. That is, C and D are

precisely the nodes that got added to the queue while we were processing A and B.

So, this is level zero, level one, and level two. E is the only node that got

added to the queue while we were processing level, layer two. The vertices

C and D. So E will be the third layer. So, in that sense, by using a first in first

out data structure, this queue, we do wind up kinda processing the nodes according to

the layers that we discussed earlier. So, the claim that breadth first search is a

good way to explore a graph, in the sense that it meets the two high level goals

that we delineated in the previous video. First of all it finds everything findable,

and obviously nothing else, and second of all, it does it without redundancy. It

does it without exploring anything twice, which is the key to its linear time

implementation. So a little bit more formally, claim number one. At the end of

the algorithm, the vertices that we've explored are precisely the ones such that

there was a path from S to that vertex. Again this claim is equally valid, whether

you're running BFS in an undirected graph or a directed graph. Of course in an

undirected graph, meaning an undirected path from S to V, whereas a directed graph

in a directed path from S to V. That means a path where every arc in the path gets

traversed in the forward direction. So, why is this true? Well, this is true, we

basically proved this more generally for any graph search strategy of a certain

form of which breadth first search is one. If it's hard for you to see the right way

to interpret breadth-first search as a special case of our generic search

algorithm, you can also just look at our proof for the generic search algorithm and

copy it down for breadth-first search. So it's clear that you're only gonna,

again, the forward direction of this claim is clear. If you actually

find something, if something's marked as explored, it's only because you found a

sequence of edges that led you there. So the only way you mark something as

explored is if there's a path from S to V. Conversely, to prove that anything with an

S to V, for with a path from V will be found, you can proceed by contradiction:

you can look at the part of the path from S to V that, that BFS does successfully

explore, and then you gotta ask, why didn't it go one more hop? It never

would've terminated before reaching all the way to V. So, you can also just copy

that same proof that we had for the generic search strategy in the previous

video. Okay? So, again, the upshot. Breadth first search finds everything

you'd wanna find. Okay? So, it only traverses paths, so you're not gonna find

anything where there isn't a path to it. But it never misses out. Okay? Anything

where there's a path, BFS, guaranteed to find it. No problem. Claim number two is

that the running time is exactly what we want and I am gonna state it in a form

that will be useful later when we talk about connected components. So the running

time of the main while loop, ignoring any kind of pre processing or initialization is

proportional to what I am gonna call NS and MS which is the number of nodes that

can be reached from S and number of edges that can be reached from S. And the reason

for this claim it just becomes clear if you inspect the code which we'll do in a

second. So let's return to the code and just tally up all the work that gets done.

So I'm gonna ignore this initialization. I'm just gonna focus on the main while

loop. So we can summarize the total work done in this while loop as follows. First

we just think about the vertices so in this search we're only gonna deal, ever

deal with the vertices that are findable from S, so that's NS. And what do we do

for the given node, well we insert it into the queue and we delete it from the

queue. Alright? So we're never gonna deal with a single node more than once. So

that's constant time overhead per vertex that we ever see, so that's the proportion

of the NS part. Now, a given edge, we might look at it twice. So, for an edge V

W, we might consider it once when we first look at the vertex V, and we might

consider it again when we look at the vertex W. Each time we look at an edge we

do constant work. So that means we're only gonna do constant work per edge. Okay. So

we look at each vertex at most once. We look at each edge findable from S at most

twice. We do constant time, constant work when we look at something. So the overall

running time is going to be proportional to the number of vertices findable from S

plus the number of edges findable from S. So, that's really cool. We have a linear

time of implementation of a really nice graph search strategy. Moreover we just need

very basic data structures, a queue, to make it run fast with small constants. But

it gets even better. We can use breadth first search as a work horse for some

interesting applications. So, that's what we'll talk about in the rest of this video.