0:03

So we're going to complete our study of MST algorithms by considering some context

Â both as an unsolved problem in theoretical computer science and as a practical

Â problem. So the unsolved problem in theoretical

Â computer science that has defied researchers for many decades is, it is

Â possible to find a linear-time MST algorithm?

Â Is there an algorithm that only examines each edge at most once on the average?

Â Now, this doesn't have. That much practical comp.

Â Consequence since the versions of Primm's algorithm that we've talked about can get

Â the running time down quite low for the sparse graphs, the huge sparse graphs that

Â we encounter in practice. But it's still, its...

Â Like union find, it's a tantalizing problem.

Â Unlike union find, it hasn't been resolved yet.

Â Union find remember, we couldn't get a linear algorithm but at least Tarjan

Â proved that no such algorithm exists. For MST, we're not even there yet.

Â And a lot of people have worked on it. So let's go with the simple model where

Â what you get to do is compare edges. Compare weights on edges.

Â In 1975 Yao proved that there exists an algorithm that's worst case running time

Â is e log, log v. In 1976, Cheriton and Tarjan came up with

Â another such algorithm. And then Fredman and Tarjan found the e

Â plus e log v algorithm or e log star v For MST's in'84.

Â 2:40

That now. In 2002, Optimal, well, let's, talk about

Â that. They, they, they showed an, an algorithm

Â that it, better not to talk about the theory of that.

Â [laugh]. And that the, still, the open question is,

Â is there, an algorithm whose worst case running time is guaranteed to be

Â proportional to e? Or could, someone prove that no such

Â algorithm exists? It's one of the most, tantalizing open

Â questions in computer science. As we get into, graph algorithms in more

Â detail. We'll see some other examples of problems

Â for which we know pretty good algorithms but would like to know whether there are

Â better algorithms or not. And MST is a fine example of that.

Â That's the orange means Princeton. There is a randomized linear time

Â algorithm that was discovered in 1995, but that's not the same as solving it worst

Â case in linear time. So that's one context.

Â Mxt is an important problem that's still been studied by theoretical computer

Â scientist and we still don't know the best algorithm.

Â Here's another one, So-called Euclidean MST.

Â And this one is what's appropriate in some practical situations.

Â So now you have points in the plane and the graph is an implicit dense graph.

Â That is, we take as an edge in the graph, the distance between this point and every

Â other point. So if there's n points there's n squared

Â edges because every point's connected to every other point.

Â And what we want is, in that graph, we want to find the subset of edges that

Â connects all the points, that's minimal. That's actually, in a lot of practical

Â problems, that's what you want. So as it stands, the algorithms that we've

Â talked about are not useful for this beacause they're going to take quadratic

Â time, because e's quadratic. That's how many edges there are in the

Â graph. So you know, you could just go ahead and

Â build the graph with the N squared over two distances and run Prim's algorithm.

Â But that's not very satisfying for a huge number of points.

Â It's actually not Too difficult to exploit the geometry of

Â the situation. And get it done in time proportional to N

Â log N. What is typically done is to build a sub

Â graph, where each point is connected to a certain number of points that are close to

Â it. There's a particular graph called the

Â Voronoi diagram, or the Delaunay triangulation, that does that.

Â And it's been proved, number one that, that graph has linear number of edges not,

Â quadratic, and it's also the MST is a sub graph of the d-linear triangulation.

Â So you can get it done in linear arrhythmic time for Euclidean MST.

Â Separate problem related but still a very interesting in many practical

Â applications. Here's another application in se, several

Â scientific studies, there's the idea of clustering.

Â And so what we wanna do is, we have a set of objects and they're related by a

Â distance function that specifies how close they are and what we wanna do is divide

Â the objects into a given number k of coherent groups so that objects in

Â different clusters are far apart. So wanna see how things cluster together.

Â And here's a really old example of a application of this where there was an

Â outbook, outbreak of cholera deaths in. London in the 1850s and, if you plot where

Â all of the deaths happened, scientific study could find that they were clustered

Â around certain places. And, actually they were able to identify

Â well pumps that were leading to the, the cholera just by doing clustering study.

Â And, that is a very special case. There are many, many other applications

Â where clustering is an important process, an important thing to be able to compute.

Â So like mobile networks for web search there's an idea of the distance between

Â doc, documents and you wanna categorize them in clusters.

Â There's the, all the objects that have been discovered in the sky, you wanna

Â cluster them together in a reasonable way. And all kinds of. Of processing having to

Â do with huge data bases, trying to get information that seems close together to

Â be close together into a relatively small number of clusters.

Â So there's, a, a approach called single link clustering.

Â Where you talk about the single length, the distance between two clusters equaling

Â the distance between the two closest objects, one in each cluster.

Â And so, so-called single length clustering is given at integer K.

Â Find the K clustering that maximizes the distance between the two closest clusters.

Â So that's a well defined computational problem.

Â And there's a very well-known algorithm, in the science literature for this

Â problem, signal, signal-link clustering. Form of e-clusters, find the closest pair

Â of objects such that each object's in a different cluster, and merge the two

Â clusters. And repeat until they're exactly

Â k-clusters. You'll find this algorithm in the

Â scientific literature. What's amazing is that, this is

Â Crussical's algorithm, just stop when you found the k-connected components, so that,

Â the, Or another thing you could do is just run Prim's algorithm and then after you've

Â run Prim's algorithm get rid of the largest edges until you're left with

Â k-clusters. So out of all the efficient Algorithms

Â that we've talked about are gonna apply for single-link clustering.

Â And actually scientists who also know some computer science now are able to handle

Â huge problems that would not be possible without efficient algorithms.

Â This is just one, one example where a, a cancer study where experiments are

Â connecting genes with the way they're expressed in different parts of the body,

Â and trying to cluster together tumors in similar tissues.

Â And again, such experimental results can amount, result in huge amounts of data,

Â and MST algorithms are playing a role in scientific research of this type.

Â That's our context for minimal spanning trees.

Â