0:01

Now, we'll take a look at how the sorting algorithms that we talked about or

expressed in the systems that we use everyday. Now, the key point is that

sorting algorithms rhythms are essential in a very broad variety of applications

and, and all of us use sorting algorithms pretty much every day. Many obvious out

applications like or, organizing your music library or displaying your search

results or listening feeds in your in your web browsers. There's some other

applications that are not so obvious where we use sorting as a to make a problem easy

once you know that they're sorted. And so, for example, finding the median and if

it's already sorted, it's much easy to find the median. And now, the statistical

problems are like that or finding duplicates. Probably finding duplicates by

itself is not quite obvious what to do but the easy way to solve it is to just go

ahead and sort. And then there are plenty of applications that we'll see later in

this course like data compression or computer graphics like finding the convex

hull, applications in science such as computational biology or, or in systems

development. We're having a efficient sort as absolutely crucial. So, because there's

all these applications most programming systems have a fast sort as an important

part of their infrastructure and Java is no exemption. So, Java has a method called

arrays.sort and it's intended to be a general purpose sorting method for use by

Java programmers. And now, that you have some understanding of the classic methods

you can have a better idea of why arrays.sort is the way that it is. So it

has the infrastructure that allows us to be used for all types of data types and

all types of ordering so it's got a method that implements comparable then its got

methods easy compare order. So that you can use the natural order or you can

provide a compare order and provide your own order for any type of data. It uses

actually both quicksort and mergesort. It uses two quick sort for primitive types of

data and a two mergesort for objects. Why two different well it's just the

designer's assessment of the idea that if a programmer is using object maybe spaces,

not a, a critically important consideration. And so, the extra space

used by merge sort maybe is not a problem. And if the program is using primitive

types, maybe performance is the most important thing. And so, then we'll use

the quicksort. To invoke arrays that sort, you have to import the name space from

java.util.Arrays and then all you need to do is invoke Arrays.sort. So I just

answered this question, why do we use different algorithms for the two types?

And this is, is maybe arguable. Now are referred to this idea of a good system

sort, there was a good system sort that a lot of people used for many years. And in

1991, there were some scientists that, that Bell Labs that were using qsort for a

scientific problem and they were used to taking just a few minutes and then they

realized that it was taking hours of CPU time. And the fact was that all the qsort

implementations at that time in Unix had this flaw well, there are two flaws and

one of them is a little complicated about the way they are raised order and the

other one was for a raise that had lots of equal keys and this is Wilks and Becker

problem and have lot of equal keys, it was quadratic time. So, the system designer,

Jon Bentley was one of the designers to take a look at these problems and that

lead ultimately to the development of the 3-way quick sort that were used today. He

worked with Doug McIlroy and they wrote a, a, a paper that outline this problem and

talk about some of these things and they had a three-way partitioning method that

was somewhat like the Dijkstra method that we showed but a bit more complicated.

Anoth er thing they did was rather than shuffling the array. They [cough] used

what's called a method for choosing partitioning element called Tukey's

ninther. Tukey is a statistician and he had this particular method for order

statistics that has some interesting properties and use that for the

partitioning element. This paper was very influential and, and that basic method is

widely used. And Tukey's ninther is just pick nine items out of the array and take

the median of the mediums and that's the ninther. So very inexpensive and they had

macros to do this so and use not too much cost to find a partitioning element that's

much closer to the middle than, and if you use a, a random one. So the, the reason

they used that is they thought they got them closer to the middle and they also

don't like the, some system designers don't like the idea of using random

choices in a system method because of way that it changes the state of the system.

So they felt that they got better partitioning than a random shuffling and

it was also less costly and then generating random numbers including this

change of state problem. But there's a problem so you would think that the system

sort would be completely solid with all this resource with all these research and

all of the development that's going into it. In fact there's a file out there in

your book site and get it that will actually break the Java system sort. There

was a killer input that will make the thing run in quadratic time. Actually it

usually crashes because it's recursive and it crashes the system stack. And it can

cause all sorts of problems. There's a killer input for the system sort and, and

it can be disastrous in various systems and the reason is, they didn't do the

random shuffling. Mcilroy, himself, actually found this problem that you could

while the sort is running figuring out an inp ut that would make it crash. And so,

you can just run that program and if the sort doesn't use randomness then it's

vulnerable to this attack. So which algorithm should we use to sort that's,

that's really a key question. We've looked at lot of sorting algorithms and actually,

there's hundreds of sorting algorithms out there and we have chosen the most

important and the most interesting for you but you could literally spend a year

reading all the papers on sorting and then you still continue to be invented new

algorithms are developed and that are found to have good characteristics all the

time. And really, the key idea is really important to think about cuz it applies to

all sorts of algorithmic problems. On our way, we've been talking about rules of the

game. What is it that we care about in a sort? It's a little bit more complicated

than just put stuff in order. There's a lot of attributes, the different

applications have. Like stability, that's a fairly sophisticated attribute that you

really have to think about, you maybe not be aware of. Maybe your computer is

parallel and the sort has to be parallel and we found that equal keys make a huge

difference. And I didn't really think about that at the beginning but it can

make a huge difference. Maybe the way your computer's memory is organized make a

difference. Maybe the keys are small and the items are large or maybe the keys are

really huge. Do we need guaranteed performance? Are we happy with random

performance? Do we know, is the array randomly ordered? You can think of a

matrix shown in the right here where we list out all the possible attributes and

then there's algorithms that worked well for different combinations of attributes.

But the thing is, there is way more possible combinations of attributes than

there are algorithms. So there is going to be situations that are going to require an

understanding of what it takes to engineer a, a sort method that's appropriate for

your application. There can't be a system sort out there that's going to cover all

possible combinations of attributes. Now, usually it's going to be good enough but

it's definitely worth while to understand what's going on with different sorting

algorithms in order to even find improved performance over the system sort. So,

here's the summary of some of the things that we've talked about. We've talked

about six different sorting algorithms. Then, we've talked about a bunch of

attributes. For example, the top line, selection sort is in place it always takes

about N^2 / two comparisons. And one of the things to remark about it is that it

only uses N exchanges and so forth. Insertion sort best case linear,

quadratic, and place unstable. And it'll work well if the file is small or

partially ordered. Shellsort, we don't know it's a running time, it's not stable

but it's a fast method for intermediate size files and not much code. Mergesort

and log N guarantee unstable but not in place, need that auxiliary array.

Quicksort not stable. Quadratic time worst case but that's unlikely to occur if you

do the random shuffling. And its the fastest and most useful in practice

particularly if you make improvements to deal with duplicate keys. And it's

interesting to note we've looked at important and classic algorithms that are

widely deployed but we don't have a, a useful, practical algorithms that are

widely used that's got all of these characteristics that's in place and stable

worst case N log N. There's versions of merge sort that come close but they are

too complex for practitioners to have adopted them. Those are some of the

situations that we encounter when developing a system sort. And, quicksort

certainly plays a role in most system sorts.