In this sequence of videos, we'll discuss our last but not least data structure

namely the Balanced Binary Search Tree. Like our discussion of other data

structures we'll begin with the what. That is we'll take the client's perspective and

we'll ask what operations are supported by this data structure, what can you actually

use it for? Then we'll move on to the how and the why. We'll peer under the hood of

the data structure and look at how it's actually implemented and then

understanding the implementation to understand why the operations have the

running times that they do. So what is a Balanced Binary Search Tree good for?

Well, I recommend thinking about it as a dynamic version of a sorted array. That

is, if you have data store in a Balanced Binary Search Tree, you can do pretty much

anything on the data that you could if it was just the static sorted array. But in

addition, the data structure can accommodate insertions and deletions. You

can accommodate a dynamic set of data that you're storing overtime. So to motivate

the operations that a Balanced Binary Search Tree supports, let's just start

with the sorted array and look at some of the things you can easily do with data

that happens to be stored in such a way. So let's think about an array that has

numerical data although, generally as we've said, in data structures is usually

associated other data that's what you actually care about and the numbers are

just some unique identifier for each of the records. So these might be an employee

ID number, social security numbers, packet ID numbers and network contacts, etcetera.

So what are some things that are easy to do given that your data is stored as a

sorted array, most a bunch of things? First of all, you can search and recall

that searching in a sorted array is generally done using binary search so this

is how we used to look up phone numbers when we have physical phone books. You'd

start in the middle of the phone book, if the name you were looking for was less

than the midpoint, you recurse on the left hand side, otherwise you'd recurse

on the right hand side. As we discussed back in the Master Method Lectures long ago,

this is going to run in logarithmic time. Roughly speaking, every time you recurse,

you've thrown out half of the array so you're guaranteed to terminate within a

logarithmic number of iterations so binary search is logarithmic search time.

Something else we discussed in previous lectures is the selection problem. So

previously, we discussed this in much harder context of unsorted arrays.

Remember, the selection problem in addition to array you're given in order

statistic. So, if your order statistic that your target is seventeen, that means

you're looking for the seventeenth smallest number that's stored in the

array. So in previous lectures, we worked very hard to get a linear time algorithm

for this problem in unsorted arrays. Now, in a sorted array, you want to know the

seventeenth smallest element in the array. Pretty easy problem, just return whatever

element happens to be in the seventeenth position of the array since the array is sorted,

that's where it is so no problem. It's already sorted constant time, you can

solve the selection problem. Of course, two special cases of the selection problem

are finding the minimum element of the array. That's just if the order statistic

problem with i = 1and the maximum element, that's just i = n. So this just

corresponds to returning the element that's in the first position and the last

position of the array respectively. Well let's do some more brainstorming. What

other operations could we implement on a sorted array? Well here's a couple more.

So there are operations called the Predecessor and Successor operations. And

so the way these work is, you start with one element. So, say you start with a

pointer to the 23, and you want to know where in this array is the next smallest

element. That's the predecessor query and the successor operation returns the next

largest element in the array. So the predecessor of the 23 is the seventeen,

the successor of the 23 would be the 30. And again in a sorted array, these are

trivial, right? You just know that predecessors just one position back in the

array, the successor is one position forward. So given a pointer to the 23, you

can return to 17 or the 30 in constant time. What else? Well, how about

the rank operation? So we haven't discussed this operation in the past. So

what rank is, this has for how many key stored in the data structure are less than

or equal to a given key. So for example, the rank of 23 would be equal to 6.

Because 6 of the 8 elements in the array are less than or equal to 23. And if you think about it,

implementing the rank operation is really no harder than implementing search. All

you do is search for the given key and wherever it is search terminates in the

array. You just look at the position in the array and boom, that's the rank of

that element. So for example, if you do a binary search for 23 and then when you

terminates, you discover it is, they're in position number six then you know the rank

is six. If you do an unsuccessful search, say you search for 21, well then you get

stuck in between the 17 and the 23, and at that point you can conclude that

the rank of 21 in this array is five. Let me just wrap up the list with the final

operation which is trivial to implement in the sorted array. Namely, you can output

or print say the stored keys in sorted order let's say from smallest to largest.

And naturally, all you do here is a single scan from left to right through the array,

outputting whatever element you see next. The time required is constant per element

or linear overall. So that's a quite impressive list of supported operations.

Could you really be so greedy as to want still more from our data structure? Well

yeah, certainly. We definitely want more than just what we have on the slide. The

reason being, these are operations that operate on a static data set which is not

changing overtime. But the world in general is dynamic. For example, if you

are running a company and keeping track of the employees, sometimes you get new

employees, sometimes employees leave. That is one of the data structure that

not only supports these kinds of operations but also, insertions and

deletions. Now of course it's not that it's impossible to implement insert or

delete in a sorted array, it's just that they're going to run way too slow. In

general, you have to copy over a linear amount of stuff on an insertion or

deletion if you want to maintain the sorted array property. So this linear time

performance when insertion and deletion is unacceptable unless you barely ever do

those operations. So, the raison d'etre of the Balanced Binary Search Tree

is to implement this exact same set of operations just as rich as that's

supported by a sorted array but in addition, insertions and deletions. Now, a

few of these operations won't be quite as fast or we have to give up a little bit

instead of constant time, the one in logarithmic time and we still got

logarithmic time for all of these operations, linear time for outputting the

elements in sort of order plus, we'll be able to insert and delete in logarithmic

time so let me just spell that out in a little more detail. So, a Balanced Binary

Search Tree will act like a sorted array plus, it will have fast, meaning

logarithmic time inserts and deletes. So let's go ahead and spell out all of those

operations. So search is going to run in O(log n) time, just like before. Select runs

in constant time in a sorted array and here it's going to take logarithmic, so

we'll give up a little bit on the selection problem but we'll still be able

to do it quite quickly. Even on the special cases of finding the minimum or

finding the maximum in our, in our data structure, we're going to need logarithmic

time in general. Same thing for finding predecessors and successors they're not,

they're no longer constant time, they go with logarithmic. Rank took as logarithmic

time and the, even the sorted array version and that will remain logarithmic

here. As we'll see, we lose essentially nothing over the sorted array, if we want

to output the key values in sorted order say from smallest to largest. And

crucially, we have two more fast operations compared to the sorted array of

data structure. We can insert stuff so if you hire a new employee, you can insert

them into your data structure. If an employee decides to leave, you can remove

them from the data structure. You do not have to spend linear time like you did for

sort of array, you only have to spend the logarithmic time whereas always n is the

number of keys being stored in the data structure. So the key takeaway here is

that, if you have data and it has keys which come from a totally ordered set

like, say numeric keys, then a Balanced Binary Search Tree supports a very rich

collection of operations. So if you anticipate doing a lot of different

processing using the ordering information of all of these keys, then you really

might want to consider a Balanced Binary Search Tree to maintain them. Well then,

keep in mind though is that we have seen a couple of other data structures which

don't do quite as much as balanced binary search trees but what they do, they do

better. We already, we just discussed in the last slide of the sorted array. So, if

you have a static data set, you don't need inserts and deletes. Well then by all

means, don't bother with Balanced Binary Search Tree that use a sorted array

because it will do everything super fast. But, we also sought through dynamic data

structures which don't do as much but do it, but what they do, they do very well.

So, we saw a heap, so what the heap is good for is it's just as dynamic as a

search tree. It allows insertions and deletions both in logarithmic time. And in

addition, it keeps track of the minimum element or the maximum element. Remember

in a heap, you can choose whether you want to keep track of the minimum or keep track

of the maximum but unlike in a search tree, a heap does not simultaneously keep

track of the minimum and the maximum. So if you just need those three operations,

insertions, deletions and remembering the smallest, and this would be the case for

example in a priority queue or scheduling application as discussed in the heap

videos. Then, a Binary Search Tree is over kill. You might want to consider a heap

instead. In fact, the benefits of a heap don't show up in the big O notation here

both have logarithmic operation time but the constant factors both in space and

time are going to be faster with a heap then with a Balanced Binary Search Tree.

The other dynamic data structure that we discussed is a hash table. And what hash

tables are really, really good at is handling insertions and searches, that is

look ups. Some, sometimes, depending on the implementation also handle deletions

really well also. So, if you don't actually need to remember things like

minima, maxima or remember ordering information on the keys, you just have to

remember what's there and what's not. Then the data structure of choice is definitely

the hash table, not the balance binary search tree. Again, the Balance Binary Search

Tree would be fine and we'd give you logarithmic look up time but it's kind of

over kill for the problem. All you need is fast look ups. A hash table recall will

give you constant time look ups. So that will be a noticeable win over the Balanced

Binary Search Tree. But if you want a very rich set of operations for processing your

data. Then, the Balanced Binary Search Tree could be the optimal data structure

for your needs.