0:04

Welcome back everyone.

Our video today is called Distance in the Plane and I'm going to talk about three things.

The first thing I'm going to do is,

I'm either going to tell you for the first time or remind you about

the distance formula that computes

the distance between two points in the Cartesian plane.

We're both going to remind ourselves why the formula

is what it is and we'll work a few examples.

Then we'll immediately show you some stuff which is almost certainly not familiar to you,

but is new data science concepts,

which is the idea that as long as you could have

a notion of distance between points in the plane,

you can talk about nearest neighbors.

That turns out to be very, very important for machine learning.

We shall see in later data science courses.

And then we can also talk about clustering.

That's also very, very important for machine learning in later data science courses.

In fact, just to use words that you'll see later,

nearest neighbors is one of the main methods in

supervised learning and clustering is one of the main methods in unsupervised learning.

But that's really fancy. Let's start really simple.

Before we can draw the Cartesian plane,

let me remind you of something that you probably saw in high school,

maybe in middle school, depends where you took it,

which is the idea of the Pythagorean Theorem.

So suppose I draw a right triangle,

so that means a triangle where one of the angles is a right angle.

Well, it's close to a right angle that I can do without having a straight edge.

And my little [inaudible] here, so there's that.

Suppose this side length here is x,

this side length here is y,

and this side length here is z,

so this is the hypotenuse.

What the Pythagorean Theorem tells us,

Pythagorean Theorem, going back to Pythagoras,

fun to Wikipedia him and find out his amazing life,

tells us that Z squared,

the square of the linked hypotenuse is equal to x^2 + y^2.

And that's just one of the things that's true; you can look up proofs of it.

Which is the same thing as writing that z is the square root of x^2 + y^2.

That's really what makes the distance formula tick.

Let's see what I mean by the distance formula.

We're gonna start abstract,

and I want to give you some examples.

So this gives you two points in the plane.

There's the point A, equal to (a, b),

and here's the point C, is equal to (c,d),

and then we draw the line segment between them like that.

And we want to ask the question, how far apart are C and D?

What the distance formula says is that the distance, Dist,

from A to C is the square root of the difference in the x-values,

(c-a)^2, plus the square of the difference in the y-values, +(d-b)^2.

Why on earth would that be true?

Well, let's draw a right triangle.

Let's take this here and let's draw a dotted line,

let's draw a dotted line there.

And if you paid attention during the points in the plane lecture,

you can convince yourself that the coordinates of this point here, let's see,

it has the same y-coordinate as

A and it has the same x-coordinate as C. So the points there,

that is in fact (c,b).

Therefore, the length of this is c minus a,

and the length of this is d minus b.

So we have a right triangle with sides lengths c minus a and d minus b.

Therefore, the length of this hypotenuse,

once this hypotenuse is, of course,

is the distance between A and C,

is given by this formula down here.

So, if you didn't understand that,

that's OK. You can also just memorize this formula.

But often, people like to know why a formula is true so

they can derive it later in a pinch or really just understand it.

OK, let's work some examples.

I'm gonna give you a whole bunch of points of

the plane and we'll compute distance between a bunch of them.

So, let's start with the point A is (1,1),

and let's take the point B way up here, not really to scale.

It's (5,4).

And let's start by computing the distance between A and B.

Let's do that over here in our scratch paper.

The distance between A and B,

just by our distance formula,

is gonna be the square root of the difference in the x values squared,

so (5-1)^2 + (4-1)^2.

Now we have to do a little bit of arithmetic.

So that's the square root of 4^2 + 3^2,

which is equal to the square root of 16 + 9,

which is the square root of 25, which magically works out to be a whole number, it's 5.

Electronic hands up if you think I rigged that to make that a nice whole

number just to get us off to a gentle start.

So that means that the length of this line between A and B,

let's draw it in, the length of that line is five.

It's five units apart that way,

which is interesting, right?

Because it's not true that you need to go 5 units in

the x direction to get from A to B or five units in the y direction,

but you do need to go five units to get from A to B.

They're fairly far away. All right.

Let's also draw the origin.

This is the point big O, this is (0,0).

And let's compute the distance between A and the origin.

It's equal to the square root,

so the distance between the x-values,

so (1-0)^2 + (1-0)^2.

Stop for a second by the way and point out that (1-0)^2 is the same thing as (0-1)^2.

That is, it doesn't matter whether you do the x-value of

A minus the x-value of O or the x-value of O minus the x-value of A,

which makes sense because the distance from A to zero should be the

same as distance from O to A, should be symmetric.

So if we work this out,

this is just the square root of two.

In other words, for the fans of the Pythagorean Theorem,

that length there is square root of two, there's a right triangle.

OK, let's do one more point.

Let's look at B equals (1,3/2),

so the distance of that line.

Now, here you don't really need a fancy formula,

you notice the only difference between them is an x-values.

It's pretty clear the distance between A and B is just 3/2-1, just a half.

So I'll let you work that out with the distance formula,

but you can just write down the distance between,

I'm sorry that should not have been B,

so let's call that D. That's the sort of thing you

edit out but we're gonna keep that here just to give it a sense of reality.

The distance between A and D is one half.

Okay. That's cool.

So, by the way,

the square root of two is approximately 1.4.

All right, so we have these three distances here.

Here's the key concept.

Let's consider the set S,

which is equal to the origin,

B and D. Notice I just computed the distances from A to these three points O,

B and D. The distance from A to O is 1.4,

approximately; distance from A to D is one half,

which is equal to 0.5;

and the distance from A to B is five.

So here's the following statement,

the meaning should be pretty clear.

The nearest neighbor of A

in the set S is D because it's the nearest point.

The second nearest neighbor,

second NN of A in S is the O, the origin.

And the third nearest neighbor and the farthest-away point is B.

That's something we often use in data science.

You have these three points, O, D and B,

and you want to say, if A had to be most like one of them,

which one would it be?

In this case, we see what it is.

Okay. One last little use of

distance formulas that we use in data science is the idea of clustering.

You'll see later and later in many,

many courses why this is important.

Let's suppose we have configuration of points in the plane.

So here is lots and lots of points here that look like this,

and let's say here's another bunch of points that look like

that and say another clump over here.

Visually, if we look at these points, there might be many,

many many of these points, visually, intuitively,

we say there are three clusters, three clumps.

We didn't define what a clump is or a cluster,

but somehow it looks like I've got three of them.

Right? Over here there's cluster one,

cluster two, cluster three.

So if these were sort of three different people

measured by some blood measurement or something like that,

we would say there's three stereotypical groups,

group one, group two and group three.

Distance is a good way of expressing membership in a cluster.

Essentially what you might say is if A and B are in cluster

one and C is in cluster two and D is in cluster three,

so we'll put those points down,

say that's A, that's B,

there's the point C, there the point D,

then the distance between A and B,

whatever it is, is much, much less,

remember that symbol, which we've seen before,

doesn't really have a formal meaning but we know what it means in the picture, much,

much less than the distance between A and C,

and that's also much,

much less than the distance between A and D.

So having this distance formula, this distance metric,

often allows you to break points up into stereotypical clusters or clumps, and somehow,

whatever these are measuring, A and B are much,

much more similar than A is to C and A is to D. Okay. That concludes our video.