The T here, that's just a transpose.

So that's just a transpose of a vector.

And then here we have the difference again.

So in matrix and vector world, if you have a vector

transpose times another vector like this,

what you're essentially doing is summing these squared differences.

So you could roughly think of this as squaring the differences and

then adding them up.

But there's one additional complication is this part here of this S inverse.

So S, that's just the covariance of X.

So remember X is a bunch of random variables.

So X is a random vector, and those will have some covariants.

So of course,

each of the Xs has a variance but they also might covariate together.

So S is just your standard kind of covariance matrix, and we're inverting it.

So that's what S inverse is.

So you could roughly think of that as just a scaling kind of thing,

where we're scaling by the unit of measure.

And the reason we want to do that is because we have different kinds of

variables here.

So one unit in difference in age will mean something a lot

different than a one unit difference in diabetes status,

where diabetes is a 01 variable.

So for example, if you compare somebody age 67 or

somebody age 68, their age differs by one year.

If you compare somebody who has diabetes versus who doesn't,

that variable differs by one.

But we don't necessarily think those are the same thing.

And differing by one unit of age probably, if it's years, it's probably

not very significant, whereas differing on diabetes status probably is.

This S inverse is going to scale things.

You could roughly think of this as if you had one variable,

what we would do is we would take the difference, let's say, it's age.

We would take the difference in age, square it, divide by the variance, and

then take the square root.

So what we're really doing then is just scaling.

We're scaling each variable, essentially sort of by its variance.

And that's basically,

then, putting all of these different variables on the same kind of scale.

So that hopefully, once we rescale, a difference in one on one

variable means something similar to a difference in one on another variable.

So that's the main idea here.

So that's what the formula is.

You could roughly think of it as a sum of squared

differences that are scaled, and then we take the square root.

But the details maybe aren't quite as important as just the big picture is