>> We're now going to review multivariate distributions.

We'll talk about multivariate CDFs, multivariate PDFs, conditional

distributions, and so on. Much of this material is a little tedious,

a little dull, but we thought it was worthwhile collecting it all and having it

in one place for your review if it crops up later in the course.

Okay, so let's get started. Let x be a vector of random variables, x 1

up to x n. We say the joint CDF of x is given to us

by the following. So, the joint CDF, Fx of little x is equal

to the probability that X1 is less than or equal to little x1.

X2 is less than or equal to little x2. Up to xn being less than or equal to

little x n. And from this joint CDF, we can actually

calculate the marginal CDF, so for example, the marginal CDF of Xi is given

to us by just plugging infinity into all of the components in the joint CDF except

for the ith component which is little xi. Okay, so we can go from the joint CDF to

the marginal CDF. It is also straightforward to generalize

the previous definition to joint marginal distributions.

So for example, if I want the joint CDF of just xi and xj, I can also recover that

from the joint CDF of 1 up to xn by placing infinity in all of the arguments,

except for the ith argument where I have xi and the jth, where I have xj.

We also say that x is a joint PDF or probability density function, f subscript

x, if we can write the joint CDF as an integral like this.

So, this is just the way we, we, we capture our joint CDF by integrating out

the density function by appropriate limits.

Okay. We can also talk about conditional CDFs.

So what we're going to do is we're going to partition our vector x1 up to xn into

two components. The first component is x1, which contains

x1 up to xk. And the second component is this boldface

x2 which contains xk plus 1 up to xn. And then we can talk about the conditional

CDF of x2 given x1, and in fact, it's defined as following, as follows.

So the conditional CDF of x2 given x1 is equal to the probability that the random

vector x2 is less than or equal to little x2 conditional on x1 being equal to little

x1. If x is a PDF, f of x, then the

conditional PDF of xx2 is given to us by this quantity here.

So it's the joint PDF divided by the marginal PDF of x1, which we can also

write like this. Okay.

And the conditional CDF, f of x2 given x1, can be determined by integrating the

conditional PDF. So this is our conditional PDF, and we can

integrate this out with respect to uk1 up to un and that will give us our

conditional CDF. Okay, independence.

We say the collection x is independent if the joint CDF can be factored into the

product of marginal CDFs. So in particular the joint CDF here in the

left hand side is equal to the product of the marginal PDFs over here on the right

hand side. Similarly, actually, this implies that if

x is a PDF fx, then we can also factorize the joint PDF into the product of the

marginal PDFs over here. We can also see from one, and one is here

on the previous slide. So we can use this, okay, to see that if

x1 and x2 are independent, then the conditional PDF of x2 given x1, well by 1,

that's equal to this ratio here. So the joint PDF of x divided by the

margin PDF of x 1 and by independence here, we can replace the joint PDF by the

product. Then these two cancel and we're left with

the marginal PDF of x2. So what we're saying here is that if x1

and x2 are independent then the conditional PDF of x2 given x1 is simply

the marginal PDF, f of x2. In other words, having information about

x1 tells you nothing about x2 when x1 and x2 are independent.

Okay. Some implications of independence.

Well,and I expect we're all familiar with this, but let's, let's go through it

anyway. Let X and Y be independent random

variables. Then for any events A and B, the

probability that X is in A and Y is in B, well, that factorize into the product of

the probability of X being an A times the probability of Y being in B.

More generally, for any functions, f and g, independence of X and Y implies the

expected value of f of X times g of Y is equal to the expected value of f of X

times the expected value of g of Y. And in fact, 2 follows from 3, okay?

So the implication goes that way and it's easy to see this, because we can write

this probability of X being in A and Y being in B as the expected value of the

indicator function of X and A times the indicator function of Y and B.

Just to remind ourselves what is this indicator function, while it takes on two

possible values, it takes on the value 1 if X is in A and it takes on the value 0

otherwise. So therefore, the product of these two

indicator functions is 1 or 0 and will only be 1 if X is in A and Y is in B.

Okay? That occurs with probability X and A, and

Y and B. So this statement here is correct.

Okay, so we've got this first line. And now we can use the independents X and

Y in condition three to break down this expectation down into the product of these

two seperate expectations. Okay.

But of course, this expectation is the probability that X is in A and this

expectation is the probability that Y is in B.

So indeed we do see that we can go from three to two.

Okay. More generally, if X1 up to Xn are

independent random variables, then we can write the expected value of f1 of X1, f2

of X2, and so on up to fn of Xn. That factorizes into a product of n

separate expectations. The expected value of f1 of X1 times the

expected value of f2 of X2 and so on. Random variables can also be conditionally

independent. For example, we say that X and Y are

conditionally independent given Z, if the expected value of f of X times g of Y

given Z is equal to the expected value of f of X given Z times the expected value of

g of Y given Z and I should mention this is for all functions f and g.

Okay, and in fact, this idea of conditioned independence, we're going to

see later in the course, because it's used in the, well, the now infamous Gaussian

copula model for pricing CDOs. So just to give you a brief idea of how it

might be used in a bond context or a CDO context, let Di be the event that the ith

bond in a portfolio defaults, okay? So we'll assume that there is a portfolio

of n bonds. Okay.

It's not reasonable to assume that the Di's are independent.

You might ask, why is that? Well, if you think about it, there will be

all sorts of macroeconomic factors or industry specific factors, which will

cause defaults to actually be dependent. So for example, maybe some industry

crashes that might cause not just one firm to default but multiple firms in that

industry to default. And so, it doesn't make sense to assume

that these events, these Di's are independent.

But, we might be able to say that they're conditionally independent given some other

random variable zed. Zed, for example, might reflect some

industry factor. Some, some factor that governs how well a

particular industry is doing. In that case, if we assume that the

default events are conditionally independent given zed, then we can write

the probability of D1 up to Dn given Z as being the product of these factors here,

probability of D1 given Z up to probability of Dn given Z.

And it's actually often easy to compute these quantities.

So we'll actually be using this kind of idea later in the course, as I said, when

we discuss the Gaussian copula model for pricing CDOs.

We'll also see it in a couple of other applications as well.

Okay, so very briefly, I also want to mention the mean vector and covariance

matrix of a vector round the variables X. I hope we're all familiar with this

already, but let's go through it anyway. So the mean vector of X is simply the

vector of expected values, expected value of X1 up to expected value of Xn and the

covariance matrix of x is. Well, this matrix of covariances.

Okay, so, formula is expected value of X minus expected value of X times X minus

expected value of X transposed. And just to be clear, this is an n by 1

vector, and this is a 1 by n vector, so the product is n by n.

And we get an n by n covariance matrix, with the i, jth element of sigma being the

covariance of Xi and Xj. The covariance matrix is symmetric that of

course is because the covariance of Xi, Xj is equal to the covariance of Xj and Xi.

And this diagonal element satisfies sigma i greater or equal to 0, and of course,

the diagonal elements are just the variances.

So this is equal to the variance of Xi and variances are always nonnegative.

It is also positive semi-definite, so this is a, an important well-known property of

a covariance matrix, in particular, it means that X transpose sigma X is greater

than or equal to 0 for all vectors X and Rn.

The correlation matrix row X is similar to the covariance matrix except it has as its

i, jth element, the correlation Xi with Xj itself is symmetric, positive

semi-definite, and has 1's along the diagonal.

And just to remind ourselves, the correlation of Xi and Xj, is equal to the

covariance of Xi and Xj, divided by, well, the square root of the variance of Xi

times the variance of Xj. Okay.

For any matrix A, which is a k by n matrix and a k by 1 vector A, we can take a

linear combination of AX plus little a and we can compute the mean of this vector.

So the mean is a times expected value of X plus little a and the covariance matrix of

this new vector of random variables is a times the covariance of X times A

transpose. And of course, five actually implies this

result, which you're probably familiar with, that is the variance of aX plus bY

equals a squared variance of X plus b squared variance of Y plus 2ab the

covariance of X, Y. Note that if X and Y are independent, then

the covariance of X, Y equals 0, but the converse is not true in general.

And some people tend to forget this, but it is not in general true that if the

covariance of two random variables equals zero, then those two random variables are

Independent. That is not true.