Hi, my name is Brian Caffo. I'm in the department of biostatistics at

the Johns Hopkins Bloomberg School of Public Health and this is mathematical

biostatistics boot camp lecture four. Today, we're gonna talk about random

vectors and independence. Independence is a key ingredient in

simplifying statistical models. Independence is a, a useful assumption.

And we frequently use it in statistics to get a handle on complex phenomena.

In addition, we'll find that independence in identically distributed random

variables are going to be our canonical model for what we might think of as a

random sample. So let's just briefly talk about what

we're going to cover today. Random vectors, which are simple

collections of random variables. Then we'll talk about independence.

And you probably have a rough idea of what is meant by independence to begin with.

But we're gonna talk about the mathematical formalism a little bit.

We'll talk about correlation. And then, go over various mathematical

properties of the correlation and covariance operators.

And then we'll use our facts about independence, and variance, and

correlation to talk about properties of the sample mean.

And then we'll cover the sample variance, and end with some discussion.

This lecture is actually one of the hardest things in all of statistics.

I think if you can kind of understand this lecture, you've understood what the goal

of probability modeling and this kind of population modeling really is.

You might want to consider listening to it over and over again.

And it's incredibly difficult concepts until you internalize them.

And then once you internalize them they seem simple.

So what I, what I'm hoping in this lecture to do is to help you internalize them.

Okay. So random vector is nothing other than an

ordinary vector with random variables as its entries.

So if you have X and Y's random variables, then simply the ordered collection X comma

Y is a random vector. So just like, individual random variables

have densities in mass functions or distributions that govern their

probabilistic behavior. Random vectors have joint densities and

joint mass functions and joint distribution functions that govern their

probabilistic behavior. Lets just simply talk about densities in

mass functions to begin with. A joint density f of x, y first of all it

has to satisfy that it is positive everywhere.

Its the two dimensional random vector, so the surface f exists on the two

dimensional plane and the fact that f is greater than zero suggests that its height

everywhere is above the horizontal plane. And it has to integrate to one, but when

you integrate over the whole XY plane; so the Z direction has to be greater than

zero and the integral over the XY plane has to be one.

So it's a direct extension of the ordinary one-dimensional probability density

function, and I think from this definition you should probably be able to guess what

the definition of a joint density is for, by n random variables.

And then for discrete random variables, let's say f now is a joint probability

mass function. Then the joint probability mass function

F. Maps possible values of X and Y here.

So, lowercase x and lowercase y are the possible combined values of X and Y.

To probabilities, so to satisfy the definition of being a joint probability

mass function, f has to be bigger than zero for all possible combinations of x

and y, and then the sum over all possible combinations has to equal one.

By the way, the joint density function exactly works like a univariate density

function in that. In this case, volumes under it, so

integrals under it correspond to probabilities, and so the total area is

one and in the same case with the joint mass function, sums of collections of

possible values of x and y yield the probability of that collection.

So for this class, a general discussion of random vectors is probably too much, so

we're only going to focus on one specific kind of joint density that's particularly

manageable type. And that's when the random variables x and

y are independent. And what we'll see is that, for say joint

density or joint mass function, if the random variables x and y are independent

then the joint density just factors into the product of the two individual

densities. The f of x and g of y.

Basically this is what mathematically independence does for us a lot is it turns

complicated, multi-varied structures into products.

We're gonna use this, this fact a lot. And we'll explain some of the intuition

behind this. Thinking back to our early definitions of

probability, we were discussing the sample space and events, two events are

independent if the probability of their intersection is equal to the product of

their probabilities, so probability of A intersect B is probability of A times the

probability of B, so A and B are independent.

Incidentally, if this is true, then A is independent of B complement, B is

independent of A complement, and A complement is independent of B complement.

And the mathematical definition of independence is equivalent to our kind of

intuition of what it means to be independent.

A is unrelated to B. That's what the mathematical definition

implies and we'll kind of get a better sense of that.

For two random variables, we would maybe define that they're independent.

If you have any two sets, a and b, that probably the x lies in a, and y lies in b,

is the product of the probability that x lies in a regardless of what y is doing.

And the probability that y lies in b regardless of what x is doing.

And so that's just simply a direct extension of the definition of

independence above, that I think probably everyone is maybe a little bit familiar

with. We automatically think of independence all

the time already so if you would to ask nearly anyone who has basic amount of

mathematical training, what's the probability of getting two consecutive

heads on two consecutive coin flips? They would probably say okay well, the

probability of getting a head on the first one is a half, and the probability of

getting a head on the second one is half, so it's probably a quarter, right?

Well that's just an exact execution of the independence rule.

Let A, B the event that you get head on flip one.

B be the event you get head on flip two, and basically what you are saying is you

want the probability of the intersection ahead on flip one and two, and so then the

probability of that intersection is exactly the product of probabilities we

have independence probability of A times the probability of B so .5 times .5, which

is .25 or a quarter. So we use independence all the time and,

you know, the main consequence of independence is that probabilities of

independent things multiply to obtain the probability of both occurring.

But this creates a problem in that people have then gone onto extend this rule to

where they just multiply probabilities regardless of whether they're independent

and this can lead to tragic consequences. Here's a great example.

In Science, Volume 309 they report a physician who gave expert testimony in a

criminal trial and he was giving expert testimony on sudden infant death syndrome,

SIDS, which is this tragic phenomenon where a baby dies, for example, in the

middle of the night and no one exactly knows why.

So, a woman was on trial because she had two consecutive children who died of SIDS.

And there was a court case that then considered whether or not this was too

unlikely to happen by chance, and that it wasn't really SIDS, it was something

malicious on the part of the mother. So, the person who was testifying did the

following calculation. The person said well, the probability of

SIDS is one out of 8543. I'm not 100 percent clear where they got

that number, but lets assume for the case that's correct.

Then, the person giving the testimony said well then the probability that you have

two SIDS would be the product of that number twice, or the square of that

number. One over 8543 squared.

Based on this evidence the mother was convicted of murder.

So, what was this physician's mistake in this case.

For the purpose of this class, there is actually quite a bit of discussion you

could have over ethics, probability, evidence, and culpability based on this

case. There's quite a collection of complicated

issues that intersect when you're discussing a case liked this.

For example, where and how does this probability of a SID come from?

What's the evidence for it? How do you, you know balance medical

evidence when convicting a person or not convicting a person in a trial.

For the purpose of this class lets just simplify the discussion down to, is this

directed calculation warranted of simply multiplying this number twice, given that

it's correct? Well, if A one is the event that the first

child died of SIDS and A two is the event that the second child did, then the

inherent calculation that's, or the inherent assumption being made is that A1

is independent of A2 so that you can multiply the probability of A1 times the

probability of A2. But this logic fails immediately.

There's no reason to believe that the event of the second SID is independent of

the event of the first SID so in this case And in many cases in biology, biological

processes that have a genetic or familial component would tend to be dependent

within families. So you couldn't multiply the marginal

probabilities to obtain the intersection. And there's other problems, and I outlined

an example of one here, with this estimate.

The prevalence was obtained from an unpublished report on single cases, and

quite a bit of the discussion surrounding this case revolved around these and other

issues. But, the point I'm trying to make for the

purposes of this class is, you can't just go around multiplying probabilities

willy-nilly. The random variables or events that you're

discussing have to actually be independent.

Okay, so we'll use the following fact extensively in this class and we'll use it

as a basic simplifying principle. If we have a collection random variables

that are independent X1 up to Xn, then the joint distribution of X1 to Xn, or the

joint density function, is the product of the individual densities or mass

functions. So, in other words, the density of F of X1

up to Xn is the product of the individual densities.

And here I have fi of Xi, indicating that every Xi could potentially have a

different density. The most common model that we'll be

dealing with is the instance where X1, X2. All the way up to Xn are from the same

distribution. And, this particular case, we would say

that the Xi's are independent and identically distributed.

The independent being that X1 is independent from X2, and so on.

And identically distributed in that f1 is equal to f2 is equal to all the way up to

fn. Iid samples are very important in the

subject of statistics, and the reason for that is that IID random variables are a

basic kind of default model for random samples.

If you have a collection of things that are in, in essence we believe

exchangeable, then we treat them as if they are IID.

And many of the important theories of statistics are founded on the assumption

that variables are IID. So, to give you an example of IID random

samples, imagine just simply rolling a die.

Each roll of a die is a draw from the uniform distribution on the numbers of one

to six. So when we say that a process, when we

model a process as if it's IID, we're saying it's like we're rolling a die.

For each variable that we're modelling from some population level distribution.

I just wanna comment on the broader discussion on probability modelling, this

is never actually the case right? It, it's probably a very good model for

rolling a die, but we use IID to model things where surely the variables

themselves are not IID. We can rarely guarantee that are sample's

actually a random draw from some population distribution f over and over

again. The point is that it's statistical model

used to simplify calculations, and simplify our discussion.

But whenever we use this statistical model we have to be cognizant of the fact that

it is a model, and it's an enormously simplifying assumption.

Let's just go to a very important example of flipping a coin.

So imagine if we have a biased coin and remember if we have a biased coin we could

say the probability of a head or success probability is p and we flip it n times.

What is the joint density of the collection of possible outcomes?

Recall, each coin flip here is a Bernoulli random variable, with success probability

p. And recall we wrote out the density in the

form p to the x, one minus p to the one minus x, and notice that's a very easy

form, right? So if you plug in x equal one, you get the

probability p of a head or of a one. And if you plug in x equals zero, we get

one minus p for the probability of a tail or a zero.

So this density is a nice way to represent it and you'll see why we present it

specifically this way in the next line. So the joint density is the joint mass

function after f of X1 to XN. If their independent coin flips, right, is

simply the product of the individual densities and you'll see, from this

formula, we get p raised to the summation Xi one minus p raised to the n summation

Xi. So if the xs are all 0s and 1s, this works

out to be p to the number of heads, one minus p to the number of tails.

And that's basically why we write out the density this way is because if we have a

bunch of independent coin flips, then it's convenient that the mass functions

multiply and we wind up with this nice form for the joint mass function.

>> So if you wanted to say for example, if I have a bias point, the success

probability P, and I had four coin flips and I wanted to know what is the

probability of getting a one and then a zero and then a one and then another one.

So one, zero, one, one you would simply plug into this formula and notice the

order of one, zero, there we got three heads and one tail.

Notice the order doesn't matter. We would need p to the three, one minus p

to the one would be the probability of that occurrence.

And notice it's the same probability regardless of what order.

The 1s and 0s occurred. So this formula makes it easy to calculate

the joint probability for a collection of 1s and 0s from a potentially biased coin

flip. Just want to mention again that this model

is tremendously important. So, imagine for example we want to model

the prevalence of hypertension in a population.

One way we might go about doing that, is to say that our sample is IID and again

that's often a big assumption, that people are IID draws, individuals are coin flips

and what we would like to know is their success probability of having

hypertension. And so that success probability is the

prevalence of hypertension in the population and we would use this joint

mass function to model that process for our collection of data and that's the idea

behind where were going with this. But notice there's a lot of assumptions

that go into that right? I just want to emphasize this fact quite a

bit. We're assuming that we're randomly drawing

people from the population that we're interested in or not even that we're

randomly drawing them but that we can model the collection of people.

Their hypertension status as if they were a bunch of independent coin flips with the

prevalence being the success probably. That's ultimately what our model is

stating. So it's important to always keep that in

mind. So let's stop here and we'll next talk

about some of the mathematical properties associated with random variables and

covariances in correlation and their consequences when variables are