0:09

However, we need something that's a bit easier to work with for numeric outcomes

of experiments, where here I'm using the word experiment in a broad sense.

So densities and mass functions for random variables are the best starting point for

this, and this is all we will need.

Probably the most famous example of a density is the so called bell curve.

So in this class you'll actually learn what it means to say that

the data follow a bell curve.

0:34

You'll actually learn what the bell curve represents.

Among other things, you'll know that when you talk about probabilities associated

with the bell curve or

the normal distribution, you're talking about population quantities.

You're not talking about statement about what occurs in the data.

How this is going to work and

where we're going with this is that we're going to collect data that's going to be

used to estimate properties of the population, okay.

And that's where we'd like to head.

But first before we start working with the data, we need to develop our intuition for

how the population quantities work.

1:09

A random variable is the numeric outcome of an experiment.

So the random variables that we will study come in two kinds, discrete or continuous.

Discrete are ones that you can count, like number of web hits, or the number,

the different outcomes that a die can take.

Or even things that aren't even numeric, like hair color.

And we can assign the numeric value, one for blonde, two for brown, three for

black, and so on.

1:35

Continuous random variables can take any value in a continuum.

The way we work with discrete random variables is,

we're going to assign a probability to every value that they can take.

The way that we're going to work with continuous random variables is we're

going to assign probabilities to ranges that they can take.

1:58

So the biggest ones for building up our intuition in this class will be

the flip of a coin, and we'll either say heads or tails, or 0 or 1.

This is discrete random variable, because it could only take two levels.

The outcome of the roll of a die is another discrete random variable,

because it could only take one of six possible values.

These are kind of silly random variables because they're,

the probability mechanics is so conceptually simple.

Some more complex random variables are given below.

For example, the amount of web, the website traffic on a given day,

the number of web hits, would be a count random variable.

We'll likely treat that as discrete.

However we don't really have an upper bound on it.

So it's an interesting kind of discrete random variable.

We'll use the Poisson distribution likely to model that.

3:00

The hypertension status of a subject randomly drawn from a population could be

another random variable.

Here we might give a person a one if they have hypertension or

were diagnosed with hypertension, and a zero otherwise.

So, this random variable would likely be modeled as discrete, or

would be modeled as discrete.

3:18

The number of people who click on an ad.

Again, this is another discrete random variable, but a, but an unbounded one.

But still, we would still assign a problem, probability, for zero clicks,

one clicks, two clicks, three clicks and so on.

3:38

When we talked about discrete random variables,

we said that the way we're going to work with probability for

them is to assign a probability to every value that they can take.

So why don't we just call that a function?

We'll call it the probability mass function.

And this is simply the function that takes any value that a discrete random variable

can take, and assigns the probability that it takes that specific value.

So a PMF for a die roll would assign one sixth for the value one,

one-sixth for the value two, one-sixth for the value three, and so on.

But you can come up with rules that a PMF must satisfy in order to then satisfy

the basic rules of probability that we outlined at the beginning of the class.

First, it must always be larger than zero because we, larger than or

equal to zero, because we've already seen that a probability has to

be a number between zero and one, inclusive.

But then also the sum of the possible values that the random variable can

take has to add up to one, just like if I add up the probability a die takes

the value one, plus the probability that it takes the value two,

plus the probability it takes the value three, plus the value it takes four, five,

six, that has to add up to one, otherwise.

The probability of something happening, right,

that the die takes one of the possible values, would not add up to one,

which would violate one of our basic tenets of probability.

So all a PMF does, has to satisfy, is these two rules.

We won't worry too much about these rules.

Instead, we will work with probability mass functions that

are particularly useful, like the binomial one, the canonical one for

flipping a coin, and the Poisson one, the canonical one for modelling counts.

5:30

Here we're using the notation where an upper case letter represents a potentially

unrealized value of the random variable.

So it makes sense to talk about the probability that x equals 0 and

the probability that capital X equals 1.

Where as a lower case x is just a placeholder that we're going to

plug a specific number into.

So in the PMF down here, we have p x equals one half to the x,

one half to the one minus x.

So if we plug in x equal to 0 we get one half, and

if we plug in x equal to 1 we get one half.

And this merely says is that the probability that the random

variable takes the value 0 is one half and

the probability that the random variable takes the value 1 is also one-half.

This is for a fair coin.

But what if we add an unfair coin?

Let's let theta be the probability of a head and 1 minus theta be

the probability of a tail, where theta's some number between 0 and 1.

Then we could write our probability math function like this.

P of x is theta to the x, 1 minus theta to the 1 minus x.

Now, notice if I plug in a lower case x of 1 we get theta, and if I plog in a,

plug in a lower case x of 0 I get 1 minus theta.

Thus for this population distribution the probability a random variable takes

the random 0 is 1 minus theta.

And the probability that it takes the value of 1 is theta.

This is incredibly useful, for example, for modeling the prevalence of something.

For example, if we wanted to model the prevalence of hypertension,

we might assume that the population or the sample that we're getting

is not unlike flips of biased coin with the success probability theta.