0:02

Just like biology has a central dogma, statistics, and

Â in particular statistical inference, has a central dogma as well.

Â The central dogma is sort of the central idea that

Â explains what you're trying to do in the field.

Â And so, the central dogma of statistics has to do with this specific problem.

Â Suppose you have a huge population, like you see in the top left-hand corner, and

Â you might want to know something about that population.

Â In this case, it's an idealized example, so we might want to know how many pink and

Â how many gray samples are there.

Â So in general, the problem might be that measuring the whole population, or

Â taking measurements on the whole population, might be really expensive, or

Â it might be very hard to do for a number of different reasons.

Â And so, what we want to do is take advantage of, basically, probability to be

Â able to say something about the population without measuring the whole population.

Â So what we do is we take,

Â use probability to take a small sample from that population.

Â You may have heard of sort of a randomized sample,

Â there's a number of different ways you can use probability to get this sample, but

Â the idea is that you would like it to somehow represent the larger population.

Â So once you've taken that sample, we can maybe make measurements on the smaller

Â number of objects that we've collected here.

Â So we have these symbols in the lower right hand corner,

Â there's only three of them, so it might be relatively cheap,

Â or relatively easy to take measurements on it.

Â So we see that there are two pink symbols and one gray symbol, and

Â so then what we use is statistical inference

Â to make a guess about what the population looks like.

Â So we might say, you know, on average there are going to be more pink symbols

Â than there are gray symbols in the whole population,

Â because that's what happened in our sample.

Â And if we did the sampling right and the probability sampling right,

Â then that best guess might be pretty good.

Â Another important component of the central dogma of statistics is that

Â this best guess isn't quite enough.

Â So, we took a sample, we didn't measure actually everything in the population,

Â we only measured a subset.

Â So, it turns out that the whole, the, our best guess is actually, potentially,

Â kind of variable.

Â And so, it could be that the best guess is off in one direction,

Â we might actually have more gray symbols in the population.

Â Or, it could be in the other direction,

Â that it might be more pink symbols in the population.

Â 2:04

So the question is, how do we quantify that variability?

Â How do we say,

Â we took this sample, how do we see what's actually going on in the population, and

Â that's the, sort of, the central idea behind statistical inference.

Â And it's really important, so knowing the population is maybe one of the most

Â fundamental ideas in statistics, and it's central to the central dogma.

Â So, in this same example, suppose we have a population that consists of pink and

Â gray symbols and we take a sample from that population.

Â And, then it turns out that between the time that we took that sample and

Â we actually want to do the inference, the population changes.

Â So now, all of a sudden, we've introduced some purple symbols.

Â 2:42

Now, if we want to do that same inference,

Â we end up in trouble, because the sample no longer represents the population.

Â This is actually a very common problem, and it's a very under sort of, appreciated

Â problems in statistical inferences, knowing what the population is.

Â So here's an example of that.

Â You may have heard about Google Flu Trends.

Â Google Flu Trends tries to use search terms to predict

Â flu activity in the United States and in other places.

Â And it got a lot of press because it's sort of a cool,

Â and a very inef, a very efficient way of trying to predict the flu,

Â you just need the search terms and you can create the prediction.

Â But it turns out that Google Flu Trends, despite being very good when it very,

Â was first released,

Â ended up being pretty bad at predicting when flu outbreaks would occur.

Â And the reason why was that the population changed, the way people searched for

Â symptoms of the flu changed over time.

Â And so, that was one of the major reasons why the prediction algorithm

Â they originally developed no longer worked, because the population changed.

Â So the central idea of statistical inference, and

Â the central dogma of statistics is, we have a population,

Â we want to take a smaller sample from that population using probability, and

Â then use statistical inference to say something about the population, and

Â in particular, the variability of our estimate for that population.

Â