Now imagine we can do that over, and over, and over, and over again.
Every time I'm going to draw 30 different ones.
Now with some combinations, just how many you can come up with choosing 6 from 47.
Now imagine choosing 30 from 10,000.
That would be an enormous number.
Of different combinations of thirty people you could do.
Take 30 people, get their values, calculate a mean.
Now have a look at this.
There's all our 10,000.
Histogram there we see the values were between 30 and 40 and we see how many
individuals fell into the different little categories from 30 to 31, 30 to 32, etc.
So a completely random, different spread that could have any value.
But now, what is going to happen if we start plotting all those means?
So I take my 30 individuals, I calculate the mean, I've got a mean on the table.
Send them back, take another 30 at random.
Now you all know now, there's going to be billions, and billions,
and billions of combinations from those 10,000.
And I could have at random drawn any 50 of them and
that's what happens in our research.
We take 50 people at random.
And that was just a mean for them, or
the difference between the means of two groups or three groups or whatever.
We just could have done so many, 30 different, or
50 different or 100 different.
There would be so many different means we could work out.
But imagine we did that for those 10,000, drawing 30,
drawing 30 and doing the means.
And we start placing them on the table, we start stacking them.
Different means will occur more commonly than others.
And here's the beauty.
No matter how skewed that original data was,
no matter skewed an individual set of 30 was, if we could
do all those of means of all those countless,
countless, countless, countless samples of 30, this is what it's going to look like.
The distribution of those means is going to be
beautifully symmetrically bell-shaped.
And that's why we can use statistics.
And that's why we can use the p-value, the geometrical area under a curve.
Because if we stack up all the means of all the possible means,
it's going to look like this and that is the central limit theorem.
We can now say that the 30 we found from a study or if we read a study in the literature.
The 30 individuals in that study or the 60 individuals in
that study of a difference between two groups is just one of
the countless possible others and that's what the statistics does.
It takes those values, uses some of those values, and then draws the curve for you.
The mathematics would take some representation, draw that curve for you.
And your city, or the city that you're reading about for
some week on a much larger set of possible values.
And if the value that you found in your study
should not have occurred very commonly.
If in actual fact the probability of it was less than 0.05 we return
that significant because it would have been rare to find that specific result.
And there you have it, the central limit theorem.