0:03

So now, we've concluded week four.

Let's just consider some of the key takeaways from the last few sessions.

So we've started to delve into the world of

statistical inference whereby there is some population of interest,

and we'd like to know something about it.

However, due to the likely size of the population,

it won't be feasible to consider every member of that population.

Hence, we will take a sample and calculate various statistics of interest from

that sample and use these values to infer

the corresponding characteristics in that wider population.

So how do we take a sample from a population?

Well, our key goal is to try and obtain a representative sample.

As we've seen from historic examples,

easier said than done.

But we do have a variety of sampling techniques available

to us with their inevitable merits and limitations.

So at the headline level,

we might wish to distinguish between non-random sampling,

which are non-probability based sampling techniques,

and their random or probabilistic counterparts.

So they will have various pros and cons in terms of time,

money and effort to conduct these different types of

sampling and also the level to which we can apply statistical inference procedures.

So we gave a few examples of things like the convenience, judgment,

quota and snowball sampling on the non-random front and simple random sampling,

systematic, stratified and cluster sampling on the random front.

We then moved into some, perhaps,

slightly more abstract territory whereby

we introduced the concept of a sampling distribution.

So to appreciate that as we obtain different samples,

we'll get different members of the population within those samples.

And hence, any statistics we calculate such as the sample mean x-bar,

which has been our main focus as far as this course is concerned, but of course,

we could relate this to things like

sample variances or sample standard deviations as well.

Inevitably, the values of

these descriptive statistics will vary from one sample to another.

So a sampling distribution is simply the probability distribution of some statistic,

recognizing the variation which typically occurs within these statistics.

So of interest to us, as we said,

was the primary parameter of interest,

the mean of a population, i.e.,

that main measure of central tendency,

that measure of location.

I'm going to use that simple descriptive statistic of x-bar,

the sample mean as our estimator of mu.

So we looked at a very simple example of deriving a sampling distribution from scratch,

and then we considered the more generic case whereby there was

an assumption of a normally distributed population,

and we considered that theoretical sampling distribution of x-bar there.

So having derived to sampling distribution,

I invited you to consider any observed value of x-bar to

be viewed as a random drawing from said sampling distribution.

We know that on average,

the expectation of x-bar is equal mu,

which simply equates to in the long run, i.e.,

on average point estimate as given by

the sample mean is correct in estimating the population mean but only on average.

There's inevitable uncertainty in any point

estimates we derive due to the potential that,

by chance, our random sample does happen to

deviate a bit from the characteristics of the population.

So we rounded off week four with

a brief discussion of how we could convert a point estimate,

the sample mean x-bar into an interval estimate or a confidence interval.

We briefly examined the margin of error associated with

a confidence interval for a mean and we considered

the three parameters which affected the width of the interval,

some of which are under our control, others not, namely,

the level of confidence and hence,

the Z value in the formula seen that is within our control.

And other things equal, we prefer to be more confident than less,

the sample size and also within our control.

And other things equal, the larger the sample size,

the more precise any estimate is likely to be and hence,

the narrower, the less wide the confidence interval would be.

And thirdly, the level of variation which exists in the population or when

unknown proxied by the variation within our sample that is not within our control,

but other things equal, the more heterogeneous,

the more, the greater the variation which exists in the population.

Then of course, that leads to more uncertainty in any estimation,

and hence, the wider the confidence intervals tend to be.

So armed now with these concepts of sampling distributions and confidence intervals,

we now look ahead to week five where we consider

another major branch of statistical inference, namely hypothesis testing.