0:00

[MUSIC]

So now we are now at the end week 3, in fact, the halfway point of the course.

So I'd just like to, perhaps,

summarize some of the key takeaways we've seen from this third week.

So now we've moved beyond the concepts of some more sort of theoretical

probability considerations to some simple descriptive statistics.

Indeed, the first few sections have been dealing with data reduction and

summarisation.

Namely, if we observe and

collect some data on particular variables, first of all, we need to be

conscious about the levels of measurement of the variables being considered.

Are these categorical variables, are they measurable variables?

Because that in itself is going to affect how we may statistically analyze them.

So the data reduction, data summarisation really took two forms.

One was a sort of visual summary,

whereby we were trying to display the sample distribution.

And again, depending on whether these were categorical or

measurable variables, would affect how we may wish to summarize these.

Maybe simple tables, maybe bar charts for categorical variables, or

histograms are a very common kind of graphical display for

those univariate measurable variables.

And we also start to consider how we might extend from the univariate

to the bivariate setting, bi prefix meaning having two variables.

And we saw an example of a simple scatter plot and discussed sort of correlation,

and how this did not necessarily imply causality.

But nonetheles, a scatter diagram, a useful display for

showing two measurable variables.

1:39

So having done these data visualizations, so summarising the data graphically,

we also needed to do this numerically as well.

And we saw a combination of measures of central tendency, mean, median, and mode.

And of those three, I think it's fair to say the mean the most widely used, but

we noted it was susceptible to being influenced by those pesky outliers.

So perhaps second in the order of importance might be the median, and

finally, the mode as well.

But all three, we're trying to come up with sort of representative summaries,

ie what was sort of the average or central value of our data set?

This then extended into looking at various measures of spread and dispersion.

Recognizing that things such as the mean and median,

they provide a great summary in terms of the typical location of the data set.

But they neglect to say how dispersed the values of a variable are.

And we saw, with that simple example of sort of two hypothetical share returns,

the red stock and the black stock,

we needed to go beyond those measures of central tendency to give some sense of

the variability which exists across these variables.

So things such as the sample variance and

the sample standard deviation were offered as useful measures.

3:02

Now, that was all looking at sample data sets and

working with some sample statistics related to them.

We then sort of revisited part of our week 2 material, whereby we introduced

a further theoretical probability distribution, ie the normal distribution.

And said how useful it is for modeling many real-world phenomena.

And going forward in your statistical studies,

you'll see the normal distribution making frequent appearances

as distributional assumptions in a wide variety of models.

And to come later on in this course,

using something called the essential limit theorem, so more on that next week.

But we introduced the normal distribution as a two-parameter family, requiring or

distinguishing the different members of the normal family by the mean, mu,

and the variance, sigma squared, of these normal distributions.

We then went back slightly theoretically and

continued some of our week 2 material about how we could work out the variance,

based on a theoretical population distribution.

So this extended our definition of the sample variance,

which we viewed as an average of the square deviations about the sample mean.

And we saw the equivalent sort of expected value.

And we looked at the score of a fair die to work out the variance,

based on that simple probability distribution.

We then rounded off with a look at how we may

wish to benefit from standardizing our variables.

And noting that when we consider data on a standardized basis,

it's very easy to judge whether or not we have any extreme observations.

Remembering, on a z-score, values lying beyond plus or minus 2, or

indeed, beyond a plus or minus 3, we might look at defining as outliers, or

extreme outliers, respectively.

So looking ahead to week 4,

we are now in a position to start to conduct some formal statistical inference.

And that's going to occupy us for the next couple of weeks of the course.

We begin with issues of sampling from a population, and

how that could take place, as well as matters of point and interval estimation.

That will be week 4, and

then we're going to have a little look at hypothesis testing in week 5.

So bear in mind, there's a cumulative nature to probability and

statistics, and some of the themes we've seen already will be revisited and

seen again in our later work.

So join me for week 4, when we start our look at statistical inference.

[MUSIC]