0:02

In this section of our lecture,

we present some important descriptive numbers

that measure the variability of the observations around the mean.

Usually to indicate this variability,

we speak about the spread of the observation around the mean.

In particular, in this lecture,

we'll see the range,

the variance and the standard deviation.

We need to keep in mind that in all context there exists variation.

There exists always a mean value and a value which stays around the mean,

and which is indicated by the spread.

For example, in finance the variation from the mean represents the risk.

A students study on average eight hour per day.

This is a mean value.

The student may study some days 10 hours,

some other day six hours and so forth.

It is interesting to notice that while two datasets could have the same means,

one set of observation can have a higher degree of dispersion around its mean,

compare with the other set.

This means that given two sets of observations,

the individual observation in one set

may divert more than the mean than the other observations.

For example, in sample A,

we have 1, 2,

1 and 40, and in sample B we have 9,

10, 11 and 14.

If we calculate the mean,

this is for both sets equal to 11.

However, we can notice that the data in the first sample

are further from the mean value than the data in the second sample.

This is why once we get the mean,

it is important for us to know also the measure of this spread around the mean.

2:49

The range is the difference between the largest observation and the smallest observation.

The greater the spread of the data from the center of the distribution,

the larger the range.

Since the range takes into account only the largest and the smallest observations,

it might give a distorted picture of the data.

In particular, this is likely to happen if there is an unusual extreme observation.

This is why although the range measures the total spread of the data,

it might be not a satisfactory measure of the variability.

3:38

The usual extreme observations we can have in our data are called outliers.

Basically, the outliers are either very height or the low observations.

The influence of the outliers in the data may distort our final understanding of them.

One way usually used to avoid

this drawback is to set the data in ascending or descending order.

After, we need to discard some of the highest and some of the lowest numbers.

Finally, we find the range of those remaining.

4:49

Although the range measures the spread of the data,

we need a measure that would average

the total distance between each of the data values in the mean.

In these cases, we have to do with the variance and the standard deviation.

Notice that for all datasets,

if we sum up all the distances between each of the data values in the mean,

it will be always equal to zero.

This is understandable if we consider that the mean is the center of the data.

5:33

If the data value is below the mean value,

the difference between the data value and the mean would be negative and vice versa.

This is why we square these differences.

Then each observation, both above and below the mean,

would be part of the sum of the squared terms.

6:02

The variance represents the average of the sum of the squared terms.

The population variance is indicated by the Greek letter Sigma square.

The variance is the sum of the squared difference between

each observations and the population mean divided by

the population size N.

6:32

The sample variance is indicated by the capital letters S square.

The sample variance is the sum of

the squared differences between each observation and the sample mean,

divided by the sample size, and minus one.

6:54

Notice that for sample data,

the variance is found by dividing the numerator by n minus one and not by N. Why?

Because mathematical statisticians have shown that if the population variance is unknown,

a sample variance is better estimator for the population variance,

if the denominator is given by n minus one.

7:42

Now, we need to define the other important numerical value.

That is the standard deviation.

The standard deviation is the square root of the variance.

This is why it restores the data to their original measurement unit.

The standard deviation measures the average spread around the mean.

The population standard deviation is the positive square root of the population variance.

Then it is the square root of

the sum of the squared differences between each observation and the population mean,

all divided by the population size N.

The sample standard deviation is the positive square root of the sample variance.

Then it is the square root of the sum of

the squared differences between each observation and its mean value,

all divided by the sample size N.

8:59

For example, calculate the standard deviation of the following data;

6, 8, 10, 12, 14, 9, 11, 7, 13, 11.

We need to follow three steps.

Step one, we need to calculate the sample mean,

then we sum up, 6 plus 8,

plus 10, plus 12,

plus 14, plus nine, plus 11,

plus seven, plus 13, plus 11,

and we divide them over the number of the observations, which is 10.

This will be equal to 10.1.

Step two, we find the difference between each of the data in the mean, which is 10.1.

Then we have 6 minus 10.1,

plus 8 minus 10.1,

plus 10 minus 10.1,

plus 12 minus 10.1,

plus 14 minus 10.1,

plus 9 minus 10.1,

plus 11 minus 10.1,

plus 7 minus 10.1,

plus 13 minus 10.1,

plus 11 minus 10.1.

These will be equal to zero.

Step three, we need to square each difference and then we

have 6 minus 10.1 to the power of 2,

plus 8 minus 10.1 to the power of 2,

plus 10 minus 10.1 to the power of 2,

plus 12 minus 10.1 to the power of 2,

plus 14 minus 10.1 to the power of 2,

plus 9 minus 10.1 to the power of 2,

plus 11 minus 10.1 to the power of two,

plus 7 minus 10.1 to the power 2,

plus 14 minus 10.1 to the power of 2,

plus 11 minus 10.1 to the power of 2,

which is equal to 69.9.

12:00

Then we can calculate the variance.

We do it by dividing the squared differences by the number of the observations minus one,

which is 10 minus 1 equals to 9,

and then we have 69.9 over 9 equals to 6.76.

Finally, we can calculate

the standard deviation by taking the square root of the variance.

The square root of 6.76 is almost equal to 2.6.