0:03

Hello and welcome to week three of this course,

where we're going to start delving into the world of statistics.

Now, before we get into any kinds of actual statistical analytical techniques,

I'd like us to spend a little bit of time considering

the different types of variables which exist

because different statistical techniques are appropriate and

applicable depending on the nature of the variables we possess.

Now, before we get into the technicalities,

let's consider an analogy.

Imagine you were perhaps a builder or

a decorator and you enter a room and you want to change something about the room.

For example, we have a white wall behind us and imagine we wanted to change the color.

Let suppose, we wanted to paint it red,

blue, green or any color of your choice.

Now, of course, to do this, we're going to need

a particular tool for the job namely a paintbrush,

and if we have the paint then we can paint the wall accordingly.

Of course, maybe we want to do something else,

maybe we'd like to hang a picture on the wall.

So for that we may need a drill or a hammer,

or maybe a nail or a screw and if we can combine

those tools we'll be able to fix a picture to the wall.

So, of course the builder or the decorator will have different tools

in his or her toolbox to achieve different objectives.

And as far as statistical analysis is concerned,

we have these selections of tools in our statistical toolbox.

However, each is appropriate depending on the nature of the variables we possess.

So in this section, we are going consider the different types of levels of measurement.

Now, we call this say a MOOC,

Massive Open Online Course suggesting that there was a massive i.e a very large

number of students out there watching this video and are progressing through the course.

Now, each and every one of you will have lots of different characteristics

and these will give a few examples represent different levels of measurement.

For example, I'm going to anticipate here we have students studying throughout

the world and hence there'll be many different nationalities of students out there.

Some of you, of course, may have dual nationality or multiple nationalities.

So, nationality is a variable which will vary across different people.

For example, I'm a British citizen that's what it states

in my passport and that is my nationality.

Now, clearly, different people will possess

different nationalities but this is an example of

a categorical variable because we're not really measuring this in numerical terms,

we're giving a name,

a word to your level of nationality.

For example, some maybe British, French,

Russian, Japanese, Chinese, etcetera.

So we have different levels of

nationalities and this is our first example of a categorical variable.

However, there isn't any natural ordering to these nationalities.

True, we may want wish to arrange them perhaps

alphabetically just for convenience but that's not to say

a nationality beginning with a letter earlier in the alphabet is

better or worse than a nationality letter in the alphabet.

So, though it's categorical there is

no natural ordering and hence you refer to this as a nominal categorical variable.

In contrast, there are other kinds of

categorical variables which do possess some sort of natural ordering.

For example, if we consider athletes in the Olympic 100 meter final,

so, we can already actually consider an example

of a nominal variable with respect to these athletes.

Next time you watch the Olympic Games,

if you look at the number,

the competitor number which they affix to some sticker on their vest,

although that's a number that is an example again of

a nominal categorical variable because their number is

typically just to be used for identification purposes.

So there it takes a numerical value.

We're not saying someone with a competitor number of 10 is better or worse than

someone with a competitor number of 20 just because 10 is smaller than 20.

It's simply used for identification purposes,

just like your passport number or maybe your driving license number.

So, again that's an example of a nominal variable used for identification purposes.

But now let's suppose these athletes perform the race.

Now, clearly they will finish the race in a particular order.

They'll be the winner coming in in first place getting the gold medal then the runner up,

the silver medalist in second place,

the bronze medalist in third place and everyone else at their after.

So, if we wish to consider the actual finishing position in that 100 meters race,

first, second, third, fourth, etcetera,

down that order, this is an example yet again of a categorical variable but

now one with an ordinal nature because there is

a natural ordering to those finishing positions.

The person finishing first gets the glory,

gets the gold medal and we can see that he or she

perform better than the person coming in second, third, fourth etcetera.

So an example of an ordinal categorical variable.

Now, many variables will tend to be measurable in

nature such that we can measure these on

some kind of scale but even these measurable variables can be divided up into two types.

Those measured on an interval scale and those measured on a ratio scale.

So just as categorical variables will be split up between the nominal where there's

no natural ordering and the ordinal where there is a natural ordering.

Measurable variables could also be

subdivided into these interval and ratio levels scales.

So let's consider some examples of each.

Well, let's return to that Olympic 100 meter final sprint.

So, imagine you watch this race and suppose you

wanted to come up with some sort of performance measure for each of those athletes.

Maybe you want to score them on a scale from 0 to 10.

Now, we're not necessarily saying in terms

of where they finish was at first, second or third,

we said that was ordinal in nature but of

course it might be that someone comes in third but did very

well given perhaps they are generally of lesser quality athletes and tend to finish last.

So you may wish to rate the different athletes based on

their performance and where they came in the race relative to their natural abilities.

So, if we were to rate these athletes on a nought to 10 scale,

an example them of an interval level of measurement.

Why? Well, because that nought to 10 scale is an arbitrary choice.

We could easily perform a linear transformation of this,

a positive linear transformation of this,

such that those nought to 10 ratings could easily be rescaled to

be on nought to 100 scale or maybe from a 100 to 1000 scale.

So in this case, there is no fixed zero point

and this is an example of an interval level of measurement.

Another example might be temperature.

Now depending on where you are in the world,

different countries may use different units of measurement for temperature.

Some countries may have a preference for degrees Celsius,

degrees centigrade, others perhaps for degrees Fahrenheit.

Of course the choice of units of measurement,

how hot or how cold it is on a given day still

the same temperature regardless of your choice of units of measurement,

degrees Celsius or degrees Fahrenheit.

However, the actual numerical value we

assign in degrees Celsius or degrees Fahrenheit would vary.

For example, 28 degrees Celsius equates to 82 degrees Fahrenheit.

They are recording the same temperature but clearly we're getting

a different numerical value depending on

the choice of temperature scale we choose to use.

So, if we just stick to degrees Celsius for a moment,

imagine today was 10 degrees Celsius and tomorrow was

20 degrees Celsius and the day after that was 30 degrees Celsius.

So, there was a 10 degree differential between today and tomorrow,

tomorrow and the next day,

so that's a fixed differential across those three days.

However, that's not to say that tomorrow at 20 degrees is

twice as hot as today at 10 degrees because if we had

a day where it was freezing zero degrees Celsius that's not saying there's an absence of

temperature because clearly we can have a negative degrees

Celsius as well indicating very cold conditions.

So, there's not really a fixed zero point.

So zero degrees Celsius would not mean there is no temperature,

it's an arbitrary sort of choice of points to indicate what

we might call a cold situation.

Of course in physics,

if you're familiar with that,

and are dealing with Kelvin's then zero Kelvin,

absolute zero really as cold as we can get in

the universe and that would be an example of a ratio level of measurement.

So, perhaps consider a few more examples of this ratio level.

Height. Now, many of you out there

watching this and I'm sure there will be different heights among you.

There'll be some short people,

some taller people, and people somewhere in between.

Now, of course, we could measure height in different ways in metric units like in meters

or centimeters or maybe in imperial units such as feet and inches.

But suppose we had two individuals,

one was so very short at 100 centimeters

and another one was a comparative giant at 200 centimeters,

or in meters terms,

one meter versus two meters.

Now, that two meter person one could say is twice as tall as that one meter individual.

So there the height in say centimeters although it's

twice the magnitude that would reflect that someone is twice as tall as someone else.

Contrast that with 20 degrees Celsius versus 10 degrees Celsius.

Of course someone with height of zero centimeters doesn't really exist,

they have no height, whatsoever.

If we go back to the Olympic 100 meter sprint race,

an example of a measurable or ratio level variable there,

would be the time it takes to complete that race.

So, if it was an athlete like Usain Bolt,

he would clearly run this in

somewhat lightning speed and we could measure the time it takes from

the start to crossing the finishing line and that would be measured in seconds,

maybe to two or three decimal places and the precision we give this time

is simply limited by the precision of the measuring device available.

And of course, given we have those units of measurement of

seconds if someone was running this 100 meter race in zero seconds

that's a sort of instantaneous run of the race but that would indicate an absence of the

characteristic i.e that it took no time at all in order to run at the race.

So, this session is just to make you acquainted with

the different kinds of variables which exist and do

remember that builder or decorator analogy

whereby there are different tools in the toolbox to achieve different tasks,

painting the wall or maybe hanging a picture on the wall.

Similarly, as statistician's, we have

an arsenal of different statistical techniques in our toolbox.

However, each is designed for different types of levels of measurements.

So don't think necessarily you can blindly apply

every technique to every kind of dataset you come across.

One needs to be very conscious and aware of the levels of measurement of

the variables in your dataset and

hence that determines how you can go on and analyze them.