[MUSIC]

Hi guys.

Welcome to the sixth part of statistics applied to biodiversity of the course

biological diversity, theories, measure, and data sampling techniques.

Today we will talk about correlation.

We determine correlation we indicate our relationship between two random variables.

So that each value of the first variable corresponds with a certain regularity

to a value of the second.

The correlation cannot be conceded a cause-effect relationship,

because it simply is a tendency of a factor to vary in function of another.

Sometimes the variation of a variable depends on the variation of the other.

For example the relationship between the eye and the diameter of a tree.

But in some cases there may be other relations or

intermediate factors between the two variables.

In this case we called them spurious correlations.

It is therefore necessary to pay attention to the problems arising from drawing

conclusion from the correlations.

The correlation is a direct or

positive when changing the factor the other also varies in the same direction.

Why it is an indirect or

negative when changing a factor the other varies in the opposite direction.

The degree of correlation between the two variables is expressed by means of

correlation indices.

These take values between -1, when the variables are inversely related.

And +1, when there is a positive correlation.

That is when the change of a variable corresponds to a rigidly

dependent variation of the other.

A correlation index of zero indicates an absence of correlation.

Two independent variables have a correlation index equal to zero.

But this doesn't not necessarily imply that the two variables are independent.

To calculate the correlation coefficient called r Between two factors or variables.

For example i in diameters, length of beak and weight.

We need to calculate their means, the square root of errors and

the product of the two variables.

But in this case we can calculate the Pearson correlation coefficient.

You will see the formula for calculation in this picture.

The statistical significance of r can be calculated in the parametric test

directly from the tables of the probability of the distribution.

Simply dividing r for S.E. So that's the Standard Error of r.

So you can see that t, in this case, is just r, so

the coefficient of correlation divided by the Standard Error of r.

The values of t may be compared with those in the table of the critical t statistics.

The correspondence of N total minus one,

minus one degrees of freedom so we have N epps and N epsilon.

They are the number of units in the two samples minus two

since the variables in the correlation are two.

If the computed values do exceed the critical one

to the final level of probability, the correlation is statically significant.

The r squared is often called coefficient of determination.

And when this is one plus or minus one the correlation positive or

negativity is define it perfect.

Finally, we must remember then to calculate the correlation and the related

coefficient we assume that both sets of data are normally distribution.

This is why we calculate the Pearson correlation coefficient.

Otherwise, it is necessary to proceed

to a normalization using local transformation or the root square.

When distribution are be or multi-model, correlation cannot be there.

So a preliminary histogram of the distribution of data we clarify whether

the observation are more or less mathematical distributed around one or

more values with the largest number of observations.

When is that we need to test the significance of a trend.

In times serious, when the risk of out of correlation is high and the data are not

normally distributed, it is better to use the spearman correlation coefficient.

Which employs the ranks of the two variables and

the square differences of them, so they are paired.

The formula to calculate the spearman correlation coefficient

is shown in this picture.

So today we saw how correlation works.

But we need to understand how to use correlation to calculate for

instance regression lines or regression analysis.

And we will see this during the next lecture.

See you.