[MUSIC]

Hi guys.

Welcome to the 24th lecture of the course, Biological Diversity, Theories, Measures,

and Data Sampling Techniques.

Today we will talk about statistics applied to biodiversity,

it's the first part.

During this course, we talk a lot about indices and measures of biodiversity, and

we mentioned some statistical concept and

tests that can be applied to the study of biological diversity.

Today, I will show you some of them, and how they can be used.

So let's start from some basic concepts.

What are mean, median, and mode in statistics?

When we describe quantitative data is often necessary to perform some

fundamental analysis in statistics, where to measure out a set,

we need to analyze their central tendency and their distribution.

The arithmetic mean is a type of mean that is most commonly used for

the analysis of the central tendency and

what to which in determining is usually referent in statistic.

It is used as a summary of the data site or the measurable phenomenon,

for example, the i of the trees.

This means it's just calculating by adding the different values available xi,

which are divided by their total number n.

The formula is simply the sum from 1 to n of xi divided by n.

In the case where it's necessary to calculate the mean from a group of data

in frequency table called f.

This may be calculated from the formula, the sum of fx divided by n

where x is the frequency and therefore represents the frequency class, for

example, 5 times of an i of 12.5, not 3 times of an i of 9, etc.

Median is the value assumed by statistical units that are located

in the middle of the distribution.

The median, in fact, is the central value of a series.

So it can be calculated by distributing the data in ascending order and

identifying the value above and below which there is an equal number of data.

When data are in equal number, the median is constituted

by the average of the two values that formed a central pair.

That mode, instead, is the maximum frequency value, and

is often represented with the v0 sign mode.

In other words, it is the value that appears most frequently.

A distribution is unimodal if it emits only one mode of value.

It is bimodal if it emits two.

That is, that if there are two values,

they appear both with a maximum frequency in the distribution date.

If the graphs are useful to the term in the model class,

identifying the maximum high interval, which is the maximum point of the curve.

The class with the highest mean density,

which corresponds to the high of the histogram is modal.

In the particular case of the normal distribution, also called Gaussian

distribution, the mode coincides with the mean and the median.

The normal distribution quote is used for me to analyze the distribution of data,

and we can use the probability density function.

It represents the density of a continuous random variable and describes the relative

probability that the variable falls within a range of values.

The normal distribution is considered the ultimate example of continuous probability

distribution because of its role in the central limit theorem.

The normal or Gaussian distribution is a continuous probability distribution

which is often used as the first approximation to describe random

variables with actual value, that tend to distribute around a single mean value.

The graph, or the probability density function associated to a normal

distribution is a metric and has a bell shape known as normal curve or bell curve.

The normal distribution depends on two parameters,

the mean and the variance, and is indicated in the following way.

The normal curve is symmetrical and its axis of symmetry passes at exactly through

the intermediate point of the adjacent and

corresponds to the mean, the median, and the mode of the data.

On each side of the curve, there is an inflection point.

The distance of the point of inflection from the central axis,

the mean, can be used as you need of standard deviation or distance.

As can be seen by observing the figure in the picture,

each segments represent a unit of standard deviation.

And if we consider that the total area below the curve represent

100% of the data, they are detected by each segment of +- 1 as d or

standard deviation correspond respectively to 68%,

95.44%, 99.74%, etc.

Since 95% of all observation will be within the range bounded by 1.96 +- sd,

and 99% of all observation will be within the range bounded

by 2.58 +- sd, the probability that any observation

extracted a random from a sample falls outside the two intervals.

They're limited by this range, is respectively p 0.05, or p 0.01.

These values are the basis for the validation of statistical significance,

and they are what is reported usually in any scientific paper.

It is defined the probability distribution of zed.

It is usual, in fact,

to determine the statistical significance of a comparison test between the means or

the medians of two samples, defining a null hypothesis, H0.

This is simple, that the samples belong to the same population, and

that their differences are due to shunts, rather than a systematic overall sampling.

While the hypothesis H1 represents a possibility,

then the samples belong to a specific population.

If other variables, such as, for example, the differences between male and

female individuals are introduced by providing two direction for

test the new hypothesis, the test is defined in two-tails test.

In case you want one just to see whether the new hypothesis is confirmed or if it

directed toward a single verification, H1, it is called one-tail test.

Each statistical tests the level of significance is measured by p

0.05 in correspondence to the value zed 1.65.

That is less accurate than a two-tailed test for

which zed is 1.96 or a level that is p 0.01 or

p 0.001, which represent 5%, 1% and

0.1% of the values under the normal curve.

So there are three levels of significance, significant, very significant,

and highly significant.

In statistical test, a value higher than zed 1.65 in one tail or

1.96 in two tails is statistically significant, and

does allow rejecting the new hypothesis that established the, for example,

that the two samples come from the same population.

We can say that the difference of the mean values between the samples is

statistically significant, p less 0.05 or very significant where is

p is less than 0.01, or highly significant where p is less than 0.001.

And therefore, it is possible to reject the new hypothesis and

to accept the alternative hypothesis H1.

Since the one-tail test is less stringent, unless there are specific reason,

it is recommended to always use two-tails test, remembering that a significance in

the two-tails test automatically means a significance in one-tail tests.

So that's all for the first part of statistics applied to the study of

biological diversity, and see you at the second part.