The residual analysis is a vital part of any statistical method.

In the previous two videos, you learned when and how to perform an ANOVA analysis.

The P-value we use in a main analysis is only

valid if the assumptions are satisfied.

In this video,

you will learn how to validate these assumptions using a residual analysis.

Remember that we were wondering in the moisture content in coffee beans

differs between the four machines it can be produced on.

Moisture was our numerical y variable, and machine was our categorical x variable.

We performed an ANOVA analysis and these were the results we obtained.

Our ANOVA analysis gave us a p-value, which shows a statistical significant

difference between the average moisture percentages of the machines,

because the p-value is below 0.05.

This difference in the means can also be seen in the individual value plot,

as the line connecting the means is not horizontal.

Let's take a look at the R squared.

This shows that the influence factor machine explains 26% of

the variation in the moisture percentage.

However, before we can completely trust these conclusions,

we have to validate the assumptions underlying the ANOVA.

These checks are called the residual analysis, and

this is the last and final step of your ANOVA.

As you probably remember, ANOVA consists of three steps in total.

To validate the assumptions, we will check if the residuals are normally

distributed and if there are any outliers or other irregularities present.

But what is a residual?

Let's take a look at the data to answer this question.

Every dot in the graph is one measurement.

We also know the value that we would expect from a measurement for Machine 1.

That is the estimated mean.

So there is a difference between the measurement and our expectation.

This difference is not explained by our influence factor machine.

It is left over variation, and this difference is called the residual.

The residuals are calculated by subtracting the expected value from

each observation.

In the case of ANOVA,

this expected value is the mean output over the relevant machine.

This is our data in a time order.

Our categorical variable has four different groups and

the red lines are the group means.

Then the residuals will look like this,

with the mean of the residuals equal to zero by construction.

Okay, let's go back to our moisture example and

let's perform a residual analysis with Minitab.

Now, pause the video, load your data into Minitab before continuing.

Once you loaded your data into Minitap,

this is what your data file would look like.

You have Machine 1 in the first column, Machine 2, Machine 3, and Machine 4.

Note that I already stacked my data into a column Moisture, and Machine.

Okay, let's look at our residual analysis.

We can find this in our ANOVA menu,

which was under Stat > ANOVA > One Way.

Well, maybe you still have it there, but otherwise, fill in your response,

which is moisture, and your factor, which is machine.

Your residual analysis can be found under the options graph,

and then half way, it ask you for residual plots.

If you click on the four in one, you get all plots once.

Furthermore, you can also unclick the interval plot, because we don't need it.

Well, that's it.

OK > OK, and then this is your four in one plot.

Let's study the four in one plot.

Remember, that we needed to check two things in the residual analysis.

Let's start with the normality assumption.

These can be checked in the probability plot.

Are your residuals normally distributed?

Yes, they are.

Now, let's have a look at the second assumption That there are no outliers or

irregularities in the residuals.

To check this assumption, we take a look at the four and one plot again.

But now, we look at the line graph.

We see that there are no outliers or strange patterns present.

This means that this assumption is also satisfied, and

that the original analysis is valid.

Let's have a look at another example, and assume that these are our residuals.

We see in the probabiity plot that the residuals are not normally distributed.

And in the line plot, we see outliers in the residuals.

This means that if these were your residuals,

the assumptions of the ANOVA are violated.

This implies that the conclusion in step two would not have been valid,

or at least they're not very precise.

If this is the case, you can perform a Kruskal-Wallis analysis.

In summary, in this series of videos I have explained that the ANOVA is

a technique to test whether a categorical influence factor X has

a significant effect on a numerical Y.

After organizing your data in the first step You run the analysis in

the second step and interpret the p value for significance,

and the r squared for importance.

In the third step, you will validate your conclusions by checking whether

the residuals are normally distributed.

And whether they don't contain any outliers or other strange patterns.