So, let's apply it. Once again, because actually the algorithm is not simple, but it is not there in Rattle. So, once again, I have to take you through R, but it's fairly straightforward using it, but let me show you how it works. We're going to use a dataset from the package, rsample package. In that, we are trying to predict whether the employee will leave the company or not. That's the objective. So, you've got all kinds of variables there. First of all, the attrition level yes or no, leaving the job or not, what's the job level? Whether business travel, whether they do or not, department they work in, education, marital status, etc. So, many organizations are interested in trying to predict the attrition levels so that they can at least target how many people they want to hire, because you can't just wait till people have left to hire, especially for large organizations. You will need to be able to predict the total attrition level well in advance. Sometimes it may be high because people are retiring or maybe you can't hold onto your salespeople and they're all leaving. So, here's what we're going to do. Let me very quickly show you. The first part is just having a nice way to load your data. Pacman loads the required packages. One of the loaded packages, rsample, also loads the data. Then there are 31 variables, we will use only nine variables, of one target variable which is the attrition level, eight input variables. The variables I'm going to use are; BusinessTraveler, Department, Education, EducationField, Marital Status, TrainingTimes, StockOption. We're going to get the subset of the data, set a random seed as usual to fix the random numbers. Then this is a little trick we're going to use, convert some of these values into categorical levels and choose the variables which are numerical. I'm going to convert into categories, that is done automatically here. We will do what you'll have now become very good at, identifying which is getting a sample and using that to partition your data into the dependent and the independent variables, then we will do prediction. So, first, we will predict only based on four of these variables, and you will see the accuracy level will be about 68.59 on the training set, and it is about 68.2 percent on the test set. Remember, we have split the data. Now, one lifting twist I have to tell you. So, how do we predict? Well, you set a probability cut-off level, and we will see this only in session 8. How do you know somebody would leave? Because what name base was giving you is a probability of attrition. Now, it's your choice at what probability you will predict this person to actually leave. So, for example, you could say the probability is 50 percent. So, if the model says, "This person will leave 50 percent'', we predict this person will leave. Because in this particular case, that with so few people who are leaving, we kept the cut-off level as 20 percent, so that there is some data. Otherwise, it will predict everybody as not leaving. In session 8, we will see how to fine- -tune your model by changing these probabilities. So, once again, there is something called a probability cut-off, which you use to divide the class into a group which will leave and a group which will not leave. This cut-off value in this case, is kept at 20 percent, because they have very few data points on attrition. In general, you'll have to play around with it to decide what should be the right value, and you'll have to wait for two sessions to learn that part. Next, as before, I'm going to add more variables to this data and look at the accuracy improvement into inputs, as to 72 percent on the training set and 78 percent on the test set.