So, doing this type of look up is difficult. Too much of data, too many features. So, here's where name base works. It works when data is too much or when number of features is too much. The idea ,you will see, use an assumption called the independence assumption. So, instead of looking at all features together, we take them apart one feature at a time and we use this idea. So, I'll explain this slowly once and then then we'll apply it using R. In the table below, each variable has a different number of levels. The data summary shows in this table how many did not buy and how many did buy the offering, right, and the total. For example, of the five who own houses, right, four bought. So, you can see that, okay? The total five. Four bought and one didn't buy, right? Out of retirees, level one, you can see four bought and two didn't buy. So, if you just add up across this, you would see that 6 people have not bought and 14 people have bought out of 20, right? So, you can see that, 6 have bought, 14 did not bought, okay? What's a level? Level is depending on the kind of classification. So, job has got four levels. Marital status has got three levels. Education has got three levels, okay? Default loan has got only two levels, yes or no? So, for each level, we know, if he is not defaulted on a loan, 14 being have bought and 4 have not bought. Whereas if they defaulted on a loan, two did not buy and zero bought. So, this is a partial summary by feature and by level of whether a customer has bought the product or not. From this, we come up with prior probabilities, right? Probability by a 70%. That's easy, because 14 out of 20 are bought. And probability not buy is 30%, because 6 out of 20 have not bought. Okay, the next is more interesting. Probability, the jobs is retired and buy is no. So, out of the six people who did not buy, you know that two people were retired, two had unknown jobs and two had blue-collar jobs. So, the probability that the person is retired given he didn't buy is 33%. Probability he didn't buy out of the six and his unknown job is unknown is 33%, and so forth. Probability, he didn't buy, he's a student is zero. I can do this similarly for if you bought the product. So, out of the 14 who bought the product, 4 were retired, 2 the jobs status was unknown, 4 of blue collar, right, and 4 were student. So, given, and that's what that horizontal line means, given that they bought the product, there's a 28% chance this person is retired, there is a 14% chance that person's job is unknown. There is a 28% chance the job is blue-collar and there's a 28% chance that this person is a student, right? So, given buy is no or yes, I can construct this remaining price and I believe you to look at this price. So, basically this is marital status given no. This is marital status given yes, they're bought. This is education status given they haven't bought and this is education status given they bought, and so forth. Now, how do you apply the Bayes method? Technically, what you are computing is say I gave you the marital status is single, education is tertiary. Remember, this is the person for whom I have to make a prediction. Job is blue-collar and no default, and no house. So, I want to know whether this person will buy given this data. That's what the first line says, okay? What the Bayes rule says that this probability is given by the ratio of two quantities, the first quantity is what's the probability the person will buy and matches all these data points? So, what's the probability that person will buy and marital status single and education is tertiary, and job is blue-collar and no defaults and no house. You divided with the probability of the data, right? So, the data probability is marital status is single, education is tertiary, job is blue-collar, no default, no house, right? So, basically you're saying, what's the probability buy is a yes and the person matches everything divided by what's the probability the purchase matches everything. This ratio sort of narrows down into that part of the data, which matches your person. And in that little data point, you're searching what's the probability they would have bought, right? So, if you get another data point and that narrows down to here, we can search there and see how many people have bought. That's the idea of the Bayes rule, okay? There's a pain. There's a pain, because computing this probability and this probability is time consuming when there are a lot of features. So, what we do is we play a trick and that trick is assuming independence, right? So, we really need the joint probability. But we pretend for a minute that to find this joint probability, we can decompose it into pieces and multiply them together. That's the independence assumption, right? So, we want the probability buy is yes and marital status is single and education is tertiary and job is blue-collar and no default and no house. We decompose it into this formula, which holds onto the independence assumption. What's the probability they will buy times what's the probability they are single given they buy times the probability the education is tertiary given they're buy times the probability job is a blue-collar given they will buy times probability they will not default on a loan given they buy times probability housing they don't own a house given they buy, right? So, now each of these little components we have already computed in the priors, if you go back, right? So, for example, probably by equal to yes is 70%, you know that. And given buy equal yes, you can go check, 50% of the people assume that. You can go back and check that given buy is equal to yes, education is tertiary, the chances are 28.5%, right? Finally, given buy is a yes, the probability the housing, they don't own a house is 71%, right? So, this probability is approximated by this product and what it says is this joint probability is now the product of all these probabilities and turns out to be 0.029. Forget all the decimals, if you want, but it's about 2%, okay? You say, it's very small. But then remember we still have to find out, right? You have to divide it by the probability of the data, because I still need to know what's the probability of this joint event, okay? To do that, we play a trick again. Okay. The big difference between this and this is in this case, it's probability is yes. In this case, it's probability is no. So, again, using the same trick, we can predict the probability, the buy is no marital statuses singular, etc., etc., etc. And using the same rule, we get this probability equal to 0 and the reason you get it equal to 0 is the third probability is 0. There is nobody in this data set who had a tertiary education and buy was no. And so that probability is 0. So, basically, probability buy equal to no and the data is 0, okay? So, now, you have two events. We know probability buy equal to yes and the data. Probability buy equal to no and the data. When you add it up, you get the probability of the data, right? Because it is a splitting the data into two parts. This part is the probability buy equal to yes and the data. And this part is a probability buy equal to no and the data. You add it up, you get the probability of the data. The sum of these two probabilities gives you the probability of the data. That's all I'm saying, right? And therefore, remember this probability was 2%. And this property was zero, the sum is 2%. Now, put everything together. The probability you will buy given the data is the ratio of these two quantities. Probability buy given data is probability buy and data divided by probability of data, right? And therefore, you calculate, you have approximately calculated it and this issue is one. I mean, you knew it was this one person who saw it. So, again, the idea being, we have narrowed down and said, this probability is 2%, but the probability of data is also 2%. And therefore, the ratio given this data is one. And so this method will predict that given this data of single person with tertiary education with a blue-collar job and no default, and no house will accept your offer. I will leave this to work it out, but let's say I had another buyer and this person is single and has secondary education only. We don't know the rest of the data. How will we compute? Well, your data says is single and has secondary education. So, this is a port. So, I want to know what's the probability he'll buy. So, we use the rule probability buy probability single given buy probability secondary given buy that's your numerator. The denominator is the same thing plus buy is no, single and secondary, right, the same trick. And if you do the calculation, you need to count the 87.1%. I leave you to work out the detail. Now, how do I know it is good enough? Actually, you go back to your data. If you go back to the data, right, there are six customers of this given type. Out of those 20, 6 of them were single and secondary education. Out of these six customers, five in your data bought it. So, if you had applied the Bayes rule exactly the way it is, you would have got 5 divided by 6, 83.3%. So, what I'm trying to say is the naive Bayes method remember is an approximation, but it is a reasonable approximation and it's a quick way of predicting which class something belongs to.