So in the previous video, we've discussed Logloss and Accuracy. In this video we'll discuss Area Under Curve, AUC, and (Quadratic weighted) Kappa. Let's start with AUC. Although the loss function of AUC has zero gradients almost everywhere, exactly as accuracy loss, there exists an algorithm to optimize AUC with gradient-based methods, and some models implement this algorithm. So we can use it by setting the right parameters. I will give you an idea about this method without much details as there is more than one way to implement it. Recall that originally, classification task is usually solved at the level of objects. We want to assign 0 to red objects, and 1 to the green ones. But we do it independently for each object, and so our loss is pointwise. We compute it for each object individually, and sum or average the losses for all the objects to get a total loss. Now, recall that AUC is the probability of a pair of the objects to be ordered in the right way. So ideally, we want predictions Y^ for the green objects to be larger than for the red ones. So, instead of working with single objects, we should work with pairs of objects. And instead of using pointwise loss, we should use pairwise loss. A pairwise loss takes predictions and labels for a pair of objects and computes their loss. Ideally, the loss would be zero when the ordering is correct, and greater than zero when the ordering is not correct, incorrect. But in practice, different loss functions can be used. For example, we can use logloss. We may think that the target for this pairwise loss is always one, red minus green should be one. That is why there is only one term in logloss objective instead of two. The prob function in the formula is needed to make sure that the difference between the predictions is still in the 0,1 range, and I use it here just for the sake of simplicity. Well, basically, XGBoost, LightGBM have pairwise loss we've discussed implemented. It is straightforward to implement in any neural net library, and for sure, you can find implementations on GitHub. I should say that in practice, most people still use logloss as an optimization loss without any more post processing. I personally observed XGBoost learned with loglosst to give comparable AUC score to the one learned with pairwise loss. All right. Now, let's move to the last topic to discuss. It is Quadratic weighted Kappa metric. There are two methods. One is very common and very easy, the second is not that common and will require you to implement a custom loss function for either XGBoost or neural net. But we've already implemented it for XGBoost, so you will be able to find the implementation in the reading materials. But let's start with the simple one. Recall that we're solving an ordered classification problem and our labels can be found of us integer ratings, say from one to five. The task is classification as we cannot output, for example, 4.5 as an answer. But anyway, we can treat it as a regression problem, and then somehow, post-process the predictions and convert them to integer ratings. And actually quadratic weights make Kappa as somehow similar to regression with MSE loss. If we allow our predictions to take values between the labels, that is relax the predictions. But in fact, it is different to MSE. So if relaxed, Kappa would be one minus MSE divided by something that really depends on the predictions. And it looks like everyone's logic is, well, there is MSE in the denominator, we can optimize it, and let's don't care about denominator. Well, of course it's not correct way to do it, but it turns out to be useful in practice. But anyway, MSE gives us flat values instead of integers. So now, we need somehow to convert them into integers. And the straightforward way would be to do rounding all the predictions. But we can think about rounding as of applying a threshold. Like if the value is greater than 3.5 and less than 4.5, then output 3. But then we can ask ourselves a question, why do we use exactly those thresholds? Let's tune them. And again, it's just straightforward, it can be easily done with grid search. So to summarize, we need to fit MSE loss to our data and then find appropriate thresholds. Finally, there is a paper which suggests a way to relax classification problem to regression, but it deals with this- hard to deal with part in denominator that we had. I will not get into the details here, but it's clearly written and easy to understand paper, so I really encourage you to read it. And more, you can find loss implementation in the reading materials, and just use it if you don't want to read the paper. Finally, we finished this lesson. We've discussed that evaluation or target metric is how all submissions are scored. We've discussed the difference between target metric and optimization loss. Optimization loss is what our model optimizes, and it is not always the same as target metric that we want to optimize. Sometimes, we only can set our model to optimize completely different to target metric. But later, we usually try to post-process the predictions to make them better fit target metric. We've discussed intuition behind different metrics for regression and classification tasks, and saw how to efficiently optimize different metrics. I hope you've enjoyed this lesson, and see you later.