[MUSIC] All right, so a recap of where we are. We talked about supervised learning, talked about deriving rules and arranging them into trees. We talked about how those trees are susceptible to over-fitting. And we talked about one way of combating overfitting is to use ensembles of weak learners and combine their results to form a strong learner. And gave one example of this in Random Forest. We also mentioned that one disadvantage of these ensembles is they become pretty difficult to interpret, right? Because it's a big vote among a bunch of smaller models. And so, it's not quite clear where the decisions are coming from. And random forests had one way of attacking this in that as you build a random forest it was producing an importance measure for the variables. All right? So, if particular attributes were more influential in the decisions you would get a measure of this as one of the outputs. And then we talked about k Nearest Neighbors, which was a change in that it used numeric attributes. And continuing that, we're gonna talk about gradient descent and some applications of it. Okay, so you remember the three components of a machine learning solution is the representation of the model, the evaluation of that model, how you tell how well it's doing, and then how you optimize the process. The gradient descent is an optimization method that's applicable in a lot of different contexts that we'll touch on at the end. In a nutshell, what you're doing here is expressing your learning problem in terms of some kind of a cost function that you want minimized. And starting at some initial point, you're going to roll the ball downhill until it settles in a trough and so this process describes how to walk downhill. It's not really a roll, right. You're going to take steps. And that turns out to be one of the difficulties with Gradient Descent. Okay. And so some situations you can sort of design, offer a guarantee that the minimum you find will be the global minimum, other times you don't. And so getting trapped in local minimum for more complex problems, that we won't go into too much detail on, is another potential weakness of this method. Okay. So the problem we're gonna be thinking about here is no longer classification. Although we'll mention how you can use this for classification at the end, but we're gonna think about just regression. Okay, and so regression can still be used as a prediction task and we think about sort of fitting the line to data, but the point is that you might use that line in order to make predictions of the outcome value, the response variable, given the inputs down the road, okay. So regression line is a predictor. So rather than a discreet class variable, we'll maybe talking about a response variable that's continuous. In this case, we'll [LAUGH] take the very simple case of just a single input variable, but throughout this we're going to be thinking about an entire vector of input variables and actually, you can think of it as a row in a database. Okay. [MUSIC]