Now let's talk about the bias-variance tradeoff and its relationship to complexity and error as well as overfitting vs underfitting. Now the learning goals for this section will be a quick recap on model complexity versus model error. We'll define the terms bias and variance and discuss their relationship to model complexity. We'll then discuss different sources of model error and how that relates to bias and variance. And then finally, we'll discuss the bias-variance tradeoff, how that relates to complexity and how we can find the optimal balance between bias, variance and complexity versus error. So from last time, we recall when building a model, we want training and test errors to be small, right at the middle of this curve. So we discussed how if we were too far out to the right, then we probably have too complex of a model. And then we end up with the training error, that's very small because on our training set we can continue to reduce error. But on our test set, our error is very large. On the other hand, if we're all the way out to the left, then both our training error and our test error will ultimately be large. So we want to see on our test set, can we continually decrease the error by making it more complex. And besides just coming up with a more or less complex model, there's also the idea of regularization, which I'm just going to talk to now. But we will have a whole video on this after this coming video on bias and variance. Now coming back to our polynomial function that we saw earlier, this problem can be framed as a choice between different model complexities. To recap, our goal here is to fit a polynomial model to each one of our sample points, these blue dots. And something that we have here that we generally don't have is also the true function which is going to be shown here in blue. The model can involve polynomials of different degrees, 1st order, 4th order 15th order. And the error function will continuously decrease as we increase our model order if we're just looking how well we can fit exactly to each one of our sample points. Even though this is actually deviating from our actual underlying function. So another way to find the best model is to ask how well is it actually going to generalize? When we look out to the left, we are probably doing poor at both training and predicting because our model is too simple. In the middle, this is the just right area because we're getting a low error in terms of training as well as predicting for our holdout set. And then finally out to the right, we may do well on training because we fit exactly to each one of the data points. But when we get a new data point that will probably be somewhere along the lines of this blue line, we'll do a poor job of predicting that value. Now, I want you to keep this in mind as we begin to define the terms bias and variance in the next slide. So here we see a graph that's going to show us bias versus variance with low and high bias and low and high variance. And starting off with bias, bias is just going to be a tendency to miss. So we look at the bottom left corner. Then we see that we are in the high bias low variance. So we are consistent but we are consistently missing our target. That's our tendency to miss. Variance is going to be the tendency to be inconsistent. And that's going to be the top right. Right, top right is going to be low bias but high variance. So low bias means you won't have the tendency to miss but you'll be fairly inconsistent. And we are at times going to hit our target, but we're kind of all over the place here. Ideally, we want to be in this top left outcome where we have highly consistent predictions that are also close to perfect on average. And we want to think about this tendency of our models expectation of out-of-sample behavior over many training set samples. So something like cross-validation would refer to our tendency to have high or low variance as well as high or low bias given our hold out sets. So that's going to allow us to understand whether or not we have high bias, high variance, both high or both low. So when we create a model, we have three essential ways by which our models can produce errors in their predictions. The models can just be wrong. This will generally refer to models that are not doing well in identifying the relationship between our feature and our outcome variables. This will generally be a bias model where our predictions are fairly consistent. But due to the model choice not properly defining the relationship between the X and Y variable, we are consistently getting the wrong prediction. Now our model can also be unstable. And this will generally relate to models with high variance. And we can think two models that are two perfectly identifying the relationship between X and Y. So it's actually incorporating random noise besides for the actual underlying function. And in that effort to perfectly define that relationship, may become unstable in their predictions. And then finally, there's just unavoidable randomness. All our models are going to depend on real-world data where there will always be a level randomness within each one of our data points that we collect that could not be perfectly predicted. The idea is to find a model that finds the actual relationship while avoiding accidentally trying to incorporate this information that is just random noise. Now, let's dive deeper into each one of these different causes of error for our modeling. First, we're going to discuss high bias and this will be the tendency for a model to miss the true values when trying to predict. Now this will happen due to either missing information or overly simplistic models. Think we are biased to the simplicity of our model or we are biased to the misrepresentation of our data given that missing data. And this will happen when we missed the real patterns altogether. And we can associate high bias with the concept of underfitting or there not being enough complexity in our model. Now we look at high variance and this will be the tendency for our predictions to fluctuate dramatically. And this is generally characterized by very high sensitivity of our output to small changes in our input. If you think about our overfit models that were too complex, we notice how a slight change in our X values can end up with a drastically different outcome given the high fluctuation of our model of say a polynomial 15 model. So if you think about variance, variance will thus be associated with overfitting our training model. Bias is going to be underfitting our bias to a simple model. Overfitting will be associated with variance where we'll have this high fluctuation within our model. And at the end of the day, we must keep in mind that we will never be able to perfectly model the very large majority of real-world data. We thus must be comfortable with some measure of error in even the best possible models. Now let's talk about the tradeoff between bias and variance. So summary of this bias-variance tradeoff will start off with the idea that model adjustments that decrease bias will often increase variance, and vice versa. We either bias to a very simple model or we can have high variance or high sensitivity due to an overly complex model. Therefore, this bias-variance tradeoff is analogous to the complexity trade off. Again, as we see here in the graph, we can assume a high bias is going to lead to too simple of a model. And high variance is going to lead to too complex of a model. Now finding the best model means choosing that right level of complexity as we discuss with cross-validation in order to look at a holdout set, in order to see are we under fitting, are we overfitting or did we get it just right? So you want a model that's elaborate enough to not underfit the underlying relationship but not so exceedingly elaborate that it ends up overfitting. So we search for model that is going to be able to describe that feature target relationship, but not so elaborate that it fits to spurious patterns or patterns that shouldn't be measured as we saw with the polynomial degree 15 model. Now bring this back to our polynomial example. The higher the degree of a polynomial regression, the more complex that model is. So that's going to be higher polynomial means lower bias higher variance. At lower degrees, we noticed that we saw visual signs of bias. Predictions are too rigid to capture the curved pattern in the data. At higher degrees, we able to see visual signs of variance. Predictions were fluctuating wildly because of the model's high sensitivity. And again, the goal is to find the right degree such that the model has sufficient complexity to describe that underlying pattern without overfitting. So looking back at our polynomial degree 1, 4 and 15 and thinking about that bias variance tradeoff, all the way out to the left we have high bias or bias to a very simple model, but we have low variance. In the middle, we have that just right where it's not too rigid of a model, but it also doesn't overfit to spurious correlations. And then finally all the way out to the right we have this very low bias, right? It's fitting to the model exactly. It doesn't have this rigid model, but it has very high variance and that it'll overfit, can jump back and forth as you move along the x axis. So let's recap what we learned here in this section. We started off with the reminder of complexity versus error in order to dive into bias and variance. We then discuss bias and variance of model, with bias representing a model that is rigid and unable to properly model the relationship of X and Y. Whereas variance represents models that tend to have high sensitivity given minor changes in our input variables. We discuss different sources of model error, how a simple model can be too rigid and not have enough complexity to describe the underlying model. And high variance or fitting to well and fitting to that random noise of our data would be the other side of that spectrum. And with that, we also highlighted that for all models, we'll have to accept some level of error due to this random noise. And finally, we discuss the bias-variance tradeoff and how it relates to this relationship between complexity and error. In the next section, we'll start providing some techniques to ensure that if we have too complex of an error, how we can bring down that complexity using something called regularization. I look forward to seeing you there.