The term high variance is another
historical or technical one.
But, the intuition is that,
if we're fitting such a high
order polynomial, then, the
hypothesis can fit, you know,
it's almost as if it can
fit almost any function and
this face of possible hypothesis
is just too large, it's too variable.
And we don't have enough data
to constrain it to give
us a good hypothesis so that's called overfitting.
And in the middle, there isn't really
a name but I'm just going to write, you know, just right.
Where a second degree polynomial, quadratic function
seems to be just right for fitting this data.
To recap a bit the
problem of over fitting comes
when if we have
too many features, then to
learn hypothesis may fit the training side very well.
So, your cost function
may actually be very close
to zero or may be
even zero exactly, but you
may then end up with a
curve like this that, you
know tries too hard to
fit the training set, so that it
even fails to generalize to
new examples and fails to
predict prices on new examples
as well, and here the
term generalized refers to
how well a hypothesis applies even to new examples.
That is to data to
houses that it has not seen in the training set.
On this slide, we looked at
over fitting for the case of linear regression.
A similar thing can apply to logistic regression as well.
Here is a logistic regression
example with two features X1 and x2.
One thing we could do, is
fit logistic regression with
just a simple hypothesis like this,
where, as usual, G is my sigmoid function.
And if you do that, you end up
with a hypothesis, trying to
use, maybe, just a straight
line to separate the positive and the negative examples.
And this doesn't look like a very good fit to the hypothesis.
So, once again, this
is an example of underfitting
or of the hypothesis having high bias.
In contrast, if you were
to add to your features
these quadratic terms, then,
you could get a decision
boundary that might look more like this.
And, you know, that's a pretty good fit to the data.
Probably, about as
good as we could get, on this training set.
And, finally, at the other
extreme, if you were to
fit a very high-order polynomial, if
you were to generate lots of
high-order polynomial terms of speeches,
then, logistical regression may contort
itself, may try really
hard to find a
decision boundary that fits
your training data or go
to great lengths to contort itself,
to fit every single training example well.
And, you know, if the
features X1 and
X2 offer predicting, maybe,
the cancer to the,
you know, cancer is a malignant, benign breast tumors.
This doesn't, this really doesn't
look like a very good hypothesis, for making predictions.
And so, once again, this is
an instance of overfitting
and, of a hypothesis having
high variance and not really,
and, being unlikely to generalize well to new examples.
Later, in this course, when we
talk about debugging and diagnosing
things that can go wrong with
learning algorithms, we'll give you
specific tools to recognize
when overfitting and, also,
when underfitting may be occurring.
But, for now, lets talk about
the problem of, if we
think overfitting is occurring,
what can we do to address it?
In the previous examples, we had
one or two dimensional data so,
we could just plot the hypothesis and see what was going
on and select the appropriate degree polynomial.
So, earlier for the housing
prices example, we could just
plot the hypothesis and, you
know, maybe see that it
was fitting the sort of
very wiggly function that goes all over the place to predict housing prices.
And we could then use figures
like these to select an appropriate degree polynomial.
So plotting the hypothesis, could
be one way to try to
decide what degree polynomial to use.
But that doesn't always work.
And, in fact more often we
may have learning problems that where we just have a lot of features.
And there is not
just a matter of selecting what degree polynomial.
And, in fact, when we
have so many features, it also
becomes much harder to plot
the data and it becomes
much harder to visualize it,
to decide what features to keep or not.
So concretely, if we're trying
predict housing prices sometimes we can just have a lot of different features.
And all of these features seem, you know, maybe they seem kind of useful.
But, if we have a
lot of features, and, very little
training data, then, over
fitting can become a problem.
In order to address over
fitting, there are two
main options for things that we can do.
The first option is, to try
to reduce the number of features.
Concretely, one thing we
could do is manually look through
the list of features, and, use
that to try to decide which
are the more important features, and, therefore,
which are the features we should
keep, and, which are the features we should throw out.
Later in this course, where also
talk about model selection algorithms.
Which are algorithms for automatically
deciding which features
to keep and, which features to throw out.
This idea of reducing the
number of features can work
well, and, can reduce over fitting.
And, when we talk about model
selection, we'll go into this in much greater depth.
But, the disadvantage is that, by
throwing away some of the
features, is also throwing
away some of the information you have about the problem.
For example, maybe, all of
those features are actually useful
for predicting the price of a
house, so, maybe, we don't actually
want to throw some of
our information or throw some of our features away.