So to recap, we said that our next objective is to
define the types of tasks 'T,' performance measure 'P,' and experience 'E.'
Let's start with different types of tasks that we encounter in machine learning.
Remember the diagram that we used to visualize
interactions between an agent and its environment.
They may be many types of environment, for example,
observable or partly observable,
stochastic or deterministic and so on.
The agent can also interact with the environment in different modes.
For example, the agent can interact with
the physical environment in real-time via sensors.
This type of learning is called online learning.
A different setting is created when the agent
doesn't have a real-time access to the environment,
but only has access to its snapshot stored as data on the computer disk.
This is called offline or batch learning.
The diagram suggests that there are two general types of machine learning tasks.
The first type is formed by what they call Perception Tasks.
For these tasks, the agent should learn from
its environments to perform one specific predefined action.
For example, you might have a classification agent that
classifies all images into hot dogs and not hot dogs.
Because the action of the agent in this case is
a fixed function of the environment as perceived learned by the agent,
we call these tasks perception tasks.
Another class is formed by tasks where an agent should
pick an optimal action from a set of all possible actions.
Let's call such tasks action tasks.
Similar to Perception Tasks,
they involve as sub task of learning from the environment,
but the final goal is to find an optimal action among many.
Therefore, Action Tasks are generally more complex than Perception Tasks.
Now, let's talk about the Performance Measure 'P.'
Normally, the choice of the right performance measure is specific to the task 'T.'
For example, let's consider a task of binary classification,
such as the above example of a hot dog versus not hot dog classifier.
One measure of the model accuracy can be the Error Rate,
defined as the ratio of incorrectly classified examples to the total number of examples.
The Accuracy Rate, in this case,
would simply be one minus the error rate.
The Error Rate can also be viewed as an estimation of the rate of expected 0-1 loss.
That gives the error one for
each misclassified example and zero error for correctly classified example,
but such performance measure is inconvenient in practice because it
might change this continuously when parameters of the model are changed continuously.
In other words such performance metric would
be a non-differentiable function of its parameters,
so that no gradient-based optimization methods could be applied in this setting.
A smooth and differentiable alternative to
the Error Rate as a performance measure is to consider
instead the probability or low probability
of the observed data under assumptions of the model.
This leads to a differentiable objective function for
the process of tuning all the parameters to the data,
which can be efficiently done using gradient based optimization software.
If we deal with Regression problems,
one particular performance metric is a mean square loss,
defined by this function.
Here, Y's are the true observed outputs and
'Y' hats are model estimates for these outputs.
The sum runs over all observations,
so that the Mean Square error is zero only when
all data points are matched by a model exactly, without any errors.
An alternative performance metric for
regression is suggested by the so-called L-1 loss shown here.
All it does is replacing the squares over [inaudible] by their absolute values.
Depending on the specific details of the Regression problem,
it's either the former or the latter choice that
can be most appropriate in each given case.
Finally, let's take a more careful look at the part
of the definition that deals with the experience 'E.'
Generally, we can distinguish three major types of learning from experience.
The first type is called Supervised learning.
The reason it's called Supervised learning is because this setting
assumes that to train the machine learning algorithm a human
teacher gives it a set of examples of
what the teacher expects to get from the algorithm after the training is done.
The set consists of pairs of attributes and right labels.
You know that by X's and C's respectively.
And what it means is that for every data point with attributes X,
a machine learning agent can compare its own model based label C hat for
this data point with the true label C. The other type is called Unsupervised learning.
The reason it's called Unsupervised is that here,
only examples with attributes X are given,
but there is no teacher to provide class labels.
In this case, the agent has to come up with its own labels C hat without
any reliance on teacher provided through labels C. And finally,
third type of learning from experience is called Reinforcement learning.
This case is in the sense intermediate between the two previous cases.
There is a teacher, but this teacher only gives
a partial feedback on the performance of the machine learning agent.
The feedback comes through rewards received by the agent when performing its action.
Actions that maximizes certain objective which are known to the agent are
awarded by higher rewards while other agent actions get smaller or even negative rewards,
that is they are penalized.
The classification for machine learning algorithms into the supervised, unsupervised,
and reinforcement learning is probably
the most fundamental ontology of machinery learning.
A vast majority of practical algorithms that are
amenable to machine learning techniques fall into one of these categories.
In the next video, we'll drill more into
more specific machine learning tasks within this categories.