Welcome back! We hope you enjoyed learning where machine learning fits in the larger business landscape. This lesson will cover the fundamentals of machine learning to put us all on the same page. By the end of this lesson, you'll be able to differentiate between regression and classification use cases and quantify what is model success. So what is machine learning? I'm going to offer two definitions, one that's more general and the other that's a little more technical. The general definition is as follows: Machine learning refers to a broad array of techniques that learns patterns in data, without being explicitly programmed. Take the example of churn analysis that we discussed in the last lesson. This is where we want it to send email offers to customers who might leave our platform. A machine learning algorithm would use past user data to find patterns between variables, such as the number of purchases in the past month and whether a customer has churned. A more technical definition is as follows: Machine learning is a function that maps features to an output. In other words, with a number of different input features such as customer purchase history, machine learning maps those inputs to the output. In our case, the probability that a customer will churn. Now that we know that machine learning refers to tools for finding patterns in data, let's talk about a few different types of machine learning. The two types of machine learning that cover the vast majority of the use cases I see are supervised and unsupervised machine learning. There's also reinforcement in semi-supervised learning but we won't cover those in this course. Supervised machine learning is a type of machine learning where you have labeled data points and your task is to predict that label. In our churn analysis example, this is a supervised learning problem, because we know the output we're trying to predict, whether a customer churns or not. Supervised learning can be further broken down into two groups, classification and regression. For classification tasks, the goal is to predict a discrete set of categories. In churn analysis, we're predicting whether a customer churns or does not churn, so we're categorizing our customers into two categories. This is an example of binary classification where we have two categories we're predicting. There's also multi-class classification, where more than two categories are used. The other type of supervised learning problems are called regression. In regression tasks, we predict a continuous value. Think of the question of financial forecasting, where you want to predict the revenue of the company in the next quarter. This is an unbounded number, not a distinct set of categories. With regression, we're predicting a continuous range of values, even values we haven't seen before. Now that we know the two types of supervised learning, regression and classification, let's talk about unsupervised machine learning. In unsupervised machine learning, we don't have a label to predict. Rather, we're learning the natural structure of the data. One example of this is clustering. Going back to the customer churn example, we might want to take a look at the natural clusters that form within our customers, revealing different customer segments that we could better serve. Let's see how machine learning can be applied to our Fire Calls dataset. In the remainder of this module, we'll predict the response times to calls using very simple features such as the call type and location of the call. This is a supervised machine learning problem. It's also a regression problem because we are predicting a continuous variable, response time delay. But before we start building our model, let's talk about how we evaluate success in a regression setting. In this case, we train a model, predict response times for all of our data, and look at the difference between the predicted value on the one hand and the true value on the other. This is known as our error, where the better our model performs, the lower that error will be. But how do we quantify this error? In the regression setting, we typically use a metric known as RMSE or Root Mean Squared Error. Let's dive into the details. You won't need to recall the specifics of these formulas unless you end up developing these models in your career. Just remember, the lower the RMSE, the better. We start off by taking the difference between the true value and the predicted value for a given data point, then we square that difference. You might be wondering why we squared the difference. But imagine I over-predict one call by 10 minutes and under-predict another by 10 minutes, we wouldn't want these two errors to cancel out, so we are going to square the error. Next, we need to compute the squared error for all n of our data points and sum that up. This is known as the sum of squared errors. But wait, this number will increase infinitely as our dataset grows. So we normalize it by the number of data points n. This metric is known as MSE or Mean Squared Error. Lastly, we take the square root of MSE to get RMSE or Root Mean Squared Error. We do this to get our error measurements in the units of our original problem, minutes as opposed to minute squared. So let's bring this back to our Fire Calls dataset. What would an RMSE of 10 minutes mean? Loosely speaking, it means our predictions are about 10 minutes off in one direction or another from the true value. As you can see, the value of the RMSE is dependent on the scale of your data. Our RMSE would be a lot larger if we would use seconds as our unit instead of minutes. Hey Conor, I think we forgot to mention one thing. Before you build your machine learning model, you always want to have a baseline model to compare to so we can evaluate our RMSE against another value. This should be the most simple model you can build. For example, always predicting the average value. If your machine learning model can't beat this baseline model, then something went wrong in your model-building process. We're going to show you how we would build a linear regression model in the next few lessons.