Welcome to Fundamentals of Quantitative Modeling: Regression models. In this module, we're going to talk about a regression model. We'll define what that is. We'll discuss questions that a regression model is able to answer for us. We're going to talk about correlation. And linear association because these regression models are examples of linear models. We're going to discuss the mechanics of fitting a line to data. We are going to spend some time interpreting the output from these regression models. And we're going to talk about prediction and in particular prediction intervals from a regression model. We will briefly see the topic known as multiple regression it allows us to create potentially more complicated and realistic models of a business process. We'll end up by talking about logistic regression which is an appropriate form of regression to use when the outcome variable is a categorical variable. Typically, a zero one outcome a binary random variable. So what is a regression model? Well, a simple regression model uses a single predictor variable with which often give the letter X to, to estimate the mean or the average of an outcome variable Y as some function of X. And so, the example that I have here plots the price against the weight of a set of diamonds. Each point in the plot represents a diamond. The X coordinate is the weight of the diamond and the Y coordinate is the price of the diamond. So, there's a single predictor, that's why we say a simple regression and what a regression model will do is say at any given value of X. What do you expect Y, the price to be? So that's the idea of a regression model. It's a model for the mean of Y as a function of X, and very frequently we will use a linear model to capture the relationship. Continuing with the diamonds example, the predictive variable is the diamonds weight and the outcome variable is the price of the diamond. Now, we can see just by looking at the plot that heavier stones, bigger diamonds tend to cost more money, we would often term that positive association. But, we can go beyond that simple statement by using a regression model that will formalize the idea of the association. And more precisely define how we expect or what value we expect the price to be at for any given weight of a diamond. So we're going to formalize how the expected price varies with weight. And as I just said, one of our most frequently used ways of capturing that relationship is with a straight line, and we'll then call it a linear regression. The formula that you can see at the bottom of this slide is how I would write a regression model. And it says that the expected value that's the average of y and then, the straight line there means given, we articulate that as given. The expected value of Y given X. The expected price of a diamond given its weight is then equal to some function of X and the most straightforward function that we might choose to use is a linear function. And we write the linear function in this instance as b knot plus b1 times X. Sometimes you will have seen the equation of a straight line written as Y equals MX plus b. This is still a straight line but we have a slightly different notation typically in the regression models and there's a reason for that. And the reason is that there's a form of regression called multiple regression which has many Xs in and then we can use a notation that incorporates B1, B0, B1, B2, B3 etc. So we subscript the coefficients, B naught is still the intercept and B1 is still the slope. So a regression model is relating the average of Y to a particular value of x and it's not at all uncommon to assert that that association is at least approximately linear and in that case we're doing a linear regression. On this slide, I have overlaid the straight line model that is calculated from the underlying data. I haven't told you how this line is calculated yet, I will. In a few minutes, but there is the regression line and the slope and intercept in this particular instance, are presented in the formula below. The expected value of the, price of a diamond, given it's weight, is equal to -260, that is the intercept, plus 3721 times the weight, whether weight is measured in carats. So that's what a linear regression is going to do for you, it's going to put a line through the data basically. And once you've got a line going through the data there are a number of useful things that you're going to be able to do with that. So there's a quantitative model that has been derived from underlying data. So we let the data talk to us in the sense that the data chose the best fitting line. Now there's a very commonly used number to describe the strength of what we term linear association. So essentially, how close are the points to a line? The way that we capture that is through a concept called correlation. So correlation is a measure of the strength of a linear association and correlation is typically given a lesser, we called r the sample correlation. And it's a fact that the correlation will always lie between- 1 and + 1. If you have a negative value to the correlation then you have negative association that would be a lying from top left of the bottom right we had positive correlation. You got positive association, that would be aligned from the bottom left of the top-right. But if you have zero correlation, what that means is that this no linear association. It doesn't actually means there no association between the two variables just as there's no linear association between the two variables. Now how would you calculate the correlation in practice? The answer is with a computer program or a spreadsheet. So we won't worry about the details or the actual calculation, it will happen. And I have calculated the correlation for the diamond's data set and it turns out to be 0.989. There's the correlation, which is incredibly a strong correlation as far as correlations go and it's just asserting the fact that the points really do lie very close to a straight line. So a linear model is quite reasonable in this particular instance.