find the best fitting line, in this instance, to the data.

And I've written down the formula for the best fitting line.

And that best fitting line is the blue line that you can see

superimposed on the graphic.

And around the blue line I've plotted a gray band.

And that gray band is termed a prediction interval.

And this is the key difference between a probabilistic and a deterministic model.

And that by using this probabilistic model,

we're going to get measures of uncertainty of the outputs.

And you can use the gray band there to create a prediction interval for

what we term, a new observation.

So if you came to me with a diamond

that had come out of the same population that this regression was run against.

Let's say you come to me with a diamond that weighs 0.25 of a carat.

Then I can use this graph to predict the price of that diamond and

furthermore, I can use the gray bands around the graph to give

a prediction interval that captures the range of uncertainty.

And clearly you want to be able to do that, because when you look at the points,

they don't lie exactly on the straight line.

They're pretty close, but they're not exactly on it, so

there's some noise in the system, and we're able to measure that noise, and

incorporate it in our prediction interval and forecast.

So that's what a regression model does for you.

And as I said before, this is certainly one of the techniques that is most

frequently used in business analytics.

So to summarize, regression models use data, and

they use that data to estimate the relationship between the mean, or

the average value of an outcome, let's call that Y, and a predictor variable X.

So going back to the diamonds example, what our regression model is

going to do is give us the expected price of a diamond for any given weight.