And as I just said, one of our most frequently used ways of capturing that

relationship is with a straight line, and we'll then call it a linear regression.

The formula that you can see at the bottom of this slide is how I

would write a regression model.

And it says that the expected value that's the average of y and

then, the straight line there means given, we articulate that as given.

The expected value of Y given X.

The expected price of a diamond given its weight

is then equal to some function of X and

the most straightforward function that we might choose to use is a linear function.

And we write the linear function in this instance as b knot plus b1 times X.

Sometimes you will have seen the equation of a straight line written as Y equals

MX plus b.

This is still a straight line but we have a slightly different notation

typically in the regression models and there's a reason for that.

And the reason is that there's a form of regression called multiple

regression which has many Xs in and

then we can use a notation that incorporates B1, B0, B1, B2, B3 etc.

So we subscript the coefficients,

B naught is still the intercept and B1 is still the slope.

So a regression model is relating the average of Y to a particular value of

x and it's not at all uncommon to assert that that association

is at least approximately linear and in that case we're doing a linear regression.

On this slide, I have overlaid

the straight line model that is calculated from the underlying data.

I haven't told you how this line is calculated yet, I will.

In a few minutes, but there is the regression line and the slope and

intercept in this particular instance, are presented in the formula below.

The expected value of the, price of a diamond, given it's weight,

is equal to -260, that is the intercept, plus 3721 times the weight,

whether weight is measured in carats.

So that's what a linear regression is going to do for you,

it's going to put a line through the data basically.

And once you've got a line going through the data there are a number of

useful things that you're going to be able to do with that.

So there's a quantitative model that has been derived from underlying data.

So we let the data talk to us in the sense that the data chose the best fitting line.

Now there's a very commonly used number

to describe the strength of what we term linear association.

So essentially, how close are the points to a line?

The way that we capture that is through a concept called correlation.

So correlation is a measure of the strength of a linear association and

correlation is typically given a lesser, we called r the sample correlation.

And it's a fact that the correlation will always lie between- 1 and + 1.

If you have a negative value to the correlation then you have negative

association that would be a lying from top left of the bottom right we

had positive correlation.

You got positive association,

that would be aligned from the bottom left of the top-right.

But if you have zero correlation,

what that means is that this no linear association.

It doesn't actually means there no association between the two variables just

as there's no linear association between the two variables.

Now how would you calculate the correlation in practice?

The answer is with a computer program or a spreadsheet.

So we won't worry about the details or the actual calculation, it will happen.

And I have calculated the correlation for

the diamond's data set and it turns out to be 0.989.

There's the correlation,

which is incredibly a strong correlation as far as correlations go and it's just

asserting the fact that the points really do lie very close to a straight line.

So a linear model is quite reasonable in this particular instance.