In this lecture we're going to talk about partitioning variability. So, assume my x matrix includes a 1. So, it has a vector, let's just call it J. That's the vector 1. So it includes an intercept and it includes other stuff, other columns. So, I'm going to define, well first of all, let me define HJ as J J transpose J, inverse, J transpose, and Hx, as x. x transpose x, inverse, x transpose. Now, since my h, and since my x, contains an intercept, then my, then J can be written as a linear combination of the columns of x. If I just take the vector that grabs the first column of x, so a vector that's 1 and then a bunch of zeroes and multiply it times x. That will grab the first column, which is the intercept. So I know from the previous lecture that if I take (I- H sub x) and I multiply it by any vector that's a linear combination of the columns of X. Let's say x times gamma. In this particular case I've set up gamma so that it grabs J. Then, this has to be equal to 0. Well that's going to imply that J minus H of X time J equals 0. In other words J is equal to H of x times J. So squirrel that little pearl of wisdom right there away, and we're going to use that later. Now let me define as the total variation the sum of the squares of total as the norm of y minus y-hat. At y minus y bar times J, okay. So y minus y bar times J. We can also write that as y minus and remember that y bar which is a scalar so I probably should have put it on this side of the J. We could write this quantity right here as HJ norm squared. And so we could write it out as y transpose (I- HJ)y. Then let me write out the residual sums of squares as the norm of the residuals. The norm of e squared which is norm y- y-hat, where y-hat is the fit of values from the full model with x in it, okay? So that we know is y transposed I minus H of x times y. So this is the numerator of the variability estimate that we would get if we only included an intercept. And this is the numerator of the variability estimate that we would get if we had included an intercept plus all the other regressors. So not presumably, but of course, the residual sums of squares is going to be smaller than the total sums of squares because those extra columns are going to explain some amount of the variation in Y. So, let me make a third definition, which is the sum of the squares for regression, which is the distance between the fitted value if I only include an intercept and the fitted value if I include the intercept plus all these other regressors squared, which I can write as norm HJ times y minus HX times Y. Let me, squared, so let me work with this term a little bit. The regression sums of squares a little bit, and so I'm going to write this as y transpose time HJ minus H of X times HJ minus H of X and I don't have to worry about the transposes for those terms, because both HJ and H of X are symmetric. Okay? And then let me write this out, now, as y transpose times, or HJ is item potent, so HJ times HJ. HJ squared is just going to be HJ. And then H of X times HJ minus, that's going to get, that's from that one, and then HJ times H of X. That's going to be that one. And then H of x squared. Remember, H of x is item. So that's plus H of x times y. Now remember this fact up here. J is equal to H of X times J but I could similarly multiply here by a J transposed J inverse J transpose and multiply here by J transpose J inverse J transpose so I haven't done anything. And then what I get is that H of J that is implied is that H of J is equal to H of x times H of J and then by taking the transpose, because they're symmetric, I also see that H of J is also equal to H(j) H(x). Okay, so this quantity right here is H(x). This quantity right here is H(x). So we get Hj- 1H(x),- another H(x), + 1H(x). So we get that y transposed is Hj minus H of X times Y. Now, I'll get to my point. Let me take my sum of the squares for total, which is the norm of Y minus Y bar times J squared which we wrote out before as Y transpose times I minus HJ Times y, H of J times y on the outside of the parenthesis. So let me add and subtract, subtract and then add, I guess, and then let me organize it this way. Hopefully, you'll see what I'm doing here. H of x plus H of x minus H of j times y. Okay, so I haven't done anything going from this line to this line other than adding or subtracting H of x. Then I get y transpose I minus H of x Times y + y transpose (Hx-Hj) times y. And this term right here is the sum of the squares for residual. And this, right here, is the sum of the squares for regression. Okay, so this small distinction is that I have Hx minus Hj down here and Hj minus Hx up there. But I would like you to prove for homework that the order of subtraction doesn't matter in this quadratic form. That the two are equal. Okay and so here it comes to the broader point. So the total variation. SS total, the total variation decomposes into the residual variation and the regression variation. So my total variability in my response gets decomposed into the variability explained by my regression model, and the remaining variability left unexplained by my regression model. All these are positive. All these are positive because they're all sums of squares, and so what is very common thing to do is take SS regression, the amount of variability explained by my regression model, and divide it, by the total variation. And then, what is that going to give us? That is going to give us the percentage of the total variability. Total variability, Explained By the linear association, With the added regressors. I guess if I want this to be a percentage, I'll multiply it times 100. So that quantity, that quantity usually not expressed as a percentage but as a proportion, is called r squared. So R squared is interpreted as the percentage of your total variability explained by the linear association with your added regressors. And we see that it was a pretty easy to proof to get that the total variability decomposes in to the residual variability. And the regression variability and it all involved this little trick up here that said that j was equal to h of x times j. Okay, so in case you were wondering how these things worked out and why everything added up when you were looking at your regression output, this is why. Okay, so thank you for listening and we'll talk a lot more about partitioning variability and how that relates to things like f test later on in the course.