In this lecture we're going to talk about residuals, which we usually label e. Our residuals are defined as y minus y hat, where y hat is the hat matrix times y. Where the hat matrix is defined as x x transpose x inverse x transpose. So it's the hat matrix is a very useful matrix. We'll talk about that a lot more as the course progresses. We can factor the y out and get I minus H of x times y. Let's talk a little bit about the residuals. So first of all, remember, if Ry was an element of R3, so some point in R3, and our space gamma x was a matrix, a full column rank that's 3x2. And the upper-case gamma was the set of elements x beta, such that beta is in R2. So we could draw gamma as a two-dimensional plane cutting through our three-dimensional space. So y-hat with the projection of y onto that space. So here's y, there's y hat. The residual e is merely the difference. So you can see, I didn't maybe draw this the best. Let me see if I can get a better, version of it. This is e. So, you can see that e, if we move it around, e should be orthogonal to every other point in gamma. In other words, the e should be orthogonal to any point that's a linear combination of the columns of x. We can see that very clearly because if we take e transpose times some point that's a linear combination of the columns of x. So x times lower case gamma where gamma is any element of r2, okay? That is y transpose I minus H of x transpose which is just H of x because that matrix is symmetric, times x gamma. Okay so let me just write over here that h(x) times x is x x transpose x inverse. X transpose then times x again. So x transpose x times x transpose x inverse is just I. So H(x) times x is just x. Okay? So this is equal to yt (x-HxX) gamma, but HxX is just X, so we get x-X. That's at zero, reminding us that e is orthogonal to any other element of the space upper case gamma which is any other element that is a linear combination of a column of x. Another interesting fact about e is that if x contains an intercept then the vector one is in the column space of our residuals. So that e transpose times Jn is equal to, I'm sorry, if our x contains an intercept and the vector Jn is in the column space of x, therefore it is a linear combination of the columns of x. Therefore, e transpose Jn equals 0. Or in other words if we have an intercept among our regressors, both the sum and the mean of our residuals will be zero. Finally, I would look at the sum of squared residuals which is e transpose e. That is going to be y transpose I minus H of x. (I- Hx) times y, but notice that if you take Hx and multiply times Hx, that's equal to x(x transpose x) inverse x transpose x (x transpose x) inverse x transpose. So this cancels with that, and we get x X transpose x, inverse x transpose, or H of x again. So H of x is okay? So if we were to multiply this out we'd get y transpose I minus H of x. So that's the I times the I gives us this I right here. H of x times I gives us that one right here. This I times that Hx gives another -Hx and then Hx times Hx gives us one more Hx. So we have -Hx-Hx+Hx, so we can just write all this as -Hx. And so we see that this matrix itself, I-Hx I minus the hat matrix is also idempotent, but at any rate the. Sum of the squared residuals works out to be a quadratic form with the ys on the outside and I minus H of x on the interior. We'll use this quantity to define things like estimates of the residual variance. For the time being, I would just like you to know that the residuals are by definition y minus y hat, they take the form given by I minus H of x times y, and that they are orthogonal to every other linear combination of the columns of x. We'll talk a lot more about residuals and all the different variations of the residuals later on in the class.