Now let's consider full lease squares. So in this setting, our outcome y is n by one vector, y1 up to yn, and our x, our design matrix, is, we can write it in two ways. First we could write it as a matrix of its element x11 Up to x1p. All the way down to xn1, to xnp. Or we could write it as its collection of column vectors, xi to xp. We're gonna assume x is a full column rank. And that p is strictly less than n, well less than or equal to, let's say. And we'd like to explain the variation in y with linear combinations of the columns of x. And we're going to do that by finding the beta. The collection, the weights of the columns of x that best describes y, in the sense of minimizing the sum of the squared distances. Okay? So let's figure this out. If we expand this out we get y transpose y Minus 2, y transpose x beta plus beta transpose x transpose x beta. Now let's call that asterisk if we take the derivative of asterisk with respect to beta. We get that it's equal to negative 2 x transpose y, plus 2 x transpose x beta. And if we want to solve that equal to 0, what we're going to then is get the equations x transpose x beta equals x transpose y. These are called the normal equations. Now, if x is full column rank then x transpose xp by p and it's of full rank, and so hence invertible. And if you haven't seen this before, I just found a nice resource on this. Go to khanacademy and his section on linear algebra, he has exactly a description of proof, a very simple proof of why x transpose x Inherits the same rank as x. It's a very easy thing. Okay so since we can invert it, we get that beta hat the estimate of beta, the best estimate, works out to be x transpose x inverse x transpose y and we can then take the second derivative. Of asterix. Which works out this part doesn't involve beta and then this part is just linear in beta so it is two x transpose x. Which is a positive definite matrix. So that means that beta at here is the minimum of the least squares criteria. So that is it. If you want to find the best linear combination of my columns of x to explain why in the sense of minimizing the norm between our predicted value and the response y. We get a solution of x transpose x inverse x transpose y.