So the multivariate normal distribution in fact just isn't rich enough for the collection of distributions that we need even if we are going to assume that their underlying outcome variables are normally distributed. Let me give you a simple example of what I mean by this. So take as an example, I have a vector x1, x2 which is, both of those are scalars, is multivariate normal with mean mu1, mu2 and variance matrix sigma. So let me define A to be equal to 1,1,0 0,0,1. Then matrix A times x1, x2 works out to be x1, x1, x2, x1, x1, x2. So this can't be multivariate normal because the first two entries are just the same one repeated twice. And another reason it can't be normal is the variance matrix which is A sigma A transpose, is not full rank. So it's actually not invertible. So you couldn't even write out the normal distribution, which remember requires the inverse of the variants in that exponent there. So what's going on here? The real problem here is that the matrix that I'm multiplying my multi-varied normal vector by is not full row rank. So when you multiply it by a matrix that's not full row rank then you wind up with not a normally distributed random variable, you wind up with what we're calling a singularly normal, a singular normal distribution. The reason for calling it singular is, so singular normal. The reason for calling it the singular normal is because the variance matrix is singular, it's non-invertible. And this is an important distribution for us and I'll give you an example of when it's important. So the standard assumption that we're going to make in regression is that our y is normally distributed with mean equal to x beta and variance equal to sigma squared I. So take the residuals which are I minus H of x times y. Look at this matrix right here, I minus H of x. It's actually not full rank and the reason I know that is because it's symmetric and idempotent. And for symmetric idempotent matrices, the trace equals the rank. So, if I take the trace of I minus H of x, that's the trace of I minus the trace of x, x transpose x, inverse x transpose which is n for the trace of I. And then I could move this x over here and then I get x transpose x inverse times x transpose. So I get the trace of A p by p identity matrix which is the trace of a p by p identity matrix is p. So this matrix is m by n, but is of rank n- p. So, we're going to wind up with the same problem. Where we have a variance/covariance matrix that's not a full rank. So the residuals, another way to see this, so the residuals, even if my y is multivariate normally distributed, my residuals are actually not multivariate normally distributed, even though they are a linear combination of my vector y's. And some of you might find this surprising because you might be, already been doing regression a lot. And had some practice of checking normality of your data by checking normality, the apparent normality of your residuals. And it's not like this is a bad practice, because when n is much larger than p, your residual should be approximately normally distributed. So it's not like this is a bad practice. But as a matter of theoretical fact, your residuals are not, are guaranteed to not be normally distributed. Another way to see that the residuals can't be normally distributed is consider the instance where we include an intercept. If we include an intercept then the sum of our residuals is zero, however for a normal vector, the sum of a normal vector has to be a normal scalar and a normal scalar can't take a particular value with probability 1. So it can't possibly be normal if it has that kind of linear redundancy built into it. And in fact there's p linear redundancies built into the residuals and so there's many different ways you can create, there's many different ways that you could create a vector, a linear combination of these residuals that is a constant. A full rank linear combination of the residuals that is a constant and so you could see that that can't possibly be multivariate normal. So the singular normal distribution was given its name because of situations like this. We needed something that included the normal as the special case, the multivariate normal is a special case, but then also encompassed all these other settings that we need. So we define a singular normal as any linear transformation of a multivariate standard normal. And so in this case, that means any linear transformation of a non-standard normal because we know that a multivariate normal is a simple transformation of a standard normal. So, take for example our case here. We know that y is equal to x beta plus sigma times z, where z is a multivariate standard normal. So our residuals, our residuals e is equal to I minus H of x, times x beta plus sigma z, where z is a standard, of collection of IID standard normals. So it satisfies the definition of being singular normal. And the singular normal distribution carries over a lot of the properties of the normal distribution that we would like. For example, linear, first of all any linear combination of singular normals is singular normal. So it doesn't, full rank, or not full rank, it's singular normal. The second thing is absence of covariants implies independence just like in the multivariate normal, and all of the marginal and conditional and subgroups of random variables from the singular normal distribution, they're all also singular normal, okay? So it carries over a lot of the properties that you'd like from the normal distribution, but then takes away this property that you have to have full row rank linear transformations in order to maintain the distribution. It takes that property away at the expense of then we have random variables that have linear redundancies in non-invertible covariance matrices. Okay, so the singular normal distribution is an important distribution and we’ll use it kind of frequently.