I will now try to give you the intuition behind principal component analysis. Let's get another example. So say this is my data set, and this data set is a two dimensional data set. So, we have one dimension this way and another one this way. I want to reduce the dimensionality of my data set, so I want to go to single dimension. What I can do is try to find a direction in this data set that preserves most of the variance. So when I say variance, I mean the classical definition of variance. The expected value of the square deviation from the mean. So the variance x will be the expected value of the deviation of the mean squared. So, what's the purpose of this? You can think about this this way. Let's say that I come up with this direction. I think this direction preserves most of the variance. This means that this direction preserves, because this is zero, zero preserves distance between my points like this, between these two, between these two, these two, these two, these two, and so on. So, if I take the distance between my points alongside this direction, it will be bigger then for example, if I get this direction and then get the distances like this. So basically, you want to find the direction that preserves most of the variance, most of the distance between the individual points because when we're doing classification we are interested in how far apart the points are. Once you have found this direction, in my case the direction is this one, the precise line that preserves most of the variance, then what you can do is you can project your points onto this line and use this line as your new coordinate system. So, if I project my points onto this line then this line becomes my 1D attribute. If I have zero, zero here then I can re-index my points based on how far away from the zero they are on the line. So, this point here for example will be something like -3, -2, -1.9, minus I don't know 0.5 and so on. So, I will project my original data onto this line and this gives me my new dimension. Effectively this operation has transformed my data from 2D-1D, and you can use the same approach in the 3D dataset when you're working in three dimensional space. So, you will find the first direction preserves most of the variance in this case something like this. Then the next thing you do is you find a line that's perpendicular to this one that preserves most of what's left of the variance. So, this will be a line probably like this one, perpendicular to this one. Okay, I didn't draw this very perpendicular, maybe something like this. Then keep going until you reach the number of dimensions. Then you'll find the line which is perpendicular to the second line and preserves most of the variance that's left unexplained in the data, so maybe it goes like this then. You'll end up with three new lines and again, you don't have to use all of them, you can just say, "Well, I will project all my data points on the blue line" in which case you would have reduced your data say to one dimension where you can say actually, I will use both the first line and the second line. So, you'll project into the coordinates and then you have reduced your dimensionality to two. So, that's the idea of principle component analysis, you weld us the idea of dimensionality reduction using the direction of greatest variance. You find the first line that explains most of the variance then you keep looking for perpendicular lines to the previous one that explain what's left of the variance and you keep going until you reach the number of dimensions in the data set. Then you will select a number of lines which ideally will be less than the total dimensionality of your original data, you'll project the data and then you end up with a new data set with new features, the three things most of the information in the original data set with less lower dimensionality. I would like to give you an example of how this technique works, and the important thing to mention is that these lines, these directions that preserve the variance are essentially given by the principal component analysis technique. So, we will do this now and I have prepared an old book in Watson studio to show you how we can use an arbitrary data set and then find the principal component analysis and to reduce the dimensionality.