Now, we discussed how we moved from two dimensions up to three dimensions. And we saw that when we moved from two to three dimensions, we saw how many more values are more than one unit away from that mean value of each one of our different covariates. Again, working between negative one and one, and we see that before it was at 21% in two dimensions, and then just adding on one more covariate with the same range from negative one to one. And the same idea that it's going to be standardized with a standard deviation of one and a mean of zero, we saw that 48% light outside. Now, we want to see from there if we can generalize up to higher dimensions. Now, obviously, we won't be able to plot in higher dimensions, but we can start to get an intuitive sense if the idea is if it's within one unit away from that mean. That would mean that we are working within the ball, within the sphere, whatever you want to call it. And then outside of that still using that range, the similar range for each one of our different covariates. If it's outside of that ball, then we would say that it was outside of that standard deviation, and we'd say that's a bit of an outlier using the same sized covariates. So in order to do that, what we're going to start with is, here we have a random sample, calling np.random.sample is just going to pull from the uniform distribution, random points from zero to one. We're saying that we want the size to be five rows and two columns, so we're going to have two dimensional points. We're then going to get the norm, again, this is just the distance from 00. The norm is going to be that Euclidean distance from 00. So the Euclidean distance is just going to be that value squared, because we're moving from 00. So you square that value and then take the square root of that value squared. And we're calling .sum and wer'e summing one, just because we're going to be passing in an array, and we want to get that sum for each one of our individual points. So we're getting that Euclidean distance for each one of our points. And then we want to determine, using that norm, whether or not we are one unit away from that mean or if we're greater than one unit away. And no matter the dimensional space, that's going to be the way that we determine whether or not we are within the ball, within that sphere or not. So we're going to say, this in the ball, we'll just say, is that value using our norm that we just defined within the ball or not within the wall? And it will return either a true or false value. So just to see an example of this, we're going to use that sample data. We're going to say for x, y in zip, and we're going to zip together both the norm value, so we can see what the norm value output is for each one of the sample data points, and then we can say whether or not that's in the ball. And we should see anything above one being outside the ball. So we run this, and first, we printed out our sample data that we randomly generated with all the points being between zero and one. And we see that all of these were actually within the circle. And here we're working in two dimensional space. That was a bit lucky. You see, if I run this again that two of them happen to be outside the circle. Now, how would we generalize this beyond two dimensions? So we saw how we could do three dimensions, now we are going to do it to any number of dimensions. So the way that we're going to do that is we're going to create this function called what percent of the n cube is in the n ball. So in the n dimensional cube is in the n dimensional ball, we pass in the number of dimensions. We can also pass in our different sample sizes, here we're going to use 10,000. So we're going to generate 10,000 random points. We're then going to create a random sample. Again, those will be values between zero and one, using the shape of 10,000 different rows, all with the dimensions defined by the number of dimensions you pass into this function. So originally, we just did two dimensions as we saw in our samples here, now we're going to move that up to three, four, five dimensions. And you can also imagine this, again, think that each one of these different rows contains our first covariate than our second covariate. And when we add more dimensions, all we're doing is adding on more features, adding on more dimensions. So what we're going to do is we're going to call in the ball for these 10,000 different values, and then we're going to call .mean. So if you think about it, this will be outputting either true or false for each one of these 10,000 values. True or false can be used as 1 and 0, with true being 1, false being 0. If we take the average, we can see what percentage actually falls within the ball. That's how this .mean will work for us. And then we're saying, for iteration in range(100), so that we get 100 different examples of these 10,000 points, to ensure that we converge on something close to what the actual solution would be in regards to generalizing to these higher dimensions. So we end up with 100 different values for the average amount that lies within the ball versus outside the ball, and then we take the mean of those values. And that will give us the percentage of the n cube that's in the n ball. We're then going to call for dimensions ranging from 2 up to 15, so not including 15. So up to 14, those are going to be the different dimensions that we're going to test. And then our data is going to be for each of these, we want to pull out what percentage is in the ball. So we're just going to map in These different dimensions into our what percent of the n cube is in the n ball function. And that will output for each one of these different values in the range, what percentage actually lies within the cube, the circle, whatever it is. You see here that we also include 2 and 3, so we'll also be able to check, compared to what we saw before, whether or not we have close approximations of what the actual values are, given the calculations that we had in regards to the actual formulas of a sphere versus a cube and a circle versus a square. So we say for dim and percent, so we're just getting, say, start with 2 and then the input for 2 for that data. We're going to map those 2 together and get the dimension as well as the percent within the ball. So we see that 78% fall within the ball at first, 78.5, which makes sense given that we saw before that 21% was outside of the ball. Same with 52% being in the ball for three dimensions, we saw 48% above. And we see how that drops off quite dramatically as we keep increasing the number of dimensions. So more and more of our values as we add on these features all with similar ranges, and similar standard distributions, we see how many of them tend to be outliers. And we can plot this, finally getting a simple plot, calling plt.plot. We're going to get our x label, our y label, and just our title, and all we're doing is our dimensions versus the data which is the percentage of the amount that falls within the ball versus not. And we can see how it steeply drops off as we add on more and more dimensions. So just to double-check our understanding, we see that this is dropping off quite dramatically. We're also going to measure the distance from the center of our cube to its nearest point. So, you can see out of all those points that we have, right here we're going to generate rather than 10,000, just 1,000 points. We can see how many of those, or out of those 1,000 points, which one is closest to the center. And hopefully we will see, I'll give you a little bit of a spoiler, we will see that that closest point will be farther and farther away as we increase the number of dimensions. So this is just a bit more evidence to that same point. So we're going to pass in the dimension, we're going to pass in our sample size here being 1,000. We're setting the default equal to 1,000. We're going to, again, call a random sample. This time rather than 0 to 1, we'll subtract .5, so it's centred at 00. And then it'll be from negative point five up till point five. And then we will return the min of the norm of each one of these points. Again, the norm is the distance from zero in either direction. And then in order to estimate the closest, given that dimension, we can use that get min distance that we just defined. That will give us that minimum distance using the norm of each one of those points. We're going to do that 100 times over. So in the same fashion that we just did to ensure that we have a large enough sample. And then we're going to return not just the average of that data, but the minimum of those minimums, as well as the maximum of those minimums. So that we can get a bit of a range of the values in regards to how far away they are from the origin. So we're going to calculate this from values ranging from two to 100. We're then going to map those dims into that estimate closest function that we just defined above. And we can print this out. And this will take just a second to run. And then afterwards we'll also be able to plot using that same functionality that we just discussed. So we see here for dimension six, the average value was 0.22, the minimum of those minimum values was 0.1. And the maximum of those minimum values, given that 100 different iterations of this was 0.3. So we're going to plot those dimensions as well as the min distance data of the rows first column. And then this plt.fill_between, we're going to use that in order to plot both our min and max. So we'll have the range of the average values, and then we'll also be able to fill between the min and max values so you can see a bit more clearly what the range was as we increase the number of dimensions. So the min distance data, if you recall, is going to output three different values. Zero is the mean. The first column is going to be, or the second, first in Python, is going to be the minimum, and the second is going to be the max. We're saying alpha equal to point five because it's going to fill between the two values, and we want to also see that line in between. So, you run this, and we can see as we increase the number of dimensions how far that minimum point is from the origin, as well as that bit of range that we're able to get using that file between as well. So, that closes out this video and it gave us an opportunity to look at how we can expand up into higher dimensions. With all this in mind, in the next video, we will begin to show you the effects of working with high dimensional data when you're actually trying to use the different classification algorithms that we introduced in the last course. All right, I'll see you there.