[MUSIC] Hello again and welcome back. In this lesson I'm going to go over some different options we have for grouping our data into bins, or different colors in Arc GIS. So far we've mostly used the default methods for this. Occasionally, we've changed the number of classes we have for our data. But let's go take a look at all the other things we can do. Specifically, we're going to cover the different methods we can use for dividing data into groups. As well as different ways to set those groups manually. So, using the document we used in the last lecture. We have a number of different groups which were automatically assigned when we changed the number of classes here. And I'll call it bining sometimes. We talk about classes, we talk about groups. But, basically what we're talking about is taking the discreet observations. The discreet data points that are in each features value and household income field, and assigning them to a group here. So that they can take a color to show on the map. I like to think of that as throwing them in a bin. Not the bin, for those of you who use bin to mean garbage. But, just tossing them in a bin. You're sorting them right, by the different values in the group and saying. Okay, value 8,000 belongs in the bin that has values between 4,600 and 46,000. But the value 100,000 belongs in this other bin over here that has values between 98,000 and 140,000. Now, you'll notice that these bins, or these classes, aren't of equal size here. They are, sized differently based upon the algorithm we used to choose what the bins are and how many classes we have. So we're going to take a look over here at a sort of new section of symbology. And I'm going to click Classify to modify our classification options here. And there are a whole lot of new values or new options in here. But what I want to show you first is that we can manually change the break value. So there's a classification method, I'll go over that in just a moment. But we can manually change the break values, which is the lines between the different classes. And I can just take this one and slide it over here. And the break value changes over here as well. It adjusts. So I can manually classify. And once I do, it changes the classification method to manual for me. And I can no longer set the number of classes once I've manually set these. So, we can move it down here. And this will let us see all the values at the top a little more. And the map's output substantially changes here because all of the sudden our groups are way lower and there's a much larger segment. A much larger class for the top value here. If we go back to Classify. I can take a look again at these break values. And maybe these values don't mean a whole lot to me as raw values. It's a little hard to set them. So let's break it out by percentages instead. I can click this button and maybe I want 15, 30. So I want 15, 30 but, since I all ready have this broken out down here this way. It won't let me put in 30. Because that values already above here. So I'm going to bump this one up to 60, which is what I ultimately want it to be. I'll put this one at 45, and this one at 30. So we have numbers by 15, groups by 15% all the way up to 60. And then, the last one is the top 40% there. And I can click OK, and click Apply again, and update the symbology again. Now, if I want to adjust this even more, though. If I wanted to add some classes back in. I'm going to have to switch it here. So I'll change it to equal interval for a moment. And I'll change the number of classes to ten. And now I can adjust these values again. So I can make it manual now with the number of classes I actually need. So maybe I'll set this to 10,000. And once again, it locks down my classes. But I now have enough groups that I can adjust manually again just like we did before. So I can go 10,000, 20,000, and let's see the percentages again. So that's four, eight, 12, 16, 20, 24. 28 and 32, and then let's go 36. And then just have that top group again. What you set those manual classes is up to you. But, I just wanted to demonstrate what a manual classification workflow looks like. If I want to set my break values. And once I come back out of here, just to point out It shows all those groups here. And then it gives me the color ramp, as well. So we can see that they did get assigned values from our color ramp. If we go back to classify, there are a number of other features in here, still. First, let's take a look at the classification methods. So we've done manual. And then we had natural breaks as the default. But I'll get to that in a moment. What I want to show you first is equal interval. Equal interval is exact what it sounds like. It has an equal spacing between each class break. So each class has the same range, or same number of potential values in it building upon the previous ones. So the first one has 29,146. And then, the next one is another 29,146 above that one. So each one has the same number of possible values even they don't overlap, and they have different values. Quantile is a little different though. What quantile does is it looks at the number values you have each number. And it sets the groups, the class size. So that you have the same number of actual observed values in each group. So, if we read this histogram here to represent the number of observations that have a particular value. Where the higher parts of the histogram say, we have a lot of records that have values in this range. And very few records with values in this range. Quantile squashes these classes together here because there are a lot of values. So to get the same number of values in the class. We have a narrow region down here, and a large range up here. Again, it's not the same number of potential values like equal interval. It's the same number of actual observed records in each bin. So let's take a look at what that looks like if we apply the symbology there. And the effect of it is that we should have the same number of polygons on the map in each color. We don't have the same area though because the polygons are going to differ in area. But we have the same number of features on the map that take each color. Okay. We won't look at all these. But the other one I really want to point out is the natural breaks method. Because it's Arc GIS's default and there's a good reason it is. If you remember in the first class, we talked about how you can effectively distort information with your symbology. You can create representations of your data that can cause people to come to incorrect conclusions. And you can do that without malice. You can just symbolize your data in a way that seems to point out one thing versus another. Well natural breaks tries to help you avoid that as the default. You don't want to use it all the time, but as a default it's a good choice. Because what it does is it tries to minimize the variation within members of the same class or bin, and maximize the variation between different bins. Those of you from math oriented disciplines might know it as goodness of variance fit. That didn't really mean anything to me before I started GIS though for the rest of you. But as an algorithm that displays our data. It's a great choice because by keeping together values that are substantially similar and then actually showing a difference in color when values are substantially different. We create a map that actually kind of fits our subtle expectations as humans. That by being a different color, two different data points are somewhat different. Where two polygons that share the same color are the same. We are separating them by as much of a difference as we can based upon the number of classes we have. And ten is probably too many classes for natural breaks. It's going to work a little better if I reduce it to five to seven or so. So if I do that, I click Apply. This is going to allow it to actually do what I was just talking about. Minimize the variation within a class and maximize the variation between classes. Okay, the last things I want to point here are in the data exclusion section. Occasionally you'll want to exclude some pieces of information from your map and you can do that under data exclusion. Maybe you have some null values that show up as negative 900, or negative 9,999, or something you just want to remove those. Well, you can run a data exclusion query here. And this will remove from your symbology, from your classification features that need a particular attribute. So maybe we want to remove census tracks that have a certain percentage, or certain amount of their area as water. So let's select something relatively large here. We'll say a water greater than That. Just arbitrarily for this. We'll click OK. I'll click OK again and click Apply. And, it excluded a lot of my data. It's sort of like a definition query, but you could do it just from the symbology pane. And then still access that data in your attribute table. So go back to Classify. I'll remove that exclusion, click OK. And then, I just want to point out sampling. Occasionally, when you are working with very large data sets. What it's going to do is, in order to determine what the classes are. It's going to go through all the records one by one, figure out what the values are. And then decide what the classes are based upon the minimum, and the maximum, and the spread of the values in there. Well, it's only going to look at the first n number of records, where n is this maximum sample size. And, once it hits that maximum sample size, it's not going to look at any more records. Because it's just trying to save you time. It doesn't want to look through a multi-million record data set and make you you sit there and wait. The first 10,000 records might give you most of your symbology. But occasionally that creates a problem, because you don't have the whole spread of your information in those records. So what you need to do is come in here, click Sampling. And tell it to look through far more records, and change the maximum sample size higher. That way, it will look through the rest of the data set, or whatever percentage of the data set you specify there. And get you the values that can let you correctly classify your data. Okay, that's it for this lecture. In this lecture I showed you how we can classify or bin our data in a more advanced way using the classification pane here. I showed you how to manually set your classification, how to use the sliders here to adjust it. And then different classification methods including equal interval, qauntile, and natural breaks. And then how you can exclude certain data from your analysis or from your symbology. And how to make sure that Arc GIS gets all the right data, in order to correctly classify your data. In the next lecture, we'll look at a similar sort of manual tweaking we can do for raster data sets. See you there.