I'll go back and go to Rattle now and show you how to read the data, and I'll also show you what are we looking at the data as we move along. So the first thing we're going to do is do some univariate plots, and then we're going to look at bivariate plots, which is the new thing and we will see how to do the bivariate plots. So let's start with the univariate plots. As before, we will have to start RStudio, and once you have RStudio started, we first load the library and it's just Rattle. Then we start Rattle. So once we start Rattle we can focus just on Rattle. Here is your Rattles, and let's go back. The data we're going to read is in the library. So you see the Library tab out there and you select it. If you pull the drop-down button and keep scrolling down, you will see the Boston Massachusetts housing values in suburbs of Boston data, and we say execute. So here you execute, and as we saw in the last class, you will see various values whether it's a target variable or not. So I'm going to look at a few of the visualization. So actually now, once I have got the data I want to look at age, tax. This is the fraction of lower status population in that area and the median value. So I'm going to look at age, tax, and lower lstat. So first of all, let's ignore all these. We ignore all these and we have age, we have tax, and we have lstat and that's it. Now, one other thing I do want to point out to you before I go on is notice on the top it says 70/15/15. I'm lazy, I'm going to leave it that way. We will see later, and I'm going to keep the seed at 42. So if you build the same thing you'll get the same results. Later on we'll talk about a seed and what these partitions mean, and you execute. Once you execute, it has recorded the fact that you chose these four variables, and now you can do some explore, and you can do distribution. So basically, which ones you want to do? I want to do age, I want to do tax, I want to do lstat and median value. But actually, I want a histogram, so let me just click on the histograms. There you go. One other thing I want to tell you about is, if you want the bars in your histogram, you'll have to go to Setting and uncheck the Advanced graphics and you execute. You go back to RStudio, and you've got the plot of all these variables. So when you go there, you'll see the plot of each of these variables. So let's look at them back as we'll go back and we look at the the points, the distributions. This is something I would like you to remember as we go forward that we look at these plots and say, "Do these plots makes sense?" Will these plots makes sense? For example, does it make sense that you have so many houses which seem to be very old? Does it make sense that there seem to be a bump in your tax? Like that. So what is that value, that big column towards the very end? Does it make sense to have a bunch of houses which seem to be very expensive? So probably, if you knew the neighborhood, or you knew that realtor who knew this neighborhood, that would tell you in a minute that this makes sense or it doesn't make sense. So I was trying to think to myself how to give an analogy for it. Let's say you have a tool, and say you're cooking, and you have a knife which is very sharp, but the vegetable you're cooking is rotten. So whatever you make out of it it's no good. To me, in a model, the tool is regression but the vegetable is your data. If your data is rotten, your outcome is going to be rotten. So the knowledge really is that the data often is more important than the tool itself. In my experience, 99 percent of the time, if the data is not good, the outcome is not going to be good. So I do want to show you a new type of plot. To do that, let me go back. So I go back to Rattle and I have to apologize upfront to those of my viewers who're using Mac that the interactive plot to make it work will require extra effort on your part. We have given instructions, you should read them. So I'm going to use the interactive plot not because it is essential, but it looks nice. So when you click Interactive you use GGobi. GGobi, it is a way of doing graphics on the screen, and you say Execute. It does a wonderful job. Actually, what it does is you can select what you want to plot. So say, I want to plot median value against tax, it will just change the plot. If you want to plot the median value against the fraction of lower state of population, you can clearly see that it has a trend. That the greater the number of lower state of population, the median values are going up. But as if you look at the age of the house, it's not that clear. There is a trend, but it's not clear. So what this tool allows you to do is play two at a time. So far we didn't do two at a time though we're going to do two at a time. That's how this tool is really good at. It's a nice interactive tool, and what we just did is called a scatter plot. Now, remember one thing that you may find this to be a curvilinear relationship. So this is the relationship between prime and median values. When there are ways of converting a curvilinear relationship into a linear relationship while transforming your variables, that should not be very difficult. So in many cases, even though the fit is not linear there are tricks to get around this.