Hi and welcome to the first video on our sequence on factors. Now in this module, we're going to talk about factors and we're going to use the package called forcats, which is just a respelling of factors and if you look on your online materials, you will see that we are going to be covering Chapter 15 of the R4 data science book. There is a factors cheat sheet that's available, as well as an overview and a vignette of the package and as well there's some other examples in Wrangling Categorical Data in R. Now what are factors? If you remember from your basic stats class, you talked about different variables being either categorical or quantitative. This is for categorical variables which puts units into different groups or categories based on their values. It's different from just a character variable in that there's a predefined list set of levels or possible values of a factor. There is not for a character, and sometimes there is an ordering for the levels of a factor. Sometimes you want to order them in a certain order. Now we're going to take a look at the average highs, some data for DC climate as our example. First of course, we'll in library tidyverse and we'll library in the forcats package. Now if you'd look at this DC climate dataset, we'll see that in this dataset, I have inadvertently misspelled the name of the month for August and there is no entry for July so we would like for a few things to happen. One is we would like to be able to determine that one is missing and that one is misspelled, and we would also like to plot them in the correct order where as in the order that the months came in. But you will notice that if I plot this dataset without making these factors, you see when I just put it in, it's a character variable and what I want is, I want to have it be a factor variable, but right now it's character, so it's going to order them in a different order. It won't necessarily order them in the order that I would like to have them. I would like it to start January, February, so forth. Factors can help us with all of these issues. But you do have to be careful with factors. Let me show you some examples before we get into exactly how it will work. If I take this character vector as my x and I make a factor out of it. I just do factor of x and make that xf. Now, what I've made that a factor, if I look at the numeric version of the variable that I started with, I get the numbers out like I would think, if I do add as numeric of xf, I get some strange thing and why would that be? Well, that's because factors are represented inside of r as integers of the order in which the levels occur. I do a factor of this, then it says, this is the fourth one, this is the third one, this is the first one, this is the second one and you see it as put it in alphabetical order. Let's look at what happens if I do as numeric of a character. The first one, it tells me it's an NA. Because it doesn't know how to make that numeric. But if I do the factor of it first, then it's the number one. Here's another thing that can happen. Let's look at making factors of these two things. I forgot my last print, there we go. If I just looked at that one, it tells me the levels and if I do that one, it tells me those entries and those levels. But if I put them together in a vector, because they had different levels, It's made them into their numeric values. You see with each level, it's represented inside as an integer. If it's the first level, it just shows it as a one and it keeps a list of the level somewhere else. This is the end of the first video. Come back and we'll talk about how to create factors.