Have you ever had that you are looking at your dataset, want to make a statistical analysis, but have absolutely no clue where to start? In this video, I will explain what you should do first, what to do with your dataset before you can make any statistical analysis. The clue is to first organize your dataset before analyzing your dataset. Okay, the basic way to organize your data is in a structure based on units and variables. A unit represents an individual case, the thing that is being measured where are the variables are the properties of the units that we have measured. This might still sound a little up strict. Therefore, let's have a look at an example. Consider, you work at the bank, and you're in charge of the department that handles cross-border transfers of money. These transactions can be made by anybody and go, for example, from the Netherlands to the United States, or maybe to China. As you can imagine sometimes something can go wrong. Maybe the wrong currency was used, the wrong amount was transferred or it was in the wrong language. These mistakes will have an effect on your client, and often your client will submit a reclaim. You as department head will have to initiate a so-called investigation to solve such a reclaim. You will assign each reclaim a unique number for administrative purposes. This is the first piece of your dataset. And it should look like this, of course, for the client it is important that the reclaim is handled quickly, and therefore a crucial indicator of service quality is a total handling time of a reclaim. In a Lean Six Sigma project, such a characteristic is often called the CTQ or the Critical to Quality characteristic. For the first reclaim, you can see that it took the team 30 days to solve. We can also look at other aspects of this reclaim. Such as the type of reclaim, in this case an AN type or how often we have to talk to the foreign counter party. In this case, it took us three iterations with the foreign bank before the reclaim was solved, as department head you collect this data on multiple reclaims. And here we have five of those, now let's move away from our bank example and just consider the data. As I mentioned at the start of this video, a dataset consists of units and variables. The unit is a thing being measured, on which you have performed your measurements and that is, in this example, a reclaim, and you collected data on five reclaims or five units. Units are always stored in the rows of your dataset. A unit is called the experimental or observational unit or sometimes referred to as a case. This just depends on the book you're reading. A dataset also consists of variables, and a variable is always stored in the columns. A variable represents the properties of the units that you have measured. In our example, we have measured for each unit, four different variables. Remember to always identify the units and your variables. And that it is common practice to store your data this way. So always use the units in the rows and the variables in the columns because most statistical software packages require you to do it this way. First, organizing your data makes analyzing your data a lot easier. Now, let's take a closer look at the variable total time. The first reclaim has a total time equal to 30. But do you know what this 30 means? Is it 30 hours? 30 days? Or 30 weeks? This is where you need to know the unit of measurement. Or it's also called the measurement unit. In our example, the unit of measurement is days. So the first reclaim took 30 days to finish. Let's summarize. First, we saw that the dataset consists of units and variables. Then we learn to organize your data by making sure that the rows represent the units and columns represent variables. Also, you learn that units are referred to as observational experimental units or cases. Finally, you learn that a measurement unit is required just to understand a data value.