Remember that you were standing next to a highway and were wondering, what is the average speed on this highway? Using a speed camera, you measure the speed of vehicles passing by. You cannot measure all vehicles passing by, as it is impossible to measure the speed of two cars that pass by at the same time, like this one and this one. This makes you wonder which vehicles should be selected to get a good estimate of this average speed. This selecting is called sampling. And in this video I will show you why we sample in statistics, and how to obtain a representative sample. So let's start with this first question. Why sampling? Well, first because it is difficult, often, to obtain information on the whole population. If you stand next to a highway with just one speed camera, it is impossible to measure the speed of each single vehicle passing by. Of course, sometimes sampling is not necessary. If you have automated speed cameras hanging above each single lane, you don't need to sample. Also, it can be that by taking a sample, you can get very detailed information about this part of the population. Whereas if you use data sources which measure the whole population, they have more vague information. In these cases, sampling can also be a good idea. So let's have a look, how should we select a sample? Let's go back to our highway. Should I focus on all the trucks, or should I focus on all the cars in the left lane, or should I measure many cars during traffic jams? Are these good sampling strategies? Remember, we wish to know the average speed on this specific highway. And you maybe already have a feeling that these three strategies will not give us a very accurate estimate of the overall average speed. So, how do we collect the sample? What makes a sample a good sample? Well, a sample has to be representative. This means that the selected vehicles are a good representation of all the vehicles passing by. It means that the vehicles in your sample are a good selection of all vehicles on the road that day. Technically speaking, a representative sample means that the sample should be based on a mechanism that is independent of the question that you want to answer. Let's have a look at our previous sampling ideas, and let's see why these ideas do not result in a representative sample to answer our question, what is the average speed on the highway? A sample should be representative. So only measuring trucks is not a good idea, because trucks always drive slower than the other vehicles. Why is measuring all the cars on the left lane not a good idea? Because these vehicles will drive faster than all the cars on the other lanes. Also, measuring during traffic hours the cars will drive slower than normally so you will get an average speed that will be lower than the average on the whole day. So, all these sampling strategies are not independent of the question asked. So what should we do? The clue is that you should sample randomly. But what does randomly mean? Well, technically speaking it means that each item, each vehicle, has an equal chance of being selected. Say you know that there will be 5000 vehicles that will pass by today, then each vehicle should have a 1/5000 probability of being selected into the sample, and that is equal to a 0.02% chance. A random sample has some advantages. First, it ensures selected vehicles are spread evenly over they types of vehicles. You will get cars, trucks, and motorcycles. Cars will have a higher likelihood of being randomly selected because there are more cars on the road. Trucks will have a lower likelihood of being randomly selected because there are less trucks on the road. And this is exactly what you want. Just like in the population of old vehicles on the road, the sample will contain more cars than trucks. Next, another advantage of random samples is that it will spread out the selected vehicles evenly over the moment of the date. For example, you will have cars at night, when people usually drive faster, during daytime, when people drive normally, and during traffic jams, when people have to drive slower. But let me give you a warning, never, never randomly select the vehicles yourself. Why not? Because as humans we are not programmed to make random selections. So how do you select random samples? You will always need to ask a computer to do this for you, because doing it yourself is not good enough. Unconsciously, you will select those items, vehicles that you prefer, that you've seen before, or maybe that you recognize. So let me show you how to select a random sample using Minitab. And you can also make a random selection using any other program that is available online. We said that there are 5000 vehicles that will pass by the road today, and you wish to select 100 randomly from these, let me show you how to do this. First, we need to make a column with your population of vehicles, and next we will make a column with our Sample vehicles. So the first thing to do is to make our set population. For that we go to Calc and you can go to Make Patterns Data and a Simple Set of Numbers, to make a population. We want to store our population in the column Population vehicles. Then the first car is numbered 1, until the last car which is numbered 5000 and we'll go in steps of 1, okay. Now, Minitab made for you a column in the Population vehicles column starting with 1 and all the way counting up to 5,000. To select a random sample of 100 vehicles we can go back to Calc, you go to Random Data because we want a random selection, and we go to the option Sample From a Column. How many rows to sample? Well, we wanted a sample of size 100. We want to sample this from the column Population vehicles and we want to store this in the column Sample vehicles, okay. Now, Minitab selected randomly 100 vehicles from the 5000 vehicles. And the first one that it selected is number 492. And of course, if you would do this yourself or if I would do it another time, these numbers will be different, because there's random selection Summarizing, we have seen that sometimes it is necessary to sample, and that if you sample, it is important to select a representative sample. One way to obtain a representative sample is by randomly selecting the items. However, never do this yourself, always use a computer to make a random selection.