In this video, you will learn to calculate probabilities and percentages, using a tool called the empirical CDF. In your list in sigma project, you will use this tool in the analyze phase. The empirical CDF is used to determine how many percent of the cases meet your service level agreement, or your specifications? Okay, let's have a look at an example to explain this. Imagine, you run a project in a call center and you want to improve the total handling time. Your selected CTQ or your Y variable is the total handling time or THT. This example was also discussed in the video on probability plots. For the project you wonder, how often do my employees handle a call within 240 seconds, which is your target. What would you do to answer a question like this? Let's have a look at the data. So how to compute how many calls were handled within 240 seconds. We can start with sorting the data from small to large. And then we can count how many calls are handled within 240 seconds. Which comes to 34 calls which equals 60%. However, a percentage based on counting is only precise if you have a large data set. As explained in the video on numerical and categorical data. We basically are considering binary data, is a value below or above the 240 seconds. And for precise estimation of categorical data, we need larger data sets. The rule of thumb was at least 300 observations in your sample. So we will need to do something different to answer this question. I will show you to use the Empirical CDF to compute an accurate percentage. The histogram of our data already showed us that the total handling time follows a skewed distribution. And with the probability plots we discovered that THT is lognormally distributed. Now pause the movie, load the THT data into Minitab before you continue. I have copied pasted the total handling time data into minutes up into the first column here. Let's make an empirical CDF. You can find it under graph. And you can find it here, the empirical CDF. Then we have a single variable. Okay. And of course we want to make the graph for the total handling time. That's it, okay. Now Minitab made for us and Empirical CDF of the total handling time with respect to the normal distribution. However, we know that total handling time is log normally distributed, so let's make it for a log normal distribution. You can do this by editing your last dialogue and then going to Distribution. And now choosing the Lognormal distribution, okay? Okay. Let's first take a look at the Empirical CDF with a normal distribution. It looks like this. You have the THT on the horizontal axis, and the percentage of calls on the vertical axis. Furthermore, there is a blue and the red curve. The blue curve represents the data, the sample. And the red curve represents the probability distribution which is a model for the population. However, you see that the lines do not perfectly overlap. Let's check the Lognormal empirical CDF. It looks like this. You see that the curve overlap almost perfectly. Now let's use the curve to calculate the probabilities that we were interested in, in Minitab. This is the Empirical CDF of the total handling time based on a Lognormal distribution. Now click on the graph and click on your right mouse button and you go to cross hairs. This gives you this cross to navigate your graph and we were wondering how many calls do we handle within our target of 240 seconds. Well you can go to 240 seconds and then you go to the red line which visit the prediction amount. So we see in your top left you get the exact coordinate in 240 seconds, 54% of the calls are handled. Okay, well, you can wonder, for say, how many calls do we handle within 600 seconds? Well, you go to 600, you go to the red line, and you see, well 600 is about 93%. Alternatively we can turn it around and say what's my service level that I can promise my client in 95% of the calls? Well for that you go to 95% and you go to the line and you say well in 95% of the calls I can promise you we will be done within 683 seconds. If you want to have this a little bit more precise, you can also click on your right mouse button again and go to add and go to percentile lines. You can say okay, show me the percentile line at the y value of 95%, that was all we were interested in, then before that we said okay, I want to know how many percent of my calls are handled within 240 seconds. Okay, and now many top computes this exactly for you based on the red curve. So for 240 seconds we have 53.8% of the calls and in 95% of the cases we can promise to be done in 661 seconds. In the previous video I showed you how to use the probability plot to find the distribution of your CDQ. When you know this distribution we can use the Empirical CDF to calculate percentages. You can use the crosshair function or the percentile lines to find these percentages. The Empirical CDF is a useful tool in the analyze phase of you Lean Six Sigma project.