[SOUND] There are many times that we are interested in inferring about the population proportion. For example, what proportion of your customers would recommend your business to their friends? What proportion of voters will vote for candidate a? What proportions of American Airline flights leave on time? When you ask questions like this, there are only two possible outcomes. For instance, someone either recommends the business, or not, they vote for the candidate, or not, or the flight leaves on time, or not. We can use sample studies to make inference about the population proportion, just the same way we did for the mean. The idea is the same, the math is the difference. So let's explore this now. This jar of marbles can represent the two possibilities when we look at the proportion, for instance, the red marbles could be all the voters who will vote for candidate a. And the blue marbles are those who won't. If this jar was the population of all voters, how will I know before the election day what percentage will vote for candidate a? This is exactly what political polling is all about. We will take a sample and then use that information to make an inference about the population at large. A randomly selected sample will be used to draw conclusions about the population, and that is part of the statistical inference process. But because samples are drawn randomly, all forms of statistical inference are tarnished with a degree of uncertainty. And this is where our confidence interval becomes important. To develop the confidence interval, we need to state the level of confidence which is 1- alpha, then find the sample proportion denoted by p-hat. And once you have these two, then the confidence interval for the population proportion is calculated using the following mathematical operation. As was the case for inferring about the mean, we are adding and subtracting the margin of error from our sample proportion. Where z of alpha over 2 shows the number of standard error we are away from the center of the sampling distribution. Each day, Gallup asks a randomly selected sample of Americans, many different questions and tracks their answers. One question is, their opinion on whether the local job market condition has improved. From March 31st to April 2nd of 2016, Gallup interviewed the random sample of 1,526 adults living in all 50 states, as well as, the District of Columbia. There were 779 of adults who thought market conditions was good. What is the 95% confidence interval for the proportion of all adults who think the market condition is good? First, there are 1,526 people in our sample, then, 51% is the sample proportion who believe job market is good. Confidence interval is 95%, which means there is a 5% chance that our sample is not a good sample. But there is a 95% chance that the interval has the true population proportion. Margin of error, which is the value of z of alpha over 2 times the standard error. Therefore, in this case, it will be .025 or 2.5%. So based on these calculations, you'd get a confidence interval for population proportion, which is somewhere between 40.5% and 53.5%. It is really important to understand what a 95% confidence means. It goes back to the central limit theorem, which shows that if we keep taking samples from our population, each sample will have an average, or proportion, slightly different from another sample. One reason why you get different results from various political polls. But if you plot these sample means, or sample proportions, you get a normal distribution centered around, in the Gallup case, 0.51. And the spread is based on the standard error, so we get out in each direction up to plus or minus three standard error to get this spread. So at 95% confidence interval we are roughly at two standard errors. Which means the confidence interval for population proportion within these two boundaries, and based on our conclusion, that was 0.485 and 0.535. Now if you take another sample, you will have a different result. But at 95% confidence interval, it implies that if you take total of 100 samples, 95 of those will have a sample proportion which will fall within the interval we calculated based on our one sample. However, five samples or so, will result in proportions in the tails, and that is why we are sometimes wrong. In this case, roughly a 5% chance of getting a non-representative sample, which will mislead us. Here you see results for 95% confidence intervals when we took total of 100 samples. For this simulation, I assume that the actual population proportion was 0.2 and it was known, shown in the solid black line. Then I took 100 samples of 100 observation each. In this particular simulation, we ended up with 6 sample, shown in red, which when we use the sample information to develop the 95% confidence interval for the population proportion, we would have gotten a wrong impression. All the green bars represent a 95% confidence in the role for all the other 94 samples, which will have resulted in confidence interval, which would have included the actual population proportion. You can watch my illustration video later on, where you can see me develop this graph. Z of alpha over 2 is the multiplier, which represents the level of confidence desired, also known as the critical value. It is how many standard errors we are away from the mean of the sampling distribution. This multiplier gets larger as we increase the confidence level, thus resulting in the wider confidence interval. Going back to the Gallup example, we found the 95% confidence interval, and now I want you to find the 99% confidence interval. As before, the sample proportion p hat is 51%. Now, confidence interval is 99%, which means there is a 1% chance that our sample is not a good representative sample. But there is a 99% chance that interval has the true population proportion. Margin of error, which is the value of z of alpha over 2 times the standard error, then it will be 0.033 or 3.3%. So based on these calculations, we get a confidence interval for the population proportion, which is somewhere between 47.7%, And 54.3%. And this is a wider interval then what we got when they used a 95% confidence interval, which gave us 48.5% to 53.5%. Let’s look at what changing of confidence interval that’s to the width of the confidence interval. First, we do the experiment using 100 samples of 100 observation each. Using the confidence interval 90%, and then 99%. As you can see here, when we switch to 99% confidence interval the width of the interval gets wider. Look at the green bars, for 90% Versus the 99%. So this wider interval is also more likely to contain the true population parameter. Again, only one red bar for the 99% confidence interval, which is the only sample that would have missed the true population. However, while we are becoming more accurate, we are doing so by becoming less precise. In another word, the margin of error is larger for 99% confidence interval versus 98%. So there is a trade off here. To get more precise we can increase the sample size. Looking at 99% confidence level, I'm using samples of 100 observation each and comparing it to samples which have 1,000 observation each, illustrate this quite well. As you see in the bottom graph with sample size of 1,000, the width of interval is much narrower than it is for the top graph where the sample size is 100. So increased sample size has given us more precision. Once again, we can determine the right sample size, for the level of accuracy, and precision we desire. Now that you understand how to develop a confidence interval for mean and population proportion, we can turn our attention to how to improve our estimation interval. As you see, that is closely related to sample size. In the next lesson we will explore this topic in details.