Hi, my name is Brian Caffo. I'm in the Department of Biostatistics at The Johns Hopkins Bloomberg School of Public Health, and this is Mathematical Biostatistics Bootcamp--lecture eight, on asymptotics. In this lecture, we're going to take a trip to Asymptopia. Asymptopia is a land where we have an infinite amount of data, so it should be fun. We're going to talk about limits, but limits of random variables. And so there's intricacies that you have to account for when you consider random variables instead of standard mathematical limits. And it's a quite difficult subject, but we're going to show that we have basically two tools, the Law of Large Numbers and the Central Limit Theorem, that are going to be our primary methods for looking at random variables. So let me just review numerical limits first. And I'm not gonna go into too much detail, and I'm gonna kind of treat it here as heuristically. Just to illustrate, in case you asymptotic's a little bit rusty. Suppose I had a sequence, where A1 was .9, A2 was .99, and A3 was.999. And I, and I hope you can get the pattern at this point. So clearly, this sequence in some sense of the word converges to one. It converges to .9999999, that gets closer and closer to one. As, the element of the sequence, gets larger and larger. Well we can formalize this. The formal definition of a limit is for any fixed distance we can find a point in the sequence. So if the sequence is closer to the limit than the distance from that point on. So take, for example. This sequence. The distance between AN and one is ten to the - N where N is the point in the sequence. And so, if we pick any distance, say, epsilon, we can find an n so that ten to the minus n is smaller than epsilon. And then, because ten to the minus n just keeps getting smaller as n get larger, then it's smaller than epsilon from that point onward. So that satisfies our definition of limit. So clearly, this converges to one. And it's kind of an interesting fact that and infinite sequence of .9s and one is the same number. But that's kind of an interesting fact that if you ever take a class on real analysis though, they'll discuss that sort of thing at length. But anyway I hope you get this basic sense of a limit as'n' goes to infinity the sequence converges it looks more and more like it's limit so that the distance just gets closer and closer and closer and never gets bigger again. The problem with that is that, that only works for just a series of numbers, right? Now we wanna talk about say, limits of averages of coin flips. And then it gets much harder. So take, for example, the average. And now we're gonna talk about. In average comprised of n observations. So let's say x n bar and now instead of saying x bar like we typically do which is our average, we're going to annotate it by a subscript n to show that it's an average of the first n of a collection of IID observations. So for example x bar could be the average of the result of n coin flips which is the sample proportion of heads. Well, there is a limit theorem for averages, and we would say that xn bar converges in probability to a limit. And we, relate this back to the ordinary definition of limit, by saying, well, the probability that x is closer, than any specific distance converges to one. So in this case, probability xn bar minus the limit being less than, any quantity epsilon that you fix. Then that probability is a number pn. Right? And the definition of convergence in probability is that sequence of numbers, pn, converges to one, right. So we've converted the problem of what does it mean for a random variable to converge, we've converted that back to the definition of convergence of a series of numbers. We've said convergence and probability. Implies convergence of the collection of probability numbers in the standard sense of the definition of convergence. So now we have a way of talking about how random variables converge. So, establishing that a random sequence of variables actually converges is hard, as you can imagine, it's hard. If you look back at the previous definition, it's not the easiest thing in the world to think about. So we have something that makes it a lot easier for us, and that's called the law of large numbers. So, if you've heard people talk about the law of averages, typically they don't know what they're talking about. But probably they are referring to the law of large numbers. And the law of averages and there is the law of large numbers. Basically says that if x and x on a IID from a population from mean-mu and various sigma squares so in this case we are going to assume that the random variables have a variance. Its interesting to note that there are distributions where there is no variance. You try to calculate the variance and you get infinity or something like that. In this case we're gonna assume that there is a variance and it's finite. Then the sample average of IID observations always converges in probability to mew. And that's the, called the law of large numbers. Probabilists make a lot of distinction over various kinds of the laws of large numbers. And they've worked very hard to get kind of minimal assumptions for the law of large numbers to work. And in fact, we're using a very lazy version of the law of large numbers. They would probably upset at us for teaching this one, but it's okay, we don't care. The basic idea I want you to get is that, averages converge to mu. Averages of IID Observations converge to mu, the population mean, that, from which the observations were drawn. This is a good thing, right? This basically says, if we go to the trouble of collecting an infinite amount of data. Then we get the number that we wanna estimate mu, exactly. Right? Which is good, because collecting an infinite amount of data takes a lot of time. Actually, infinite amount of time. If you're willing to make this many assumptions, the Xs all have a finite variance. It's pretty easy to prove the law of large numbers using Chebyshev's inequality which, Chebyshev's inequality if you remember had a pretty simple little proof so this kind of very complicated idea, it's amazing that it has a fairly simple little proof. So remember that Chebyshev's inequality states that the probability that a random variable is more than k standard deviations from the mean is less than one over k squared. So therefore the probability that Xn bar minus mu, in absolute value, is bigger than or equal to k standard deviation of x bar sub n is less than or equal to one over k squared. Now, let's pick an epsilon. Pick any distance epsilon. Because remember to establish the convergence of a limit of numbers we have to pick an epsilon. And now to establish convergence and probability, we have to show that the probability that xN bar minus mu, being bigger than epsilon, goes to zero or being less than epsilon goes to one, those two statements are equivalent. And so let's let K, from our previous definition, be epsilon divided by the standard deviation of Xn bar. K is not a random variable, right? Epsilon is a number that we pick. And standard deviation of Xn bar, Xn bar is a random variable, but the standard deviation of it is sigma over square root N. Okay? So this is just a number, there's nothing random in our definition of K. So if you plug that back in for K, right? You get the probability that Xn bar, minus mu, being bigger than epsilon, is less than or equal to, the standard deviation of Xn bar squared, which is the variance, of Xn bar, divided by epsilon squared. And we, from a previous lecture, already calculated what the variance of Xn bar is. It's sigma squared over n. So this probability is, less than or equal to, sigma squared over n, epsilon squared. Now as n goes to infinity, sigma squared isn't changing and epsilon squared isn't changing and the n's in the denominator, so this whole thing goes to zero, okay? So the probability that the random average xm bar is more than epsilon away from the mean goes to zero as n goes to infinity or we stated the probability that xm bar is less than. Epsilon from the mean goes to one as N goes to infinity. So either of those statements equivalently say that Xn bar converges in probability to mu. I think it's kind of staggering that it's really basically two lines is all you need to establish this fairly complicated result. Now on the next, page, I just have a, simple example where I, simulated random normal with a mean of zero, and I show the cumulative sample mean. So I took one random normal, and then I took that random normal and generated a second random normal and then averaged it. And then generated a third random normal and just averaged it with a remainder. And the iteration at the bottom is the number of observations that goes into that mean, right? And then on the vertical axis it shows the value of the average. And you can see if, first, there's quite a bit of variability, right? Remember, the variance of the average is sigma squared over n. The variability is going to zero and this dash line is the asymptote. Right? And as you can see this average, as we include more observations in it, is going to converge to this. You can see it already converging a little bit by 100 iterations to this dashed line. That's simply the law of large numbers. So let's cover some useful facts about the law of large numbers. One Interesting fact is. Functions of convergent random sequences converge to the function evaluated at the limit. So. This includes sums, products, differences. So, for example Xn bar, squared, converges to mu squared, right? Because, x bar converges to mu, and so this is just a function of x bar. So Xn bar squared converges to mu squared. Something different is that average of the squared observations converges to a different entity okay, right? Let's go through this a little bit carefully, just because it's kind of an odd little point. So Xn barred squared converges to mu squared, but if we sum up Xi squared the individual observation squared and divided by n that no longer converges to mu squared. Well why not? Well it's the difference between the square of the average and the average of the squares. So, in this case it's the average of the squares. In this case each xi squared is a random variable so we could just call it y instead of Xi squared. And then their average of these Xi squareds or average of these Ys is gonna converge to the population mean of those Ys. Well we can calculate that. We know what the expected value of Xi squared is because we can use the shortcut formula for the variance. Which, recall, was expected value of Xi squared minus expected value of Xi quantity squared. We can just work that formula to solve for expected value of Xi squared. And show that that's equal to sigma squared plus mu squared. So, the average of the squared observations converges to sigma squared plus mu squared, whereas the square of the average converges to mu squared. So it's kind of an interesting little point, but just remember that those things are different. And by the way, this little fact that we just sh-, showed, we can use this to prove that the sample variance converges to sigma squared, and we'll do that on the next slide. So let's actually go through this proof that the sample average converges. To sigma squared. And I think you'll see in the process of the proof that it doesn't matter whether we divide by n or n - one, it's going to converge to sigma squared. So here we have the definition of the sample average, summation xi - xmbar squared, all divided by n - one, we're going to use the unbiased estimate of the sample variance. Well, recall there was a shortcut formula for the sample variance and it worked out to then be the numerator had a shortcut formula that was summation Xi squared - n xbar squared. And so we're going to use that formula. And then we get summation Xi squared over N minus one minus N barred squared over N minus one. And let's just rearrange terms, and multiply and divide by some Ns because that N minus one is a little annoying. So we have n over n - one times summation xi squared over n - n over n - one Xn bar square. Let's look at each of these things in turn and remember, from the previous slide I told you, I didn't prove this, and just have to take it as true, is that, you know, if you multiply convergent sequences they converge to the product of the limits. If you add and subtract convergent sequences they converge to the difference of the limits and so on. So let's look at each of these terms one at a time. N over n minus one, clearly that converges to one. You don't believe me, plug n over n minus 1n for very big value of n in your computer and you'll see that it gets closer and closer to one. Probably the easiest way to see this is it's one over, one minus one over n and that one over n clearly goes to zero. Okay. We just on the previous slide talked about how, summation Xi squared over n converges to sigma squared plus mu squared. So, the second term converges to sigma squared plus mu squared. And we have minus n over n minus one, which again converges to one. And then Xn bar squared converges to mu squared. We talked about that on the previous slide. So we have this expression right here, sigma squared plus mu squared minus mu squared, which is just sigma squared. So that proves that the sample variance converges to sigma squared, and then of course, the sample variance if we happen to divide by the bias sample variance, if we happen to divide by N instead of N minus one, also converges to sigma squared. And then we can square root the sample variance and get the sample standard deviation and see that it converges to sigma as well, just by the rule from the previous page where we said that functions, in this case the square root function, of convergent random variables converge to the function of the limit. So what we've found is, we have our law of large numbers, and with a couple of rules that we just stipulated, we've got that the sample mean of IID random variables converges to the population mean that it's trying to estimate. The sample variance converges to the population variance that it's trying to estimate. The sample standard deviation converges to the population standard deviation that it's trying to estimate. And in all these cases you see the pattern that the sample entity converges to the population quantity that it's trying to estimate. Basically saying, that if you go to the trouble of collecting an infinite amount of data. Then you actually get the value you that want to estimate. You don't get it with noise, you get the actual value. We give this name a property and we say that an estimator is consistent if it converges to what you want to estimate. And the Law of Large Numbers is basically saying that the sample mean is consistent, and then we know, now know that the sample variance and the sample standard deviation are consistent, and it doesn't matter whether you're dividing by n or n - one, they're all consistent. But also remember the sample mean and the sample variance are unbiased as well. And by the way the sample standard deviation is not unbiased. Consistency by the way is a very weak property so the sample standard deviation is biased, unlike the sample mean and the sample variance. Consistency is a very weak property. Saying that an estimator is consistent is not even really a necessary property. It, it seems like it should be necessary, but if, if something converges to mu plus epsilon where epsilon is a miniscule number that is of no importance, then that estimator is not consistent. So it's fair to say that consistency is sort of a, kind of a weakly necessary but definitely not sufficient property for an estimator to be useful. We have also seen that being unbiased is neither necessary nor sufficient for an estimator to be useful either. For example, we've talked about the biased variance trade off that estimators can be slightly biased and you can want that, because you improve on the variance. So, what we're winding up with is a collection of properties that describe estimators. And you really need to think about the collection of properties as a whole to evaluate an estimator. And these various mathematical concepts are, are useful but they never, in isolation, tell the full story on the utility of a estimator. They might be useful for eliminating really dumb things, if something's definitely not consistent in a way that it doesn't converge anywhere near the estimate. Probably that's not something that you wanna use. But apart from those kind of stark circumstances, these properties you need to take as a collection to try and decide which estimators are the right ones to use. So let me give an example of an estimator that's consistent but not very good. So take the data, and, only take the first half of the collected observation. So we have, instead of Xn bar, we have Xn over two bar, right? That estimator is, of course, consistent, as n goes to infinity. You know, it just has, fewer observations than if you took all of them. But still the n is going to infinity in this case. It's just n over two. So that estimator's consistent, but it's got an obvious better estimator right in front of you. Basically, the estimator using all of the data. So there's a, a particular example of an estimator that's not consistent where there's a better estimator that comes to mind. Here you have to actually account for the fact that the estimate with all of the data has a lower variance than the estimate with half of the data. So that's enough discussion of limits and the law of large numbers. Next we're gonna go on to the central limit theorem. A very important theorem in statistics.