So, this problem is particularly useful for motivating so-called Bayesian analysis. And, we've spent a lot of time in this class talking about frequentist analysis in the form of confidence intervals. And we've spent a fair amount of time talking about the likelihood, Probably, more time devoted to the likelihood than most introductory statistics courses deal. So, we need to give at least some time to talk about Bayesian statistics. So, here's how Bayesian statistics works. So, Bayesians have to pose it a prior on the parameter of interest. The prior is a density or mass function, It's a probability distribution on the parameter where the probabilities, at least in the classical Bayesian sense, represent our beliefs about that parameter. And then, the likelihood is the component of the Bayesian equation that depends on the data, the objective part. And then, the posterior we're going to obtain as the likelihood times the prior. So, this is exactly like, if you remember back when we were thinking about diagnostic tests. We had, say, for example, some prior belief that a person had the condition that the test was trying to diagnose. We have the data, which is the result of the test. And the posterior odds of the person having the disease or whatever the test is testing, wound up being related to the likelihood ratio times the prior odds. And so, there's, it's the same exact sort of relationship here, posterior equals likelihood times prior. Now, I have to put a proportional to sign here because it's not exactly equal to, we're off by a constant or proportionality. But, it's easiest to think of as this way, we take out likelihood, we multiply it by our prior and we get our posterior. The rub in Bayesian statistics Bayesian statistics is very neat and conceptually clean way to think about statistics. The rub is really in here, specifying the prior. That's where we get into trouble in Bayesian statistics. And we'll talk maybe a little bit about that. But mostly, in this class, we're just going to talk about the mechanics of how you go about performing a Bayesian inference. And then, you can take later classes to delve into the specifics of all the different ways in which Bayesians can think about doing analysis. So, let's talk about how we can specify a prior for our binomial proportion. So remember, our binomial data is discrete. It can take only values between zero and n, but the proportion that we're trying to estimate is a number that, let's say, we're going to treat as if it's continuous. So, if we're going to specify a probability distribution on that parameter, it's going to have to be a continuous distribution. So, we need a continuous distribution that's bounded from below by zero, And bounded from above by one. And ideally, it would be a nice distribution that's easy to work with. Well, there is one such distribution, it's called the Beta Distribution. So , the beta distribution winds up being kind of a default prior for binomial proportions. And the beta density depends on two parameters, alpha and beta. Don't confuse the alpha here from the alpha earlier on in the lecture that was related to the coverage rate of the confidence interval. So, it depends on two parameters, alpha and beta. And the beta density looks like this. It's this so called gamma function. Gamma of alpha plus beta divided by gamma of alpha times gamma of beta. And then, p raised to the alpha -one times one minus p raised to the beta -one. And here, p is allowed to range between zero and one. This constant term out front, Gamma of alpha plus beta divided by gamma of alpha times gamma of beta, That's simply the constant of proportionality that you have to obtain to get this integral, Integral p to the alpha minus one, one minus p to the beta minus one, to integrate to one. So, you had some problems very early on in the class where if you had a kernel of a density, in this case, p to the alpha minus one, one minus p to the beta minus one, that had a finite integral, what you had to do was divide that function by its integral over the whole range of values, and you get a proper density. And that's exactly what people did to get to beta density. So, here is this density. It does integrate to one. And maybe it's a little bit beyond the scope of this class to verify that it integrates to one. So, let's talk about some of the properties of the beta density. So, the mean of the beta density is alpha over alpha plus beta. And remember, alpha and beta are positive. So, alpha over alpha plus beta has to be a number between zero and one. So, we're good that the mean of the density lies in the range of values for which the density is greater than zero. The variance of the density works out to be alpha times beta divided by alpha plus beta squared, alpha plus beta plus one. And we've seen special cases of the beta density before. Take the special case when alpha equals beta equals one. Well then, p to the alpha minus one, one minus p to the beta minus one, that all just goes away and this density is just a constant between zero and one. And we may not know what the gamma function of alpha plus beta over gamma of alpha times gamma of beta is, But you don't need to because you know that the density is a constant density between zero and one. It has to be exactly the uniform density then. So, the uniform density is exactly a special case of the beta density. Here, on the next slide, I plug in a bunch of different values of alpha and beta and I show you the shape of the uniform density. So, if I plug in alpha equal to beta equal to 0.5, then I get something that looks like a U shape. And, it heads off to infinity, height of infinity at both the zero and the, as it heads towards zero and one. If alpha equals 0.5 and beta equals one, it looks like this shape right here. And then, as beta gets larger and larger, the rate at which it drops down to zero as p approaches one gets faster. And then, of course, it just reverses itself. If beta is 0.5 and alpha is one, or alpha is two. Again, here's the uniform distribution when alpha and beta are both one. If you plug in an alpha of one and a beta of two, you just get a line pointing downward. If you plug in an alpha two, beta of one, you get a line pointing upward. Probably, the most kind of typical-looking cases of the beta is when the alpha and beta are both greater than one, and then you get a hump-shaped density. If they're equal, it's centered right at 0.5, and as alpha and beta get bigger and bigger it gets more peaked around 0.5. But, by allowing alpha to be bigger than beta or beta to be bigger than alpha, you can get this to be a distribution that's skewed towards zero or skewed towards one. So, you can get quite a few shapes from the beta density by playing around with alpha and beta. So, if you're Bayesian, what you need to do is you need to pick values of alpha and beta that represent where the shape of the density represents your beliefs about the perimeter p. And then once you do that, then you can start doing Bayesian analysis. So, here on the next side, we need to choose values of alpha and beta so that the beta prior's indicative of our degree of belief regarding p in the absence of data. And then, we're going to use the rule that the posterior is the likelihood times the prior. And again, because we're talking about constance of proportionality, we'll throw out anything that doesn't depend on p. So, in this case, the posterior is proportional to the likelihood, which is p to the x, one minus p to n minus x. And here, when I say proportional to, I mean proportional to in the parameter, p. So, p to the x, one minus p to n minus x, that's the likelihood. And we throw out the binomial constant and choose x cuz that doesn't depend on p. And then, we have p to the alpha minus one, one minus p to the beta minus one. And we throw out the ratio of gamma functions because that doesn't depend on p. Now, we multiply those together and we get p to the x plus alpha minus one, one minus p to the n minus x plus beta minus one, and the posterior is a density, that has this form. Now, we know it's proportional to that. But remember, it's proportional to that but look, This density that we see here is exactly just a beta density, Right? It's p raised to some power minus one, one minus p raised to another power minus one. In fact, the alpha is just now x plus the prior alpha, and the beta is just the number of failures plus the prior beta. So, we could even tell you what the ratio of gamma functions you would have to have to make this posterior proper density, To normalize the posterior. But, we don't need to do any calculations or integrals to do that. We can do that just by looking at it and saying, oh well, if I take a binomial likelihood and multiply it times a beta prior and think of that as a posterior density, Then that posterior density has exactly the form of the kind of core part of a beta density so that I know it's a beta density. So, if the posterior is a beta density with parameter alpha tilde equal to x plus alpha and beta tilde equal to n minus x plus beta, we know lots of its properties. As an example, we know what the posterior mean is. So, what to I mean by posterior mean? So, the posterior is the distribution of the parameter given the data, Right? So, the likelihood is the probability of the data given the parameter. The prior is the probability of the parameter disregarding the data. So, the posterior winds up being the probability of the parameter given the data. So, we can calculate, as an example, the expected value of the parameter p given the data. And, because p is in the posterior, a beta density, this works out to just be the expected value of a beta distribution which we've learned earlier as being the alpha parameter divided by alpha plus beta. So, in this case, it's alpha tilde divided by alpha tilde plus beta tilde. Well, let's just plug in alpha tilde equal to x plus alpha, And beta tilde equal to n minus x plus beta. And here, I do some manipulations and show that you can get down to the point where it works out to be x over n times n divided by n plus alpha plus beta plus alpha over alpha plus beta times alpha plus beta divided by n plus alpha plus beta, Which is a mouthful, but let me go through each term. X over n is the sample proportion, It's the MLE, it's p hat. So, x over n, is p hat. Let's take this second term, n over n plus alpha plus beta. That's a number that has to be between zero and one because n is positive and alpha and beta are positive. So, we have n divided by something that's bigger than n. And then notice, okay, so we have this number that's between zero and one, let's call it pi. Okay? And then, alpha over alpha plus beta is the prior mean. Okay? And then, this term right here, alpha plus beta over n plus alpha plus beta, You can check yourself, that's one minus pi where we defined pi just a second ago. So, this equation works out to be an average of the MLE and the prior mean, Okay? Now, it's not an average in the sense that it's 0.5 on both things, right? It's a simplicious average. So, pi can be between zero and one. And then, hence, one minus pi is the opposite. But that's exactly an average. It's an average of the MLE. So, let me try to state that in English. The posterior mean is the average of the MLE and the prior mean. Now, the average is a very specific kind of average where it weights the MLE different than the prior mean. So, let's look at these weights. So here, let's suppose n is really big. Then, what happens to n over n plus alpha plus beta? Well, this term pi gets very big. It gets much closer to one, and hence, one minus pi gets much closer to zero. So then, when n is very big, this mixture weights the MLE a lot more than it weights the prior mean. In other words, as you collect more data, your prior means less and the data means more. What happens on the other hand as alpha and beta get very big and n remains constant? Was alpha and beta get really big, Then we get alpha plus beta over n plus alpha plus beta. This one minus pi part goes to one, that gets very big. So, one minus pi gets very big, and pi gets very small. So, what happens as alpha and beta get big? What does that means in terms of our prior? Well, if you remember back from a couple of slides ago, the shape of the beta density as alpha and beta got bigger and bigger, the shape of the beta density got more concentrated around the mean. And what that entails is that it's saying that our prior belief was a lot more confidence and it's specific value of p. And so, what that implies is if we are incredibly certain in our prior, then that swamps the data, Right? If we're incredibly certain in our prior that swamps the data. Our MLW has very little weight and our prior has a lot of weight. And this actually explains a lot of politics for you, for example, Right? So here, your opinion is a mixture of the data and your prior beliefs. If you're immovable off your prior beliefs, then it doesn't matter how much data you collect, Tight? On the other hand, of course, if your, in this case, if your alpha and beta are quite low, then you wind up that the MLE dominates the posterior mean. So, let me just rehash this because it's an important point. The posterior mean is a mixture of the MLE p hat and the prior mean as pi goes to one it's end gets large. And for large n, the data swamps the prior and the MLE dominates. For small n, then the prior mean dominates. So, when you have very little information, you rely on your prior knowledge. The idea behind Bayesian statistics is that it should sort of generalize how science is ideally working. As data becomes increasingly available, prior beliefs should matter less and less. And then, again, prior that is degenerate at a value, so as alpha and beta go to infinity, you wind up with a prior that is 100% on a specific value of p. Then, no amount of data can overcome that prior. So, let's also look at the posterior variance. The posterior variance takes a nifty form as well. So, let's look at the variance of p, The posterior variance, given the data. So, p in absence of the data was beta with parameters alpha and beta. P given the data, via the Bayesian calculation was also from a beta distribution with parameters alpha tilde and beta tilde. So, we can just calculate the variances directly the variance from a beta with alpha tilde and beta tilde plugged in for alpha and beta. And here, you see I plug in for alpha and beta x plus alpha for alpha tilde, And n minus x plus beta for beta tilde. And you get this form. So, let me let p tilde equal x plus alpha over n plus alpha plus beta, and n tilde equal n plus alpha plus beta. Then, you wind up with the variance of p given x works out to be p tilde, one minus p tilde divided by n tilde plus one. Which is interesting, because it's not quite but very similar to the binomial variance, the binomial variance being p times one minus p over n. And so, the sample binomial variance would be p ha one minus p hat over n. So, it's an awful lot like that. So, it takes this very, very convenient form. And, in fact, let's go back to an earlier point. If alpha and beta were both two, Then the posterior mean works out to be p tilde, x plus two divided by n plus four. And the posterior variance works out to be p tilde one minus p tilde divided by n tilde plus one. So, this is exactly the sample proportion that we used in Agresti-Coull interval and the posterior is almost the same, with the exception of this plus one. So, what's a plus one among friends? So, we'll just say, it's roughly the same variance as the Agresti-Coull interval. So, this one way to motivate the Agresti-Coull interval, It is centered at the posterior mean and it's standard error is not exactly, but almost the posterior variance. So, you could view it as a normal approximation to a posterior interval. And so, that's one way to motivate the Agresti-Coull interval is just to say alpha and beta equals two from a Bayesian analysis, and you get something that's very, very similar. So, let's go back to our previous example and just do some of the Bayesian calculations. Let's say x13 = thirteen and n20. = twenty. So now, let's consider a uniform prior. Alpha equal beta equal one. In that case, the prior is just one, a constant between zero and one. What's interesting ab, in this case, about the uniform prior is that the posterior is equal to the likelihood, Right? Because you have posterior equals likely at times prior, in this case, the prior is just a constant one. So, the posterior equals the likelihood. Now, you can't always get away with doing this. This is particular to the fact that the parameter that we're interested in is bounded between zero and one. For example, if your parameter was anything between minus infinity and plus infinity, you can't put a prior of one on that and have a finite integral. Now, people have looked into that actually, and they said, well, maybe you can do it. And, that's for part of the classes. For this class, it's kind of nice to note that in this case, if we set alpha equal to beta equal to one, we get a proper density exactly a uniform density, and our posterior is exactly equal to the likelihood, Which is interesting. If instead, we were to set alpha equal to beta equal to two, remember this prior just looks like a hump right at 0.5, then the posterior works out to be p to the x plus one, one minus p to the n minus x plus one. And so, the very classical way to do Bayesian analysis is you say that the prior is sort of governed by expert knowledge, and the likelihood then is, of course, the objective part that's governed by the data. And, of course, to say that it's the objective part is a little bit misleading because someone had to subjectively elect the model data as if it's binomial. So, there is of course, a subjective part to the likelihood itself. But, you know, let's put that aside. We have the supposedly objective part in the likelihood, We have the subjective part in the prior and then the posterior is the mixture of how you update your subjective beliefs with your objective prior knowledge. That's the kind of classical Bayesian inference. But people said, well, It's in many cases, many, many cases, people don't want statistics that depend on expert opinions to start with. So, this idea of a subjective prior is really is just not palatable to the idea of science. So then, Bayesian's went back and thought hard about it and they said well, maybe we can come up with priors that are sort of, go-to proiors for us. Things that we can just use where we don't have to think about how to specify the prior, it's so-called objective priors. And because of that, the collection of Bayesian techniques then sort of ballooned onto a variety of different ways of thinking about how to be a Bayesian. The only thing in common they have is that they utilize the Bayesian machinery that the posterior is equal to the likelihood times the prior. But then, they have lots of different ways of thinking about it. And one way of thinking about it is the so-called Jeffreys prior, where people said, well, maybe we can pick a prior that has these specific mathematical properties. And for this particular problem, the Jeffrey's prior sets alpha equal to beta equal to 0.5. The uniform prior is another nice one that's somewhat objective cuz we could say, well, why don't we put a constant prior? That way the likelihood is the posterior that seems pretty objective to me. There are problems with doing that. The point is, is that, Uniformity on one scale is not uniformity on another scale. So, the fact that the prior's uniform for p means that it's not uniform for p2, squared, for example. That you would calculate the distribution of p2, squared, it's no longer uniform. So, a uniform distribution doesn't adequately represent absence of belief. The, The problem with that is there's no probability density that measures absence of belief about a parameter. If, if you've written down a density, you've specified belief. You've completely characterized it's probabilistic behavior. So, so anyway, these are very technical problems with Bayesian analysis and they all kind of revolve around, how in the world to we set this prior. But, in this case, I think people would say the uniform prior seems pretty reasonable, the Jeffrey's prior seems pretty reasonable. And putting a prior that's humped at 0.5 because, you know, shrinking everything towards 0.5 also seems pretty reasonable. All those things don't seem so bad. And the benefit is, no matter what you choose, someone else could pick a different prior as long as you gave them the likelihood, someone else could pick a different prior than you. So, the, the idea that you could just pass around the likelihood, and everyone could pick their own prior is also quite palatable way to do Bayesian inference. So, I'm going to go through some pictures just to show you and I fudged a little bit. I'll tell you how I fudged a little bit on the pictures. So here, I normalized everything so that it's one. But then, here, in this first one the problem is that the prior heads off to infinity near zero and near one. So, if I were to normalize it, I would just get I can't divide by infinity so I, I fudged a little bit. So, this U-shaped curve looks different than the U-shaped curve that I'm plotting here. So, in order to get it on the same plot, I fudged a little bit. So, if you try and do this, you'll see how I fudged. But, okay. So, the U-shaped curve isn't to the right scale but I put it on the same scale as the posterior in the likelihood which both of those I normalize so its peak was at one. So, the blue is the prior. In this case, the Jeffrey's prior to alpha equal to beta equal to 0.5. The green is the likelihood and the red is the posterior. So, you see what happens when you multiply the green times the blue and then re-normalize, You get a red curve that looks an awful lot like the likelihood. So, in this case, the Jeffrey's prior doesn't move us off our likelihood very much. And the posterior inference, which is entirely based on this red curve, is pretty much exactly identical to the likelihood. Then, of course, on the next slide, if the prior is completely flat, the posterior and the likelihood are identical. So, there is no green curve in this case, it's exactly underneath the red curve. Now, let's look at alpha equal beta equal two. Then, my prior is this hump shape at 0.5. You can see that my likelihood is the green shape, and my posterior is the red shape. And you can see it's ever so much shifted towards 0.5. So, the red shape is the mathematical compromise between the knowledge codified by my blue prior, And the objective part, codified by the likelihood. Again, I should put objective in quotes. Now, let's make it more extreme to kind of show you what's happening. Let's put alpha2 = two and beta10 = ten. And then, the blue curve gets shifted a lot towards zero. As beta gets much bigger than alpha, the prior becomes more pushed up towards zero. As alpha becomes much bigger than beta, it becomes pushed up towards one. And then, as, if alpha and beta are equal and they get larger and larger, gets more peaked around 0.5. So anyway, now we're all pushed up towards zero. And you can see, here we have the blue curve is the prior, pushed up toward zero because beta is much larger than alpha. And it has a finite maximum because both of them are bigger than one. And then, we have the green likelihood which has been constant through every, one of these pictures. And then, we have the red posterior which is the compromise between the evidence represented by our data and the assumed likelihood, and our blue prior which represents our knowledge, our prior knowledge. And so, the red curve is the appropriate mathematical compromise between these two opposing positions. And, in this case, let's say, you had a prior belief that prevalence of hypertension was very low, you thought it was on the order of 0.1. Your data says, no, no, no. It's very high. It's on the order of 0.65, right? And so, your likelihood is that compromise to say, well, your data has moved me very far away from my prior towards the MLE of 0.65 and that's how the mathematics works out. And if, as n goes to infinity, this green curve, the likelihood, will get more and more peaked around whatever the true value is, and it'll just grab this red curve and pull it increasingly towards it. So, what happens in politics, for example? Well, people have their blue curve is very spiked, right? They're dead set in their opinions and no amount of data is going to move them off of it. So here, is an example where I have alpha = 100 and beta = 100. What happens then? Alpha and beta are equal so the beta distribution centered at exactly 0.5. But, as alpha and beta goes to infinity, the variance of the beta distribution gets really small so our prior, we're quite sure according to our prior, that beta is exactly 0.5. So then, we collect our data, and it says, ehh, I don't think so. Beta is not 0.5. It's somewhere above 0.6 more likely, Right? And then, what happens to our posterior? Our posterior says, well, I don't know. You were very sure. I'm not, I'm going to kind of ignore the data because of how sure you were. So, this is, of course, the problem with extremely informative priors, right? No amount of data is going to knock you off them. Here, the red curve almost overlaps with the blue curve. So, the red curve in the previous examples is the posterior. The posterior is the distribution of the parameter, given the data. In Bayesian statistics, that's everything. If you give someone the posterior, that's it. You've given them everything, that, that's the summary of evidence as far as the Bayesian is concerned. But it's a curve, it's hard to work with. You can only look at it in graphs. And then, if you have multiple dimensions, it gets even worse. So, you know, we want to summarize it. Well, one way to summarize that curve is by it's mean, Right? The associated mean, the posterior mean. Another way to summarize it is by it's variance, the posterior variance. But we might want something analogous to a confidence interval, but a confidence interval is a frequentous property. It talks about supposed fictitious repetitions of experiments, that's not within the Bayesian ideology really. So, we need something that's analogous to a confidence interval. For all likelihood, we had something that was analogous to a confidence interval and we called it a likelihood interval. So, the Bayesians created something and they called it credible interval. The Bayesian credible interval is just an analog of a confidence interval. So, in 95% creditable interval, a to b, Just satisfies that probability that the parameter lies in that interval given the data is 95%. Really simple. You know, if you believe in the Bayesian inference, higher values of the posterior represent kind of better supported values of the parameter. So, just like the likelihood, you're better off chopping off the posterior with the horizontal line and figuring out exactly what values of a and b that entails to force it to be at 95%,. And, that's called the highest posterior density interval. And I have a picture here, where I kind of do that. So, if you could imagine this horizontal line, the red area would vary as we moved it up and down. As we moved it down, the red area would get bigger and bigger. As we moved it up and up, the red area would get smaller and smaller. So, you want to keep moving that horizontal line up and down, until the red area is exactly 95%,, right? And this is density, so that would be the area under the curve is exactly 0.95. So, once you hit that perfect point where it's exactly 0.95, and can see where it intersects the curve. And then, drop down to the horizontal axis, And those two points are your a and b. So, the probability of p lies between that a and b is, of course, just the integral between those points, which is exactly the red area. So, you wind up with a credible integral. In this case, it works out to be 0.44 t 0.84 which should be no surprise. And, and in r, you can do that with the binom package. In this case, binom.bayes thirteen, twenty, thirteen successes, twenty trials. And you have to do type equals highest, And that gives you the 95% credible interval. And, it uses a Jeffrey's interval. As I said earlier, Bayesian credible intervals, Even though they are constructed using Bayesian thinking, If you turn around and evaluate them with frequentist performance, they tend to perform very well. Just like our Agresti-Coull interval which wasn't exactly a Bayesian confidence interval but was close enough among friends. That actually has much better performance than the directly CLT constructed Wald interval. The other thing I want to mention before I go through the final bit of this lecture is that another way to create a confidence interval would be to pick a to be the lower 2.5th percentile of the posterior distribution. And pick b to be the 97.5th percentile of the posterior distribution. And that would give you exactly a 95% interval, for example. But, the posterior height of the lower point and the posterior height of the upper point would be different. So that is potentially a problem. On the other hand, if you do the HPD interval, you've got to vary this line. You have to solve a root equation to obtain them. So, it's a little bit annoying. And finding the percentile interval, the so-called percentile interval, the lower 2.5 percentile and the upper 97.5th percentile as an example to get a 95% creditable interval is very easy. So, another way to construct a Bayesian credible interval is just to take the lower and upper percentile and run with it that way. I think you're better off doing the HPD interval if you can. So, I want to end with one nice aspect of the Bayesian credible interval, if you're hardcore about these things. So, let me just say for a minute about what I mean by being hardcore. So, probably many of you have taken an introductory statistics class. And probably many of you have seen the baffling interpretation associated with frequentist confidence intervals presented as a test question. And that is just, you know, kind of hard-ball frequentist. And it's accurate, you know, I don't want to criticize it, it's accurate. And sob here's an example. We have a Wald interval, it works out to be 0.44 to 0.86. And let's assume that the 95% coverage of the Wald interval is good enough. The CLT is kicked in, in this case, and we're fine. And we're not worried about the mathematical performance of the confidence interval. We're, we're interested in the, just the strict interpretation assuming that the coverage is correct. Then, the fuzzy interpretation is that we're 95% confident that p lies between 0.44 to 0.86. But, that's not the actual interpretation. The actual interpretation is the interval 0.44 to 0.86 was constructed such that in repeated independent experiments, 95% of the intervals obtained would contain p. That's the actual confidence interval interpretation. It's this idea, performance frequentist refers to frequency ie., The definition of probability of it being entirely entwined with fictitious repetitions of experiments. Or, you know, lifetime batting averages for success probabilities and that sort of thing. That's what frequency interpretation is the, the actual interpretation is almost exactly no one interprets frequentist confidence interval this way because it's such a mouthful. Everyone is kind of thinks, well, My interval 0.44 to 0.86 is a interval that accounts for uncertainty at a, kind of, control rate of about 95%, where that control rate has a contextual meaning with respect to frequentist statistics. And, and I understand that, but I don't spit it out every time I interpret the confidence interval. Every now and then, a confidence interval makes its way into the news, and news people never interpret it right because it's hard to interpret. So, a likelihood interval, let's go on to the next one. The likelihood interval was 0.42 to 0.84, the 1/8th likelihood interval. And, in the fuzzy interpretation for the likelihood interval was that the interval 0.42 to 0.84 represents plausible values of p. Here, plausible defined by the eight fold likelihood ratio associated with the end points, relative to the MLE. So, yeah, that's okay. And so, the fuzzy interpretation is okay, it's no worse than the frequentist fuzzy interpretation. But the actual interpretation, let's go through at the interval 0.42 to 0.84 represents plausible values for p. In the sense, that for each point in the interval there is no other point that is more than eight times better supported given the data. Again, yikes. You know, this is a mouthful and, you know, anyone who constructs a likelihood interval is not going to interpret that way. They're going to say, You know, it's an interval, it accounts for uncertainty, it's based on the likelihood, the calibration is based on sort of eightfold likelihood ratios, and I understand what it means, but I don't spit it out every time I use the interval. The nice thing about the Bayesian interval is that you can spit out the actual interpretation every single time you use it because the interpretation's very easy. So, the Jeffrey's 95% credible interval was point 44 to point 84. The actual interpretation is the probability that p lies between 0.44 and 0.84 is 95% full stop. So, That's super easy. Now, there's a lot loaded in this word probability here because it's the Bayesian version of the word probability that maybe not everyone would like to agree with, And not everyone would like something that's more objective, or something like that. But nonetheless, if you're willing to buy into the Bayesian way of thinking, the simple interpretation of the credible intervals is quite nice. And this interpretation is, you know, if you see a confidence interval in the news or if you present a confidence interval to people who have just a little bit of statistics, this is how they want to interpret confidence intervals. And you can't say this statement for a frequentist interval, The probability that P is between 0.44 and 0.84 is 95%. Because in a frequentist way of thinking, The p is just a fixed parameter. It's either in the specific interval or not. It's not random, so the probability is either zero or one. I mean, if you're being a hardball frequentist, you can't make this statement. I've had, you know, people who have frequestist leanings say, yeah, you can just go ahead and make this statement because frequentist interval is kind of approximate bayesian intervals of sorts and they're fine with that. So, I think in all of the cases the kind of fuzzy interpretation that all the intervals, you know, they agree for one thing. And then, second of all, all of them measure some amount of uncertainty associated with our point estimate, they're calibrated in some way in the likely interval. You know, calibrated relative to likelihood ratios and the frequentist interval calibrated relative to frequentist coverage rate, and then the Bayesian interval calibrated relative to posterior coverage. So, that gives you a sense of Bayesian interval, likelihood interval, and frequentist intervals for the exact same problem. This is an interesting area. Bayesian statistics is quite popular and it's a good thing to know about. Bayesian statistics, even if you are a frequentist, you know, use of the Bayesian manipulations frequently turns out to be a very good way to create, say, confidence intervals or hypothesis test. It's just a very useful way to think about statistics. Well, that's the end of today's lecture and I hope you enjoyed it. And, if you're interested in reading about Bayesian statistics, there's tons of good stuff on the web and it's a very interesting area and I hope you do so. I look forward to seeing you next time for our very last lecture.