Throughout this week, we've been looking at developing some simple probability distributions culminating in the last section in our first family of distributions namely the Bernoulli distribution. Now, I've called this section the distribution zoo. I'd just like you to appreciate that there are hundreds, possibly thousands of different probability distributions out there, each appropriate for modeling different aspects of the real world. Now clearly, we could never hope to cover all of them in a short section such as this, but I'd just like to perhaps draw your attention to a few very important and commonly used ones, and also to emphasize that there are often some relationships, some links, some connections across these distributions. So let's start with the Bernoulli distribution from before. That meant we had a random experiment which resulted either in a success or a failure. So let's consider extending that Bernoulli distribution to the binomial distribution. So, a Bernoulli example might be tossing, let's say a fair coin once and observing whether we got heads or tails. Well let's suppose we wish to consider not tossing it just once but let's say n times. So maybe n is equal to 10, we toss this coin 10 times and we wish to observe the number of heads, the number of successes which occur. So where is the Bernoulli distribution had a single parameter that probability of success pi, the binomial distribution has two parameters associated with it, n, the number of these Bernoulli trials which we're going to conduct, and also the probability of success pi. Now to apply a binomial distribution, we require a few conditions to be satisfied, four specifically. Firstly, that each individual outcome results in a dichotomy of outcomes of success and failure. So multiple iterations of a Bernoulli distribution would satisfy that requirement. Secondly, we require a constant probability of success. Well, if I'm tossing the same coin, be it a fair coin or a biased coin, if it's the same coin, it's fair to assume that it's a constant probability of heads on each toss of that coin. Thirdly, we would require the fixed number of trials, i.e., how many times we wish to toss the coin. We may choose to set that, let's say, to be 10. And fourthly, we would require the outcomes of each individual Bernoulli trial to be so-called, independent events, i.e., I toss a coin and let's say I get heads, but the fact that I get heads should have no bearing on what I might get on the second toss of that coin. So provided these conditions are satisfied, then we could model the number of successes, the number of heads as following a so-called, binomial distribution. A third one could be the Poisson distribution. Now we call this the distribution zoo suggesting some animal thing. Well, French scholars among you will know that poisson is the French word for fish. Now this does not mean that this is a distribution purely concerned with counting fish, rather it was named after Simeon Poisson, just as Bernoulli was named after Jacob Bernoulli. I am hoping for an Abdi distribution to be developed one day but that's still a work in progress. So this Poisson distribution, when might we wish to use this? This is very useful if we're interested in the number of occurrences of some phenomenon per unit of time or maybe per unit of distance. So an example might be the number of telephone calls received at a telephone exchange or a call center per minute or per hour say. Or perhaps the number of arrivals at a check-in desk at an airport. So there, a Poisson distribution would be the appropriate one to use. Now, please see the accompanying materials online for a few more details about the use of the Poisson distribution among others. But appreciate for the purposes of now, that this is a separate distribution from the binomial. Now I mentioned that there was some links, some connections, some relationships across distributions and indeed there are. And a very famous example, a bit of a fun one to consider is Bortkiewicz's horses. Now, this dates back to a study in 1898, and if my German pronunciation is any good, we can call this Das Gezets der kleinen Zahlen, the law of small numbers. So this is a very famous example, which looks at some real world data looking at the number of fatalities of soldiers in the Prussian Army across 14 army corps from 1875 to 1894. So, here's a situation whereby we can use one distribution to approximate another and to see that how one of these formal families of distributions can provide a very good representation to some real world phenomenon. So if we consider, we spoke about successes and failures, and successes are not necessarily good thing. Here, I'm going to call a success a soldier being killed by a horse kick. Clearly this is not a particularly nice outcome but we said success doesn't necessarily have to be a nice thing. So, the number of deaths by horse kick, would typically be modeled as following a binomial distribution. Let's say there are about 50,000 soldiers in one army corps. Now if you look at Bortkiewicz's data set, the frequencies, the number of fatalities by horse kick is very small relative to the size of those army corps. In any given year, in any given corps, you might get one, two, three, maybe four fatalities, which is very sad for those particular soldiers concerned, but nonetheless represents a very small proportion of that sort of population size of around about 50,000. So we could model the number of deaths by horse kick as following a binomial distribution, where the number of trials n, let's say is about 50,000, the size of the each army corps, and pi, the probability of success, would correspond to the probability of being killed by horse kick. Now we said some sessions ago about how do we come up with these probabilities. It could be subjective estimates, it could be a sort of relative frequency approach, i.e., through experimentation or maybe a theoretical derivation. Well here we could just look at the evidence and see what proportion of troops were killed over that 1875 to 1894 period. That's a very small proportion but let's take this as representing our probability of success. So, we have a binomial distribution and we could use this to work out the probabilities of particular events, i.e., what is the chance that one soldier will be killed by a horse kick, let's say in the next year? Well, I mentioned the Poisson distribution, and indeed this is a lovely example to show how one probability distribution can provide a good approximation to another. Sometimes, under certain limiting conditions. Now, of course, more generally you'll become more familiar with these things over time and with practice and with exposure to multiple different probability distributions. But for now, trust me in that, we can use the Poisson to approximate the binomial if we have a very large number of trials n, and a very small probability of success pi. And I think as far as Bortkiewicz's horses are concern, we'd seem to satisfy those requirements. 50,000 is a very large number of trials and the chance of horse kick was very small indeed. So with this, we can actually use the Poisson distribution to approximate the binomial. Now, without going into too much technical details at this stage, just note that as the Bernoulli distribution had the pi parameter. The binomial distribution have the parameters n and pi, while a Poisson distribution has a so-called, rate parameter, which actually reflects the expected value of a Poisson random variable, and we call this thing lambda. Well, just for this quick illustrative example, what we'll do is use the lambda of the Poisson distribution, the specific member of this distributional family and we'll take this to be equal to the product of our binomial parameters of n and pi. So if we do this, could we use the Poisson distribution to provide a reasonable approximation to model the distribution of deaths by horse kick that we actually observed in Bortkiewicz's data set? Well, on-screen, you will see a distribution which shows the true proportions of deaths by horse kick across those army corps in those various years, alongside the corresponding Poisson approximated probabilities. Now, just by eyeballing this graph and ignoring the technical details of how these values are calculated, one sees that these Poisson approximations provide a very good approximation to what we actually observe in the real world. So, just to round off. In week one, we introduced the concept of a model, and we gave a definition that a model was a deliberate simplification of reality. Well, out of the distribution zoo, the numerous probability distributions which exist out there, when we wish to try to model some real world phenomenon, what we ideally like to do is to take a well-known probability distribution, one which adequately captures the stylized facts of the real world and wished to use it. Recognizing, as we did in week one, that a model does not equal reality, but a good model is a simplified version of reality such that it is approximately equal to reality, and the difference between them apart from reality is very small. So Bortkiewicz's horses is a lovely if somewhat a trivial example of how one probability distribution can approximate another and indeed this approximating distribution provides a very good model to represent what we observe in the real world. So, I appreciate that there are hundreds of different probability distributions out there, each designed to model different types of real world phenomena. And depending on what kind of situation you wish to model, one would wish to choose from the distribution zoo, the library of distributions, if you will, one which adequately captures the stylized facts of what we observe in the real world.