Hi. In this lecture we're going to talk about a really simple model of aggregation. So, here's the thing I want to model I want to model a situation where I've got a group of people -- it could be 100, it could be 1000 -- and each one is independently going to make a decision to do something. It could be to, y'know, go to the gym. It could be to go to the beach. It could be to go to the grocery store. What I want to try and understand is that we've got a whole bunch of people each one that is making these independent decisions What's the number of people that shows up? Now, to characterize that I'm going to use an idea called the probability distribution. So, to make this simple, let's suppose that there is a small group of people, like my family, which has four people in it. And I want to know "What's the distribution of number of four people who go for a walk on a given Saturday?" Well, if I think about the numbers could be -- there could be 0 people that go, there could be 1, there could be 2, there could be 3, or it could be that all 4 of us decide to go for the walk, right? The dog would prefer if all four of us went, but, y'know, there's going be some number that goes. So, I could take -- I could keep track of data. I could, y'know, chart this on, like, my wall somewhere. I doubt we could, right? And you can ask, "What's the likelihood that nobody went for a walk?" Maybe that's 10%. Now, what's the likelihood that 1 person went for a walk? Well that might be 15%. What about 2 people? That might be 40%. And what about 3 people? That might also be 15%. And then, what's the likelihood that 4 of us went for a walk? That might be, let's say, 20%. Now the thing to know about a probability distribution is that each one of these probabilities is less that one, right? And if we sum them up we get 25 plus 40 is 65 plus 15 is 80; plus 20 is 100. So we get a total of 100%. So a probability distribution tells us is what are the different things that could happen -- 0,1,2,3 and 4 -- and then it tells us the likelihood of each of those things. OK, so here's sorta the huge result that we're going to leverage to understand how things add up. There's a theorem called the Central Limit Theorem. And what the Central Limit Theorem tells us is that if I add up all the whole bunch of individual, independent events So what does 'independent' mean? It means my decision to go to the beach is independent of your decision to go to the beach, which is independent of your cousin Mary's decision to go to the beach. So, by independent, I mean not influenced. So, I don't care whether you're going to the beach or not. I'm going to make my decision on my own, completely independent of what you decide to do ... or your cousin Mary. So, what the Central Limit Theorem tells us is that if a whole bunch of people make a whole bunch of independent decisions, the distribution that we get has this nice bell-shaped curve. And this bell-shaped curve means that like the most likely outcome is the one right in the middle. So, there's a lot of structure to what happens. And that means that we can predict a lof ot things. We can tell a lot about what's going on in the world And that's what we're going to learn about in this lecture. It's going to be a lot of fun. To get an understanding of where these distributions come from, let's start really simple. Suppose I flip a coin twice. And I want to know "What are the odds of getting a head?" What's the probability distribution over heads. Well, what could I get? I could get tails-tails, and that would be 0 heads. I could get tails-heads, or heads-tails both of these would be 1 head. Or I could get heads-heads. And that would be 2 heads. So, what's the probability of each of these? The probability of getting tails-tails is just 1/4. The probability of getting 1 head is 1/2. And the probability of getting 2 heads is 1/4. So, I'm going to get a probability distribution, if I do it out like this 0, 1, 2 There's a 1/4 chance of that and a 1/2 chance of that and a 1/4 chance of that You notice, it sorta looks like a little bell curve. OK. Let's suppose I flip it 4 times. Well, it gets harder. I could think, OK, what are the odds of getting no heads? I could get tails-tails-tails-tails well, how do I figure out the probability of that? Well, 1/2 time 1/2 times 1/2 times 1/2 -- 4 one halves -- that's 2 times 2 times 2 times 2 ... so that's 1/16. What are the odds of getting one head? Well, I could get the head first, and then 3 tails... I could get it second, I could get it third, and it could come last. So, there's four places that it could show up. So that means there's a 4/16 chance. Well, I could do all sorts of math again for what are the odds of getting two heads? And I'd actually get 6/16. And 3 heads, well, that's the same as getting 1 head really, right, because tails and heads are interchangable. So what I'd get, I'd get this again -- If I drew this distribution out, I'd get a peak at 2 heads, right? I'm going to get a nice bell curve, right? So, I'm going to get this thing where there's very little chance of getting no heads not that much chance of getting 4 heads but the most likely thing is getting 2 heads. So I can count all this stuff and it's fun... big data, lots of data, and we want to try But, here is the problem: Remember we have talk about it. [inaudible] and understand it. Often we have more than 2 or 4, we have 'n' and that is a huge number. So if we're talking about New York City that can be 10 million people. If we're talking about Ann Harbour, where I live, that's still like a hundred thousand people. So I don't want to be sitting there writing tails, tails, tails, tails, tails a hundred thousand times. I want to have a model that will help me explain it. So what you can do is if you have n things, the mean, the expected number should be N over 2, right, should be half of n. But what we'd like to do is understand sort of what that distribution looks like. Well, what we know from statistics is that distribution is actually gonna be a nice bell-curve and the mean, right in the middle of this thing, is gonna be N/2 and this just gonna sort of flow out nice and symmetrically from each side Now there's a fancy equation, a formula that tells you what this line looks like. We're not going to get into that but if you take the Statistics class which I'd encourage you do - it's a lot of fun - you could learn exactly what this formula is and how it works, OK? We just wanna use it as a model for understanding how things aggregate. So, we're gonna take some leaps ahead in statistics. Here's the trick though, we gotta be a little bit careful. Flipping a coin is always equally likely, it's either a head or a tail, each one is 50/50 But if I'm worried about people going to the beach right, or people going to the supermarket, or people showing up for their flight that's not a 50/50 proposition, right. So maybe 90% of people may show up for their flight and maybe only 10% of people of 15% go out to the beach. So I'd like to change that 1/2 into something else Well, I can introduce something called the binomial distribution where instead of having 1/2, that gives some probability p of doing the thing. So let's suppose going to the beach happens 15% of the time Well then, if I had a 1000 people, and p = 15%, then p times N is a 150 so I expect to have a 150 show up. So that makes sense but then I can ask well, what's the distribution now I mean 150 is the average but I could have 200, I could have 74 Well again, what the central limit theorem tells us is that we're gonna get a nice bell-curve, right here you got this nice shape here [inaudible] with the mean here, which is p times N well this will be provided if N is big enough, right but if I can get a pretty large N, you're gonna get this nice bell-curve and the mean is gonna be right at p times N Okay, there's more of that. Here's where it gets a little bit complicated but also interesting. There's something called the standard deviation and this is this thing called sigma which is [inaudible] called the standard deviation. Now, when I draw the normal curve, there's gonna be a mean, that's this point right here at the center And then there's gonna be a standard deviation which basically tells us how far spread out that curve is And what I mean by that is how far spread out the different outcomes are So it turns out there's this nice structure to any normal distribution. If you tell me the mean, and then you tell me the standard deviation, it's always gonna be the case that 68% of all outcomes will be between -1 and +1 standard deviation So, if it's got a big standard deviation That means that that range could be really wide If it's got a small standard deviation that means that range should be really tight but if you tell me the mean and tell me the standard deviation it's always gonna be the case that 68% of the time I'm between -1 and +1 standard deviation Now, in fact since that's true for one, it's also gonna be true for 2, true for 3 and true for 4, right So there's gonna be a 95% chance I'm within 2 standard deviations So wait, why do we care about this, why do we care about this stuff Here's why. Now I got this model that says if I add up a bunch of independent events, here's what the mean is, right. Now, in a second, I'm going to show you the formula for the standard deviation so then it'll tell you what sigma is here Well, if you know the mean and you know sigma, then I can give you range and I can tell you, you know, that 95% of the time I'm gonna be between -2 sigma and +2 sigma So if I said the mean number of people that showed up is a hundred and that's the mean, right, and the standard deviation is only 2 Well then you'd know 95% of the time, you're gonna be between 96 and 104 So you'd know, okay, I should prepare for pretty much exactly 100 people If I told you the standard deviation was 15, then you'd know it can be anywhere between 70 to130 So that's what we want to try and use this model to explain [inaudible] how wide a range of outcomes we're likely to see in any particular setting So let's go back to our simple binomial distribution where the probability was 1/2 The mean, remember, is just N over 2 the standard deviation is the square root of N over 2 Well, you can do a little bit of math and show that So, let's suppose I have N = 100 So if N = 100, that tells me the mean is gonna be 50 so if I flip a coin a hundred times, guess what, the average is 50, no surprise But, standard deviation is the square root of N/2 What's the square root of 100, that's 10 So this is 10 over 2 so that gives 5 So what that tells me is if I think in binomial distribution right, if I draw this thing out I've got a mean of 50, and then I've got a standard deviation of 5 so that means between 55 and 45 -- 68% of all outcomes So, if you want, you can do this at home -- it'll take a while -- flip a coin a hundred times Count how many heads you get. Flip it again, count how many heads again Do that a whole bunch of times, you'll find that 68% of the time, you get between 45 heads and 55 heads So what this model gives us is it gives a sense of how strange of outcomes we'll get So, we know that most of the time, 68% of the time, we'll be between 45 and 55, right So, our mean is 50, 1 standard deviation is 55 and 45, that means 2 standard deviations is 60 and 40 What that tells us is 95% of the time, you're gonna be between 40 and 60 heads And, 99% of the time, you're gonna be between 35 and 65 So basically it's, you're almost never gonna throw fewer than 35 heads, and never throw more than 65 heads And so this is what sort of power that Central Limit Theorem is, right It gives us a sense of not only the average, but also what the spread will be Okay, remember this is a simple case. This is the P = 1/2 case And what we'd like is we want it for the more general case where the probability of something happening can be anything Right, this is this p over N thing What turns out here, we're okay because the standard deviation is just p times 1 - p times N then square root the whole thing So the case where p = 1/2 right, then we have the square root of 1/2 times 1/2 times N But notice I've got a 1/2 squared here inside so we can just pull that outside so it's just 1/2 the square root of N so that's where that square root of N over 2 came from So now, for the binomial distribution, I've got this clean formula as well And we can use that to model and understand stuff that's a little bit more interesting than just flipping a coin Let's a real example, let's have some fun So, how, most of us have probably been bumped off a plane before You show up at the airport and there's like too many people showed up for the plane And you think why did they do this, but the reason that they sometimes have to [inaudible] is they oversell And the reason they oversell tickets is because not everybody shows up So if you're running an airline and you've got 400 seats, and you know people show up, you know, 90% of the time You want to sell more than those 400 seats, right, so that your plane is pretty much full So let's do an example. Let's suppose, make it simple, that our plane [inaudible] got 380 seats So let's suppose we got a Boeing 747 with 380 seats Let's suppose that 90% of the time, people show up So we've gathered, we run an airline, we've gathered lots of data We pretty much know 90% of the time, people show up and that it's independent So one person's decision to show up doesn't [inaudible] have anything to do with anybody else's Now, that might not be true, right. Because if it's snowy, if I'm late, you're likely to be late But let's just suppose that these things are independent. And let's suppose that we sell 400 tickets Now we're trying to get some understanding why, what is that mean What's the likelihood that if we sell 400, that we're going to have more than 380 people show up Here's where the model can help us. It'll be able to tell us what the mean is, it will also tell us what the standard deviation is So the mean, right if I sell 400 tickets, and on average 90% of people show up, that means I should sell on average 360 tickets That's less than 380 seats but it should be fine, but what I care about is more than 380 people show up 'cause they're gonna be like, I paid for this to go to Florida, I want to go to Florida, I don't want to be bumped So more than 380 show up, guess what, they're gonna be mad, right So the 360 doesn't tell us enough, we want to know something about the distribution Okay, well look, we've got a formula, right, remember So N was 400, and p was .9 so p times N is 360, that's our mean Now, the standard deviation we can solve for pretty easily. That's just the square root of p, which is .9 times 1-p, which is .1, times N which is 400 So if we multiply right out that's .9 times .1 times 400 .1 times 400 is 40, times .9 is 36, that gives the squared of 36, which is 6 So 6, is our standard deviation. Now I get a bell-curve with a mean of 360 and a standard deviation of 6 Well, that's useful, that can help us 'cause let's go back and let's look That means our mean's 360, our standard deviation is 6, so that means 68% of the time, we're gonna between 354 and 366. That's great. It means that 95% of the time, we'll between 348 and 372, also great It means 99.75% of the time, we'll be between 378 and 342. Well, how many seats do we have, we have 380 seats, so this means that 99.75 -- actually more than that, right More than 99.75% of the time we won't overbook. So here's the Central Limit Theorem, let's, let's say it formally. Central Limit Theorem [inaudible] is the following. We got a whole bunch of random variables so those could be decisions to show up to a flight or not so in most case the random variables are just 1s and 0s Or they could be, you know, the weight of your bag. Each person's weight of their bag is [inaudible] independent variable. As long as those things are independent, so that means it, each person's decision doesn't depend on somebody else's or how much stuff I jam in my bag doesn't affect how much stuff you jam in your bag And that those things have finite variance -- what does that mean -- that means that they're bounded So we know we can't have super huge values, like so my bag couldn't weigh billions and billions of pounds So long as there's sort of you know, the possible range of [inaudible] that each one can take is bounded in some way Or doesn't with some high probability take huge, huge values then when you add those things up When you sum them up, you're gonna get a normal distribution which means a bell-curve, which means we can predict stuff We can use that model [inaudible] make sense of how the world works Now, let's step back for just a second and think about like, why this is so cool Suppose it weren't true, here's a little thought experiment Suppose it were the case that when I added up a bunch of independent events Most of the time, I get something nice, then there were some spiky probability of some huge event over here. What would this mean. Well, this would mean like sometimes you go to the grocery store, and there would be like, 1000 people there or sometimes you'd be like I'm just gonna run to the bathroom and there'd be 300 people in line, right A lot of the predictability of the world, a lot of the predictability of these sort of daily comings and goings stem from the fact that this can't happen and that we get these nice bell-curves Because if individual people, individual firms, individual groups of people make decisions that don't depend on what other people decide [inaudible] independent decisions, then what you're gonna get is you're gonna get sort of nice, regular stuff, according to a bell-curve Yeah, sure there'll be traffic jams, sure there'll be a lot of people at the mall There will be days where you get a lot coming on, and there'll be days where nothing much is going on But most of the time, you're gonna get things in that little region which is gonna be predictable and understandable Now, is everything normally distributed? No, it's not. What about stock returns? If you look at stock returns, you'll actually see that there's far too many days where really nothing happens, and there's far too many days where there's huge gains and far too many days where there's huge losses And what's going on there is this is that the actions are no longer independent For example, prices are going up, a lot of people may buy and that's going to cause prices to go up even further And if prices start to fall, people may sell and that can cause prices to fall even further So when events fail to become independent, fail to satisfy independence's assumption, then we can get more big events than we'd expected and more small events that we'd expected So let's put a [inaudible] on this, what have we got If we use the Central Limit Theorem as a model, and we use this model to explain how if we add up a bunch of independent events, then what we get is we get a nice normal distribution, right And we can understand the mean, we can understand the standard deviation We can use that to predict how likely things are to occur, right We also learn that like, it's that independence that gives us that normality, right Without independence, we could get really big events, really small events We can get all sorts of strange stuff happening So where we're gonna go next, I'm gonna get take, there's a brief lecture on something called the Six Sigma that pushes this idea sort of, the predictability of the system a little bit further than we had before. But then after that, we're gonna start y'know, we're [inaudible] having systems where there's interdependent actions and we have those interdependent actions, we're no longer going to get these sort of nice bell-curves We're going to get all sorts of really interesting, strange stuff It's going to be a lot of fun. Alright, thank you.