Let's quantify information entropy of a random phenomenon. So how much information does a random phenomenon contain? In this example, let's use the weather information in Gotham City as the random phenomenon. So, weather, In Gotham City. Suppose in Gotham City, there are four possible weather states. In Gotham City, it can be sunny or rainy. Or snowy and I'm going to highlight the letter W here, or cloudy. And those are the four possible weather conditions in Gotham City. Let's use capital S to represent sunny. R for rainy, W for snowy because S has been taken for sunny weather, C for cloudy. And let's suppose that all of these weather conditions are equally probable, meaning that the probability of the weather being sunny is equal to the probability of the weather being rainy, which is equal to the probability of snowy, And that's equal to the weather being cloudy. And going back to the basic probability theory, we know that all of these sums up to 1. And that's because there are only 4 possible weather conditions. And because all of these sum up to 1, we know that the probability Is 1 over N or a quarter. So one way to think about information theory is to ask the question about how many bits are needed to communicate the weather in Gotham City? So supposed there's a person in Gotham City who is experiencing in the weather. And that person wants to communicate the weather in Gotham City to an outsider who doesn't know the weather condition in Gotham City. So the question that we want to ask is how many, Bits are needed, To communicate, The weather, In Gotham City. And the answer to this question, will it yield the information entropy? So let's tackle that. So suppose there was no bit communicated. Then as an outsider, we know that the weather is going to be sunny or rainy or snowy or cloudy. You know that it's going to be one of those four weather conditions. If one bit is sent, and I suppose if that bit is 0 then we know that the weather is sunny or rainy.. Then, if that first bit is 1, then we know that it's going to be neither sunny nor rainy. So, we will know that it's going to be snowy or cloudy. Now even after receiving that one bit, we still don't know with certainty what the weather condition is. If the weather, if the bit is 0, then we know that it's either going to be sunny or rainy but we don't know which one. So now we need to send another bit. Suppose, The second bit is 0. And if the second bit is 0, given the first bit is 0, then the weather in Gotham City is sunny. And if that second bit is 1, Then the weather in Gotham City is rainy. Similarly if the second bit is 0 given that the first bit is 1, then the weather in Gotham City is snowy. And if the second bit is 1 given the first bit is 1, then the weather condition is Gotham City is cloudy. So with this we know that if the two bits were 0, 1, then we know that the weather is rainy in Gotham City and we know this with certainty. So those two bits provide the information of Gotham City. And after receiving those two bits, we know what the weather in Gotham City is. So for one day, for one weather event, The information entropy is log, Base 2 of 4 or log base 2 of N. And this was what was constructed by Ralph Hartley in 1928. Now what happens if there are multiple days or multiple weather events? And let's assume that all of these weather events are independent to each other. So the weather information of one day is completely independent or will not affect the weather condition in another day. So let's assume that. So for two days, for two independent weather events, we know that there are 4 to the squared possible outcomes. And it's 4 to the squared because there can be 4 possible outcomes for the first day and another 4 possible outcomes for the second day, so it's 4 times 4. So in this case, H becomes log base 2 of 4 to the square or log base 2, N to the square. Similarly, we can extend that logic to m days. And if there are m independent days then there are 4 to the m or N to the mth power possible outcomes, which provides us with the information entropy being H = to log base 2 of capital N to the small nth power. And because it is within the logarithmic, we actually see that this is equal to H being equal to small m times log of N. And this ties back to our previous video where we wanted that the information entropy to be proportional to the number of independent events. In this case, the information entropy will be proportional to the number of independent weather information, the independent weather events, which is small m. Now this, Ralph Hartley's construction of this information entropy where H is m times log of N. So let me write that down. So just to recap, H is m times log of capital N. Where small m is the number of independent events. So this mathematical formula constructed by Ralph Hartley in 1928, can be used when all the probability or all the outcomes are equally probable. In this case, they are all equal to be one-quarter. So in our next video, we'll look at a case where they're not equally probable or when the probability's distribution is no longer uniform.