0:00

So, let's do a recap of information entropy.

But before doing so,

let's actually go ahead and compute the entropy for the weather in Colorado Springs.

So, let's go ahead and do this computation.

And this computation is equal to 1 + 0.5 + 2 + 0.25 + 3×0.125 + 3×0.125, and

that's simply by plugging in the probabilities for the weather.

So, that gives you 1/2,

that also is 1/2.

So, both these two terms actually give you one.

And this is 0.375,

also another 0.375, so,

let's actually do the summation over all of those.

So, that actually gives you 1 +

2×0.375 and this is 1.75.

Now, I notice that this 1.75 bits is actually smaller than two bits of entropy.

And two bits of entropy,

this was the case for the Gotham City weather information,

so it's a Gotham City, GC.

So, I wanted to compare the non-uniform case and the uniform case,

and in general, the entropy maximized when the probability distributions are equal,

or when the probability are uniformly distributed.

All right, as we can see from here,

where the Gotham City entropy is greater than the entropy for Colorado Springs.So,

let's go ahead and do that recap.

So, let's go ahead and do that recap.

So, we're going to do

an Information Entropy Recap.

So, we started off by looking at the uniform case.

2:43

And, thanks to Ralph Hartley,

we established that the entropy is the log

of number possible outcomes.

And when we're expressing the information entropy in bits,

then this is log base two.

So, for m independent events,

3:20

and I emphasis the independent art,

that's important because it needs to be independent events,

no dependency across the events.

Then, the number of possible outcomes turns out to be N_to_the_mth power.

So, this is a sequence of events.

This is sequence of m events

and that's the number of possible outcomes are N_to_the_mth power.

So, the entropy in this case

becomes log base two of N_to_the_mth power,

and this is equal to m times log base two of N. And,

that's because in logarithm,

we can actually take the exponent and pop it up outside of the logarithm.

So, for the general distribution case,

and we can thank Claude Shannon for that.

4:36

Thanks to Claude Shannon,

we also have an expression or a mathematical formula for information entropy,

and that formula was H was equal to also goes leniently with the number of events,

and there's a Riemann sum over all the possible outcomes.

So, the alphabets, alphabets from i equal to one to

N. Inside of the Riemann sum

was P_sub_i times log P_sub_i.

And again, oh, log raised to one over P_sub_i.

And again, because the logarithm on the exponent of minus one can pop out,

this was equal to -m times i_sub_1 to

N P_sub_i, log of P_sub_i.

So, we have a formula for information entropy,

for the general case,

and a simpler formula for the uniform distribution case.

It is actually the case where the formula on

the left hand side is a general formula,

and it reduces into the formula on the right

for the special case where probability distributions are the same.

Let's actually compare the two expressions.

So, it turns out that the expression on the left

6:58

And the equality holds when it is uniformly distributed or

all the P_sub_i's are equal.

In other words, the information entropy is

maximized when the probability distribution is uniform.

So, given the alphabet size, N,

and the number of independent events, m,

the entropy is maximized

7:52

when the outcomes are uniformly distributed,

when P_sub_i is equal to P_sub_j for all outcome indices, i and j.

Now, let's trying to think about this intuitively.

So, this is not surprising because the randomness or

unpredictability is the greatest when there's no bias in the probabilities.

Consider the weather example that we talked about previously.

If you were a betting man or woman,

then you would feel more confident guessing that

the weather in Colorado Springs is sunny because of that bias and probability,

but you'll have less confidence in guessing the weather condition in

Gotham City because of the uniform distribution across the four weather conditions.