1:42
This is illustrated in the figure, in which the first codeword, tilde{X}(1)
is sent through the channel, to obtain the received sequence Y.
[BLANK_AUDIO]
Now for w from 1 up to M, define the
event, E_w, equals the event, that tilde{X}(w)
and Y are jointly delta typical, that is the codeword
for message w, and received sequence Y are jointly delta typical.
[BLANK_AUDIO]
Now observe that if E_1 occurs but, E_w does not occur, for
all w from 2 up to M, then no decoding error would occur,
because, according to our decoding rule, the received
sequence, Y, would be decoded to message 1.
Therefore, the probability of E_1 intersect E_2 complement, intersect E_3
complement,
intersect all the way to E_M complement, given that W is equal to 1, is upper
bounded by the probability that the error event does
not occur, given that W is equal to 1.
[BLANK_AUDIO]
Now consider the probability of the error event, given that W is equal to 1.
This is equal to 1 minus the probability that the error
event does not occur, given that W is equal to 1.
Using the above inequality, we obtain the upper bound
1 minus the probability of E_1 intersect E_2 complement,
intersect E_3 complement, intersect all the way to E_M complement,
given that W is equal to 1.
[BLANK_AUDIO]
This is simply equal to the probability of the complement, of
E_1 intersect E_2 complement, intersect E_3 complement,
intersect all the way to E_M complement, given that W is equal to 1.
Now by means of de Morgan's Law, we can write this
event as E_1 complement union E_2, union E_3,
union all the way to E_M.
[BLANK_AUDIO]
Then by the union bound, the probability of the error event, given that
W is equal to 1, is upper bounded by the probability of
E_1 complement, given W equals 1, plus summation w equals
2 up to M, the probability of E_w, given W is equal to 1.
[BLANK_AUDIO]
By the strong joint AEP, the probability of E_1 complement,
given W is equal to 1, that is the probability that tilde{X}(1), the
first codeword, and the received sequence Y, are not jointly delta typical
given that W is equal to 1, is less than some small quantity nu
that goes to 0,
as delta goes to 0.
[BLANK_AUDIO]
This is illustrated in the figure.
[BLANK_AUDIO]
Now conditioning on W equals 1, that is, the first codeword
has been sent, for w equals 2 up to M, tilde{X}(w)
and the received sequence Y, are n i.i.d. copies
of the pair of generic random variables, X prime, Y
prime, where X prime has the same distribution as
X, and Y prime has the same distribution as Y.
5:33
It is because all the codewords are generated in exactly the same way,
and so they have the same distribution.
Namely, p(x) to the power n.
[BLANK_AUDIO]
Since a DMC is memoryless, X prime and Y prime are independent.
[BLANK_AUDIO]
The idea here is that the codeword tilde{X}(1) is independent
of all of the codewords tilde{X}(2), up to tilde{X}(M).
Because the received sequence Y is obtained through the transmission of the codeword
tilde{X}(1), intuitively, tilde{X}(2) up tilde{X}(M), are independent of Y.
[BLANK_AUDIO]
Now for w equals 2 up to M, the probability of
E_w, given W equals 1, is equal to the probability that
the codeword tilde{X}(w), and the received sequence
Y, are jointly delta typical given that W is equal to 1.
[BLANK_AUDIO]
By Lemma 7.17, this is upper bounded by 2 to the power minus n, times the mutual
information between X and Y, minus tau, where tau goes to 0, as delta goes to 0.
[BLANK_AUDIO]
Note that our choice of M satisfies 1 over n times log M, is less than
the mutual information between X and Y, minus epsilon over 4.
And this implies that M, the total number of messages, is
less than 2 to the power n times I(X;Y), minus epsilon over 4.
[BLANK_AUDIO]
Therefore, the probability of the error event, which is equal
to the probability of the error event, given that W is equal
to 1, is upper bounded, by the probability of E_1
complement, given W equals 1, which is less than nu,
8:22
And so, we have nu plus 2 to the power minus n, times epsilon over 4, minus tau.
[BLANK_AUDIO]
Now let us take a close look at this term, 2
to the power minus n, times epsilon over 4, minus tau.
[BLANK_AUDIO]
Recall that epsilon is fixed.
Since tau tends to 0, as delta tends to 0, we can choose delta to be
sufficiently small, so that epsilon over 4, minus tau, is greater than 0.
[BLANK_AUDIO]
If so, 2 to the power minus n, times epsilon over 4,
minus tau would tends to 0, as n tends to infinity.
[BLANK_AUDIO]
Now let nu be less than epsilon over 3, which is so for a sufficiently small delta,
and let n be sufficiently large, so that 2 to the power minus
n times epsilon over 4, minus tau, is less than epsilon over 6.
[BLANK_AUDIO]
Then we obtain the probability of the error event, less than epsilon
over 3, plus epsilon over 6, which is equal to epsilon over 2.
Thus, we have shown, that with a suitable choice
of parameters, for the random coding scheme, the probability
of the error event, is less than epsilon over
2, for any arbitrarily small epsilon, when n is sufficiently large.
[BLANK_AUDIO]
The idea of the analysis is the following.
First of all, let the block length n be large.
[BLANK_AUDIO]
Then by the joint AEP, the probability that tilde{X}(1), that is the codeword
being sent, is jointly typical with the received sequence Y, would tend to 1.
[BLANK_AUDIO]
At the same time, the probability that any
other codeword is jointly typical with the received sequence, Y,
is approximately equal to 2 to the power minus
n times the mutual information between X and Y.
[BLANK_AUDIO]
If the size of the codebook, that is M, grows at
a rate strictly less than the mutual information between X and Y,
then the probability that a codeword tilde{X}(w) jointly typical with the
sequence Y, for some W not equal to 1, can be made arbitrarily small.
[BLANK_AUDIO]
Then the probability of the error event, that is W hat
not equal to W, can be made arbitrarily small.
[BLANK_AUDIO]
We have already shown that for the random coding scheme,
the probability of the error event can be made arbitrarily small.
We now show the existence of a
deterministic coding scheme, that can do the same.
[BLANK_AUDIO]
According to the random coding scheme, the probability of the error
event, can be written as the summation over all codebook C,
the probability of the codebook C being chosen, times
the probability of the error event for that codebook C.
In other words, the probability of the error event is
a weighted average of the probability of the error event
over all the codebooks.
[BLANK_AUDIO]
Then there exists at least one codebook C
star, such that the probability of error P_e,
is less than or equal to the probability
of error, weighted over all the possible codebooks,
which we have shown, is less than epsilon over 2.
And by construction, this codebook has rate
1 over n, times log M, greater than I(X;Y),
minus epsilon over 2.
By letting p(x) be the input distribution, that achieves the channel capacity,
the rate of this code can be arbitrarily close to the channel capacity.
[BLANK_AUDIO]
To complete the proof of the achievability of
the channel coding theorem, we are now going to
show that from this deterministic code, with P_e, the
average probability of error, less than epsilon over 2,
we can construct another deterministic code whose maximal conditional
probability of error, lambda_max, is less than epsilon.
Without loss of generality,
assume that the conditional probability of errors are in ascending order.
That is lambda_1, less than or equal to lambda_2,
less than or equal to all the way to lambda_M.
From P_e less than epsilon over 2, we have 1 over M, times summation
w equals 1 up to M, lambda_w, less than epsilon over 2.
This implies that the summation of all the lambda_w's
is less than M over 2, times epsilon.
[BLANK_AUDIO]
Since M is even, M over 2 is an integer.
Then the second half of the summation, namely
summation w equals M over 2 plus 1, to
M, lambda_w, which is upper bounded by the summation, w equals 1 up to M, lambda_w
is less than M over 2, times epsilon.
[BLANK_AUDIO]
Thus we have shown that the second half of
the summation, is less than M over 2, times epsilon.
[BLANK_AUDIO]
From this, we have 1 over M over 2, times
the second half of the summation, is less than epsilon.
As there are exactly M over 2 terms in the second half of
the summation, this means that the average of all the lambda_w's
for w from M over 2 plus 1 to M, is less than epsilon.
[BLANK_AUDIO]
Since we assume that lambda_w are in ascending order.
[BLANK_AUDIO]
The smallest value, in the second half of the summation,
namely lambda_{M over 2 plus 1}, is less than epsilon.
[BLANK_AUDIO]
And this implies that lambda_{M over 2}, which is less than or
equal to lambda_{M over 2 plus 1}, is also less than epsilon.
[BLANK_AUDIO]
The conclusion is that if the average probability of
error, P_e is less than epsilon over 2,
then lambda_1, lambda_2 up to lambda_{M over 2} are all less than epsilon.
Thus, by discarding the worst half of the codewords in the codebook C star, namely,
those codewords that correspond to lambda_{M over 2 plus 1}, up to lambda_M,
we obtain
a new codebook with lambda_max, less than epsilon.
[BLANK_AUDIO]
After discarding the worst half of C star, the rate of the code
becomes 1 over n, times log M over 2, because the size of
the message set, instead of M, is now equal to M over 2.
Now this is equal to 1 over n, times log M, minus 1 over n,