My goal for the next few segments is to show you that if we use a secure PRG We'll
get a secure stream safer. The first thing we have to do is define is, what does it
mean for a stream safer to be secure? So whenever we define security we always
define it in terms of what can the attacker do? And what is the attacker
trying to do? In the context of stream ciphers remember these are only used
with a onetime key, and as a result the most the attacker is ever going to see is
just one cypher text encrypted using the key that we're using. And so we're gonna
limit the attackers? ability to basically obtain just one cypher text. And in
fact later on we're going to allow the attacker to do much, much, much more, but
for now we're just gonna give him one cypher text. And we wanted to find,
what does it mean for the cypher to be secure? So the first definition that
comes to mind is basically to say, well maybe we wanna require that the attacker
can't actually recover the secret key. Okay so given ciphertext you
shouldn't be able to recover the secret key. But that's a terrible definition
because think about the falling brilliant cipher but the way we encrypt the
message using a key K is basically we just output the message. Okay this
is a brilliant cipher yeah of course it doesn't do anything given a message just
output a message as the cipher text. This is not a particularly good encryption
scheme; however, given the cipher text, mainly the message the poor attacker
cannot recover the key because he doesn't know what the key is. And so as a result
this cipher which clearly is insecure, would be considered secure under this
requirement for security. So this definition will be no good. Okay so it's
recovering the secret key is the wrong way to define security. So the next thing we
can kinda attempt is basically just say, well maybe the attacker doesn't care about
the secret key, what he really cares about are the plain text. So maybe it should be
hard for the attacker to recover the entire plain text. But even that doesn't
work because let's think about the following encryption scheme. So suppose
what this encryption scheme does is it takes two messages, so I'm gonna use two
lines to denote concatenation of two messages M0 line, line M1 means
concatenate M0 and M1. And imagine what the scheme does is really it outputs
M0 in the clear and concatenate to that the encryption of M1. Perhaps even
using the One Time Pad okay? And so here the attacker is gonna be given one
cipher text. And his goal would be to recover the entire plain texts. But the
poor attacker can't do that because here maybe we've encrypted M1 using the One
Time Pad so the attacker can't actually recover M1 because we know the
One Time Pad is secure given just one cipher text. So this construction
would satisfy the definition but unfortunately clearly this is not a secure
encryption scheme because we just leaked half of the plain text. M0 is
completely available to the attacker so even though he can't recover all of the
plain text he might be able to recover most of the plain text, and that's clearly
unsecured. So of course we already know the solution to this and we talked about
Shanon definition of security perfect secrecy where Shannon's idea was that in
fact when the attacker intercepts a cipher text he should learn absolutely no
information about the plain texts. He shouldn't even learn one bit about the
plain text or even he shouldn't learn, he shouldn't even be able to predict a little
bit about a bid of the plain text. Absolutely no information about the plain text.
So let's recall very briefly Shannon's concept of perfect secrecy
basically we said that you know given a cipher We said the cipher has perfect
secrecy if given two messages of the same length it so happens that the distribution
of cipher texts. Yet if we pick a random key and we look into distribution of
cipher texts we encrypt M0 we get exactly the same distribution as when we
encrypt M1. The intuition here was that if the advisory observes the cipher texts
then he doesn't know whether it came from the distribution the result of encrypting
M0 or it came from the distribution as the result of encrypting M1. And as a
result he can't tell whether we encrypted M0 or M1. And that's true for all
messages of the same length and as a result a poor attacker doesn't really know
what message was encrypted. Of course we already said that this definition is too
strong in the sense that it requires really long keys if cipher has short
keys can't possibly satisfy this definition in a particular stream ciphers
can satisfy this definition. Okay so let's try to weaken the definition a little bit
and let's think to the previous segment, and we can say that instead of requiring
the two distributions to be absolutely identical what we can require is, is that
two distributions just be computationally indistinguishable. In other words a poor,
efficient attacker cannot distinguish the two distributions even though the
distributions might be very, very, very different. That just given a sample for
one distribution and a sample for another distribution the attacker can't tell which
distribution he was given a sample from. It turns out this definition is actually
almost right, but it's still a little too strong, that still cannot be satisfied, so
we have to add one more constraint, and that is that instead of saying that this
definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1
that the attackers could actually exhibit. Okay so this actually
leads us to the definition of semantics security. And so, again this is semantics
security for one time key in other words when the attacker is only given one cipher
text. And so the way we define semantic security is by defining two experiments,
okay we'll define experiment 0 and experiment 1. And more generally we will
think of these as experiments parentheses B, where B can be zero or one. Okay so the
way the experiment is defined is as follows, we have an adversary that's
trying to break the system. An adversary A, that's kinda the analog of statistical
tests in the world of pseudo random generators. And then the challenger does
the following, so really we have two challengers, but the challengers are so
similar that we can just describe them as a single challenger that in one case takes
his inputs bits set to zero, and the other case takes his inputs bits set to
one. And let me show you what these challengers do. The first thing the
challenger is gonna do is it's gonna pick a random key and then the adversary
basically is going to output two messages M0 and M1. Okay so this is an explicit
pair of messages that the attacker wants to be challenged on and as usual we're not
trying to hide the length of the messages, we require that the messages be equal
length. And then the challenger basically will output either the encryption of M0
or the encryption of M1, okay so in experiment 0 the challenger will output
the encryption of M0. In experiment 1 the challenger will output the encryption
of M1. Okay so that the difference between the two experiments. And then the
adversary is trying to guess basically whether he was given the encryption of M0
or given the encryption of M1. Okay so here's a little bit of notation let's
define the events Wb to be the events that an experiment B the adversary output one.
Okay so that is just an event that basically in experiment zero W0 means that
the adversary output one and in experiment one W1 means the adversary output one. And
now we can define the advantage of this adversary, basically to say that this is
called the semantics security advantage of the adversary A against the scheme E,
to be the difference of the probability of these two events. In other words we are
looking at whether the adversary behaves differently when he was given the
encryption of M0 from when he was given the encryption of M1. And I wanna make
sure this is clear so I'm gonna say it one more time. So in experiment zero he was
given the encryption of M0 and in experiment one he was given the encryption
of M1. Now we're just interested in whether the adversary output 1 or not.
... In these experiments. If in both experiments the adversary output 1 with
the same probability that means the adversary wasn't able to distinguish the
two experiments. Experiments zero basically looks to the adversary the same
as experiment one because in both cases upload one with the same probability.
However if the adversary is able to output 1 in one experiment with significantly
different probability than in the other experiment, then the adversary was
actually able to distinguish the two experiments. Okay so... To say this more
formally, essentially the advantage again because it's the difference of two
probabilities the advantage is a number between zero and one. If the advantage is
close to zero that means that the adversary was not able to distinguish
experiment zero from experiment one. However if the advantage is close to one,
that means the adversary was very well able to distinguish experiment zero from
experiment one and that really means that he was able to distinguish an encryption
of M0 from an encryption of M1, okay?So that's out definition. Actually
that is just the definition of the advantage and the definition is just what
you would expect basically we'll say that a symmetric encryption scheme is
semantically secure if for all efficient adversaries here I'll put these in quotes
again, "For all efficient adversaries the advantage is negligible." In other words,
no efficient adversary can distinguish the encryption of M0 from the encryption
of M1 and basically this is what it says repeatedly that for these two
messages that the adversary was able to exhibit he wasn't able to distinguish
these two distributions. Now I want to show you this, this is actually a very
elegant definition. It might not seem so right away, but I want to show you some
implications of this definition and you'll see exactly why the definition is the way
it is. Okay so let's look at some examples. So the first example is suppose
we have a broken encryption scheme, in other words suppose we have an adversary A
that somehow given the cipher text he is always able to deduce the least
significant bit of the plain text. Okay so given the encryption of M0, this adversary
is able to deduce the least significant bit of M0. So that is a terrible
encryption scheme because it basically leaks the least significant bit of the
plain text just given the cipher text. So I want to show you that this scheme is
therefore semantically secure so that kind of shows that if a system is semantically
secure than there is no attacker of this type. Okay so let's see why is the system
not semantically secure well so what we are gonna do is we're gonna basically use
our adversary who is able to learn our most insignificant bits, we're going to
use him to break semantic security so we're going to use him to distinguish
experiment zero from experiment one Okay so here is what we are going to do. We are
algorithm B, we are going to be algorithm B and this algorithm B is going to use
algorithm A in its belly. Okay so the first thing that is going to happen is of
course the challenger is going to choose a random key. The first thing that is going
to happen is that we need to output two messages. The messages that we are going
to output basically are going to have differently significant bits. So one
message is going to end with zero and one message is going to end with one. Now what
is the challenger going to do? The challenger is going to give us the
encryption of either M0 or M1, depending on whether in experiment 0 or
in experiment 1. And then we just forward this cipher text to the adversary
okay so the adversary A. Now what is the property of adversary A? Given the cipher
text, adversary A can tell us what the least significant bits of the plain text is.
In other words the adversary is going to output the least significant bits of M0 or M1
but low and behold that's basically the bit B. And then we're just
going to output that as our guest so let?s call this thing B prime Okay so now this
describes the semantic security adversary. And now you tell me what is the semantic
security advantage of this adversary? Well so let's see. In experiment zero, what is
the probability that adversary B outputs one? Well in experiment zero it is always
given the encryption of M zero and therefore adversary A is always output the
least significant bit of M zero which always happens to be zero. In experiment
zero, B always outputs zero. So the probability that outputs one is zero.
However in experiment one, we're given the encryption of M1. So how likely is
adversary B to output one in experiment one well it always outputs one, again by
the properties of algorithm A and therefore the advantage basically is one.
So this is a huge advantage, it's as big as it's gonna get. Which means that this
adversary completely broke the system. Okay so we consider, so under semantic
security basically just deducing the least significant bits is enough to completely
break the system under a semantic security definition. Okay, now the interesting
thing here of course is that this is not just about the least significant bit, in
fact of the message for example the most significant bit, you know
bit number seven Maybe the XOR of all the bits in the message and so on
and so forth any kind of information, any bit about the plain text they can be
learned basically would mean that the system is not semantically secure. So
basically all the adversary would have to do would be to come up with two messages
M0 and M1 such that under one thing that they learned it's the value zero and then
the other thing, the value, is one. So for example if A was able to learn the XOR
bits of the message then M0 and M1 would just have different
XOR for all the bits of their messages and then this adversary A would
also be sufficient to break semantic security. Okay so, basically for cipher
is semantically secure then no bit of information is revealed to an
efficient adversary. Okay so this is exactly a concept of perfect secrecy only
applied just efficient adversaries rather than all adversaries. So the next
thing I wanna show you is that in fact the one time pad in fact is
semantically secure, they better be semantically secure because it's in fact,
it's more than that it's perfectly secure. So let's see why that's true. Well so
again it's one of these experiments, so suppose we have an adversary that claims
to break semantic security of the one time pad. The first the adversary is gonna do
is he's gonna output two messages M0 and M1 of the same length.
Now what does he get back as a challenge? As a challenge, he gets either the encryption
of M0 or the encryption of M1 under the one time pad.
And he's trying to distinguish between those two possible cipher texts that he gets, right?
In experiment zero, he gets the encryption of M0 and in experiment one, he gets the
encryption of M1. Well so let me ask you, so what is the advantage of adversary
A against the one time patent? So I remember that the property of the ones I
had is that, that K, XOR M0 is distributed identically to K, XOR M1.
In other words, these distributions are absolutely identical distribution,
distributions, identical. This is a property of XOR. If we XOR the random pad
K with anything, either M0 or M1, we get uniform distribution. So in both
cases, algorithm A is given as in input exactly the same distribution. Maybe the
uniform distribution on cipher texts. And therefore it's gonna behave exactly the
same in both cases because it was given the exact distribution as input. And as a
result, its advantage is zero, which means that the onetime pad is semantically
secure. Now the interesting thing here is not only is it semantically secure, it's
semantically secure for all adversaries. We don't even have to restrict the
adversaries to be efficient. No adversary, doesn't matter how smart you are, no
adversary will be able to distinguish K XOR M0 from K XOR M1 because the two
distributions are identical. Okay, so the one time pad is semantically secure. Okay,
so that completes our definition of semantic security so the next thing we're
going to do is prove to the secure PRG, in fact implies that the stream cipher is
semantically secure.