In this segment we will look at how to use block ciphers to encrypt multiple messages
using the same key. This comes up in practice for example in file systems where
the same key's used to encrypt multiple files. It comes up in networking protocols
where the same key is used to encrypt multiple packets. So let's see how to do
it. The first thing we need to do is to define what is mean for a cipher to be
secure when the same key is used to encrypt multiple messages. When we use the
key more than once the result of that is that the adversary gets to see many cyber
text encrypted using the same key. As a result, when we define security, we're
gonna allow the adversary to mount what's called a chosen plain text attack. In
other words, the adversary can obtain the encryption of arbitrary messages of his
choice. So, for example, if the adversary's interacting with Alice. The
adversary can ask Alice to encrypt arbitrary messages of the adversary's
choosing. And Alice will go ahead and encrypt those messages and give the
adversary the resulting cipher texts. You might wonder why would Alice ever do this.
How could this possibly happen in real life? But it turns out this is actually
very common in real life. And in fact, this modeling is quite a conservative
modeling of real life. For example, the adversary might send Alice an email. When
Alice receives the email, the writes it to her encrypted disk, thereby encrypting the
adversary's email using her secret key. If later the adversary steals this disc, then
he obtains the encryption of an email that he sent Alice under Alice's secret key. So
that's an example of a chosen plain text attack, where the adversary provided Alice
with a message and she encrypted that message using her own key. And then later
the attacker was able to obtain the resulting cipher text. So that's the
adversary's power. And then the adversary's goal is basically to break
semantic security. So let's define this more precisely. As usual, we're gonna
define semantic security under a chosen plain text attack using two experiments,
experiment zero and experiment one, that are modeled as a game between a challenger
and an adversary. When the game begins, the challenger is gonna choose a random
key K. And now the adversary basically gets to query the challenger. So the
adversary now begins by submitting a semantic security query, namely, he
submits two messages, M zero and M one. I added another index, but let me ignore
that extra index for a while. So the adversary submits two messages, M zero and
M one, that happen to be of the same length. And then the adversary receives
the encryption of one of those messages, either of M zero or of M one. In
experiment zero, he receives the encryption of M zero. In experiment one,
he receives the encryption of M one. So, so far this would look familiar this looks
exactly like a standard semantic security [inaudible]. However, plain text attack
the adversary can now repeat this query again. So now you can issue a query with
two other plain texts, again of the same length, and again you would receive the
encryption of one of them. In experiment zero you would receive the encryption of M
zero. In experiment one you would receive the encryption of M one. And the attacker
can continue issuing queries like this. In fact we'll say that he can issue up to Q
queries of this type. And then, remember, every time he issues a pair of messages.
That happen to be of the same length and every time he either gets the encryption
of the left side or the right side again in experiment zero he will always get the
encryption of the left message in experiment one he will always get the
encryption of the left message. And, then adversary's goal is, basically, to figure
out whether he's in experimental zero or in experiment one. In other words, whether
he was constantly receiving the encryption of the left message or the encryption of
the right message. So, in some sense, this is a standard semantic security game just
iterated over many queries that the attacker can issue to adaptively one after
the other. Now the chosen plain text attack is captured by the fact that if the
attacker wants the encryption of a particular message m. What he could do is,
for example, use query J for sum J, where in this query J he'll set both the zero
message and the one message to be the exactly same message M. In other words,
both the left message and the right message are the same, and both are set to
the message M. In this case, what he will receive, since both messages are the same,
he knows that he's gonna receive the encryption of this message M that he was
interested in. So this is exactly what we meant by a chosen [inaudible] attack.
Where the advisory can submit a message m and receive the encryption of that
particular message m of his choice. So some of his queries might be of this chose
plain text flavor where the message on the left is equal to the message on the right,
but some of the queries might be standard semantic security queries where the two
messages are distinct and that actually gives him information on whether he's in
experiment zero or in experiment one. Now by now you should be used to this
definition where we say that the system is semantically secure under a chosen plain
text attack. If, for all efficient adversaries, they cannot distinguish
experiment zero from experiment one. In other words, the probability that, at the
end, the output, B Prime, which we're gonna denote by the output of experiment
B. This output will be the same whether [inaudible] experiment zero or experiment
one. So the attacker couldn't distinguish between always receiving encryptions of
the left messages, versus always receiving encryptions of the right messages. So in
your mind, I'd like you to be thinking of an adversary that is able to mount a
chosen plaintext attack, namely, be given the encryption of arbitrary messages of
his choice, and his goal is to break semantic security for some other challenge
cipher texts. And as I said in this [inaudible] model of the real world the
attacker is able to fool Alice into encrypting for him messages of his choice
and then the attacker's goal is to somehow break some challenge cypher text. So I
claim that all the ciphers that we've seen up until now, namely deterministic counter
mode or the one time pad, are insecure under a chosen plain text attack. More
generally, suppose we have an encryption scheme that always outputs the same cipher
text for a particular message M. In other words, if I ask the encryption scheme to
encrypt the message M once. And then I ask the encryption scheme to encrypt the
message m again. If in both cases the encryption scheme outputs the same cypher
text, then that system cannot possibly be secure under a chosen plain text attack.
And both deterministic counter mode and the one time pad were of that flavor. They
always output the same cipher text, given the same message. And so let's see why
that cannot be chosen plain text secure. And the attack is fairly simple, what the
attacker is gonna do, is he's gonna output the same message twice. This just says.
That he really wants the encryption of M0. So here the attacker is given C0 which is
the encryption of M0. So this was his chosen plain text query where he actually
received the encryption of the message M0 of his choice. And now he's going to break
semantic security. So what he does is he outputs two messages, M0 and M1 of the
same length, and he's going to be given the encryption of MB. But low and behold,
we said that the encryption system. Always outputs the same cipher text when its
encrypting the message, M0. Therefore, if B is=to zero, we know that C, this
challenged cipher text, is simply=to CO, because it's the encryption of M0.
However, if B is=to one. Then we know that this challenge cypher text is the
encryption of M1 which is something other than C zero so all the attacker does is he
just checks his C is = to C0 the output's zero in other words he outputs one. So, in
this case, the attacker is able to perfectly guess this bit B, so he knows
exactly [inaudible] given the encryption of M0, or the encryption of M1. And as a
result, his advantage in winning this game is one. Meaning that the system cannot
possibly be CPA secure. One is not a negligible number. So this shows that the
deterministic encryption schemes cannot possibly be CPA-secure, but you might
wonder well, what does this mean in practice? Well in practice this means
again that every message is always encrypted to the same cipher text. What
this means is if you're encrypting files on disk, and you happen to be encrypting
two files that happen to be the same, they will result in the same cipher text and
then the attacker by looking at the encrypted disk, will learn that these two
files actually contain the same content. The attacker might not learn what the
content is, but he will learn that these two encrypted files are an encryption of
the same content and he shouldn't be able to learn that. Similarly, if you send two
encrypted packets on the network that happen to be the same, the attacker will
not learn the content of those packets, but he will learn that those two packets
actually contain the same information. Think for example of an encrypted voice
conversation. Every time there's quiet on the line, the system will be sending
encryptions of zero. But since encryption of zero are always mapped to the same
cypher text. An attacker looking at the network will be able to identify exactly
the points in the conversation where there's quiet because he will always see
those exact same cypher text every time. So these are examples where deterministic
encryption cannot possibly be secure. And as I say formerly we say that the
terministic encryption can not be semantically secure under a chosen plain
text attack. So what do we do, well the lesson here is if the secret keys gonna be
used to encrypt multiple messages, it had better be the case that given the same
plain text to encrypt twice. The encryption algorithm must produce
different cipher texts. And so there are two ways to do that. The first method is
what's called randomized encryption. Here, the encryption algorithm itself is going
to choose some random string during the encryption process and it is going to
encrypt the message M using that random string. So what this means is that a
particular message, M0 for example, isn't just going to be mapped to one cipher text
but it's going to be mapped to a whole ball of cipher texts. Whereon every
encryption, basically, we output one point in this ball. So every time we encrypt, the
encryption algorithm chooses a random string, and that random string leads to
one point in this ball. Of course, the decryption algorithm, when it takes any
point in this ball, will always map the result to M zero. Similarly cipher text M
one will be mapped to a ball, and every time we encrypt M one, we basically output
one point in this ball. And these balls have to be disjoint, so that the
encryption algorithm, when it obtains a point in the ball corresponding to M one,
will always output the message M one. In this way, since the encryption algorithm
uses randomness, if we encrypt the same message twice, with high probability we'll
get different cipher texts. Unfortunately this means that the cipher text
necessarily has to be longer than the plain text because somehow the randomness
that was used to generate the cipher text is now encoded somehow in the cipher text.
So the cipher text takes more space. And roughly speaking, the cipher text size is
going to be larger than the plain text. By basically the number of random bits that
were used during encryption. So if the plain texts are very big, if the plain
texts are gigabytes long, the number of random bits is going to be on the order of
128. So maybe this extra space doesn't really matter. But if the plain texts are
very short, maybe they themselves are 128 bits, then adding an extra 128 bits to
every cipher text is going to double the total cipher text size. And that could be
quite expensive. So as I say randomized encryption is a fine solution but in some
cases it actually introduces quite a bit of costs. So let's look at a simple example.
So imagine we have a pseudorandom function that takes inputs in a certain
space r which is gonna be called a nonce space. And outputs, outputs in the message
space. And, now, let's define the following randomize encryption scheme
where we want to encrypt the message m with the encryption of whatever it's gonna
do is first it's gonna generate a random r in this nonce space R. And then it's going
to open a cypher text that consist of two components, the first component is going
to be this value R and the second component is going to be an evaluation of
the pseudo-random function at the point R XOR with the message M. And my question to
you is, is this encryption system semantically secure under a chosen plain
text attack. So the correct answer is yes. But only if the nonce space R is large
enough so that little R never repeats with very, very high probability. And let's
quickly argue why that's true. So first of all, because F is a secure pseudo-random
function, we might as well replace it with a truly random function. In other words,
this is indistinguishable from the case where we encrypt the message M, using the
truly random function little F, evaluated to point R, and then XOR with M.
But since this little r never repeats every cypher text uses a different little r what
this means is that the values of F(r) are random uniform independent strings
every time. So every time we encrypt a message, we encrypt it essentially using a
new uniform random one time pad. And since XORing a uniform string with any string
simply generates a new uniform string, the resulting cipher text is distributed as
simply two random uniform strings. I'll call them r and r prime. And so both in
experiment zero and in experiment one, all the attacker gets to see are truly uniform
random strings r, r', and since in both experiments the attacker is seeing the same
distribution, he cannot distinguish the two distributions. And so since security
holds completely when we're using a truly random function it's also gonna hold when
we're using a pseudorandom function. Okay, so this is a nice example of how we use
the fact that the pseudo random function behaves like a random function to argue
security of this particular encryption scheme. Okay, so now we have a nice
example of randomized encryption. The other approach to building chosen plain
text secure encryption schemes is what's called a nonce based encryption. Now, in
a non-spaced encryption system, the encryption algorithm actually takes three
inputs rather than two. As usual it takes the key and the message. But it also takes
an additional input called a nonce. And similarly, the decryption algorithm also
takes the nonce as input, and then produces the resulting decrypted plain text. And
what is this nonce value n. This nonce is a public value. It does not need to be
hidden from the adversary but the only requirement is that the pair (k,n)
is only used to encrypt a single message. In other words, this pair (k,n)
must change from message to message. And there are two ways to change it. One way
to change it is by choosing a new random key for every message. And the other way
is to keep using the same key all the time but then we must choose a new nonce for
every message. And, and as I said, I wanna emphasize again, this nonce need not
be secret, and it need not be random. The only requirement is the nonce is unique.
And in fact, we're gonna use this term throughout the course. A nonce
for us, means a unique value that doesn't repeat. It does not have to be random. So
let's look at some examples of choosing an nonce, well the simplest option is
simply to make the nonce of the accounter so for example the networking
protocol you can imagine the nonce being a packet counter that's incremented
every time a packet is sent by a sender or received by the receiver this means that
the encrypter has to keep state from message to message mainly that he has to
keep this counter around and increment it after every message is transmitted.
Interestingly, if the decrypter actually has the same state then there is no need
to include the nuance in the cipher text since the nuance is implicit. Let's look
at an example. The https protocol is run over a reliable transport mechanism which
means that packets sent by the sender are assumed to be received in order at a
recipient. So if the sender sends packet #5 and then packet #6, the recipient
will receive packet #5 and then packet #6 in that order. This
means that if the sender maintains a packet counter, the recipient can also
maintain a packet counter and two counters basically increment in sync. In this case
there is no reason to include the nonce in the packets because the
nonce is implicit between the two sides. However, in other protocols, for
example, in IPsec, IPsec has a protocol designed to encrypt the IP layer. The IP
layer does not guarantee in order delivery. And so the sender might send
packet #5 and then packet #6, but those will be received in reverse order at
the recipient. In this case it's still fine to use a packet counter as a nonce
but now the nonce has to be included in the packet so that the recipient knows
which nonce to use to decrypt the received packet. So as I say, nonce based
encryption is a very efficient way to achieve CPA security. In particular if the
nonce is implicit, it doesn't even increase the cipher text length. Of course
another method to generate a unique nonce is simply to pick the nonce at random
assuming the nonce space is sufficiently large so that with high probability the
nonce will never repeat for the life of the key. Now in this case, nonce
based encryption simply reduces to randomized encryption. However, the
benefit here is that the sender does not need to maintain any state from message to
message. So this is very useful for example if encryption happens to take
place on multiple devices. For example, I might have both a laptop and a smart
phone. They might both use the same key. But in this case if I require state full
encryption, then my laptop and the smartphone would have to coordinate to
make sure that they never reuse the same nonces. Whereas if both of them simply take
nonces at random, they don't need to coordinate because it was very high
probability they'll simply never choose the same nonce. Again assuming the nonce
space is big enough. So there are some cases where stateless encryption is quite
important, in particular where the same key is used by multiple machines. So I
wanted to find, more precisely, what security means for nonce based
encryption. And in particular, I want to emphasize that the system must remain
secure when the nonce are chosen by the adversary. The reason it's important
to allow the adversary to choose the nonces is because the adversary can
choose which cipher text it wants to attack. So imagine the nonce happens to
be a counter and it so happens that when the couter hits the value fifteen, maybe
at that point it's easy for the adversary to break semantic security. So the
adversary will wait until the fifteenth packet is sent and only then he will ask
to break semantic security. So when we talk about nonce based encryption, we
generally allow the adversary to choose the nonce and the system should remain
secure even under those settings. So let's define the CPA game in this case and it's
actually very similar to the game before. Basically the attacker gets to submit
pairs of messages MI, MI0, and MI1. Obviously they both have to be of the same
length. And he gets to supply the nonce. And in response, the adversary is given
the encryption of either MI0, or MI1. But using the nonce that the adversary
chose. And of course, as usual, the adversary's goal is to tell whether he was
given the encryption of the left plain text or the right plain text. And as
before the adversary gets to iterate these queries and he can issue as, as many
queries as he wants, we usually let q denote the number of queries that the
adversary issues. Now the only restriction of course, which is crucial, is that
although the adversary gets to choose the nonces, he's restricted to choosing
distinct nonces. The reason we force him to choose distinct nonces is because
that's the requirement in practice. Even if the adversary fools Alice into
encrypting multiple messages for him, Alice will never use the same nonce
again. As a result, the adversary will never see messages encrypted using the
same nonce and therefore, even in the game, we require that all nonce be
distinct. And then as usual we say that the system is a nonce based encryption
system that's, semantically secure under a chosen plain text attack if the adversary
cannot distinguish experiment zero where he's given encryptions of the left
messages from experiment one where he's given encryptions of the right messages.
So let's look at an example of a nonce based encryption system. As before, we
have a secure PRF that takes inputs in the nonce space R and outputs strings in the
message space M. Now when a new key is chosen, we're going to reset our counter R
to be zero. And now we encrypt the particular message M, what we will do is
we will increment our counter R, and then encrypt the message M using the
pseudorandom function applied to this value R. And as before, the cipher text is
going to contain two components, our current value of the counter and then the
one time pad encryption of the message M. And so my question to you is whether this
is a secure, non-spaced encryption system. So the answer as before is yes, but only
if the nuance space is large enough. So as we increment the counter R, it will never
cycle back to zero so that the nuances will always, always be unique. We argue
security the same way as before. Because the PRF is secure, we know that this
encryption system is indistinguishable from using a truly random function. In
other words, if we apply a truly random function to the counter and XOR the
results with, the plain text M. But now since the nuance R never repeats, every
time we compute this F of R, we get a truly random uniform and independent
string so that we're actually encrypting every message using the one time pad. And
as a result, all the adversary gets to see in both experiments are basically just a
pair of random strings. So both the experiment zero and experiment one the
adversary get's to see exactly the same distribution, namely, the responses to all
this chosen plain text queries are just pairs of strings that are just uniformly
distributed and this is basically the same in experiment zero and experiment one and,
therefore, the attacker cannot distinguish the two experiments. And since he cannot
win the semantic security game with a truly random function he, also, cannot win
the semantics security game with the secure PRF, and, therefore, the scheme is
secure. So now we understand what it means for a symmetric system to be secure when
the keys used to encrypt multiple messages the requirement is that it be secure under
a chosen plan of attack. And we said that basically, the only way to be secure under
a chosen plain text attack is either to use randomized encryption, or to use, use
nonce spaced encryption where the nonce never repeats. And then in the
next two segments, we're gonna build two classic encryption systems that are secure
when the key is used multiple times.