In this segment, I want to give a few examples of stream ciphers that are used in practice.
I'm gonna start with two old examples that actually are not
supposed to be used in new systems. But nevertheless, they're still fairly
widely used, and so I just want to mention the names so that you're familiar with
these concepts. The first stream cipher I want to talk about is called RC4, designed
back in 1987. And I'm only gonna give you the high-level description of it, and then
we'll talk about some weaknesses of RC4 and leave it at that. So RC4 takes a
variable size seed, here I just gave as an example where it would take 128
bits as the seed size, which would then be used as the key for the stream cipher.
The first thing it does, is it expands the 128-bit secret key into 2,048 bits, which
are gonna be used as the internal state for the generator. And then, once it's done
this expansion, it basically executes a very simple loop, where every iteration of
this loop outputs one byte of output. So, essentially, you can run the generator for
as long as you want, and generate one byte at a time. Now RC4 is actually, as I said,
fairly popular. It's used in the HTTPS protocol quite commonly actually.
These days, for example, Google uses RC4 in its HTTPS. It's also used in WEP as we
discussed in the last segment, but of course in WEP, it's used incorrectly and
it's completely insecure the way it's used inside of WEP. So over the years,
some weaknesses have been found in RC4, and as a result, it's recommended that new projects
actually not use RC4 but rather use a more modern pseudo-random generator as we'll
discuss toward the end of the segment. So let me just mention two of the weaknesses.
So the first one is, it's kind of bizarre basically, if you look at the second byte
of the output of RC4. It turns out the second byte is slightly biased. If RC4 was
completely random, the probability that the second byte happens to be equal to zero
would be exactly one over 256. There are 256 possible bytes, the probability that
it's zero should be one over 256. It so happens that for RC4 the probability is
actually two over 256, which means that if you use the RC4 output to encrypt a
message the second byte is likely to not be encrypted at all. In other words it'll
be XOR-ed with zero with twice the probability that it's supposed to.
So two over 256, instead of one over 256. And by the way I should say that there's
nothing special about the second byte. It turns out the first and the third bytes
are also biased. And in fact it's now recommended that if you're gonna use RC4,
what you should do is ignore basically the first 256 bytes of the output and just
start using the output of the generator starting from byte 257. The first couple
of bytes turned out to be biased, so you just ignore them. The second attack that
was discovered is that in fact if you look at a very long output of RC4 it so happens
that you're more likely to get the sequence 00. In other words, you're more
likely to get sixteen bits, two bytes of zero, zero, than you should. Again, if RC4
was completely random the probability of seeing zero, zero would be exactly 1/256
squared. It turns out RC4 is a little biased and the bias is 1/256 cubed. It
turns out this bias actually starts after several gigabytes of data are produced by
RC4. But nevertheless, this is something that can be used to predict the generator
and definitely it can be used to distinguish the output of the generator
from a truly random sequence. Basically the fact that zero, zero appears more often
than it should is the distinguisher. And then in the last segment we talked about
related-key attacks that were used to attack WEP, that basically say that
if one uses keys that are closely related to one another then it's actually possible
to recover the root key. So these are the weaknesses that are known of RC4 and, as a
result, it's recommended that new systems actually not use RC4 and instead use a
modern pseudo-random generator. Okay, second example I want to give you is a
badly broken stream cipher that's used for encrypting DVD movies. When you buy a DVD
in the store, the actual movie is encrypted using a stream cipher called the
content scrambling system, CSS. CSS turns out to be a badly broken stream cipher,
and we can very easily break it, and I want to show you how the attack algorithm
works. We're doing it so you can see an example of an attack algorithm, but in
fact, there are many systems out there that basically use this attack to decrypt
encrypted DVDs. So the CSS stream cipher is based on something that hardware
designers like. It's designed to be a hardware stream cipher that is supposed to
be easy to implement in hardware, and is based on a mechanism called a linear
feedback shift register. So a linear feedback shift register is basically a register
that consists of cells where each cell contains one bit. Then basically
what happens is there are these taps into certain cells, not all the cells, certain
positions are called taps. And then these taps feed into an XOR and then at
every clock cycle, the shift register shifts to the left. The last bit falls off
and then the first bit becomes the result of this XOR. So you can see that
this is a very simple mechanism to implement, and in hardware takes very few
transistors. Just the shift right, the last bit falls off and the first bit just
becomes the XOR of the previous bits. So the seed for this LFSR
basically, is the initial state of the LFSR.
And it is the basis of a number of stream ciphers. So here are some examples. So, as
I said, DVD encryption uses two LFSRs. I'll show you how that works in just a
second. GSM encryption, these are algorithms called A51 and A52. And that
uses three LFSRs. Bluetooth encryption is an algorithm called, E zero. These are all
stream ciphers, and that uses four LFSRs. Turns out all of these are badly broken,
and actually really should not be trusted for encrypting traffic, but they're all
implemented in hardware, so it's a little difficult now to change what the hardware
does. But the simplest of these, CSS, actually has a cute attack on it, so let
me show you how the attack works. So, let's describe how CSS actually works. So,
the key for CSS is five bytes, namely 40 bits, five times eight is 40 bits. The
reason they had to limit themselves to only 40 bits is that DVD encryption was
designed at a time where U.S. Export regulations only allowed for export of
crpyto algorithms where the key was only 40 bits. So the designers of CSS were
already limited to very, very short keys. Just 40 bit keys. So, their design works
as follows. Basically, CSS uses two LFSR's. One is a 17-bit LFSR. In other words,
the register contains 17 bits. And the other one is a 25-bit LFSR,
it's a little bit longer, 25-bit LFSR. And the way these LFSRs are seeded
is as follows. So the key for the encryption, basically looks as follows.
You start off with a one, and you concatenate to it the first two bytes of
the key. And that's the initial state of the LFSR.
And then the second LFSR basically is intitialized the same way.
One concatenated the last three bytes of the key. And that's
loaded into the initial state of the LFSR. You can see that the first two bytes are
sixteen bits, plus leading one, that's seventeen bits overall, whereas the second
LFSR is 24 bits plus one which is 25 bits. And you notice we used all five bits of
the key. So then these LFSRs are basically run for eight cycles, so they generate
eight bits of output. And then they go through this adder that does basically
addition modulo 256. Yeah so this is an addition box, modulo 256. There's one more
technical thing that happens. In fact let's actually—also added is the carry from the
previous block. But that is not so important. That's a detail that's not so
relevant. OK, so every block, you notice we're doing addition modulo 256 and
we're ignoring the carry, but the carry is basically added as a zero or a one to the
addition of the next block. Okay? And then basically this output one byte per round.
Okay, and then this byte is then of course used, XOR-ed with the appropriate
byte of the movie that's being encrypted. Okay, so it's a very simple stream
cipher, it takes very little hardware to implement. It will run fast, even on very
cheap hardware and it will encrypt movies. So it turns out this is easy to break
in time roughly two to the seventeen. Now let me show you how.
So suppose you intercept the movies, so here we have an
encrypted movie that you want to decrypt. So let's say that this is all encrypted so
you have no idea what's inside of here. However, it so happens that just because
DVD encryption is using MPEG files, it so happens if you know of a prefix of the
plaintext, let's just say maybe this is twenty bytes. Well, we know if you
XOR these two things together, so in other words, you do the XOR here,
what you'll get is the initial segment of the PRG. So, you'll get the
first twenty bytes of the output of CSS, the output of this PRG. Okay, so now
here's what we're going to do. So we have the first twenty bytes of the output. Now
we do the following. We try all two to the seventeen possible values of the first
LFSR. Okay? So two to the seventeen possible values. So for each value, so for
each of these two to the seventeen initial values of the LFSR, we're gonna run the
LFSR for twenty bytes, okay? So we'll generate twenty bytes of outputs from this
first LFSR, assuming—for each one of the two to the seventeen possible settings.
Now, remember we have the full output of the CSS system. So what we can do is we
can take this output that we have. And subtract it from the twenty bites that we
got from the first LFSR, and if in fact our guess for the initial state of the first
LFSR is correct, what we should get is the first twenty-byte output of the
second LFSR. Right? Because that's by definition what the output of the CSS
system is. Now, it turns out that looking at a 20-byte sequence, it's very easy
to tell whether this 20-byte sequence came from a 25-bit LFSR or not. If it
didn't, then we know that our guess for the 17-bit LFSR was
incorrect and then we move on to the next guess for the 17-bit LFSR and
the next guess and so on and so forth. Until eventually we hit the right initial
state for the 17-bit LFSR, and then we'll actually get, we'll see that
the 20 bytes that we get as the candidate output for the 25-bit LFSR is
in fact a possible output for a 25-bit LFSR. And then, not only will we have
learned the correct initial state for the 17-bit LFSR, we will have also
learned the correct initial state of the 25-bit LFSR. And then we can predict the
remaining outputs of CSS, and of course, using that, we can then decrypt the rest of
the movie. We can actually recover the remaining plaintext. Okay. This is
things that we talked about before. So, I said this a little quick, but hopefully,
it was clear. We're also going to be doing a homework exercise on this type of stream
ciphers and you'll kind of get the point of how these attack algorithms
work. And I should mention that there are many open-source systems now that actually
use this method to decrypt CSS-encrypted data. Okay, so now that we've seen two
weak examples, let's move on to better examples, and in particular the better
pseudo-random generators come from what's called the eStream Project. This is a
project that concluded in 2008, and they qualify basically five different stream
ciphers, but here I want to present just one. So first of all the parameters for
these stream ciphers are a little different than what we're used to. So these
stream ciphers as normal they have a seed. But in addition they also have, what's
called a nonce and we'll see what a nonce is used for in just a minute. So
they take two inputs a seed and a nonce. We'll see what the nonce is used for in
just a second. And the of course they produce a very large output, so n here is
much, much, much bigger than s. Now, when I say nonce, what I mean is a value that's
never going to repeat as long as the key is fixed. And I'll explain that in more
detail in just a second. But for now, just think of it as a unique value that never
repeats as long as the key is the same. And so of course once you have this PRG,
you would encrypt, you get a stream cipher just as before, except now as you see, the
PRG takes as input both the key and the nonce. And the property of the nonce is
that the pair, k comma r, so the key comma the nonce, is never—never repeats. It's
never used more than once. So the bottom line is that you can reuse the key, reuse
the key, because the nonce makes the pair unique, because k and r are only
used once. I'll say they're unique. Okay so this nonce is kind of a cute trick that
saves us the trouble of moving to a new key every time. Okay, so the particular
example from the eStream that I want to show you is called Salsa twenty. It's a
stream cipher that's designed for both software implementations and hardware
implementations. It's kind of interesting. You realize that some stream ciphers are
designed for software, like RC4. Everything it does is designed to make
software implementation run fast, whereas other stream ciphers are designed for
hardware, like CSS, using an LFSR that's particularly designed to make hardware
implementations very cheap. It's also, the nice thing about that is that it's
designed so that it's both easy to implement it in hardware and its software
implementation is also very fast. So let me explain how Salsa works. Well, Salsa
takes either 128 or 256-bit keys. I'll only explain the 128-bit version of Salsa.
So this is the seed. And then it also takes a nonce, just as before, which
happens to be 64 bits. And then it'll generate a large output. Now, how does it
actually work? Well, the function itself is defined as follows. Basically, given
the key and the nonce, it will generate a very long, well, a long pseudorandom
sequence, as long as necessary. And it'll do it by using this function that I'll denote by
H. This function H takes three inputs. Basically the key. Well, the seed k,
the nonce r, and then a counter that increments from step to step. So it goes
from zero to one, two, three, four as long as we need it to be. Okay? So basically,
by evaluating this H on this k r, but using this incrementing counter, we can get a
sequence that's as long as we want. So all I have to do is describe how this function
H works. Now, let me do that here for you. The way it works is as follows. Well, we
start off by expanding the states into something quite large which is 64 bytes
long, and we do that as follows. Basically we stick a constant at the beginning, so
there's tao zero, these are four bytes, it's a four byte constant, so the spec for
Salsa basically gives you the value for tao zero. Then we put k in which is
sixteen bytes. Then we put another constant. Again, this is four bytes. And
as I said, the spec basically specifies what this fixed constant is. Then we put
the nonce, which is eight bytes. Then we put the index. This is the counter zero,
one, two, three, four, which is another eight bytes. Then we put another constant
tau two, which is another four bytes. Then we put the key again, this is another
sixteen bytes. And then finally we put the third constant, tau three, which is
another four bytes. Okay so as I said, if you sum these up, you see that you get 64
bytes. So basically we've expanded the key and the nonce and the counter into 64
bytes. Basically the key is repeated twice I guess. And then what we do is we apply a
function, I'll call this functional little h. Okay, so we apply this function, little h.
And this is a function that's one to one so it maps 64 bytes to 64 bytes. It's a
completely invertible function, okay? So this function h is, as I say, it's an
invertable function. So given the input you can get the output and given the
output you can go back to the input. And it's designed specifically so it's a- easy
to implement in hardware and b- on an x86, it's extremely easy to implement because
x86 has this SSE2 instruction set which supports all the operations you need to do
for this function. It's very, very fast. As a result, Salsa has a very fast stream
cipher. And then it does this basically again and again. So it applies this
function h again and it gets another 64 bytes. And so on and so forth, basically
it does this ten times. Okay so the whole thing here, say repeats ten times, so
basically apply h ten times. And then by itself, this is actually not quite random.
It's not gonna look random because like we said, H is completely invertable. So given
this final output, it's very easy to just invert h and then go back to the original
inputs and then test that the input has the right structure. So you do one more
thing, which is to basically XOR the inputs and the final outputs. Actually,
sorry. It's not an XOR. It's actually an addition. So you do an addition word by
word. So if there are 64 bytes, you do a word-by-word addition four bytes at a
time, and finally you get the 64-byte output, and that's it. That's the whole
pseudo-random generator. So that, that's the whole function, little h. And as I
explained, this whole construction here is the function big H. And then you evaluate
big H by incrementing the counter I from zero, one, two, three onwards. And that
will give you a pseudo-random sequence that's as long as you need it to be. And
basically, there are no signifigant attacks on this. This has security that's
very close to two to the 128. We'll see what that means more precisely later on.
It's a very fast stream cipher, both in hardware and in software. And, as far as
we can tell, it seems to be unpredictable as required for a stream cipher. So I
should say that the eStream project actually has five stream ciphers like
this. I only chose Salsa 'cause I think it's the most elegant. But I can give you
some performance numbers here. So you can see, these are performance numbers on a
2.2 gigahertz, you know, x86 type machine. And you can see that RC4 actually is the
slowest. Because essentially, well it doesn't really take advantage of the
hardware. It only does byte operations. And so there's a lot of wasted cycles that
aren't being used. But the E-Stream candidates, both Salsa and other
candidate called Sosemanuk. I should say these are eStream finalists. These are
actually stream ciphers that are approved by the eStream project. You can see that
they have achieved a significant rate. This is 643 megabytes per second on this
architecture, more than enough for a movie and these are actually quite impressive
rates. And so now you've seen examples of two old stream ciphers that shouldn't be
used, including attacks on those stream ciphers. You've seen what the modern stream ciphers
look like with this nonce. And you see the performance numbers for these
modern stream ciphers so if you happen to need a stream cipher you could use one of
the eStream finalists. In particular you could use something like Salsa.