0:00

So now that we understand what block ciphers are, let's look at a classic

example called the Data Encryption Standard. So, just a quick reminder. Block

ciphers basically map N bits of input to N bits of output. And we talked about two

canonical examples, triple DES and AES. In this segment, we're gonna talk about DES,

and we'll talk about triple DES, actually, in the next segment. And then I also

mentioned before that block ciphers are often built by iteration. In particular,

we're gonna look at block ciphers that are built by a form of iteration where a key K

is first expanded into a bunch of round keys. And then a round function is applied

to the input message, again and again and again. And essentially, after all these

round functions are applied, we obtain the resulting cipher text, okay? And again,

what we're gonna look at, how DES, the data encryption standard, uses this

format. I just wanna be clear that, in fact, to specify a block cipher of this

type, one needs to specify the key expansion mechanism, and one needs to

specify the round function. In the segment here, I'm gonna focus on the round

function, and I'm not gonna talk much about key expansion. But I just wanted to

mention that, in fact, key expansion is also a big part of describing how block

cipher works. Okay, so let's talk about the history of DES. Essentially, in the

early 1970s, IBM realized that their customers are demanding some form of

encryption. And so they formed a crypto group, and the head of that group, was

Horst Feistel, who, in the early 70s, designed a cipher called Lucifer. Now,

it's interesting. In fact Lucifer had a number of variations but one of the later

variations in fact had a key length that was 128 bits and a block length that's

also 128 bits. Okay, in 1973 the governor realized that it's buying many commercial

off-the-shelf computers and so it wanted its suppliers to actually have a good grip

to algorithm that they could use in products sold to the government. So in

1973 the National Bureau of Standards as it was called at the time put out a

request for proposals for a block cipher that is going to become a federal

standard. And in fact IBM submitted a variant of Lucifer. That variant actually

went through some modification during the standardization process and then finally

in 1976, the national bureau standard adopted DES as a federal standard. And, in

fact, for DES it's interesting that the key length was far reduced from Lucifer.

It's only 56 bits. And the block length was also reduced to 64 bits. And in

fact, these decisions, especially the decision to reduce the key length, is

going to be a key length yield of DES and was a source of many complaints over its

life. In particular, already back in 1997, DES was broken by exhaustive search

meaning that a machine was able to search through all two to the 56 possible keys to

recover a particular challenge key. And in fact we're going to talk about exhaustive

search quite a bit. It's quite an interesting question, and there are

various ways to defend against exhaustive search. And basically this 1997 experiment

kinda spelled the doom of DES. It meant that DES is itself, is no longer secure.

And as a result, the National Institute of Standards, as it became known, issued a

request for proposals. For our next generation's block cipher standard and in

2000 it standardized on a cipher called Rijndael. Which became the advanced encryption

standard, AES. And we'll talk about AES later on. But in this segment, I wanna

describe how DES works. Now, DES is a cipher, it's an amazingly successful

cipher. It's been used in the banking industry. In fact, there's a classic

network called the Electronic Clearinghouse, which banks use to clear

checks with one another. And DES is used for integrity in those transactions. It's

also used in commerce. In fact, it was very popular up until recently, as the

main encryption mechanism for the web. Of course, now, that's been replaced with AES

and other ciphers. Overall, it's a very successful cipher in terms of

deployment. DES also has a very rich history of attacks, which we'll talk about

in the next segment. Okay, so now, let's talking about the construction of DES. So,

the core idea behind DES is what's called a Feistel network, due to Horst Feistel.

And basically, it's a very clever idea for building the block cipher out of arbitrary

functions, F1 to FD. Okay so imagine we have these functions F1 to FD

that happens to match n bits to n bits. Now these are arbitrary functions,

they don't have to be invertible or anything. What we want to do is build an

invertible function out of those D functions and the way we'll do it is by

building a new function we'll call capital F that maps 2n bits to 2n bits.

And the construction is described right here. So here we have our inputs. You

notice, there are two blocks of n bits. In other words, the input is actually

2n bits. The R and L stand for right and left. Typically, people describe a

Feistel network from top to bottom. In which case, these n bits really would be

right and left. But here it is more convenient for me to describe it

horizontally. So if we follow the R inputs. You realize it basically gets

copied into the L output, without any change at all. Okay? However, the L

inputs, is changed somewhat. What happens is, basically, the R inputs is fit into

the function F1 and the result is then XORed with L0 and that becomes the new R1

input. Okay, so this is called one round of a Feistel network, and is done using

the function F1. Now we do this again with another round of the Feistel network

with the function F2, and we do it again and again and again, 'till we get to the

last round, and we do it again with the function FD. And finally, the output is

R<u>d L<u>d. Okay. So, if you like, we can write this in symbols. That basically, L<u>i is</u></u></u>

simply equal to R<u>i-1. And R<u>i, let's see, that's the more complicated</u></u>

one. R<u>i is equal, well, let's just follow the lines here. R<u>i is just equal to F<u>i</u></u></u>

applied to, R<u>i-1 XOR L<u>i. Okay? So this, and this is basically, i goes</u></u>

from, 1 to d. So this is the, equation that defines a Feistel network, mapping a

2n bit input to 2n bit outputs. So here we have the, again, I just copied the

picture of the Feistel network. And the amazing claim is that, in fact, it doesn't

matter which functions you give me. For any functions, F1 to FD that you give me,

the resulting Feistel network function is, in fact, invertible. And the way we're

going to prove that is basically we're going to construct an inverse, because not

only is it invertible, it's efficiently invertible. So let's see. So let's look at

one round of a Feistel network, so here, this is the inputs, R<u>i L<u>i, and this</u></u>

is the output R<u>i+1, L<u>i+1. And now what I'm going to ask you is to invert this.</u></u>

So let's see. So suppose now the input that we're given is R<u>i+1, L<u>i+1 and we want to</u></u>

compute R<u>i L<u>i. So we want to compute the round in the reverse direction. So let's</u></u>

see if we can do it. Well, let's look at R<u>i. So, R<u>i is very easy. Basically, R<u>i is</u></u></u>

just equal to L<u>i+1. So L<u>i+1 just becomes R<u>i. But now, let me ask you, to write the</u></u></u>

expression for L<u>i in terms of R<u>i+1, and L<u>i+1.</u></u></u>

7:13

So I hope everybody sees that, basically, L<u>i+1 is fed into the function F<u>i+1.</u></u>

The result is XORed with R<u>i+1, and that gives the L<u>i input.</u></u>

So this the inverse of one round of a Feistel network.

And if we draw this as a diagram, let's just write the picture for the inverse.

So here you notice the input is R<u>i+1, L<u>i+1 and the output is R<u>i L<u>i. Right?</u></u></u></u>

So we're computing, we're inverting, the rounds. So you notice that the inverse of

a Feistel round looks pretty much the same as the Feistel round in the

forward direction. It's literally, you know, just for a technical reason, it's

kinda the mirror image of one another. But it's basically, the same construct. And

when we put these inverted rounds back together. We essentially get the inverse

of the entire Feistel network. So you notice we start off with the round number D

with the inverse of round number D, then we do the inverse of round number D-1

and so on and so forth until we get to the inverse of the first round. And

we get our final outputs which is R<u>0, L<u>0, like this is the inputs and we manage to</u></u>

invert basically R<u>d, L<u>d and get R<u>0, L<u>0 and the interesting thing is that</u></u></u></u>

basically these inversion circuits look pretty much the same as the encryption

circuits and the only difference is that the functions are applied in reverse order.

Right we started with F<u>d and ended with F<u>1. Whereas when we were encrypting, we</u></u>

started with F<u>1 and ended with F<u>d. So, for hardware designers, this is very</u></u>

attractive, because, basically, if you wanna save hardware, you realize that your

encryption hardware is identical to your decryption hardware. So you only have to

implement one algorithm, and you get both algorithms the same way. The only

difference is that the functions are applied in reverse order. Okay. So this

Feistel mechanism is a general method for building invertible functions from

arbitrary functions F<u>1 to F<u>d and in fact, it's used in many different block ciphers.</u></u>

Although, interestingly, it's not actually used in AES. So, there are many other

block ciphers that use a Feistel network. Or, of course, they differ from

DES in the functions F<u>1 to F<u>d. But AES actually uses a completely different type</u></u>

of structure that's actually not a Feistel network. We'll see how AES

works in a couple of segments. So now that we know what Feistel networks are, let

me mention an important theorem about the theory of Feistel networks that shows

why they're a good idea. This theorem is due to Luby and Rackoff back in 1985, and it

says the following. Suppose I have a function that is a secure pseudorandom

function, okay. So it's indistinguishable from random and happens to act on N bits.

So it maps N bits to N bits and uses a key K. Then, it turns out, that if you use

this function in three rounds of a Feistel network. What you end up with is a secure

pseudo random permutation. In other words, what you end up with is an invertible

function that is indistinguishable from a truly random invertible function. And I

hope you remember that the true definition of a secure block cipher is that it needs

to be a secure pseudo random permutation. So what this theorem says, is that if you

start with a secure pseudo random function, you end up with a secure block

cipher. Basically, that's what this is. And let me explain in a little bit more

detail what's actually going on here. So, essentially, the PRF is used in every

round of the Feistel network. So, in other words, here, what's actually

computed is the PRF using one secret key, K0. Here, what's computed is the PRF

using a different secret key, of course applied to R1. And here we have yet

another secret key, K1 applied, K2 applied to R2. And you notice, this is why,

basically this Feistel construction, uses keys in K cubed. In other words, it

uses three independent keys. So it's very important that the keys are actually

independent. So really, we need three, independent keys. And then we end up with

a secure pseudorandom permutation. Okay, so that's the theory behind Feistel

networks. And now that we understand that, we can actually look at the specifics of DES.

So DES is basically a sixteen round Feistel network, okay. So there are

functions F1 to F16 that map 32 bits to 32 bits, and as a result, the DES itself

acts on 64 bit blocks, 2x32. Now the sixteen round functions in DES are

actually all derived from a single function F. Just used with different keys.

So in fact, these are the different round keys. So K<u>i is, a round key. And it's</u>

basically derived from the key K, derived from the 56 bit DES key K. Okay and I'll

describe what this function F is in just a minute. But basically that, you see that

by using sixteen different round keys, we get sixteen different round functions. And

that gives us the Feistel network. So just on a high level how the DES works

basically you have a 64 bit input. The first thing it does is, this initial

permutation that just permutes the 64 bits around. Namely it maps bit number one to

bit number six. Bit number two to bit number seventeen, and so on. This is not

for security reasons, this is just specified in the standard. Then we go into

the sixteen round Feistel network. That actually, you now know how it works.

Basically uses the function F1 to F16, as specified before. And then, basically we

have another permutation, it's called the final permutation. That's just the inverse

of the initial permutation. Again, it just permutes bits around, this is not

necessary for security reasons. And then we finally get, the final outputs. Okay.

Now, as we said, there's a key expansion step, which I'm not gonna describe. But

basically, this 56-bit DES key is expanded into these rounds keys. Where each round

key, is 48 bits. Okay, so we have sixteen 48 bit round keys, and they're basically

used in the sixteen rounds of DES. And then when you want to invert the cipher,

all you do is you use these, round keys, these sixteen round keys, in reverse

order. Okay, so now that we understand the DES structure, the only thing that's left

to do is specify the function, capital F. So let me explain how this function works.

So basically, it takes, as inputs, its, 32 bit value, let's call it X. But in

reality, you remember, this is R<u>0, R<u>1, R-2, R<u>3, so on and so</u></u></u>

forth. These are 32 bit values. And then it takes, also, a 48 bit round key. So

here we have our key K<u>i, which happens to be 48 bits. The first thing it does, is it</u>

goes through an expansion box. And this expansion box basically take thirty two

bits, and maps them into 48 bits. Now, all the expansion box does is just replicates

some bits, and move other bits around. So, for example, bit #1 of X is replicated

into positions 2 and 48 in the output. Bit #2 of X is positioned in as bit

#3 of the output. And so on and so forth, just by replicating some of the

bits of X, we expand the input into 48 bits. The next thing we do is we compute

an XOR with the round key. Sometimes people say that cryptographers

only compute XORs. This is an example of that, where, well, we just do XORs in this

function. And then comes the magic of DES, where, actually, these 48 bits are broken

into eight groups of six bits. Six, seven, eight. And so let me draw, and then what

happens is, so yes. Each one of these, each one of these wires is, six bits. And

then they, they go into what, what are called S boxes. And I'll explain S boxes

in just a minute. The S boxes are kind of the smarts of, DES. And then, the S

box is basically a map, six bits to four bits. So, the outputs of the S boxes are

these four bits. They're collected. This gives up 32 bits, right? Eight groups of

four bits, gives us 32 bits and then finally this is fed into yet another

permutation which just maps the bits around. So, for example bit number one

will go to bit number nine, bit number two will go to bit number fifteen and so on.

So it just permutes the 32 bits around and that's the final 32 bit output of this F function.

Okay? So by using different round keys, essentially, we get different

round functions, and that's how we form the sixteen round functions of DES. Now,

the only thing that, left to specify are these S boxes. So the S boxes, literally,

are just functions from six bits to four bits. They are just implemented as a look

up table. Right? So describing a function from six bits to four bits basically

amounts to writing the output of the function on all two to the six possible inputs.

Two to the six is 64. So we just have a table that literally contains 64 values.

Where each value is four bits. So here is an example, this happens to be the

fifth S box, and you see that this is a table that contains 64 values right?

It's four by sixteen. So, 64 values. For example, if you wanna look at, the output,

that corresponds to 0-1-1-0-1-1. Okay, then you look at these two bits. This is 01,

and these four bits are 1101, and you see that the output is 1001. The four bits

output 1001. Okay, so the S boxes are just implemented as these tables.

Now the question is, how are these S boxes chosen. How are these tables actually chosen by

the designers of this. So to give you some intuitions for that, lets start with a

very bad choice for S boxes. So imagine the S boxes were linear. What do I mean by

that? I mean that imagine that these six bit inputs literally were just

XORed with one another in different ways to produce the four bit outputs.

Okay, another way of writing that is that we can write the S box as a matrix vector product.

So here you have the matrix Ai. And the vector, the six bit vector X.

And you can see that, if we write this matrix vector product, basically, we take the

inner product of this vector with the input vector. Remember, these are all bits.

So the six bits vector inner product. Another six bit vector, and we do

that modulo two, you realize, basically, what we're computing is X2 XOR X3.

Right? Because only position two and position three have 1's in it.

And similarly the next, inner product will produce X1 XOR X4 XOR X5 and so on and

so forth. Okay. So you can literally see that if the S boxes are implemented this

way. Then, all they do, is just apply the matrix A to the input vector X. Which is

why we say, that in this case the S boxes are completely linear. Now, I claimed that

in fact that if the S boxes were linear, then DES would be totally insecure. The reason is,

if the S boxes are linear, then all that DES does is just compute XOR of various

things and permute and shuffle bits around. So it's just XORs and bit

permutations, which means that as a result, all of DES is just a linear

function. In other words, there will be a Matrix B. Of these dimensions. Basically,

it's a matrix B that has width 832. Basically what I will do is I will write

the 64 bit message plus the sixteen round keys as one long vector. Alright, so the

message is 64 bits and there are sixteen round keys. Each one is 48 and that, if

you do the math, it's basically 832. Okay? So I write these values, the keys and the

message, as one long vector and then there will be this matrix that essentially when

you compute these matrix vector products. Essentially you get the different bits of

the ciphertext. So there's 64 of these rows and as a result, you get 64 bits of

ciphertext. Okay, so this is what it means for DES to be linear. So if you

think a little bit about this, you realize that the S boxes are the only nonlinear

part of DES. So if the S boxes were linear, then the entire circuit is linear

and therefore can be expressed as this matrix. Now if that's the case then DES

would be terrible as a secure pseudorandom permutation. And let me give you a very

simple example. Basically if you did the XOR of three outputs of DES, well

let's think what that means. Basically we would be looking at B times, the matrix B

that defines DES, times, one vector XOR B times another vector,

XOR B times a third vector. We could take the B out of the parentheses so

we'd be basically doing B times this vector over here. But of course K XOR K XOR K,

this is just K. And so if you think about what that means, basically we

just got back DES of K at the point M1 XOR M2 XOR M3. But this means that now DES

has this horrible relation. That can be tested. Right? So, basically, if you

XOR the output of three values, M1, M2, M3, you'll get the value of

DES, at the point M1 XOR M2 XOR M3. Now this is not a relation that is going to hold

for a random function. A random function is not going to satisfy this equality.

And so you get a very easy test to tell you that DES is not a random function.

In fact, maybe you can take that as a small exercise. It's not even difficult to see,

that given enough input output pairs, you can literally recover the entire secret key.

Yeah. You just need 832 input output pairs, and you'll be able recover the

entire secret key. And so if the S boxes were linear, DES would be completely

insecure. It turns out, actually, even if the S boxes were close to being linear. In

other words, the S boxes were linear most of the time. So maybe for 60 out of the 64

inputs, the S boxes were linear. It turns out that would also be enough to break

DES, and we're gonna see why later on. In particular, if you choose the S boxes at

random, it turns out they'll tend to be somewhat close to linear functions. As a

result, you'll be able to totally break DES. You'll just be able to recover the

key, in basically, very little time. And so, the designers of DES actually

specified a number of rules they use for choosing the S boxes. And it's not

surprising, the first rule is that these functions are far away from being linear.

Okay. So, in other words, there is no function that agrees with a large fraction

of the outputs of the S box. And then there are all these other rules, for

example, there are exactly four to one maps, right? So, every output has exactly

four pre-images and so on and so forth. So we understand now why they chose the S

boxes they way they did and in fact its all done to defeat certain attacks on DES.

Okay. So that's the end of the description of DES, and in the next few

segments we are going to look at the security of DES.