In this lecture, we're going to study the important primitive of pseudorandom functions and their practical instantiation via block ciphers. Informally, a pseudorandom function is something that looks like a random function. So before we can talk more about pseudorandom functions let's first make sure we understand something about random functions. Let's define Funcn to be a set containing all functions mapping strings of length n to strings of length n. The first thing we can ask is, how big is this set? How big is the set Funcn? Well, we can specify any function by its inputs and their corresponding outputs. So here, for the case of n equals 3, I've drawn a table specifying for each possible three bit input, a corresponding three bit output. So the string 000 maps onto the string 011 and so on. Now in fact, you can notice that the first column here is redundant. We don't need to store the list of inputs in their lexicographic order, we can instead just let that be implicit by the ordering in the table itself. So now we just have a single column, a large array containing in each entry a three-bit string. And just as before, we can say that the first input, i.e 000, maps onto the first value in this array, i.e 010, and so on, up until the last possible 3-bit input, 111. Which maps onto the last element of this array, in this case, 000. So we have this array containing 3-bit values. And this array is of size, two to the third. Right, because we have all possible three-bit inputs. There are two to the three such inputs. So we have a total of eight inputs or eight values, rather in our array. In general, we can represent a function from n bit inputs on to n bit outputs by a similar array. Where the array is going to be now of length 2 to the n. With one entry corresponding to every possible n-bit input. And each entry of that array containing an n-bit value, representing the corresponding output. So we can represent the function in Funcn using exactly n times 2 to the n bits. Again that's an array of length 2 to the n with an n-bid entry in each element of the array. Since we have also a correspondence in the other direction, that is, if I take any array of length 2 to the n, containing n-bit strings in each entry, that defines a function in Funcn. And this is in fact a one to one mapping from arrays of length n sorry, arrays of length 2 to the n containing strings of length n to functions from n-bit inputs to n-bit outputs. That tells us that the size of Funcn is exactly equal to the number of strings of length n times 2 to the n. That is the size the Funcn is just 2 to the n times 2 to the n. Note that this is huge, this is doubly exponential in our parameter n. But never the less it's a finite set. So there's nothing funny going on we don't have to worry about continuous probabilities or continuous random variables, or anything like that, we're still on the setting of a very large, but still finite set. Now when I talk about a random function, what I really mean is a uniform function. But I'm just allowing myself to be a bit informal. So what do we mean more carefully speaking by a random function or by a uniform function? Well, very simply this means that we choose a uniform function in this finite set, Funcn. So when you talk about a random function from n-bit inputs to n-bit outputs, I mean we just pick a uniform function, a uniform member of this set Funcn. Its a little bit difficult to think about this or to conceptualize what that means. So we can think about in an alternate way. This is exactly equivalent to just filling out the function table we saw on the previous slide, with a uniform value in every entry. That is, we can imagine starting with a blank table, and then for each possible input that is each possible x of length n, we simply choose the corresponding entry in the table corresponding to the value of the function on that input. With a uniform value in 0,1 to the n. That is we imagine having a table of length 2 to the n. And we simply fill in every entry with a random or a uniform n-bit string. Now in fact rather than thinking about this being done all at once, we can also equivalently think about this being done on the fly as values are needed. That is I can it's equivalent to me choosing a uniform f, and then, probing the value of that function on a bunch of different points, x1, x2, x3. That's equivalent to starting with an empty table, and then when asked for the value of f of x1, I simply fill in the value with that position of the table with a uniform n-bit string. And so on for x2 and x3. Just being careful that if I'm ever asked for the value of f of x1 again, I'm consistent and I choose to return the same value that I returned before. So the point is that there are these two equivalent, or three I guess equivalent ways of thinking about what it means to choose a uniform function from n-bit inputs to n-bit outputs. Now again, a pseudorandom function, is intuitively a function that looks like a uniform function or a random function. But as in our discussion of pseudorandom generators. It doesn't make sense to talk about any fixed function being pseudorandom. And what we're going to do, to do instead is look at keyed functions, which allow us to define a distribution over functions from n-bit inputs to n-bit outputs. So let's let capital F be a function that now takes two inputs and produces one output. And we'll always assume that this F is efficiently computable. That is, it's computable in polynomial time in the length of both of its inputs. We'll define the reduced function F sub k, to be the function mapping n-bit strings, n-bit inputs to n-bit outputs, where we define it by F sub k of x is equal to F of k, x. So we're just basically imagining fixing the first input to capital F, to some fixed value k. Now this fixed first input is going to be called a key and that's why I've denoted it by k. We're going to assume throughout just for simplicity that F is length-preserving. This means that F k of x is only defined if the length of k is equal to the length of x. In which case the length of the output is equal to the length of each of the inputs. Thinking a little bit ahead and thinking in terms of cryptography, this just means that for every value of the security parameter n, we have F defined for keys of length n and inputs of length n, and then producing outputs of length n as well. Now the important thing to know here, is that if we choose a uniform key of length n. Then that is equivalent to choosing the function F sub k, mapping endbit inputs to n-bit outputs. So by choosing a uniform key, we're choosing a function according to some distribution. We can now define a little bit more carefully what we mean by a pseudorandom function. So we'll say that this two input function F, is a pseudorandom function. If F sub k for uniform choice of the key is indistinguishable from a uniform function. More formally, we require that for every polynomial time distinguisher D. D cannot distinguish between the case when it's given what's called oracle access to F sub k, or when it's given oracle access to a completely uniform function f. That is on the left hand side here we have the probability that D returns 1. When D is given access to the function F sub k. Where k is chosen uniformly, giving D access to this function means that we allow D to provide inputs to the function and get back the corresponding outputs, but D can't look inside the function as it were. And in particular can't see anything about the value k being used other than what it can potentially learn from the inputs and outputs that it sees. We compared that to the probability that D outputs 1 when it's given oracle access to a function F chosen uniformly from the set of all functions. On n-bit inputs, and producing n-bit outputs. And we'll, we'll define that capital F is a pseudorandom function if those probabilities are close for every polynomial time D. That is, if the difference between the probabilities that D outputs 1 in those two cases is negligible. It might be a little bit easier to understand the definition, at least on an intuitive level, by looking at a picture. So here we have a polynomial time, distinguisher D, and that distinguisher is going to try to tell the difference between two possibilities. In the first case, what we do is we imagine choosing a function, f uniformly at random from the set of all functions on n-bit inputs and having n-bit outputs. What it means for the attacker to interact with this function is that the attacker can specify inputs of his choice, and get back the corresponding outputs. It can do this adaptively, that is it can choose its next input x2 based on the output f of x1 that it receives. And it can do this as many times as it likes. One thing I'll note is that there's no real point for the attacker to ever repeat an input because it knows that it'll get back the same output. The only randomness here is in the initial choice of F. But once we choose an F and fix it inside the box, f is then a deterministic function that always returns the same output when given any input. We're going to compare that situation to a situation where what we do is choose a uniform key k and then put the function F sub k inside of a box. And we allow the attacker to interact with that box. Again, by submitting inputs and getting the corresponding outputs. As in the first case, it can do this as many times as it likes, adaptively fusing whatever inputs it wants to feed into the box. And we'll say that capital F is a pseudorandom function. If these two worlds, world 0 and world 1, are indistinguishable from the point of the attacker. That is, the attacker can't tell whether it's interacting with a black box containing a, an F chosen uniformly from the set of all functions in Funcn. Or if it's interacting with a box containing F sub k. For a uniformly chosen key k. One thing I want to stress is that in the first case in, in world 0, that function f is being chosen from a huge space of possibilities. Remember the size of func n was 2 to the n times 2 to the n. W explanation to n. Where then the bottom the number of possibilities for f k is at most 2 to the n because k is an n-bit key. And so there are, there are most 2 to the n choices for k. So on the top we have a distribution over a doubly exponential size set of functions. And on the bottom we have the sub-distribution over an exponential size set of functions. Nevertheless capital F is pseudorandom if the attacker can't distinguish these two possibilities apart. [SOUND] Now we can try to connect our notion of pseudorandom functions with the earlier notion of pseudorandom generators. And if you think about it a pseudorandom function is actually much stronger than a pseudorandom generator. And it immediately implies, in particular, a construction of a pseudorandom generator. That is if we have pseudo random function f then we can very easily define the following pseudo random generator. G on input of seed k we'll just output k of 0, 0, 0, 0 concatenated with F k of 0, 0, 0, 1. Right, so intuitively F sub k for a uniform k, looks like a random function, and if that's the case, then applying a uniform function to the 0 input. And then applying a uniform function to the one input is equivalent to choosing two uniform n-bit values, n-bit outputs. And so, what we get is a pseudorandom generator G that maps an n-bit seed to a 2 n-bit output. In fact there's no reason to restrict ourselves to only evaluating F twice. We can define a, a pseudorandom generator G with larger expansion by simply applying F sub k to more values, to more distinct values. So here I'm just defining G of k as f, f, k applied to 0. F, k applied to 1. F,k applied to 2. Where we imagine we're encoding those integers as n-bit strings. More generally we can in fact view a pseudorandom function as a pseudorandom generator with exponentially long output. And that furthermore allows random access to individual chunks of that output, that is we can view the function F sub k. Eh, by looking at the entire function table for F k, that is the value of F k on each possible embed input, and concatenating those all together, right? Intuitively and this is, I'll warn you that this is not quite formally correct, this is just intuition. But intuitively the the fact that Fk looks just like a uniform function, means that the function table for Fk looks like a function table for a uniform function. So just by writing out all possible values of the function on all possible inputs we get some huge table of random looking values. Where moreover, we can access anyone of those values, by simply evaluating Fk on the corresponding input. I next I want to talk about the notion of pseudorandom permutations. If we let F be a length preserving keyed function, as before. Then we'll say that F is furthermore a keyed permutation if the follow two conditions hold. First of all, we'll require that the function Fk, which remember is a function from n-bit inputs to n-bit outputs, should be a bijection for every possible key K. That is the function Fk is one to one and onto. For every possible choice of k, this furthermore this essentially means that F k is a permutation. We'll also require that F k inverse is efficiently computable. If F k is a bijection, then F k has a well-defined inverse, and we'll just add the additional efficiency requirement that we can compute F k inverse given the key k. We'll say then that F is a pseudorandom permutation. If F, k for uniform key k, is indistinguishable from a uniform permutation chosen from the set of all permutations on n bit strings. Block ciphers are very important cryptographic primitives. And these can be viewed as giving us practical instantiations of pseudorandom permutations. That is, if we want to use a pseudorandom permutation in some cryptographic construction. Then the right thing to do is to instantiate that pseudo random permutation with a block cypher. Now for a block cypher there are no asymptotics involved. A block cipher is typically defined on some fixed key length and fixed block length, although there are cases where they're allowed to vary slightly and it gives a fixed output length. So rather than having this function F being defined on all length inputs and all length keys. We'll only have it defined for keys of a certain length. Here denoted by n and what we'll call a key length. And defined only on certain input lengths m, where m is what we'll call the block length. And, because we want this F to be instantiating a pseudorandom permutation. We're going to require that F sub k be hard to distinguish from a uniform permutation F. But actually for the case of block ciphers, we're going to be more stringent. So, it's not going to be sufficient for F sub k for uniform k to be indistinguishable from F. For attackers running in polynomial time, we actually wanted to be hard even for attackers running in time about 2 to the N. That is at the best attack in terms of distinguishing Fk from a uniformed permutation. Should be an exhaustive key search attack. Where the attacker tries to see whether any key k is a possibility that is matches with the inputs and outputs that it seem. It's kind of amazing actually that we have constructions that appear to satisfy this notion. We can't prove that they satisfy the notion of being a, a pseudo-random permutation. However, after many, many years of study, no one's been able to show otherwise. And it's therefore, reasonable to conjecture that they do indeed provide a secure instantiation of pseudorandom permutations. Perhaps the best known of these is AES the advanced encryption standard. AES was standardized by NIST in 2000 based on a public worldwide competition lasting over three years. Essentially teams of cryptographers were allowed to submit their proposals for block cipher constructions. And after three years of study and analysis, NIST chose AES as the block cipher to be standardized. AES provides a block length of 128 bits, and it offers three different key lengths 128, 192, or 256 bits. And again, as far as we know today AES provides the security claimed on the previous slide. That is AES with 128 bit keys seems to provide about 2 to the 128 security, that is security against attackers running in time 2 to the 128, and really that's very that's sufficient. For, any kind of attack you can imagine in practice. I'll mention that AES is like I said, one of the best known block ciphers out there, and there's really no reason to use anything else. You'll find, if you look, that there are other block ciphers available. however, I can really just only recommend using the de, using AES as the default block cipher extra in any construction. Next time, we'll show how to use pseudo random functions aka block ciphers to construct CPA-secure encryption schemes.