In this segment, we're gonna study the security of the ElGamal public key encryption system. So let me remind you that when we first presented the Diffie-Hellman protocol, we said that the security is based on the assumption that says that given G, G to the A, G to the B, it's difficult to compute the Diffie-Hellman secret, G to the AB. This is basically what the attacker has to do. He sees Alice's contribution. He sees Bob's contribution and then his goal is to figure out what the Diffie-Hellman secret is. And we said that this problem is difficult. The statement that the problem is difficult is what's called the computational Diffie-Hellman assumption. So, let's take this assumption more precisely. So, as usual we're going to look at a finite cyclic group of order N, so G is some group, in your head you should be thinking ZP star, but there could be other groups, for example, like an ellipctic curve group. And then we say that the computational Diffie-Hellman assumption. I've often used the shorthand CDH for Computational Diffie Hellman. We'll say this assumption holds in G if the following condition holds, namely for all efficient algorithms. If we choose a random generator, and then we choose random exponents A, B in ZN. Then when we give the algorithm G, G to the A, and G to the B, the probability that the algorithm will produce the Diffie Hellman secret is negligible. If this is true for all efficient algorithms, then we say that the CDH assumption holds for G. CDH, as I said, stands for Computational Diffie Hellman. As it turns out, this assumption is not ideal for analyzing the security of the ElGamal system. And instead I'm gonna go ahead and make a slightly stronger assumption called a hash Diffie-Hellman assumption. Okay. So what is hash Diffie-Hellman assumption? Again, we are focusing on a particular group G and now we're also gonna introduce a hash function H that maps pairs of elements in G into the key space of some symmetric encryption system. And then we say that the hash Diffie-Hellman a ssumption, or HDH for short, holds for this pair, G comma H for the group in the hash function if the following is true. Basically, if I choose a random generator and then I choose random exponents A and B in ZN and then I also choose a random R and K, then the following distributions are computationally indistinguishable. That is, if I give you G, G to the A, G to the B, and then this hash value, this should look familiar to you. This is the hash value that's used in the ElGamal system to derive the symmetric encryption key. What we're saying is that this distribution is indistinguishable from a distribution where you're also given. G, G to the A, G to the B. But now, instead of giving the hash, you're given just really random key. So what this assumption says is that the symmetric key that was derived during encryption in the ElGamal system, essentially is indistinguishable from a truly random key that is derived independently from the rest of the parameters of the system. It's a truly independent random key, looks basically the same as the key that we used, to derive from the G to the A and the G to the B. Now I'd like to point out that the Hash Diffie-Hellman assumption is actually a stronger assumption than the Computational Diffie-Hellman assumption. The way to see that is using the contra positive, that is suppose the Computational Diffie-Hellman assumption happens to be easy in the group G. Then I claim that in fact the Hash Diffie-Hellman assumption is also easy in the group G. In fact, I'll say for G, H and this is true in fact for all hash functions where the image of the hash function. It contains at least two elements. In other words all I want to say is that the Hash Diffie-Hellman assumption and it's easy for all hash functions to go match everything to a single point. Which is true for almost all hash functions of interest to us. So actually, this is a really simple statement to prove. Basically, if the Computational Diffie-Hellman assumption is easy, what that says is that, given G to the A and G to the B, I can actually calculate G to the AB myself. Cuz I know the hash function H, I can calculate this complete value here. So if you give me a tuple that's sampled from the left or sampled from the right. I can very easily tell where it's from. If it's sampled from the left, then once I've calculated G to the AB myself, I can just test that the last element in the tuple is, in fact, the hash of G to the B and G to the AB. If it is, I know the sample is from the left. If it isn't, I know the sample is from the right. Okay, so this gives me pretty high advantage, in distinguishing these two distributions. So CDH is easy, it's very easy to see that these distributions are distinguishable. And this basically says that if the Hash Diffie-Hellman is in fact hard, then CDH must also be hard. Which then we say, that therefore the Hash Diffie-Hellman is a stronger assumption. There might be group where CDH is hard, but HDH is not hard. Okay. So we say HDH is a stronger assumption. If you found this discussion of weak assumption versus strong assumption confusing, it's not really that important, it's fine to ignore it. The important thing that I want to show you is in fact that the Hash Diffie-Hellman assumption is sufficient to prove the semantic security of the ElGamal system. Before we do that let me quickly ask you the following puzzle just to make sure the Hash Diffie-Hellman assumption is clear. So imagine our key space is {0, 1} to the 128. So 128 bit strings and our hash function will map pairs of group elements into this 128 byte strings. Suppose it so happens that we chose a hash function Such that it always outputs strings that begin with zero. In other words, of the 128 bit strings the most significant bit of the output is always zero. If we chose such a hash function, does the Hash Diffie-Hellman assumption hold for this pair, G comma H. So the answer is no it doesn't hold. And the reason is because it now very easy to distinguish the two distributions that we have here. The distribution on the left an the distribution on the right. And the way you would distinguish the two is basically if I choose a truly random element in K, in {0, 1} to the 128, then mostly it can very well be zero with probability one half. However, the tuple that's given to me is chosen from the left distribution, then the most significant bit of the hash will always be zero and therefore if I see zero, I'm gonna say the distribution is a distribution on the left. If I see one, I'm gonna say the distribution is a distribution on the right. That will give me advantage one half in distinguishing these two distributions. And so as a result, the Hash Diffie-Hillman assumption is false in this case. So clearly you could see that, even though CDH might be hard in a certain group G, if you choose a bad hash function, then HDH will not hold for the pair G comma H. Okay. But suppose it so happens that we choose a group G and a hash function H for which the Hash Diffie Hellman assumption holds. And in fact, if you choose the hash function to be just SHA-256, for all we know, the Hash Diffie Hellman assumption holds in the group ZP star, and holds in elliptic curve groups. My goal then is to show you that ElGamal is semantically secure under the Hash Diffie-Hellman assumption. So let me quickly remind you how theElGamal system works. While we're gonna choose a random generator G, we're gonna choose a random 'a' in ZN, the public key is gonna be G, and G to the A, the secret key is simply gonna be A. And now here I wrote shorthand for the ElGamal encryption. Basically, what to encrypt, what we do is we choose a random B. We hash G as being H to the B. Remember this H to the B is the value G to the AB. This is the Diffie Hellman secret. The hash function gave us a symmetric encryption key K. We encrypt the message with K, and we output G to the B and the symmetric cipher text. To decrypt we have to value U, and the Diffie Hellman secret, G to the A. To derive a symmetric key, we decrypt the cipher text. And then we output the plaintext message m. So let's see how to argue that, in fact, ElGamal is semantically secure under this Hash Diffie-Hillman assumption is fairly simple. So well we have to argue, remember we had, in semantic security, we have these two experiments. One experiment, the attacker is given the encryption of m0. In the other experiment, the attacker is given the encryption of m1. So I wrote the two experiments here. Here you notice that the attacker starts by sending off the public key to the adversary. The adversary then chooses two equal length messages m0 and m1. And then if he gets the ElGamal encryption of M0, and a kind of rough shorthand for what ElGamal ciphertext is, okay? Similarly, in encryption one, he simply gets the encryption of M1 instead of M0, and everything else is the same about these two experiments. Now, because of the Hash Diffie-Hellman assumption, we know that even though the attacker got to see G, G to the a and G to the b, we know that the output of the hash of G to the b, G to the ab is indistinguishable from random. Therefore, if we replace the actual hash function by a truly generated random key K that's independent of everything else, by the Hash Diffie-Hellman assumption, the attacker can't distinguish these two games. And again, it's a simple exercise for you to show that if he could distinguish these two games, then he would break the Hash Diffie-Hellman assumption. But then the same is true for experiment one. The attacker also could not distinguish the output of the hash function from a truly random key, that was used the generate the challenge cipher text. Okay. So now basically we look at these two experiments and we realize that, wait a minute, what's going on in these two experiments? Basically everything is the same except in one experiment, the attacker gets the encryption under a symmetric encryption system of m0. In the other one, he gets the encryption under a symmetric encryption system of m1 and the key is chosen at random independently over everything else. So by the fact that the symmetric encryption system is semantically secure, these two games are indistinguishable. If they were distinguishable, then the attacker could break the semantic security of this symmetric encryption system. So now we kinda prove this, you know, this chain of equivalences. And that proves that the original games, in fact, are indistinguishable, computationally indistinguishable. And therefore, the ElGamal system is semantically secure. Okay so it's like I said a very simple proof by pictures and you can make this into a rigorous proof without too much work. But semantic security isn't enough, what we really want is chosen ciphertext security, and the question is ElGamal chosen ciphertext secure? So it turns out to prove the chosen ciphertext security of ElGamal we actually need a stronger assumption, Hash Diffie-Hellman or Computational Diffie-Hellman are actually not enough to prove chosen ciphertext security of the system as far as we know. So to prove chosen-ciphertext security, I'm going to introduce a new assumption called Interactive Diffie Hellman assumption. And actually, technically we really don't like this assumption. And we're going to try to get rid of this, later on. But for now, we just want to analyze the security of the basic ElGamal system against chosen ciphertext attack. So to argue that it is chosen ciphertext secure, here is what the assumption says. Basically the challenger starts, you know, plays a game with the adversary, he generates a random G, generates two exponents, and then he says to the adversary as usual, G, G to the a and G to the b. Now the adversary's goal is basically to figure out the Diffie-Helmman's secrets, mainly g to the ab. He outputs the value of V and he wins the game if V happens to be equal to G to the AB. So, so far this looks identical to the Computational Diffie-Hellman assumption, there's no difference - this is the Computational Diffie-Hellman assumption except in Interactive Diffie-Hellman would give the attacker a little bit more power. So because the attacker has more power, it's harder to satisfy the assumption, which is why we say that this assumption is stronger than Computational Diffie-Hellman. Now what is this power to be given? We are given the ability to make queries. In particular, he gets to submit two elements from the group G, so U1, V1 disappear from the group G. And then he's told whether U1 to the A is equal to V1, so he gets one. If you wanted the A equals to V1 get zero, otherwise it's kind of a bizarre type of query. What, how does it be possibly help the attacker? But it turns out we need to be able to answer these types of queries to the adversary in order to be able to prove chosen ciphertext security. And in fact, he can do these type of queries again and again and again. So he can issue as many queries like these as he wants and then the assumption says that despite all these queries he still can't figure out the Diffie-Hellman secret, namely despite making all these queries, the probability that he outputs the Diffie-Hellman secret is negligible. Okay. So clearly if this assumption holds, that the Computational Diffie-Hellman assumption holds, because here, the adversary has more power, and as a result we say that this assumption is stronger that Computational Diffie-Hellman, the thing, we really don't like about this assumption, is the fact, that, it's interactive, and that the adversary is allowed to make queries to the challenger, and as I said, we're gonna trying to get rid of this interaction later on. But for now I'll tell you that under this Interactive Diffie-Hellman assumption and under the assumption that the symmetric encryption system provides authenticated encryption, and under the assumption that the hash function is kind of an ideal hash function, what we call the random oracle, then in fact the ElGamal system is chosen ciphertext secure and again this RO here denotes the fact that it's in the random oracle model. Which is not important, so much for our purposes. The main thing to remember that under kind of this weird, interactive assumption we can actually prove that ElGamal is a chosen ciphertext secure. And as far as we know this assumption actually holds for the group ZP star. So what we're gonna do next is basically answer this question of whether we can get rid of the interactive assumption. Can we prove security strictly based on CDH? And similarly can we proof security without relying on random oracles? Just you know without having to assume, that the hash function is ideal. Just you know, can we prove security using a concrete hash function. And we'll do that very briefly in the next segment.