0:00

In this video, we're going to look at the equilibria of infinitely repeated games.

So in order to talk about equilibria, we need to begin by asking what are we going

to mean by a pure strategy in an infinitely repeated game? And you may like

at this point to pause the video and see if you can answer that question before I

go on and tell you the answer. If you need a hint, you should remember that the rule

of thumb that we've used to define strategies as we've changed game

representations has been to say that a pure strategy should be whatever you need

to tell another person to play in the game on your behalf and to end up doing exactly

what you would have done. So a pure strategy has always kind of been the

policy that you would, you would follow. Making all of the decisions in, in the

same way as you would have actually done it. So whatever you would need to

communicate about that policy. Well, in an infinitely repeated game your strategy is

going to be a choice of action at every decision point that you would have. which

means an action that you would take every stage game. And bear in mind, that when

you take those actions, you actually get to reason about everything that you've

seen before in the game which is to say, you can remember all of your own previous

actions and you can also remember all of the actions in previous stage games by

other players. So, so really, your pure strategy space is mapping from every

history to an action choice that you would make. Clearly, this is an infinite number

of actions. So, unlike the previous games we've looked at, extensive form games and

normal form games, you're not even going to have a finite set of pure strategies in

an infinitely-repeated game. Let me give you some examples of some famous pure

strategies in infinitely-repeated games, nevertheless, to give you the sense that

we can say interesting things about pure strategies, even though the set of

possible pure strategies is. Is infinite. So think about, playing repeated

prisoner's dilemma, so , I'm going to play the prisoner's dilemma game infinitely.

One famous strategy is called tit-for-tat. Turns out that they were famous

competitions where people submitted programs that played a prisoner's dilemma

very repeatedly and then they looked at how these programs did against each other.

And, tit-for-tat was a stradegy that did famously well in those competitions. The

way it works is it starts out cooperating, and then if it observes that it's opponent

defect every defects in the previous round the opponent defected, then tit-for-tat

defects in the next round then it goes back to cooperation. So then, if it

defected but its opponent cooperated, then it'll respond to that by cooperating. So

if tit-for-tat plays itself it'll cooperate forever., but if it plays

against somebody who defects it's going to intuitively kind of punish that defection

by defecting itself in the next round. But it's very forgiving, it's going to go back

to cooperation after one punishment. In contrast, the trigger strategy is kind of

really a mean-spirited version of tit-for-tat. So, it starts out by

cooperating, but if its opponent ever defects, that's it. It's just going to

defect forever. So it's going to pull a trigger and say., I'm, I'm never going to

forgive you. You've wronged me once. I'm going to wrong you back until the end of

time. That's why we call it trigger. So, you can see that, in both of these cases,

I've been able to describe to you how these strategies would respond to anything

from the infinite history that they would see kind of in algorithmic terms without

actually kind of writing out a strategy kind of in a formal language. So, you can

see that it is possible to think kind of coherently about interesting strategies in

infinitely repeated games. Now, of course, what we would really like to do as game

theorists is describe the Nash equilibria of infinitely repeated games. The kind of

approach that we've taken in the past, has been to first of all show how we can make

an induced normal format of a game. We, we figure out what a strategy is in the game,

we show how to make the induced normal form. And then, we just appeal to Nash's

theorem, and say, because we've got a normal form game, we know that there is a

Nash equilibrium, and so, everything kind of goes through the way it did before.

Well, that worked well for us in the past, because we always ended up with finite

sized induced normal forms. And unfortunately, because we have an infinite

number of pure-strategies, we're going to get an infinity by infinity matrix, even

in the two player case. And so, we're not going to have something to which Nash's

theorem applies anymore, because now we have an infinite-sized game and Nash's

theorem only works for finite games, which means games whose matrices are finite and

contain a finite number of numbers. And that means we, we don't have any reason

based on what we know so far, even to be sure that equilibria do exists in these

games. on the other hand, because there are an infinite number of strategies,

there might even be an infinite number of pure-strategy equilibria. So we've seen

the, the fact that in the past it's possible to have an infinite number of

mixed strategy equilibria. For example if I, in a normal forum game, have two

strategies. two pure strategies that I'm indifferent between, then any mixture

between them can also be a best response for me. So that there's a sense in which I

can have a infinite number of mixed strategy equilibria. But here, I can have

an infinite number of qualitatively different pure-strategy equilibria

potentially, that seems like a problem. So, we're not going to be able to somehow

list off the equilibria, necessarily, of a repeated game. Interestingly, there is

still something that we can coherently do to give a satisfying answer about the Nash

equilibria of infinitely repeated games. And in this video, I'm going to tell you

what it is, and actually because, it's so satisfying I'm even going to prove this

theorem to you, so you really will understand how this one works.

And, here is the idea of the theorem that we will eventually prove, we can

characterize what payoffs can be achieved under equilibrium. And, we can give an

example of equilibrium strategies that result in those payoffs, without giving

every equilibrium that results in those payoffs. So, we're going to kind of

characterize which strategies are in equilibrium in infinitely repeateded

games, sort of via their payoffs. I'm going to be able to tell you precisely

which payoff vectors are achievable in equilibrium. So to do that, I need to give

you a little bit of notation so we can talk about these payoff vectors. So this

is kind of our slide of notation, once we get through this we've going to have all

the building blocks we need to prove our theorem. So, we're going to start with

some n player game so this is just our stage game, some normal form game. And

we're also going to start with some payoff vector here, and by this, what I mean is

the utilities that the players get in the game. Now, in this video I'm going to talk

about the average reward case, where each of these numbers actually encodes the

utility that I get on average by following my strategy in the infinitely repeated

game. And so, that's going to say, I care just in the same way about payoffs that I

get now and payoffs that I get a million iterations into the future. we can do

something similar in the discounted reward case and that's going to be a different

topic of a different video. But for this video, we're going to talk only about the

average reward case and the reason is the proof is a little bit easier to think

about in the average reward case. Even though the discounted reward case, reward

case is maybe a bit more of a practical setting. We can get across the, the key

ideas of both proofs by this proof and this one's a bit easier. So, so let's do

it to do that, I need to remind you of the concept of the minmax value here which is

something that we kind of introduced in the concept of zero games, but it turns

out to be very important for repeated games. So I'm going to remind you of what

it is here. So, this is the mathematical definition of the minmax value, but let,

let me start by saying in words what it means, because the, the nested min and max

operators are a little bit hard to think about. Essentially what, what i's minmax

value is, is it's the amount of utility that player i is able to obtain for

himself. If all of the other players who we call minus i are completely unconcerned

about their own utility, and instead, they're just trying to hurt i as much as

possible. So, they all play that strategy from their joint mixed strategy space,

that, if i responds to that by doing the best thing that i can for himself, then

i's utility is as low as possible. So they're trying to minimize i's utility,

given that i is trying to maximize in response to their minimization and the

number that comes out at the end is the amount of utility that i can get by doing

as, as best he can against everybody trying to hurt him as much as possible. So

that's i's minmax value. So, intuitively, if i want to punish you as much as

possible in a game, and you know that I'm trying to punish you, that's, the, the

amount of utility that you get is your minmax value.

So I will say that a payoff profile, remember, a payoff profile is a payoff

amount for everybody, is enforceable, I'll give it the name enforceable if it's the

case that everybody's payoff in that payoff profile is at least their minmax

value. So if anybody is getting less than their minmax value in a given payoff

profile, I will say that that's an unenforceable payoff profile. And I will

say that a payoff profile is feasible, I'll use, I'll use this technical term

feasible if the following condition is true. intuitively, what I, what I want to

say about feasibility is that it's possible to actually get that payoff

profile by combining together payoffs in, in the actual game that's being played.

Notice that enforcibility doesn't actually require that it's possible. You know, I

could say the payoff $1 million for me, a million for you in prisoner's dilemma is

an enforceable payoff profile, because it's above both of our min max values, but

there's no way we can actually each get a million in prisoner's dilemma, because the

numbers don't go that high. So feasibility is going to talk about what it's actually

possible to do. And the way we're going to say this is I'm going to restrict myself

to rational numbers, I'm going to say, if there exists rational and non-negative

values alpha a, such that for all players, I could express player i's element from

this payoff vector as a sum overall of the payoff profile, action profiles in the

normal form game of this alpha a times i's utility for that a. So, intuitively oh,

and then, of course, I need this let me just say this last part of the condition,

that these alphas all sum to one, across all of the action profiles. So let me kind

of explain to you what this means. So, oops, so I have kind of a normal form game

and what I want to say is I have some weight on each cell in the game. These are

the alphpa a's and they all sum to one. And what I want to do is take my payoff

from this cell weighted by the alpha for that cell, my payoff for this next cell

weighted by my alpha for that cell, and sum all of those, those weighted utilities

for me up and then I get ri. So I, I, what I want to say is there exist alphas that I

could use that would blend together the payoffs in the actual game in such a way

that I get ri. And furthermore, that has to simultaneously be true for everybody

else, and in particular, it has to be true with the same alpha a's for everybody

else. So I can come up with some alphas that may, that get me my ri, but then I

have to use those same alphas to get rj for player j. I have to have the same

weights on all of the, the cells here so that I, I get the same numbers for

everybody. So, so let's think about the following game, which will give you an

examp le of how feasibility works. So, let's say I have a game that looks like

this. So in this game, I claim the payoff profile 1,1, is enforcable, and the reason

is I can put a weight of 50% on this cell, a weight of 50 percent in this cell, and

weights of zero on these cells. And you can see my payoff in this game is 50%

times 2 plus 0 times 0 plus 0 plus times 0 plus 50% times 0, which is 1. The other

players' payoff in this game is 50% times 0 plus 0 times 0 plus 0 times 0 plus 50%

times 2, which is also 1. So that is a feasible payoff in this game. On the other

hand, the payoff 2,2 is not feasible in this game, and the reason is, there's no

way that I can have weights on these cells that sum up to one and that give both of

us two, because for me to get a payoff of two, this would have to be one. That's the

only cell where I get a payoff at all. And for my opponent to get two, this one would

have to be one, and then they would both sum up to more than, than one. This

condition over here would be violated and so we just can't do it, so that's not a

feasible payout. ,, , Okay. Nol, we're ready to prove the folk theorem. So the,

and it's kind of funny it's called the folk theorem. It's kind of like a folk

song for mathematicians. So a folk song is a song that everybody knows, but nobody

really knows who wrote it first. And a folk theorem is a theorem that people all

kind of knew and talked about before they got around to writing it down and they're

not really sure quite where it came from. So, this is the folk theorem of game

theory, and despite having uncertain origins, it's, it's very important. So in

the folk theorem basically tells us what the Nash equilibria are of an infinitely

repeated game in terms of these payoffs and in terms of the definitions that I

just told you. So there are different folk theorems for different settings. This is

the folk theorem for infinitely repeated games with average rewards. So, the folk

theorem has two parts which basically stems from the fact that I've made a

restriction here to rational alphas. Turns out we don't actually need that, that the

math gets more annoying if we have to deal with real values. So, to keep things

simple, I'm doing this just for rational numbers. So, the folk theorem says that in

any n-player game, which we're going to repeat infinitely, and, for any payoff

15:48

vector r1 to rn just like we've been talking about. We're, it, first of all

we're, we're going to talk about the case where if r is the payoff in a Nash

equilibriuum of the infinitely repeated game with average rewards. What we can

conclude about the payoff vector? And what we conclude is that for every player, r

has to be enforceable. Remember, enforceable means greater than or equal to

that player's minmax value. So I can conclude that if r is the payoff in an

equilibrium of the infinitely repeated game with average rewards, then for

everybody that r must be enforceable. That seems like kind of a weak thing to say.

The second part of the folk theorem is kind of more surprising. It says,

basically, that's the only restriction we need if r is enforceable, and furthermore,

if it's feasible. If it's kind of possible to make it out of the payoffs of the game,

then, r is the payoff in some Nash equilibrium. So, together, what this says

is that, basically, feasibility and enforceability is the whole story of Nash

equilibrium. Enforceability seems like this very small thing. It says you're

getting no less than the payoff that you get if everyone punishes you as much as

it's possible for them to punish you. And, feasibility says it's kind of possible to

get these payoffs in the game at all. And the folk theorem says. That's basically

it, as long as you meet those two conditions, you've got a Nash equilibrium.

That's, that's basically what there is to say about Nash equilibrium. You'll notice

there's, there's a little bit of wiggle room between these two statements these

two parts of the theorem statement which has to do with the fact that we're talking

about feasibil ity here and we're not talking about it here. So we can't

conclude that every Nash equilibrium is feasible. You might wonder why that is,

that's just because some of them aren't expressible as rationals. Some of them,

the alphas wouldn't be rational. So that's the part that I'm kind of leaving out of

this proof, but that, that's a technical point. So the broad thing that you should

remember about the folk theorem is that, basically, feasibility and enforceability

together are equivalent to Nash equilibrium in repeated games. So if, if

you wanted to stop here, you now know what the folk theorem is, but I encourage you

to keep watching the rest of the video, and you'll see how we actually prove this

there. So.

Let's begin by saying proving the first part. So saying why is it that if the

payoff is achievable in a Nash equilibrium that I can conclude that it must be

enforceable? Well to prove this, I'm going to prove by contradiction. So I'm going to

begin by supposing the payoff vector is not enforceable. And if not, then that

means there must be some player i for whom his payoff, ri, is strictly less than his

min max value. Now, let me consider a situation in which this player I changes

his strategy. Let's imagine that he, instead, deviates to playing bi of s minus

ih. I'll tell you in a second what that means. any time he's in a history h of the

repeated game. So, what is this thing? It says, this is the strategy of the other

players. Let's assume that he gets to know what that is, because we're talking about

a Nash equilibrium. And, and we'll say, bi is just his best response, every time the

other players are playing their strategy profile s minus i given h for every

history h So, let's just consider a case where he just best responds to what the

other players are doing. Well, by the definition of a minmax strategy, you have

to get a payoff of at least your minmix value in every stage game if you follow

the strategy that we just defined. Because, remember, the definition of a

minmax v alue is that you get is, is that everyone is trying to hurt you as much as

possible, and then you're best responding to that. So intuitively, if I'm already

getting less than my minmax valu, that means I can't be best responding to

everybody else, because if they're trying to hurt me as much as possible, which is

the worst case for me, and I'm best responding to them, I will get my minmax

value. So intuitively I could change my strategy to best responding and that has

to improve things for me, that would have to bring me up to by minmax value. And

because that would improve my utility, that's a better response for me that what

I was doing before. And because so basically, here we've just kind of

constructed a profitable deviation for player i. And because a profitable

deviation exists for him, that means the strategy he was playing before couldn't

possibly have been a Nash equilbrium. And so, that leads us to, to derive a

contradiction from our that we made at the beginning that r was not enforceable. So

you can see we can conclude that if r is not enforceable, we, we can't have a Nash

equilibrium and, and that then proves what we want it to prove, that being in an Nash

equilibrium implies enforceability. So that's part one, that's kind of the easy

direction and the less interesting direction. Let's now do the, the second

and more interesting part, which says all I need to assume is that the payoff

profile is both feasible and enforceable and that means that I've found a Nash

equilibrium. And what's interesting about this part is that we're going to do it by

construction. So I'm going to show you how If you're given any feasible and enforf,

and enforceable payoff profile, you can build a set of strategies for both players

which are in equilibrium with respect to each other, and which obtain exactly that

average payoff for both players. So, first of all, let's just kind of introduce some

bookkeeping notation that we'll use here. So, since r is feasible, and since we've

assumed that these alphas a re rational, then we can write each r-i as follows. So,

basically we can make up new variables, beta a and gamma, where we've replaced

each alpha a by beta a divided by gamma, and, basically, that, that's not hard to

do. Because we know that the alphas are rational. So, we know that we, it's

possible to write each of these alphas as a fraction. and then we can make gamma

into their common denominator, rnd we can set the betas appropriately. And then,

it's possible to come up with betas and gammas that make this expression true. So,

so, that shouldn't be too surprising, that just follows from, from the fact that

these alphas are rational and from feasibility. Now, I'm going to construct a

strategy profile, and then, on the next, slide I'm going to argue to you that, that

strategy profile is in equilibrium or that, that we can turn it into

equilibrium. So, let's make a strategy profile that cycles through all of the

outcomes using cycles of length gamma. And the way it's going to work is that each,

each cycle is going to repeat action a exactly beta a times. So we're going to

have our kind of game matrix here. And, remember, beta and gamma are both

integers, so, so for example, let's say here we have, let's say we have gamma is

seven and say here we have beta as two. So here, alpha a is 2 over 7, let's say here

it's zero. over 7, 0 over 7, I'll just write the betas from this point on.

0,1,2,0,2,0. So, so let's say that's, that's what we

had in this particular game, then what the strategy would do is it's just going to

cycle through. So player 1's strategy would be to play,

let's, let's give these names, A, B, C, D, E, F. So player one's strategy would be to

play A, A, B, B, B, C, C, and then go back, and keep doing that forever. And

player 2's strategy would be to play D, D., E, F, F, E, E, and then go back. And

so, if the 2 players played those two sets of strategies together in a coordinated

way forever, they would cycle through exactly hitting every action profile in

this game accord ing to the betas in, in just the way that the betas say. And, and

let's denote such a sequence a to the t. Now let me make strategy the, the real

strategy is for the players that I'm going to claim are in equilibrium here. Let's

define a strategy si of player i to be a trigger version of this strategy. So, if

nobody deviates, then si tells the player to play just what at would tell them to

play. So, the players are going to begin by kind of cycling through these outcomes

in just the way that I told you. But, if one player notices that the other player

didn't do what they were supposed to do according to at, then they're going to hit

the trigger and instead from that point on they're going to play the minmax value

against the other player. So if, and sorry that's, that's just what this says here So

if there's ever a period in which somebody does the wrong thing, then everybody is

going to gang up on that person forever, and play the minmax p-, p-, play the

strategy against that person that causes that person to get their minmax value. So.

So, so let me just say that one more time to make sure we've all got it. So, sir 8t

is a sequence where, which is constructed so as to hit every action profile here

exact, exactly the number of times given by these beta integers and we'd cycle

through that forever. And the strategy we're interested in is a trigger version

of that, that says everyone tries to do that, but if somebody does the wrong thing

Everybody punishes them forever and gives them their minmax value. Okay.

So, I want to claim that this is a Nash equilibrium of the infinitely repeated

game with average rewards. So first, notice that if everyone does play

according to this strategy, then everyone will get an average payoff of ri just as

we wanted. now you might be thrown off by the fact that, that, that sometimes

they're not quite getting r-i because, they're only half-way through the

sequence. But, we're only interested in the limit. And, so, if you look at

averages over periods of time of length gamma. So, you look at what happens after

gamma, what happens after two gamma, what happens after three gamma. You, you'll be

able to convince yourself that they really do get the right payoffs. Because at,

indeed after every period of length gamma, they get exactly the payoff ri by

construction. so, so indeed this does lead to the payoff they were trying to get.

What, what remains to show is that this is a Nash equilibrium that nobody can gain by

deviating. And indeed I claim that it is. So, to show that let's, let's imagine that

everybody else is playing according to this strategy and some player J deviates

at some point. Well, if that happens, then for all time after that point, player j is

going to get his minmax payoff, and, that's going to render the deviation

unprofitable. Because we've assumed that this payoff profile is feasible and

enforceable and that means that he was already getting at least his minmax value.

And because this is going to happen for all time, right? This is going to happen

forever afterwards, that's just going to end up dominating the average. Everything

that's happened for that finite amount of time beforehand is going to be washed out

of the average. And instead, his average payoff is going to turn out to be his, his

min max payoff. And by our, But by the, the thing we're trying to prove here, we

know that that's less than or equal to rj. And so, there's no reason that he can gain

by such a deviation. And, that suffices to prove the folk theorem. So, so what we've

seen now is that essentially, feasibility and enforceability. correspond to, the

payoff profiles that are achievable in Nash equilibrium of an infinitely repeated

game with average reward.