In this video, we're going to look at the equilibria of infinitely repeated games.

So in order to talk about equilibria, we need to begin by asking what are we going

to mean by a pure strategy in an infinitely repeated game? And you may like

at this point to pause the video and see if you can answer that question before I

go on and tell you the answer. If you need a hint, you should remember that the rule

of thumb that we've used to define strategies as we've changed game

representations has been to say that a pure strategy should be whatever you need

to tell another person to play in the game on your behalf and to end up doing exactly

what you would have done. So a pure strategy has always kind of been the

policy that you would, you would follow. Making all of the decisions in, in the

same way as you would have actually done it. So whatever you would need to

communicate about that policy. Well, in an infinitely repeated game your strategy is

going to be a choice of action at every decision point that you would have. which

means an action that you would take every stage game. And bear in mind, that when

you take those actions, you actually get to reason about everything that you've

seen before in the game which is to say, you can remember all of your own previous

actions and you can also remember all of the actions in previous stage games by

other players. So, so really, your pure strategy space is mapping from every

history to an action choice that you would make. Clearly, this is an infinite number

of actions. So, unlike the previous games we've looked at, extensive form games and

normal form games, you're not even going to have a finite set of pure strategies in

an infinitely-repeated game. Let me give you some examples of some famous pure

strategies in infinitely-repeated games, nevertheless, to give you the sense that

we can say interesting things about pure strategies, even though the set of

possible pure strategies is. Is infinite. So think about, playing repeated

prisoner's dilemma, so , I'm going to play the prisoner's dilemma game infinitely.

One famous strategy is called tit-for-tat. Turns out that they were famous

competitions where people submitted programs that played a prisoner's dilemma

very repeatedly and then they looked at how these programs did against each other.

And, tit-for-tat was a stradegy that did famously well in those competitions. The

way it works is it starts out cooperating, and then if it observes that it's opponent

defect every defects in the previous round the opponent defected, then tit-for-tat

defects in the next round then it goes back to cooperation. So then, if it

defected but its opponent cooperated, then it'll respond to that by cooperating. So

if tit-for-tat plays itself it'll cooperate forever., but if it plays

against somebody who defects it's going to intuitively kind of punish that defection

by defecting itself in the next round. But it's very forgiving, it's going to go back

to cooperation after one punishment. In contrast, the trigger strategy is kind of

really a mean-spirited version of tit-for-tat. So, it starts out by

cooperating, but if its opponent ever defects, that's it. It's just going to

defect forever. So it's going to pull a trigger and say., I'm, I'm never going to

forgive you. You've wronged me once. I'm going to wrong you back until the end of

time. That's why we call it trigger. So, you can see that, in both of these cases,

I've been able to describe to you how these strategies would respond to anything

from the infinite history that they would see kind of in algorithmic terms without

actually kind of writing out a strategy kind of in a formal language. So, you can

see that it is possible to think kind of coherently about interesting strategies in

infinitely repeated games. Now, of course, what we would really like to do as game

theorists is describe the Nash equilibria of infinitely repeated games. The kind of

approach that we've taken in the past, has been to first of all show how we can make

an induced normal format of a game. We, we figure out what a strategy is in the game,

we show how to make the induced normal form. And then, we just appeal to Nash's

theorem, and say, because we've got a normal form game, we know that there is a

Nash equilibrium, and so, everything kind of goes through the way it did before.

Well, that worked well for us in the past, because we always ended up with finite

sized induced normal forms. And unfortunately, because we have an infinite

number of pure-strategies, we're going to get an infinity by infinity matrix, even

in the two player case. And so, we're not going to have something to which Nash's

theorem applies anymore, because now we have an infinite-sized game and Nash's

theorem only works for finite games, which means games whose matrices are finite and

contain a finite number of numbers. And that means we, we don't have any reason

based on what we know so far, even to be sure that equilibria do exists in these

games. on the other hand, because there are an infinite number of strategies,

there might even be an infinite number of pure-strategy equilibria. So we've seen

the, the fact that in the past it's possible to have an infinite number of

mixed strategy equilibria. For example if I, in a normal forum game, have two

strategies. two pure strategies that I'm indifferent between, then any mixture

between them can also be a best response for me. So that there's a sense in which I

can have a infinite number of mixed strategy equilibria. But here, I can have

an infinite number of qualitatively different pure-strategy equilibria

potentially, that seems like a problem. So, we're not going to be able to somehow

list off the equilibria, necessarily, of a repeated game. Interestingly, there is

still something that we can coherently do to give a satisfying answer about the Nash

equilibria of infinitely repeated games. And in this video, I'm going to tell you

what it is, and actually because, it's so satisfying I'm even going to prove this

theorem to you, so you really will understand how this one works.

And, here is the idea of the theorem that we will eventually prove, we can

characterize what payoffs can be achieved under equilibrium. And, we can give an

example of equilibrium strategies that result in those payoffs, without giving

every equilibrium that results in those payoffs. So, we're going to kind of

characterize which strategies are in equilibrium in infinitely repeateded

games, sort of via their payoffs. I'm going to be able to tell you precisely

which payoff vectors are achievable in equilibrium. So to do that, I need to give

you a little bit of notation so we can talk about these payoff vectors. So this

is kind of our slide of notation, once we get through this we've going to have all

the building blocks we need to prove our theorem. So, we're going to start with

some n player game so this is just our stage game, some normal form game. And

we're also going to start with some payoff vector here, and by this, what I mean is

the utilities that the players get in the game. Now, in this video I'm going to talk

about the average reward case, where each of these numbers actually encodes the

utility that I get on average by following my strategy in the infinitely repeated

game. And so, that's going to say, I care just in the same way about payoffs that I

get now and payoffs that I get a million iterations into the future. we can do

something similar in the discounted reward case and that's going to be a different

topic of a different video. But for this video, we're going to talk only about the

average reward case and the reason is the proof is a little bit easier to think

about in the average reward case. Even though the discounted reward case, reward

case is maybe a bit more of a practical setting. We can get across the, the key

ideas of both proofs by this proof and this one's a bit easier. So, so let's do

it to do that, I need to remind you of the concept of the minmax value here which is

something that we kind of introduced in the concept of zero games, but it turns

out to be very important for repeated games. So I'm going to remind you of what

it is here. So, this is the mathematical definition of the minmax value, but let,

let me start by saying in words what it means, because the, the nested min and max

operators are a little bit hard to think about. Essentially what, what i's minmax

value is, is it's the amount of utility that player i is able to obtain for

himself. If all of the other players who we call minus i are completely unconcerned

about their own utility, and instead, they're just trying to hurt i as much as

possible. So, they all play that strategy from their joint mixed strategy space,

that, if i responds to that by doing the best thing that i can for himself, then

i's utility is as low as possible. So they're trying to minimize i's utility,

given that i is trying to maximize in response to their minimization and the

number that comes out at the end is the amount of utility that i can get by doing

as, as best he can against everybody trying to hurt him as much as possible. So

that's i's minmax value. So, intuitively, if i want to punish you as much as

possible in a game, and you know that I'm trying to punish you, that's, the, the

amount of utility that you get is your minmax value.

So I will say that a payoff profile, remember, a payoff profile is a payoff

amount for everybody, is enforceable, I'll give it the name enforceable if it's the

case that everybody's payoff in that payoff profile is at least their minmax

value. So if anybody is getting less than their minmax value in a given payoff

profile, I will say that that's an unenforceable payoff profile. And I will

say that a payoff profile is feasible, I'll use, I'll use this technical term

feasible if the following condition is true. intuitively, what I, what I want to

say about feasibility is that it's possible to actually get that payoff

profile by combining together payoffs in, in the actual game that's being played.

Notice that enforcibility doesn't actually require that it's possible. You know, I

could say the payoff $1 million for me, a million for you in prisoner's dilemma is

an enforceable payoff profile, because it's above both of our min max values, but

there's no way we can actually each get a million in prisoner's dilemma, because the

numbers don't go that high. So feasibility is going to talk about what it's actually

possible to do. And the way we're going to say this is I'm going to restrict myself

to rational numbers, I'm going to say, if there exists rational and non-negative

values alpha a, such that for all players, I could express player i's element from

this payoff vector as a sum overall of the payoff profile, action profiles in the

normal form game of this alpha a times i's utility for that a. So, intuitively oh,

and then, of course, I need this let me just say this last part of the condition,

that these alphas all sum to one, across all of the action profiles. So let me kind

of explain to you what this means. So, oops, so I have kind of a normal form game

and what I want to say is I have some weight on each cell in the game. These are

the alphpa a's and they all sum to one. And what I want to do is take my payoff

from this cell weighted by the alpha for that cell, my payoff for this next cell

weighted by my alpha for that cell, and sum all of those, those weighted utilities

for me up and then I get ri. So I, I, what I want to say is there exist alphas that I

could use that would blend together the payoffs in the actual game in such a way

that I get ri. And furthermore, that has to simultaneously be true for everybody

else, and in particular, it has to be true with the same alpha a's for everybody

else. So I can come up with some alphas that may, that get me my ri, but then I

have to use those same alphas to get rj for player j. I have to have the same

weights on all of the, the cells here so that I, I get the same numbers for

everybody. So, so let's think about the following game, which will give you an

examp le of how feasibility works. So, let's say I have a game that looks like

this. So in this game, I claim the payoff profile 1,1, is enforcable, and the reason

is I can put a weight of 50% on this cell, a weight of 50 percent in this cell, and

weights of zero on these cells. And you can see my payoff in this game is 50%

times 2 plus 0 times 0 plus 0 plus times 0 plus 50% times 0, which is 1. The other

players' payoff in this game is 50% times 0 plus 0 times 0 plus 0 times 0 plus 50%

times 2, which is also 1. So that is a feasible payoff in this game. On the other

hand, the payoff 2,2 is not feasible in this game, and the reason is, there's no

way that I can have weights on these cells that sum up to one and that give both of

us two, because for me to get a payoff of two, this would have to be one. That's the

only cell where I get a payoff at all. And for my opponent to get two, this one would

have to be one, and then they would both sum up to more than, than one. This

condition over here would be violated and so we just can't do it, so that's not a

feasible payout. ,, , Okay. Nol, we're ready to prove the folk theorem. So the,

and it's kind of funny it's called the folk theorem. It's kind of like a folk

song for mathematicians. So a folk song is a song that everybody knows, but nobody

really knows who wrote it first. And a folk theorem is a theorem that people all

kind of knew and talked about before they got around to writing it down and they're

not really sure quite where it came from. So, this is the folk theorem of game

theory, and despite having uncertain origins, it's, it's very important. So in

the folk theorem basically tells us what the Nash equilibria are of an infinitely

repeated game in terms of these payoffs and in terms of the definitions that I

just told you. So there are different folk theorems for different settings. This is

the folk theorem for infinitely repeated games with average rewards. So, the folk

theorem has two parts which basically stems from the fact that I've made a

restriction here to rational alphas. Turns out we don't actually need that, that the

math gets more annoying if we have to deal with real values. So, to keep things

simple, I'm doing this just for rational numbers. So, the folk theorem says that in

any n-player game, which we're going to repeat infinitely, and, for any payoff