0:00

And welcome back, folks. So now we're going to talk a little bit

Â about learning. So in terms of our course we've done a

Â bunch on background and fundamentals. We looked at different models of network

Â formation. And now we've moved towards trying to

Â understand how network structure impacts different kinds of behaviors, and so

Â forth. We talked a bit about diffusion, and now

Â I want to focus in a little bit on learning.

Â And in terms of learning, we're going to basically look at two different models.

Â we'll look briefly at Bayesian learning, and then we'll look at, at what's known

Â as the DeGroot model. And you know, there's a whole variety of

Â models out there these days and different ways of, of modeling learning.

Â And, you know, it's going to depend on who observes whom, and what the network

Â structure looks like, and there's hybrid models out there.

Â What we're going to do is, is just look at these two to get some flavor of these

Â things. And the DeGroot model turns out to be a

Â very useful one. The Bayesian one has interesting insights

Â and interesting questions associated with it.

Â So we're going to start with the Bayesian learning, and we'll talk about repeated

Â actions where people get to observe what each other are doing.

Â So I'm deciding over time what I'm doing, and I can see what my neighbors are

Â doing. And there's going to be interaction

Â between us and, and in terms of what I learned from what my neighbors experience

Â and, and so forth. So we'll look at that a bit, and then

Â we'll move into the DeGroot model. And the DeGroot model is going to be one

Â with repeated communication, where people can, can keep talking to each other.

Â But it's going to be a very naive way of updating.

Â So what I'm going to do is, is essentially just keep taking weighted

Â averages of, of information that I get from my, from my friends.

Â And I'll form opinions by, by continuing to average things.

Â Even though I might end up hearing from things from I might end up over-weighting

Â or under-weighting. So it'll be a, a more naive model, where

Â I'm not fully rational in a, in a Bayesian sense.

Â 1:56

so Bayesian learning, I mean these people are, are probabilistically sophisticated.

Â you, you, you take into account information, you update a posterior using

Â Bayes rule, and then, and maximize some payoff based on that.

Â So the DeGroot model is going to much more naive and actually easier to work

Â with in, in many ways. And there's some experimental work these

Â days which you can find in, in some of the references, which compare, you know,

Â Bayesian models and learn, and DeGroot models and other models.

Â And you know find that, that it, humans are somewhat rational in what they're

Â doing, but they have limits to their rationality and, and they don't look like

Â they're necessarily full Bayesians. and some of these naive models,

Â alternative models, can be better at actually capturing human behavior.

Â Okay, so let's start with the Bayesian model as a, as a very useful benchmark

Â and an important point to consider. And the idea here is, you know, first of

Â all we can ask you know, will society converge?

Â So will it be that eventually everybody converges to doing the same thing or, or

Â having the same beliefs? will people learn and aggregate

Â information properly? So imagine that new a technology comes

Â out, and we're not sure whether this is a good technology or a bad technology, and

Â some people start playing with it and using it, and other people can see

Â whether others are, are enjoying it. will eventually, if it's a good

Â technology, and a better one than the old one, will it take over, or will it not?

Â Will, will people not necessarily learn, under what conditions might that happen?

Â so, we can ask information, you know, questions about whether or not people are

Â going to accurately aggregate information.

Â And so, I'll start with a model by Bala and Goyal from 1998.

Â And it's a very simple setting. A very natural one to, to analyze.

Â So there's a number of people in some network.

Â And we'll take this network to be a single component, so people are all

Â going to be able to, to path-wise be connected to each other.

Â 4:57

Whereas B is uncertain, and it pays two with probability p and zero with

Â probability one minus p. Okay?

Â And let's suppose, to make things simple, that people don't mind risks, so

Â basically they just care about the expected value.

Â So they know they can get one from, from choosing action A.

Â And action B is either going to payoff two, with probability p, or zero, with

Â probably one minus p. So basically, B is better if p is bigger

Â than a half. This is bigger, bigger than one then they

Â should choose B. But if p is less than a half, then they

Â should choose A. Alright, so very simple setting.

Â B's better if p's bigger than a half. A's better if, if p's less than a half.

Â But we don't know what p is. So it's a new technology, we're

Â uncertain. Maybe we have some prior information,

Â maybe we have some guess at what p is, but we don't know for sure.

Â So these individuals have to experiment a little bit with p with B, sorry, to find

Â out whether p is, is good or bad. Okay?

Â So, there're going to be choosing actions over time.

Â And the learning model is, is going to be as follows.

Â Each period, a person makes a choice between A or B, and each period you get a

Â payoff, okay? And so, if I choose A, I get a payoff of

Â one, for sure. And if I choose B, I'm going to get a

Â payoff of two with probability p, and zero with probability one minus p.

Â Okay, so I, I try this new technology. Maybe I'm a farmer.

Â I try a new thing, and I either get the higher payoff or the lower payoff with

Â probability p or one minus p, 'kay? So, each period I'm going to do that, and

Â what does the network do? The network is such that what I also get

Â to see is I get to see my neighbors choices, and their outcomes.

Â Okay, so if I'm a given individual and I have two friends I chose A, I got a

Â payoff of one. this individual chose B, they got to

Â payoff of zero. This person chose A, they got a payoff of

Â one. this person over here chose B and got a

Â payoff of two. So basically, what I learn, I learned I

Â chose A, they chose A, I get to see that this person chose B and got a payoff of

Â zero. This person chose B and got a payoff of

Â two. And I'm, everyday I'm going to get all

Â this information, and I'm going to store that information over time.

Â And over time, I'll begin to, to, to learn.

Â So if I get, you know, if I begin to see lots of people choosing B, and lots of

Â people getting two's, I'm going to think, well it's probably a good thing.

Â P's probably pretty high. If I see lots of people choosing B, and

Â lots of people getting zeros, then I'll downgrade my belief on p, and I, I would

Â be more likely to pick A. And what people are going to do in this

Â setting, is they're going to maximize their overall stream of expected

Â payments, right? So I'm going to get, you know, a dollar

Â today if I chose A. I get some random amount if I choose B.

Â And every day I'm, I'm making this, this choice.

Â And I have an expectation of what this thing looks like, conditional on what

Â I've seen up to a, a point in time. And so I'm getting some prior, some

Â payoff that, that pi sub i at time t, will be the payoff I get from following a

Â certain strategy of choosing As and Bs. And they'll be some delta less than one

Â greater than zero, and I'm going to maximize that sum of, of discounted

Â payments, okay? And let's suppose that p is unknown, so I

Â don't know it initially. It takes on some finite set of values.

Â So maybe it could be 0.1, 0.2, 0.3, etc. Right, so it has some finite set of

Â values that I'm trying to guess, whether B is a good thing.

Â Okay, that's the structure. So now let's talk about some of the

Â difficulties with this. What, what are the real challenges in

Â Bayesian learning? So first of all, let's, let's think of

Â the following. So let's, let's suppose that I, I know

Â what the network looks like. I'm person one, here's person two.

Â person two is connected to some other individuals, say three and four.

Â 9:35

But, it also tells me things about what they might be seeing from other

Â individuals, right. So, so over time they've been seeing

Â three and four are doing. And I don't get to see what three and

Â four did. But I know that two saw three and four.

Â And I know that three is influenced by five, and four is influenced by six, and

Â so forth, right? So there's some network out there of a

Â bunch of individuals. And let's suppose, for instance, this is

Â just a, is an example. Let's suppose that I've been choosing A

Â for awhile, I choose, I, I see that this person is choosing B for awhile.

Â And so, and let's I suppose I see them. You know, I see person two choose B, they

Â get a payoff of two. I see them choose B, they get a payoff of

Â two. B, they get a payoff of two, so I'm

Â thinking, wow, this is really great. and so then I switch to B, and I, I keep

Â seeing them get twos. And then suddenly, I see them switch to

Â A. What would that tell me?

Â Well, now I have to think, well why would they have switched to A?

Â It's probably because their own experience has been pretty good.

Â It must be that they see some bad experiences somewhere else, right?

Â So now I have to think, what are all the possible experiences they could have had?

Â Well, it could be that they saw three getting bad payoffs, or four getting bad

Â payoffs. Or maybe it's that they saw three switch

Â from B to A. Or, you know, so they saw both three and

Â four switch. So I have to, you know, in order to think

Â about this problem, it's a very complicated problem.

Â I have to think about, what are all the scenarios that could be considered in

Â terms of all the histories of As and Bs, and what everybody is seeing, and how

Â does that impact each person's decision. And what, what should they do in response

Â to that, 'kay? So the updating question here is actually

Â fairly complicated. So I can make all kinds of indirect

Â inferences just based on what somebody's strategy is.

Â Okay, that's one, one challenge. What's a second challenge here?

Â 11:38

A second challenge is that there also could be some interaction.

Â Let's suppose I start with with a belief that p is less than half, right,

Â If, if I was alone in the world, even if I believed that p was less than a half, I

Â know that I'm going to be wrong for a long time.

Â Its still worthwhile for me to try B a few times, just to see, to experiment,

Â and to see if p, if, if in fact maybe I'm wrong and maybe p is, is higher than a

Â half. So even if I start out with a prior, I,

Â it could be that I want to experiment. Right, it could be that I want to try B a

Â few times just to see what happens. Then once I've learned that could be very

Â valuable information, because if it pays off two a bunch of times in a row, I'm

Â going to want to take B. And that's going to give me payoffs for

Â the rest of my life. So trying something out can be very

Â worthwhile. And so, being fully rational, even if I

Â start with p less than a half, as long as I'm making choices over time, there's an

Â option value for trying this thing and and, and that's going to be positive.

Â And I might want to try that and, and experiment for a while, and see what

Â happens. Okay, so, so now there's an

Â experimentation that comes into play as well.

Â 12:54

Okay, well now let's suppose there's two of us, person one and person two.

Â I'd like the other person to experiment. If I think p's less than a half, why

Â don't I let them try it, and I'll sit by and just choose A?

Â And if, if they experiment and play with B for a while and it pays off well, then

Â I can switch to B. But I don't have to pay the cost of the

Â experimentation. I want a free ride.

Â 'Kay? Now that becomes a game, which is

Â actually going to have a fairly complicated equilibrium, especially when

Â you start putting that game in a network with all kinds of players.

Â And we begin to have the players connected to other players, and now we

Â look at this, this simultaneous decision of, who's going to choose B in this

Â period? Who's chose it in the last period?

Â What are our beliefs? What do I think everybody else's belief

Â is, and so forth. So when you, when you get to, to overall

Â looking at this game, the game becomes very complicated.

Â both because of the strategic aspects, and because of the, the, Bayesian

Â inference. And so, now you can begin to see why it

Â might be that in fact when we put humans in the laboratory and, and ask them to

Â play games, or to make these kinds of choices, that they might not behave in a

Â fully Bayesian manner. and its just, just complicated to do.

Â It's, it's hard to even write the model down and solve it.

Â Okay so in fact, the way that let me just say a little bit about how this is solved

Â in terms of the Bala and Goyal approach. So what they did is assume that players

Â are not going to be strategic about this, and each person is just going to choose

Â things which maximize their own payoff and, and not worry about the gaming

Â aspect of it. And secondly it, I'm not going to infer

Â things from the fact that other people are making different kinds of choices.

Â I'm just going to, to keep track of what have I seen in terms of my histories of

Â As and Bs, okay. So I just, I just keep track of my, what,

Â whatever I've seen through myself and my neighbors.

Â And I'll just keep track of what are the relevant payoffs, and important, most

Â importantly, how many times have I seen B payoff two, how many times have I seen it

Â payoff zero? And then I can update on what I think p

Â is, just based on, on those observations. And I'll ignore everything else, and I

Â won't do the complicated updating, I won't game things.

Â I'm just going to look at, at the twos and zeros, and decide whether or not I

Â want to switch from A to B or B to A. Okay, so let's look at that.

Â Okay, so what's a proposition you can prove then fairly directly?

Â the first thing you can show is, let's suppose p is not exactly a half, where

Â I'd being different between choosing an action.

Â Then with probability one, there's a unique time, or there'll be a time, a

Â random time, sorry. such that all agents in a given component

Â play just one action and play the same action from that time onward, okay.

Â So bascially what's, what's happening is that as long as p's not exactly a half,

Â and we would be sort of indifferent we're basically going to eventually all end up

Â choosing the same action. And we'll just lock in on some action

Â and, and play that forever after at some time, okay.

Â So, so, sometime we'll all eventually converge, and, and play the same action

Â forever. Okay, so that's the nature of the

Â proposition. So let's talk through the intuition and,

Â and basic proof behind this, why is this true.

Â and I'm just going to sort of sketch out the proof, it's, it's fairly easy to, to

Â fill in the details here. So let's suppose that, that this weren't

Â true, right? So if, if it wasn't true, then basically

Â somebody's going to be having to switch back and forth infinitely many times,

Â otherwise we'll eventually converge. So somebody's gotta be going back and

Â forth between A and B infinitely often. And then, in particular somebody let's

Â suppose we just have one component, and this just works, you know, regardless of

Â which component you're looking at. So, let's suppose somebody is playing B

Â infinitely often. Okay, so they, if we don't converge,

Â somebody's gotta be playing B infinitely often, okay.

Â Now we can use the law of large numbers. So law of large numbers is going to tell

Â us that if somebody plays B infinitely many times, then they're going to come to

Â get an arbitrarily accurate estimate of what p is.

Â So with the probability going to one in time, they will, their belief will

Â converge to p. And so, what does that mean?

Â Well, in order for them to keep playing B, if their belief about p is becoming

Â arbitrarily accurate, then it must be that p is converging to bigger than a

Â half, otherwise they would stop, right? So over time, they're good, they're good

Â Bayesians, they know how accurate their belief is.

Â They would either converge to p above half or not, or below half, because it's

Â not allowed to be exactly half under our assumption.

Â If it's above half, then they'll keep playing it.

Â If it's below half, then eventually they would stop playing it.

Â Because now they'd be arbitrarily accurately convinced that it's, that it's

Â not good. So if it's not good, they should learn

Â that, and they'd stop playing it. If it is good, they'll keep playing it.

Â Okay, so it must be that if they do play it infinitely often, then it's gotta be

Â the case that they're converging to the good belief.

Â Otherwise they would've stopped. Okay?

Â So, now this means that, that they have to be converging to the true belief or,

Â or with probability one that the, the true p has to be bigger than a half.

Â so then, everybody who sees this person is actually going to see this sequence

Â played. They're going to also see B played

Â infinitely often. They're also going to have to converge to

Â the belief that P is bigger than a half. And so they should all start playing B,

Â right? So if this person is, is learning that B

Â is, is good, then their neighbors are all going to have to converge.

Â Then these people are all going to see B infinitely often and converge to p.

Â Their neighbors are going to have to converge, and so forth.

Â So the neighbors of agent must play, then all agents must have to play B.

Â 19:13

So it just has to spread out, okay? So, if, if anybody's going to play B

Â infinitely often, then it's gotta be that it's a good thing, and you learn.

Â If not, then we've stopped, and everybody had to have played A.

Â So, either somebody plays B infinitely often, in which case, we converge to B.

Â Or they don't, in which case everybody's converging to A.

Â So that gives us a proof that basically we're going to get a convergence.

Â And we're going to converge to either all playing B or all playing A.

Â Well, does that mean that we always converge to the right action?

Â That if, if B is, if p's really bigger than half we converge to B, and if p is

Â really smaller than a half then we're going to converge to A?

Â Well, let's suppose that p is really bigger than a half.

Â Is p is bigger than a half, then B is the right thing to do.

Â We should really be playing B. so, then we will play the right thing if

Â we actually converge to that. But its possible, that we might not

Â converge to that. And how could that happen?

Â That could happen if we all start pessimistically enough.

Â And we just happen to get some bad draws on, on B, initially.

Â So it's possible that everybody gets some bad draws on B, stops playing B, and then

Â we never learn after that, right? So, so even when p is bigger than a half,

Â it's possible for us to converge to A. on the contrary, if A is the right

Â action, then we've gotta converge to the right action.

Â Because that means that now p is less than a half.

Â 21:05

if B is the right thing to do, we'll eventually converge, but we could all

Â stop playing B too soon, and we might end up just converging to B.

Â Okay, so, so, we will all converge to doing the same thing in this model, but

Â whether it's the right thing or not depends on whether B is the right thing

Â or A is the right thing. If A is the right thing, we'll definitely

Â converge to the right answer. If B is the right thing, we might or

Â might not, depending on what our prior distribution is, and whether we get good

Â luck in the initial draws. Okay you could enrich this model so that

Â you have different priors for different individuals.

Â You can actually start specifying a prior, so what's different people's

Â priors or prior beliefs. And then the probability of converging to

Â the correct action, of converging to B, for instance, when it's the right thing

Â to do, can be made arbitrarily high. Basically if, if, you know, if we add

Â many actions, as long as each action there has somebody who initially has a

Â very high prior that that action is the best one, then we'll get enough

Â experimentation in these different actions.

Â So that we'll learn about these different actions.

Â And society can, can learn arbitrarily accurately, as long as there's somebody

Â who's really willing to try out every technology.

Â And the case where we might fail to learn is going to be a situation where nobody's

Â convinced enough to begin with to, to give it a long enough try.

Â We might end up not learning about it. Okay, conclusions.

Â where did we end up in this, in this model?

Â we all end up choosing the same actions, so we reach a consensus.

Â it doesn't necessarily mean that we all end up with the same belief.

Â Because we're going to have different observations, so it might be that one of

Â us stops with a probability, you know our probabilities differ on whether B was

Â good or not. we, we might end up with different

Â beliefs, but we all believe that it's, we're all pessimistic enough to stop.

Â you can do speed of convergence kinds of results here.

Â You could, you know, go through and, and with depending on whether B is good or

Â bad, sort of do computations. There you have to actually explicitly

Â solve though for these optimal actions as a function of what the histories look

Â like. And there's a number of, of theorems in,

Â in studies of, of two-arm bandits and other things in probability theory, where

Â you can get rates of convergence on these things.

Â law of large numbers, especially in this kind of Bernoulli world, have well

Â established speeds of convergence, so you can calculate those kinds of things.

Â and, and relatively, you know, these things will happen relatively quickly in

Â terms of the number of observations giving good information here.

Â limitations, okay so there, there's a number of limitations in this kind of

Â model. one is that, that, you know, basically

Â everybody was getting the same payoffs from A or B.

Â And so, when you think about new technologies in the real world new

Â technologies might be right for some people and not right for other people.

Â And so, when you start putting in that heterogeneity, then it's much harder for

Â me to necessarily infer, you know, maybe my, my, my neighbor is getting a good

Â payoff from this, but will that be the same payoff I get?

Â That complicates things. And that heterogeneity means that the

Â learning might take a very different form than what it did in this model.

Â here we've got repeated actions over time, so everybody keeps taking all these

Â actions and trying all these different things.

Â There's a lot of things in which we're not trying things repeatedly over time

Â we're just learning about them slowly over time.

Â So with things like global warming, we're going to get one, one go at it.

Â And you know the, the, it's not as if we can just try infinitely often

Â experimenting with different things. And you know, if we get it wrong we've

Â got it wrong. so, so the, the repeated actions over

Â time, and getting feedback from that is, is a situation where we get lots of

Â information over time and, and incoming information.

Â It might be that information, we get it in, in slower clumps or different bits.

Â here this is a very stationary environment.

Â It could be that the environment changes, which make things even more complicated.

Â And finally and, and probably most importantly in this model we were not

Â really able to take the network into account.

Â the network really didn't play any role. All the arguments were just that, you

Â know, somebody would eventually learn, and the neighbors have to learn, and the

Â neighbors of the neighbors have to learn, and so forth, so it's a simple induction

Â argument. And we weren't able to say anything about

Â what was going on in terms of one network versus another.

Â Now, you could do simulations and see whether speeds are faster in one versus

Â another. but we can go to other models to try and

Â get a better feeling for exactly how the network works on that.

Â And that's what we'll, we'll do next when we move to the DeGroot model.

Â That'll bring in the network very explicitly and allow us to do a lot of

Â calculations quite easily. So that'll be our next subject.

Â We'll start looking at the DeGroot model where network structure's going to play a

Â much more prominient role in the learning process.

Â