0:00

Okay, so I'm going to take you through a threshold theorem, for those people who

are interested in seeing a little more detail, exactly how these thresholds work.

And how some of these things are proven, just to give you a feel for

the type of proof that works in this setting.

And so the one that we're looking at is going to be

one of the more interesting theorems, coming out of Erdos and Renyi.

0:24

So what we're looking at is that a threshold function for

the connectedness of one of these.

Plus some of our GMP random networks is the point at which

the probability of forming a link is proportional to log(n)/n.

So if this then tells us that if p(n),

0:48

Is much larger than this, then so if p(n) is much larger than log(n)/n,

then we're going to have a connected network.

And if it's much smaller than this, then the network is not going to be connected,

and we're going to have different components in it.

1:08

Okay, so that's the theorem.

And what we're going to do is just sort of work through part of the proof which

basically carries the main intuitions of the proof.

And there's still some details that you have to work out, and

I'm not going to go through all of the details.

But this will give the essential bits of the proof.

And so one thing we can do is show that if p(n) compared to this threshold goes to 0,

so if that p is much smaller than this threshold,

then there's going to be isolated nodes with probability 1.

So if there's isolated nodes, then certainly the network can't be connected.

Whereas if p(n) compared to t(n) goes to infinity,

then there can't be any components with less than half the nodes.

So all components have to have more than half the nodes, and basically,

that means they have to be one giant component.

1:59

So the idea here is going to be to show that if you're much smaller than this,

you're going to get some isolated nodes.

And if you're much bigger than this,

you're not even going to get any isolated components.

And what I'll do is really concentrate on the fact that showing that there

are isolated nodes, or there aren't isolated nodes is at this threshold.

And the idea that showing that a small component can exist or

not exist is more or less just a small variation on this proof.

Okay, so first of all, just to remind you of some useful approximations of

the exponential functions, so in case you're calculus,

or your pre-calculus is a little bit rusty.

2:40

So definitions of the exponential function, two forms

that'll be very useful in this course in terms of understanding properties.

It's going to be one is it.

If we look at e of zx we can also think of that as just looking at the limit

of a process where we do 1 + x/n raised to the nth power.

So think of this as continuous compounding of interest.

It looks like e to the x.

So the limit is n becomes quite large if 1 + x to the n/n equals the exponent.

Another is a Taylor series approximation for the exponential function.

We can write the exponential function as the sum of x to the n/n factorial.

[COUGH] So that's another way of getting that, and

you can go through your Taylor series approximation if you like.

But that'll be a useful approximation.

And here what we're going to do is approximate probabilities of these events

happening.

And then using these approximations, we'll be able to get things back in

terms of exponentials, and then logarithmic functions.

Okay, so what we're going to show now is that

right at the point where you're expected number of connection looks like log(n).

So this is going to be the point where P is proportional to log(n)/n, right?

So the expected degree is just multiplying up by n minus 1, or roughly n.

This is the threshold above which we expect each node to have some links, and

below which we expect nodes to be isolated.

4:14

And in fact, this is the threshold above which we expect nodes to have many links.

And once each node has many links,

the chance of having disconnected components vanishes.

So basically the logic is we're going to show that this part of it,

and the rest is a fairly simple variation, okay?

So first thing we're going to do is we're going

to write the expected degree as some function r+log(n), okay?

So the expected degree is actually some r+log(n).

And what we're going to then do is now is do calculations for

the probability that some node is isolated.

Okay, so how is that going to depend on this expected degree?

So the probability that some link is not present.

So I look at a given node, and I ask what's the probability that it's isolated?

Well, it's the probability that it has no links.

What's the probability that it doesn't have one of its links?

Well, that's 1- p.

What's the probability that it has none of its links?

Well, that's the probability that none of them formed.

So 1-p times 1-p times 1-p times, raised to the n-1 power.

So this is the probability that some node is isolated, okay?

And so what we want to do is to then show that the probability, if p is small

enough, then you are going to have a high probability of these nodes being isolated.

And if p is large enough,

then you're not going to have any probability of this happening.

So the probability that you're isolated is 1- p to the n -1 power.

6:25

And now we're going to use our approximation that we went through before.

I just told you the approximation for an exponential function.

So recall that, if we have some x running n off to infinity,

1- x/n, over n to the n, approximates e to -x.

Okay, so and that's true if x/n is small, so vanishing.

And in particular,

what we're going to need is r + log(n) to be small relative to n-1.

Now what I'm going to do is just assume that that's going to be true.

And the compression of the proof here for people who are really interested,

if r + log(n) is large relative to n, then this is a monotone property.

If it holds when it's small relative to n,

it's even going to hold more easily when it's large, right?

We're just adding more links.

So that part of the proof is quite easy.

So the case that's really going to be tough is when it's small relative to n.

7:25

And so what we're going to do now is look at this.

So now we've got an approximation.

We're approximating this by e to the minus, right?

So we've got e to the minus r + log(n)

as an approximation for this.

So the probability that some node is isolated begins to look like this, right?

And now we've approximated that using our exponential function.

So it looks like e-r-log(n), bring the log(n) out,

and we end up with the probability that some node is isolated.

Looks like e-r/n.

Well remember, r was our how much our p differed, or

expected degree differed from login.

Okay, so now we've got a simple formula for

the probability that some node is isolated for large n.

8:24

So if each one is isolated with a probability e-r over n,

then the expected number of isolated nodes is just n times this,

since nodes are identical in this world.

And so the n's cancel out.

And we get the expected number of isolated nodes as e to the minus r.

9:02

then that tells that the expected number of isolated nodes becomes infinite, right?

So basically, what we're getting here is a function of r.

If it's either getting very largely positive,

meaning that the threshold is exceeded by p, or if it's getting very negative.

So that p is much smaller than the threshold, then we're either seeing

the expected number of isolated nodes going to 0, or going to infinity, okay?

And so basically, the expected number of nodes,

isolated nodes goes to 0 if r(n) tends to infinity,

and to infinity if r(n) tends to minus infinity.

So the expected number of isolated nodes tends to 0.

If that's tending to 0,

then the probability of having one has to tend to 0, right?

You can't have the expected number be 0, and

have the probability of having one be positive.

So significantly higher than 0.

So the probability of having 1 is going to have to tend to 0.

And so this tells us now that basically,

if we're sitting in a situation where r(n) goes to minus infinity,

then the expected number of tends to 0,

the probability of having one tends to 0.

So basically, if p is much bigger than this t(n)

being log(n) over n, then we've got a situation

where the expected number of isolated nodes goes to 0.

And the probability of having an isolated node goes to 0.

Vice-versa, if it's going to infinity,

then an extra step, play with the variance.

You can't get the expected number going to infinity without having

the probability of having at least one go to 1.

So it's impossible to have the expected number go to infinity,

and not have a probability going to 1 of actually having 1.

So that's an extra step that I'm not going to go through, but

it's a fairly easy one to do in terms of Chebyshev's inequality.

11:17

Okay, so this was just one example.

I don't expect you to digest all of this under first pass.

You can go through book in much more detail if you want to see

one of these proofs worked out in full detail.

But what I wanted to communicate to you through this derivation is

just the type of reasoning it goes through.

So what you do is you want a certain probability, you can get that probability

in terms of approximations based on exponential functions often.

That approximation then, now we know if for large n,

it becomes fairly accurate, and that gives us bounds on probability.

So we can say that the probability of something is either it has to be tending

to 1, or it has to be tending to 0,

based on what these functions have to be converging to as n becomes very large.

And those threshold functions

come from identifying where is it that these expressions either go to 1 or to 0?

And whether we're talking about isolated nodes, or we're talking about the first

link appearing, etc, it's going to be a different threshold.

And so the proofs tend to have the same kind of structure to them.

They're just going to differ in the particular expressions,

depending on whether we're talking about one type of threshold in one property, or

a different property in a different threshold.

But this type of analysis is what underlies a lot of random graph theory,

and a lot of the proofs behind the scenes in terms of establishing properties

of random graphs in large settings.