[MUSIC].

Last video, we finished up an expression for the multivariate chain rule.

In this video, we're going to start by picking up

one more little detail which you might have already spotted.

And then follow this up by adding another link to our chain.

If we have a function, f(x), where x is a vector,

in which each term in x depends on t, which we can compactly write like this.

We also have a compact form of the multivariate chain rule to go with it.

What I hope you might have noticed last time is that our vector of partial

derivatives, df by dx, is just the same as a Jacobian vector,

which we saw last module.

Except that we wrote as a column instead of a raw vector.

So from our knowledge of linear algebra,

we can say that df by dx must be the transverse of the Jacobian of f.

The last thing to realize is that taking the dot product of two column vectors

is the same operation as multiplying a row vector by a column vector.

So finally, we can see that our old friend, the Jacobian, offers us perhaps

the most convenient representation of the multivariate chain rule.

Next, we're going to see that the chain rule still works for more than two links.

To start us off, we'll work through a really quick univariate example

where we're going to add in another function of separating f from t.

So we can say f(x) = 5x.

And we can have x(u) = 1- u.

And u(t) = t squared.

So we've got three functions, and we're separating f from t now by an extra step.

Of course, we can just sub in each step into each other and

find an expression for f as a function directly of t where we say, well,

it's going to be 5 of whatever x is and x is 1- whatever u is.

And u is t squared, so

this thing goes to 5- 5t squared.

And of course, we can now directly differentiate

this thing, and say df by dt= -10t.

Or we can apply a two-step chain rule and say the following.

So we can have df by dt =

df by dx times dx by du and

finally du by dt, okay?

Now subbing in for each of our terms, we just get,

well, this thing's going to equal df by dt is, so df by dx is just 5.

We're going to multiply that by dx by du, which is just -1.

And finally, du by dt is just going to be 2t.

So once again, we're going to recover the same answer,

which is- 10t.

So we can see that this approach works for chains of univariate functions.

And we could extend it out to as many intermediary functions between f and

t as we would like.

But what about the multivariate case?

Well, the chain rule does work here, too, but

we do just have to pay attention to a few extra details.

Let's start by considering the function f(x(u(t))),

again, where the function f takes the vector x as an input, but

this time x is a vector valued function, which also takes a vector u as its input.

As the last video, the both symbols indicate vectors.

So u is again a vector valued function and it takes this scalar t as its input.

Ultimately, we are still relating the scalar input t to a scalar output f.

But we're doing this via two intermediary vector valued functions, x and u.

If, once again, we'd like to know the derivative of f with respect to t,

we can essentially write the same expression as the univariate case,

except that now seven of our terms are in bold.

We've already seen differentiating the scale of the valued function f,

with respect to its input vector x, gives us the Jacobian row vector.

We've also seen that differentiating a vector valued function

u with respect to the scalar variable t gives us a column vector of derivatives.

But what about the middle term dx by du?