0:00

Welcome back. In the last session,

we have seen how to reshape a torch Tensor.

We saw that we are calling view method and this method would reshape our torch Tensor.

Today, we're going to discuss a very important topic.

This is the topic of Computational Graphs in PyTorch.

As we know in other deep learning frameworks,

we also deal with computational graphs like in Keras or in TensorFlow.

But, in these frameworks,

the computational graph are fixed,

so you create a model,

and as in Keras for example you define, for example,

if it is a neural network you define different layers, optimizer,

you define loss function,

number of epochs, the batch size,

and then you call compile.

If you call compile,

the run time will create a computational graph of

this model and this computational graph is fixed,

so you cannot change,

if you call model.fit it will execute the computational graph,

and during run time you cannot change it anymore, it's fixed.

And PyTorch has created something completely different.

The creators of PyTorch have decided that they

need a flexibility to change the computational graph at the runtime.

How did they accomplish this?

They have created a component which is called autograd.Variable.

This is the main building block of computational graph in PyTorch. How it works?

Let us see, let us execute this next cell.

So, first of all, we are creating autograd.Variable,

we are passing two parameters.

One parameter is actually data of this autograd.Variable.

This is a torch Tensor of size three,

and then we can pass the second argument which is requires_grad,

which is the meaning requires gradient, true or false.

It means following, it will say requires_grad true,

we are saying that this autograd.Variable should track how it was created.

This is very important.

So, we can print out here the data,

which is inside of this autograd.Variable.

This is simply the tensor of size three; one, two, three.

Then we are creating the next autograd.Variable,

which is Y. autograd.Variable, y the same.

We are passing this argument, data,

this is the first parameter,

then we are passing the second parameter,

requires gradient true or false, we're passing true.

And then we are adding up x and y,

and here we are printing the data in the component z,

which is the sum of x and y.

So, this is the sum,

so we just have summed up element wise two vectors.

But here, what is very interesting here is that we can also see how z was created.

If we write z.grad, function grad_fn,

and print out this operation,

we see that z was created by add operation.

It's very interesting. Let us see next example.

Here we are summing up the components of the z vector.

So, z vector we have created a sum of x and y,

this is five, seven, nine.

If we add up all those elements we are getting 21.

Nothing very interesting and special. But if we call in the next step,

if we call s grad function,

grad fn, we see that it was created as a sum.

So, the PyTorch knows exactly for every variable how it was created.

Here we are with the next example,

in here we have a function s. This function s is

the sum of two variable vectors of size three,

so we are summing up this vectors element wise.

So, we are summing the first element of vector x,

with the first element of vector y.

The second element with the second and third with the third.

So, now if we create partial differentiation,

so if we differentiate s

with respect to x_0, for example.

Due to the rules of partial differentiation,

we would deal with other variables other than x,

y as with the constants.

So, the partial differentiation for s,

for the function s, with respect to variable x_0 is one.

Let us see how it works with PyTorch.

So, first of all, let us recap that s was the sum.

This is actually the sum element wise.

And here we are calling backward,

and backward means, start back propagation from this point backwards.

So, it's starting back propagation,

so we have the function s which is the sum,

and then we say backwards,

and here we can pass the argument retain_graph true.

I will not go into details,

but it's optional actually.

So, here we round this in Excel what we see here is following.

So, first of all, we are printing vector x,

we see he has nothing special with just printing out this vector.

But, now if we print out the gradients,

not the gradient function as in a previous time,

but now we are we can print out the gradient.

Meaning gradient is the value of,

if we create differentiation,

if we differentiate the function at a special point,

we're getting also the value of this partial differentiation.

And here where it differentiates this function.

With respect to variable x,

x is consisting of three variables: x zero,

x one, x three.

So if we differentiate partially with respect to x zero,

we're getting one and then,

again with respect to x one,

we're getting one and with respect to x two, we're getting one.

We saw that this is very simple function.

If we create partial differentiation,

it would be one and all other variables would get zeros.

The level over here, we have one, one, one.

The same thing if we create differentiation,

partial differentiation with respect to y.

Because y is also, as x,

a simple plain variable here without any coefficient or exponentiations.

It's just one, one, one.

But what will happen if we call again <Backward>?

And then we call again: (x.grad) and (y.grad).

You see, it was updated.

So now, we have here two, two, two and

that was for x and for y the same, two, two, two.

What has happened? Let us call again <Backward> and again print out.

Now it's again changed.

It's three. The reason is,

that every time you call <Backward>,

the gradient property is accumulated.

This is a technology of PyTorch which is used for special,

there are some models where it's very convenient to use.

In this introductory session,

we will not use it but it's important to know that: every time if you call <Backward>,

the gradient will be accumulated.

Next important point is how to preserve the computational graph.

As you know, as you already know,

the autograd of variables are consisting of two components.

One is the data.

Another one is the grad fn, gradient function.

So with the data,

you can get out of the gradient.

Of the autograd variable,

you can get out the data.

And with grad fn,

you can get out of the function how this variable was created.

Let us execute next cell.

In this cell, we are creating two Tensors of size two.

And then we are summing up these Tensors.

So very simple.

So we're getting actually Tensor of size two by two, sorry.

So now let us execute next cell.

Here, we're creating autograd variable x, out of x,

and autograd variable y, out of y,

and saying both times (requires_gradient=True).

So now I can sum up these variables:

var_x + var_y and then we're printing out the gradient function,

how it was created.

Let us do this and we see z was created with add operation, <AddBackward>.

But now, what we do is the Following: we are extracting

the data out of this sum. Just the data.

And passing it to a new variable which is called var_z_data.

And then, we are creating actually new variable, new autograd.Variable,

and passing this new data variable in it and

then we are printing out and then we are trying to print out how it was created.

So we're actually printing out new_variable_z.gradient_function.

And then you see, none.

It's lost, because here,

if we are extracting the data.

We have extracted only data but not the gradient function.

And, yes, then the computational graph at this point is already broken so

the grad_fn was not passed and we

have retained only data but not the gradient function.

Now, if we tried to call <Backward> on this new autograd variable,

new_var_z which was created again with the data out of var_z,

we will get exception because there's nothing there.

Here we have a runtime error

that the element of variable does not require grad and does not have a gradient function.

This is the important point.

Does not have a gradient function.

Gradient function was lost and last but not least,

in this session, I would like to

briefly mention the CUDA functionality of the Torch Tensor.

As you probably remember,

the keras in Tensor flow,

automatically detect whether you have GPU acceleration on your machine or not.

And they will execute everything on the GPU,

if your GPU is available,

or on CPU if GPU is not available,

for PyTorch, you can very granually decide what to execute on GPU

and which Tensors you want to execute where, on GPU or on CPU.

Here you have a check.

And you can check torch.CUDA.is available().

And this check you can run every time if you want to decide where to execute the Tensor.

If you want to execute the tensor on CUDA,

and CUDA is available,

you just add CUDA function,.cuda().

And then it will run this Tensor on GPU.

If you don't want to run the Tensor on GPU,

you don't add this CUDA function at the end and it will run on CPU.

This is very, very flexible.

So in my opinion,

the main advantage of PyTorch is its huge flexibility.

Okay. I hope you enjoyed the session and next time we're going

to try something really practical.

We're going to build up a linear model with PyTorch.

See you then. Enjoy our sessions. Bye bye.