0:00

In the last video,

Â you saw the building blocks of a single convolutional layer of a ConvNet.

Â Now let's go through a concrete example of a deep convolutional neural network.

Â And this will give you some practice with the notation that we

Â should use toward the end of the last video as well.

Â 0:19

Let's say you have an image, and

Â you want to do image classification, or image recognition.

Â Where you want to take as input an image x, and decide, is this a cat or not,

Â 0 or 1.

Â So it's a neural classification problem, so

Â let's build an example of a ConvNet you could use for this task.

Â For the sake of this example, I'm going to use a fairly small image,

Â let's say this image is 39 by 39 by 3.

Â This choice just makes some of the numbers work out a bit better.

Â And so nh in layer 0 will be equal to nw,

Â height and width are equal to 39, and

Â the number of channels in layer 0 is equal to 3.

Â Let's say the first layer uses a set of 3 by 3 filters to detect features.

Â So f1 is Is equal to three, already f one,

Â is equal to three because we're using three by three filters.

Â And let's say we're using a, let's try the one and no padding,

Â so using a same convolution.

Â And let's say you have 10 filters.

Â Then, the activations in this next layer of

Â the neural network will be 37 by 37 by 10.

Â And this 10 comes from the fact that you use 10 filters,

Â and 37 comes from this formula, n + 2p- f/s + 1, right?

Â And I guess you have 39 + 0- 3 / 1 + 1,

Â that's equal to 37, so

Â that's why the output is 37 by 37.

Â It's a valid convolution, and that's the output size.

Â So in our notation,

Â you would have nH1 = nW1 = 37.

Â And nC1 = 10, so nC1 is equal to

Â the number of filters from the first layer.

Â So this becomes the dimension of the activation at the first layer.

Â 2:45

Let's say you now have another convolutional layer, and

Â let's say this time you use 5 by 5 filters.

Â So in our notation, f(2) at the next layer of the neural network is equal to 5.

Â And let's say use a stride of 2 this time, and

Â maybe you have no padding, and say, 20 filters.

Â 3:11

So then the output of this, Will be another volume,

Â this time it'll be 17 by 17 by 20.

Â Notice that because you're now using a stride of 2,

Â the dimension has shrunk much faster.

Â 37 by 37 has gone down in size by slightly more a factor of 2, to 17 by 17.

Â And because you're using 20 filters, the number of channels now is 20.

Â So at its activation,

Â a2 would be that dimension,

Â and so nH2 = nW2 = 17,

Â and nC2 = 20.

Â All right, let's apply one last convolutional layer, so

Â let's say that you use a 5 by 5 filter again, and again a stride of 2.

Â So if you do that, I'll skip the math,

Â you end up with a 7 by 7, and let's say you use 40 filters.

Â No padding, 40 filters, you end up with 7 by 7 by 40.

Â So now what you've done is taken your 39 by 39 by 3 image,

Â and computed your 7 by 7 by 40 features for this image.

Â And then finally, what's commonly done is, if you take this

Â 7 by 7 by 40, 7 times 7 times 40 is actually 1960.

Â And so we continuously take this volume, and

Â flatten it, or unroll it, into 1960 units.

Â Just flatten it out into a vector, and then feed this

Â to a logistically rationed unit, or soft max unit.

Â 5:10

Depending on whether you're trying to recognize cat or no cat, or

Â trying to recognize any one of k different objects.

Â And then just have this give the final predicted output for the neural network.

Â And so just be clear, this last step is just taking all of these numbers,

Â all 1960 numbers, and unrolling them into a very long vector.

Â So then you just have one long vector to feed into soft max,

Â until it's just a regression, in order to make a prediction for the final output.

Â 5:50

A lot of the work in designing a convolutional neural

Â net is selecting hyperparameters like these.

Â Deciding what's the filter size, what's the stride, what's the padding,

Â and how many filters you use.

Â And both later this week as well as next week, we'll give some suggestions and

Â some guidelines on how to make these choices.

Â 6:10

But for now, maybe one thing to take away from this is that, as you go deeper

Â in the neural network, typically you start out with larger images, 39 by 39.

Â And then the height and width will stay the same for awhile,

Â and gradually trend down as you go deeper in your networks.

Â It's gone from 39 to 37 to 17 to 40, excuse me, so

Â it's gone from 39 to 37 to 17 to 7.

Â Whereas the number of channels will generally increase,

Â it's gone from 3 to 10 to 20 to 40.

Â And you see this general trend in a lot of other convolutional neural

Â networks as well.

Â 6:55

But you've now seen your first example of a convolutional neural network,

Â or ConvNet for short.

Â So congratulations on that, and

Â it turns out that in a typical ConvNet, there are usually three types of layers.

Â One is the convolutional layer, and often we'll often donate that as a conv layer,

Â that's what we've been using in the previous network.

Â It turns out that there's two other common types of layers that you haven't seen yet,

Â but we'll talk about in the next couple videos.

Â One is called a pooling layer, often I'll call this pool, and

Â the last is a fully connected layer, called FC.

Â And although it's possible to design a pretty good

Â neural network using just convolutional layers.

Â Most neural networks architectures will also have a few pooling layers, and

Â a few fully connected layers.

Â 7:56

So we'll do that quickly in the next two videos, and then you have a sense of

Â all of the most common types of layers in a convolutional neural network.

Â And you'll be able to put together even more powerful networks

Â than the one we just saw.

Â So congrats again on seeing your first full convolutional neural network.

Â We'll also talk later in this week about how to train these networks.

Â But first, let's talk briefly about pooling and fully connected layers.

Â And then training these,

Â we'll be using back propagation, which you're already familiar with.

Â But in the next video,

Â I'll just quickly go over how to implement a pooling layer for your ConvNet.

Â