So let's talk about, why do we use the CNN, and what's the advantage, and what does the convolutional filters do? So you need to have very big pixels nowadays. You can see easily a few hundreds by a few 100, or even 1000 pixel resolution images. That means, if I have 1000 by 1000 pixel image, then I have 1 million pixels, inside of my image. That's pretty big. When you think about mnist exercise that you have done before, what we did is the mnist is very small data. Sorry to, I was started with 28, I kind of forgot. I think it's 28. 28 by 28 pixels, mnist image, and we had to stretch it to some long shape, one director. So 28 times 28 was something close to 700, something. Is it 84. By one vector. So I have this 784 pixels. And I have maybe, let's say I have 100 neurons in my first layer. And this is ANN with multi layer perceptron, let's say it's for, simplicity maybe 1000 pixels, and 100. But if I, so this is for 28 x 28 images, I mean 28 x 28 pixel image. But if I have this big image, now I suddenly have one million in my input image, input layer. So number of neurons in my input layer is one million. And let's say maybe, I have 100 neurons still in my first layer, which is not desirable in the design kind of heuristics, you want to have something similar orders of magnitude, or some smaller than that. But Let's say I picked 100 neurons to begin with, because I'm worried about my calculation capacity. Then because this is dense layer, you have one million connections only from your first neuron in your first layer. Okay, so if I have 100 neurons in my first layer, I suddenly have 10 to the 8, 100 million weight, only from my first layer, this is a lot. So there are a couple of problems if this happens. So let's say the first problem is that, way too many parameters. So what is the immediate problem? The calculation resource? Okay, maybe someone will tell me, come on you have a gpu, maybe calculating 100 million rates are not difficult, so you can do that. Okay, so let's say I can do that. But I'm wasting my wasting my resource also. What is going to happen is that it's going to overfit. So this is one problem. But another problem is that overfitting, and they are actually, the same thing. But if I say just genuinely then, these weights are actually not doing anything. So very redundant. Not helpful. So this 100 million weights are actually, not doing anything, but there just replicated for the things that I could do without. So that is actually, something to do with the translational invariance in image. So translational invariance in image is this. So if you have an image, you have a cat, And maybe there is another cat here, Maybe you have some cat here, Doing this, I think cats like to do this a lot of times. But anyway, so you have, if you had this type of model, then you will have to slice them, and then stretch it out. So there are translation invariance is kind of broken. That means your neural network thinks, this image is different from this image. So if you have this type of model, after you stretch it out, your input for this, and these are totally different, and it will require different weights for this. That's a waste. Instead, if you have the filters, maybe your filter number one detects the cat here. So it clicks, it clips a one if it finds the ear. One, and maybe you have another filter, say it finds the eye, then it clicks, finds eye. And another filter that finds the nose, something like that. So, with using just a few subtle filters, if that filter is trained to find certain primitive features in the image, they can be reused here. Here, doesn't matter whether the cat is on the top left corner, or in the middle, or on the bottom right corner, doesn't matter. The victor can be used multiple times. So that's more efficient. That means let's say this filter had some kind of 3 x 3 times some RGB times maybe 100 of such filters. Then I have, let's say, this is almost 10. So it's 3000. Roughly 3000 weight, with the 100 filters to find all these different features. I can solve this problem equally well, or even better a lot of times, actually. Verses if you have 100 million weight, it's going to surely overfit. Even though you have many, maybe 10,000 of, a few 1000 to 10,000 of such images to train. It will surely overfit, but this doesn't. So to recap the advantage of CNN convolutional neural network. The filters are the collection of wait, so that is called vulnerable filters that loans, that updated the values as we train. And these weights are shared, across the image. So it's not like one pixel to pixel weights, but it's more of the future finder. And that can be kind of move around the image, slide around the image, and then find the features. So this is called weight sharing. That actually, makes the computation efficiency very good. And it also take care of the translational invariance in the image. Because there are much less number of weights to optimize, it's less likely to overfit. Okay, so this is some example what the weight looks like, but we'll talk about it soon. Just wanted to mention the typical CNN structure looks like this. So there is a convolution layer that we talked about, and another stack of convolution layer operation we talked about. And there is something called the maxpool, or pooling, which is actually, subsimpling. And the reason why we have this kind of pooling time to time, and this structure repeats many times, and then we have classifier here. So this is typical CNN architecture. The reason why we have this pooling is that we want to kind of shrink down the feature, and size for two reasons. One is the computational efficiency, and the second is we want to see more global features. So we'll see what that means.