Let's talk about images. Image is nothing but the multidimensional matrices. When you think about image you know there is a width on the image and height of the image but if you have not great scale but if you have a color image you have three color channels RGB. Your image can have depth as well. This is depth or sometimes we call channel and this depth and channel correspond to each color channel. This is a one-pixel, one pixel is one width and one height and this color pixel has three values. The depth is still there. Usually, the eight-bit color scheme goes from values for each matrix element value from 0-255, 256 colors scale. In the image there are something called the filters, if you are familiar with the image processing maybe you use Photoshop and you do some cool tricks to get some interesting effects such as a blurring, you can blur the image using filters. You can get the lines out roughly using some filter called the Mexican Hat filters. What are the filters? Filters are a small patch of matrices. If the image is some size and your filters are typically smaller than that, a lot smaller usually and then you have some values inside. For example, Gaussian blurring has a filter value like pepsin weights inside may be, for example, 1.5.5.5.5 and then I may have that there has to be square by the way it is.1.1.3.3 something like that. You can have the profile looks Gaussian. This is some intensity of the weight and this is smaller and goes to nearly zero at the end and if you apply that small filter here and then you can slide this filter through the image and then you do some calculation just wait with this await numbers with the Gaussian profile here then you're going get some blurred image. Here the Sigma you see means that how abroad. You can have some more wide Gaussian versus the sharp Gaussian. If you have a sharp Gaussian you will have still sharper blurred image and if you have a really broad Gaussian, that smooths out everything in the far range so they actually become very blurry. You can check it these different Sigma values for that red color. Mexican Hat filtering they look like this, so certain things go under [inaudible] value and it can access some thresholding and give you some lines extracted out of the image. These are the concepts of filters. Small matrices have some weight values inside and the filters just scan through the image and then it outputs some output and the output is usually called the feature map. This is a feature map or output after this blurring. This feature map is typically used by the deep learning community that means just output of certain filtering operations on the image. Let's talk about convolution operation in images. Convolution, we are interested in 2D usually called two-dimensional convolution in the image. What does the 2D convolution do? Again we have image, we have a filter and filter has some weight. It doesn't have to be zero or one but just for calculation and convenience. I have a one or zero here. What convolution is this? If you have two matrices like this and if you had that product, what you get is a times 1 plus b times 4 plus c times 7 plus, things like that. You do this and then this plus everything. That's the dot product, inner product of two matrices and that's one example of some matrix operation dot-product. However, convolution is something similar but not quite same. It goes like this pattern, a times 7, b times 8, c times 9. Actually, no. It goes point symmetry a to 9, b to 8, c to 7. From this, you will get something like this. According to this formula that's the definition of convolution 2D with the three-by-three filter. Again this is image, this is filtered three-by-three. With that in mind, this shows some animation on how the feature map after this convolution gets calculated. First, start it again. It started here with using this filter the overlapping and the result of this calculation was four and it's slide one to the right and then it calculates that value using this operation and the result was three. It fills three here and it slides again to the right and then it calculates again and it gets four and then it slides to the next site like this and then it calculates 2 and so on. What this convolution do or any filtering on image do is that if there is a filter there's overlap, calculate whatever row they have as a definition and then it slides through the image and creates the feature map as an output. Let's talk about what does the feature map dimension looks like after a convolution layer is applied? Convolutional layer is this. This pink block which has a three-dimension is our filter. It has three-by-three width and height, and it also have the depth dimension D. Filter size is this but normally when we talk about 2D convolution because it doesn't do any convolution over the dimension, the z-direction here. I can call it X Y Z then it doesn't do any convolution over the Z-axis. We typically talk about a three-by-three filter but in reality, when you look at the code or look at algorithm, the filter has to have the other dimension about the depth. What does it do? Here it slides to the right. This original light blue collection of cubes. These cubes are the image pixels, each image pixel per each channel and after the convolution, the feeling value, the feature map is shaded here. It's a lot of clicks. I scan it through the image and as you can see actually this convolution because it fills the value in the mirror pixel here and leave out the perimeter actually it loses the outmost pixels or outmost cubes as you can see. It loses that if you don't have padding or special treatment. I scan through all the images like this and then now I have one layer, one sheet of feature map if I had just only one filter. This one slice of this object is actually from one filter. If we had a four filters here, this slice is from the Filter 1 and the next slice is from Filter 2. Output of the second filter operation is stacked together on top of the first one and the third one also have some output slice, will be also stacked on top of that, and the fourth one will be also stacked on top of this. As you can see the X and Y and Z or maybe I should call it the dimension of W prime, H prime, D prime compared to original W, original H, original D. What conclusion you can make? First of all W prime here if you use a three-by-three filter with no padding. Padding means that you can pad on top of this like that but we didn't do that. W prime would be W minus 2, and H prime here would be H minus 2, and D prime is nothing to do with this D but the D prime is actually the number of filters in the convolution layer. This is actually same as number of filters, and this is oftentimes important design parameter. When have such many convolutions, we call it convolutional neural network, and typical convolutional neural network for typical image classification model look like this. Actually only these three convolutional operations are the convolution layers. It's confusing because in the ANN the neurons itself is the pixels and the neurons itself are the layers but here in convolutional neural network, this is the convolution layer and these are just outputs. This is an input, output, feature map Number 1, feature map Number 2, feature map Number 3. We have three convolution layers and each convolution layer in the architecture consists of many hyper-parameters. We'll talk about them one by one later but notable thing is that we have n number of filters and the filter size and so on. This part is actually the convolution layers, and sometimes people call it feature extractor. We'll talk about your y and this part you are familiar with. This is just normal multilayer perception or where it's normal artificial neural network which has dense layer, we'll call pepsin layer which is actually fully connected or densely connected layer or dense layer for short. These are all synonyms. Then also this is each neuron, and each neuron are densely connected to each other that not very densely connected each other, and these links correspond to each weight component. Then at the end of the day, you have output, this is cat, this is dog, things like that. If you have a multiple category then this output shape will be some on the vector instead of just one value. That's typical structure for convolutional neural network for image classification. Yeah, by the way, this part is called classifier and interestingly you don't have to have ANN for the classifier. If you like for whatever reason you can actually build a convolution layers to do the feature extractor and use the classifier something like random forest or maybe extra boost. You can build whatever classifier here and build a hybrid model and that is also possible.