Now that we know,

the Neural Network Learning Technology,

which uses back propagation, well,

we can go deeper into it and actually look at

some real neural network structures that have been used

to obtain great levels of intelligence.

And so, we will focus on Deep Learning with Convolutional Neural Networks, CNN.

A CNN structure uses a Feed-Forward Neural Network.

And it's based on an animal's visual cortex where

individual visual neurons progressively focus on overlapping tile shape regions.

And vision tile regions sequentially shift,

which is the convolutional process,

to cover the overall visual field.

CNN uses MLP, Multi-Layer Perceptrons,

to do this convolutional process.

Deep learning CNN techniques,

became well known based on an outstanding,

or winning performance, of Image Recognition

at the ImageNet Challenge 2012.

CNNs need a minimal amount of preprocessing.

Rectified Linear Unit activation functions

are often used in convolutional neural networks.

And remember this?

This is the equation for a Rectified Linear Unit.

Now, CNNs are used in image and video recognition, recommender systems,

natural language processing, chess and the go-type game systems,

which we shown the example of AlphaGo.

Now, a CNN structure,

well, basically this is a detailed example.

As you can see here, from the original input image that we have here,

convolution process is done where

a filtering field is sequentially shifted around to process the whole image.

And that sequential shifting around is what we call the convolutional process itself.

Now, the convolution is conducted to create feature maps.

And you can see multiple feature maps there.

And due to the feature maps,

sub-sampling is conducted on the feature maps to have reduced size feature maps.

The sub-sampling reduces the data size

focusing it on more important structure of data information.

Then there's a convolutional process again,

and additional feature maps are created.

And then, there's a sub-sampling process.

And as a result,

we have a fully connected structure that

leads to the output of the convolutional neural network.

Convolution is used to find the same feature in different places of an image.

Convolution is conducted using learnable filters or kernels,

that are passed through the input data, the input image.

The convolutional layer uses multiple filters

where each filter moves sequentially

across the input data or image to make

a 2-dimensional activation map based on each filter.

Feature maps are made from the activation maps of the filters.

And the number of learnable filters, or kernels,

in the convolution process,

determines how many feature maps are generated after the convolutional process.

Sub-sampling uses a selecting operation,

which is called pooling,

on the feature maps.

Sub-sampling is a non-linear down-sampling process,

that results in smaller feature maps.

The most popular sub-sampling schemes,

which there are many that exist,

include median value, average value,

max pooling, which we'll talk about right here.

Median value.

For each sub-region select the median value,

the middle value, as a representative sample.

For example, looking over there,

in the red box, we have those numbers.

And if we were to list those numbers in order,

we would get right here 1,

2, 5 and 8.

Now, in the middle we have the numbers of 2 and 5.

If we were to add 2 and 5,

it would become 7.

And if we divide it by 2 to get it's average,

that would be 3.5.

Rounding up 3.5 would result in 4.

So therefore in this case,

the median value is 4. Another example.

For the red box over there,

if we were to look at the numbers,

and list them in order,

it would be 2, 4, 6 and 9.

Then taking the 4 and the 6,

which are the ones in the middle,

then the average would be 5.

So therefore, in the position of the sub-sampling using median value,

we have 5 inside.

In this example, we had four numbers in the

block which were an even number of numbers.

If we had an odd number of values

that were in the range where we were looking for a median value,

then the middle value becomes definite.

For example, let's say we had 1,

4, 7, 8 and 10,

these five values, then the one in the middle is evidently 7.

In this case, the median value becomes 7.

The next example is the average value.

For each sub-region, use the Average Value as the representative.

For example, for the value of 5, 1, 8 and 2,

the average is the numbers added up,

divided by four, which is 4.

And therefore, you can see that 4 is used as the average value.

Then there's the Max Value.

Each sub-region selects its max value as its representative value.

For the block of 5, 1, 8, 2,

the largest number is 8.

So, 8 is used as the sub-sampling representative value.

Now, as you can see here the convolutional layer increases number of feature maps.

And that's the process down there.

And you can see that,

the block right here,

which is the polling,

the sub-sampling process, this decreases the spatial resolution.

And in the middle, there is a Local Contrast Normalization block can be used.

This improves the optimization results and the image's invariance.

What do you mean by the invariance?

Well, this is the characteristic of not

changing after a transformation or processing.

LCN can be used on the image after convolution,

and before the pooling,

the sub-sampling process, which is shown right there in the middle.

Other techniques of CNN include dropout.

On each iteration, selected neurons are randomly turned off.

And this is based on a probability model used to select which will be turned off.

This helps in training neurons to properly work

when other neurons may not exist due to neuron functional failure.

This provides robustness to the Convolutional Neural Network.

Another technique is fully connected MLP output layer.

The output layer uses a fully connected multi-layer perceptron

that is connected to the previous hidden layer.

Outputs are computed with a matrix multiplication and biased offset.

Another Convolutional Neural Network technique is ensemble.

An ensembles are created by repeated random sampling of the training label data.

Ensembles improve the accuracy and reliability

by providing an improved global image,

of the data's actual statistics.

Ensemble models are often used in deep neural networks.

Another very powerful CNN Technology is bagging.

Bagging uses multiple iterations of training the CNN

with training labeled data that has random sample replacements.

After training, the results of all trained models of all iterations are combined.

These are the references that I use,

and I recommend them to you. Thank you.