0:00

In the last video you saw how to define the content cost function for

Â neural style transfer.

Â Next let's take a look at the style cost function.

Â So what does the style of an image mean?

Â Let's say you have an input image like this, you're used to seeing

Â a constant like that to compute features at various different hidden layers.

Â And let's say you've chosen some layer l,

Â maybe that layer to define the measure of the style of an image.

Â What we're going to do is define the style as the correlation

Â between activations across different channels in this layer l activation.

Â So here's what I mean by that.

Â Let's say you take that layer l activation.

Â So this is going to be a nH by nW by nC block of activations,

Â and we're going to ask, how correlated are the activations across different channels?

Â So to explain what I mean by this maybe slightly cryptic phrase, let's take this

Â block of activations and let us shade the different channels by different colors.

Â So in this little example, we have say five channels,

Â which is why I have five shades of color here.

Â In practice, of course in neural network, we use a lot more channels than five,

Â but using just five makes the drawing easier.

Â 1:22

But to capture the style of an image, what you're going to do is the following.

Â Let's look at the first two channels.

Â Let's look at the red channel and the yellow channel, and

Â say how correlated are activations in these first two channels?

Â So for example, in the lower right-hand corner, you have some activation in

Â the first channel and some activation in the second channel.

Â So that gives you a pair of numbers.

Â And what you do is look at different positions across this block

Â of activations, and just look at those two pairs of numbers, one in the first

Â channel, the red channel, one in the yellow channel, the second channel.

Â And you just look at these two pairs of numbers and see when you look across all

Â of these positions, all of these nH by nW positions.

Â How correlated are these two numbers?

Â So why does this capture style?

Â Let's look at an example.

Â Here's one of the visualizations from the earlier video.

Â This comes from a again the paper by Matthew Zieler and

Â Fergus that I had referenced earlier.

Â And let's say for the sake of arguments that the red neuron corresponds to,

Â and let's say for

Â the sake of the argument that the red channel corresponds to this neuron.

Â So it's trying to figure out if this little vertical

Â texture in the particular position in the image.

Â And let's say that this second channel,

Â this yellow second channel corresponds to this neuron.

Â Which is vaguely looking for orange colored patches.

Â So what does it mean for these two channels to be highly correlated?

Â Well, if they're highly correlated,

Â what that means is whenever part of the image has this type of subtle vertical

Â texture, that part of the image will probably have this orangish tint.

Â And what does it mean for them to be uncorrelated?

Â Well it means that whenever there is this vertical texture,

Â it probably won't have that orangish tint.

Â And so the correlation tells you which of these high level texture

Â components tend to occur or not occur together in part of an image.

Â And it's the degree of correlation that gives you one way of measuring how often

Â these different high level features such as vertical texture, or

Â this orange tint, or other things as well.

Â How often they occur, and how often they occur together, and

Â don't occur together in different positive image.

Â And so if we use the degree of correlation between channels as a measure of

Â the style, then what you can do is measure the degree to which in your generated

Â image, this first channel is correlated or uncorrelated with the second channel.

Â And that will tell you in the generated image how often this type of theoretical

Â texture occurs or it doesn't occur with this orangish tint.

Â And this gives you a measure of how similar is the style

Â of the generated image to the style of the input style in each.

Â 4:25

So, let's now formalize this intuition.

Â So what you're going to do is given an image,

Â compute something called a style matrix,

Â which will measure all those correlations we talked about on the last slide.

Â So more formally, let's let a superscript l,

Â subscript i,j,k denote the activation at position i,j,k in hidden layer l.

Â So i indexes into the height, j indexes to the width, and

Â k indexes across the different channels.

Â So, in the previous slide we had

Â five channels that k would index across those five channels.

Â So, what the style matrix will do is you're going to compute a matrix causes

Â G superscript around square bracket l,

Â this is going to be a nc by nc dimensional matrix, so it'll be a square matrix.

Â Remember you have nc channels, and so you have an nc by

Â nc dimensional matrix in order to measure how correlated each pair of them is.

Â So in particular Glk, k prime will measure how correlated are the activations

Â in channel k, compared to activations in channel k prime.

Â Where here, k and k prime will weigh from 1 to nc,

Â the number of channels there are in that layer.

Â 5:51

So more formally, the way you compute G l, and

Â I'm just going to write down the formula for

Â computing one element, the kk prime element of this.

Â This is going to be sum over i,

Â sum over j of the activation in that layer

Â i,j,k times the activation at i,j,k prime.

Â 6:22

So here, remember, i and j index across the different positions in the block,

Â indexes over at the height and width.

Â So i is the sum from 1 to nh, and

Â j is the sum from 1 to nw, and

Â k here and k prime index over the channel.

Â So k and k prime range from 1 to the total number of channels in that

Â layer of [INAUDIBLE] So all this is doing

Â is summing over the different positions of the image over height and width.

Â And just multiplying the activations together of the channels k and

Â k prime and that's the definition of Gkk prime.

Â And you do this for every value of k and

Â k prime to compute this matrix G, also called the style matrix.

Â 8:51

So now you have two matrixes they capture what is the style of the image S and

Â what is the style of the image G.

Â And by the way we've been using the alphabet to capital

Â G to denote these matrixes.

Â In linear algebra these are also called the grand matrix or

Â these are called grand matrices.

Â But in this video I'm just going to use the term style matrix, but

Â it's this term grand matrix that motivates using G to denote these matrices.

Â 9:23

Finally, the cost function,

Â the style cost function,

Â if you're doing this on the l, between S and

Â G, you can now define that to be just

Â the difference between these two matrices.

Â G,l,G to the 2nd.

Â And So this is just the sum of squares

Â of the element right differences between these two main differences.

Â And just to write this out, this is going to be sum over k,

Â sum over k prime prime of these differences,

Â of s, kk prime minus g of g kk prime,

Â and then sum of squares of elements.

Â 10:26

The authors actually use this for

Â the normalization constants two times, nH, nW, at that layer,

Â and c at that layer, and then square this you can put this up here as well.

Â But the normalization constant doesn't matter that much because

Â this cost is multiplied by some hyper parameter v anyway.

Â So just to finish up, this is the style cost function defined using layer l.

Â And as you saw under previous slide, this is basically the Frobenius norm

Â between the two star matrices, computed on the image S and on the image G.

Â For results squared and then we're going to additional

Â normalization content which isn't that important.

Â 11:13

And finally it turns out that you get more visually pleasing results if you

Â use these style cost function from multiple different layers.

Â So the overall style cost function you can define as,

Â sum over all the different layers of the style cost

Â function for that layer, you should define the.

Â Weighted by some set of parameters,

Â by some set of additional hyper parameters which we'll denote as lambda l here.

Â So what this does is it allows you to use different layers in the neural network,

Â both the earlier ones which measure relatively simpler low level features like

Â edges.

Â As well as some later layers which measure high level features and

Â cause a neural network to take both low level and

Â high level correlations into account while computing for style.

Â And in the you gain more intuition

Â about what might be reasonable choices for this hyperparameter lambda as well.

Â