[SOUND] The algorithms we study this week have one common property. This property is the fact that they treat the decision process, be it decision process or something else, as a black box, or well, almost always a black box. You do take account for the fact that you have to take states, produce probabilities of actions and so on, so there's this iterative structure of the process. But otherwise, the assumption for example, is not that widely used as we will use it later in this course. Basically, you can think of it as a, again, black box family of algorithms. So, you have this decision process here, or any process. And you have all those things, actions, rewards. Rewards from the whole trajectory in this process. Now the way you think of it is as some kind of box to which you feed the parameters of your policy. Maybe ways of a neural network which constitutes your agent's probability, action probability distribution, or a table of probabilities for every possible state, if there is a fleet amount of them. Anything you can think of, and then this box spits out the expected reward. Or just reward from one of several trajectories averaged. Now since we don't actually require that much from this process, can think you can make this next step and assume it's a black box. So you have a black box which takes a vector of weights. You can just draw a few inputs for every respective weight here probabilities. It spits out one number, and you want to tune these inputs to get the output number as large as possible in expectation. And again the method basically does this very thing. Maybe not exactly black box but it is almost so. And the method we're going to start right now, or to be more accurate, a family of methods, the so-called evolution strategies. Now counterintuitively, they only have a little bit to do with actual biological evolution, but get another method like. Now the idea behind them is, the first thing you have to do is you have to define a distribution, probability distribution, over inputs to your black box, which takes parameters that produce the reward. So if you use a distribution, if you remember for each state. You have to feed it one number per particular action in a particular state. So it's the number of states times the numbers of actions. Minus one if you are purely mathematical. And in case you are using a neural network, say 100 neurons followed by yet another 100 neurons, then you have to store, in this case at least, 100 squared numbers, which are the weights of this neural network. So what you do is you define them via some kind of distribution. For example, the fully factorized normal distribution. So you have 10,000 weights. What you do is you have 10,000 means of those respective weights. And you have 10,000 weight wise variances, the sigma squares. You could of course use any other distribution. For example, you could use not the fully factorized norma distribution, but the actual. And some weight, variances, the sigma squares. You could, of course, use not the fully factorized normal distribution but the actual mandates and, weight wise, kind of variant. The algorithm of course using other distribution, using not the fully facotrized normal distribution but the actual mandates. And you could use not the The algorithm of course [INAUDIBLE] use not the [INAUDIBLE] and [INAUDIBLE]. The algorithm of course using any other distribution, using not the normal distribution but the actual mandates. and [INAUDIBLE] [INAUDIBLE] the sigma squareds. The algorithm of course using other distribution. Use not the [INAUDIBLE] as a normal distribution but the actual [INAUDIBLE] sigma squares. [INAUDIBLE] Weight wise kind of variant. The algorithm [INAUDIBLE] you could of course use any other distribution. Use not the normal distribution but the actual mandates. You could [INAUDIBLE] squares. [INAUDIBLE] You could of course use any other distribution. The algorithms [INAUDIBLE] used not the [INAUDIBLE] and normal distribution but the actual [INAUDIBLE] fully factorized [INAUDIBLE] sigma squares. [FOREIGN] And of course, using any other [INAUDIBLE] the algorithm [INAUDIBLE] and weight wise kind of [INAUDIBLE] normal distribution but the actual amount of fully factorized [INAUDIBLE] the sigma squares. [FOREIGN] The algorithm [INAUDIBLE] and weight wise kind of [INAUDIBLE] but the actual [INAUDIBLE] the sigma squares. The sigma squares. Well, of course, use any other distribution [FOREIGN]. The algorithm is [INAUDIBLE] normal distribution but the actual amount [INAUDIBLE] weight wise iii. [FOREIGN] Well of course using other distribution. [SOUND]