So, let's now see how those ideas compare to one another in a more strict and formal way. The Monte-carlo method seems to rely on sampling of complete trajectories. They have to get at least one full directory to even begin training. The full trajectory here means that you have to be in the initial state then iterate it, you'll get to the terminal state. Sounds easy enough, but when it gives extra challenge because many complicated trouble control tasks as well as complicated video games that require you to get through tens of thousands of time step before you get to the terminal state. Even Atari games have thousands of iterations before they terminate. The issue here is that you won't be able to actually begin from your policy before you spend say five minutes to get one sample of your experience. The case is even worse where your process is actually infinite. So, you have a robot which tries to walk forward, but there's no key termination condition. It just gets rewarded for the amount of distance occurs. The temporal difference method, the Q-learning course similar methods, they don't have this problem at all. Instead, they are able to improve a policy in a meaningful way just based on one time step with this potentially infinite timeline. So, have your S, A, error, and next S. And even before you pick the next actions, what happens, you can already improve your Q-function. And by say an iteration 100 of this process can probably expect your algorithm to behave statistically significantly better. Now, this is not always the case with this general principle behind the temporal difference method. In a way, you can consider them as closer to how your humans learn. You can say that for example, your humans are capable of learning all kinds of wholesome stuff like writing, reading, walking upright and while getting the course here and rank all this kind of all unstoppable advance machine learning without having access to one full session. So, before you begin learning, no one gave you the recording of you from birth to death because you cannot die at once. And this is awesome in a way. And this is what makes you closer to the temporal difference method or rather what makes temporal difference method closer to how you learn. There's also a lot of things that can say in favor of Monte-carlo method. In this case, for example if you're having an imperfect table or function, say you've accidentally discretize your state poorly when trying to apply it to a particular problem. Then your Monte-carloa and temporal difference methods, they are going to run different, kind of different approximations of your Q-function. They are going to be wrong in different ways and the way Monte-carlo is wrong, it's actually kind of less biased than the temporal difference is going to be. If you're in ideal case this is not that important, but remember if you're trying to apply for learning practical problem whenever in ideal case. We'll include a lot of information about how those two algorithms compare to one another in the reading section as usual.