Decision trees use a deterministic algorithm. That means if you give them the same data, they will produce exactly the same tree each time. They have a tendency to over fit. They will make the best tree for the data you give them, but it may not generalize well to data they haven't seen. Random forest is the fix for this. So the first part, forest, basically, it adds lots of trees. But I just said it's deterministic. If you add lots and lots of identical trees, you don't achieve anything. It's like having a democracy with ten newspapers and thinking you have diversity. But if they're all published by exactly the same person who controls the content, then really you haven't got anything more than a single newspaper. And your democracy is going to over fit on the opinions of that one publisher. So, what we do is we make each tree slightly different. And we do this by training on only on a subset of our data. So we might take 90% of the rows, 90% of our training samples, and 90% of our training columns.of training fields. And we do this randomly, that's the random of random forest. So each tree will have been given a different subset of the data, and so will be producing slightly different results. And we take this forest of trees, and we join together their answers. For a classification, it might be taking the most common answer. For a regression, we might take the mean of all of their answers. Now because we didn't give them all of the training data, each of those individual trees is weaker. But overall when used together, we get a model that is more robust against data it hasn't seen before. Incidentally, we'll revisit this same idea with neural nets, where it's called drop out. Going back to our democracy example, it's a bit like having ten newspapers published by ten different people, each with different viewpoints on what the truth is. Some of the time each of them is going to be wrong or publish an under-researched article. But taken together, they give you a healthier, more robust democracy.