Okay, welcome back to lecture two of our course. So far in this course we have retrieved data from Fred, the database maintained by the Federal Reserve Bank at St.Louis. We have calculated one day log returns. We are now going to start looking at the distribution of these returns. We start with the normal distribution, also known as the bell curve. The normal distribution is probably the best-known distribution in statistics. You might ask, why is this the case? Probably because the normal distribution is completely defined by two parameters. One of them is the mean, and the other, the standard deviation. These parameters are very easy to compute. Now another reason is that under appropriate conditions, the central limit theorem or the law of large numbers says that the sum of random numbers can be approximated by a normal distribution. Even if the random numbers themselves did not come from a normal distribution. This is a very useful result. This is the reason why many types of data can be well-approximated by the normal distribution. But let us not get too carried away. The central limit theorem does not always apply in every circumstance. So we need to be careful not to assume that all data are normally distributed. The reason why we're interested in the distribution of returns is this. Remember we have two key concepts of risk, value-at-risk and expected short fall. Both are related to the shapes of the return distribution. We will be more specific in a few slides. The point I want to make here is this. If returns are normally distributed, value-at-risk and expected short fall are easily calculated. In fact, there are explicit formulas for them, and I will show you that later on. But if the returns are not normally distributed, then VAR and expected short fall require more effort to calculate. So let's start by reviewing the standard normal distribution which is the normal distribution with a mean of zero and a standard deviation of one. Now the probability density function of the standard normal distribution is given by the function to the right of f of x on this slide. Now, there is no need to memorize this function. It is here just because we will want to refer to it later on in this course. We're going to use a short hand notation called N of 0 and 1 to refer to the standard normal distribution later on in these slides. The little wiggle sign in front of the N means distributed as. N of course, refers to the normal distribution, zero refers to its mean, and one refers to its standard deviation. So the whole short hand notation reads like this. Distributed as normal distribution with zero mean and a standard deviation of one. Now it is easy to obtain a general normal distribution with an arbitrary mean of mu and an arbitrary standard deviation of sigma from the standard normal itself. To do so, let's start with a variable that I will call epsilon. And now I'm using this short hand notation which says epsilon is distributed as a normal distribution with mean zero and standard deviation one. You see, here is this instance where I'm going to use this short hand notation. Now we create a new variable x by first multiplying epsilon with a positive number sigma. And then add mu to the product of this multiplication. We know from statistics that x is also a normal distribution. But now, its mean it mu instead of zero, and its standard deviation is sigma instead of one. Now most of the time we do not know the true values of mu and sigma. We must estimate them from data. So let's see how to do this using the log returns of the Wilshire 5,000 index. For now we're going to assume that the log returns are normally distributed, with some mean mu and standard deviation sigma. We just don't know the true values of mu and sigma. However, we can estimate mu using the sample mean. And we can estimate sigma using the sample standard deviation. In fact, in R, it is very easy to do. The mean function calculates the mean of a data series, and the SD function calculates the standard deviation of a data series. So for the one day lot returns of the Wilshire 5,000 index, if we do this in R we will find that the mean is 0.0043575 and the standard deviation is 0.01072056. You will want to keep track of these two estimated parameters because we're going to use them later on. So we call the estimated mean mu or M-U, and the estimated standard deviation sig or S-I-G. We can now graph this normal distribution. The graph is the density function of the normal distribution with now the estimated mean of mu and the estimated standard deviation of sig. Now, it is your turn to calculate the mean and the standard deviation of the data that you downloaded from Fred in the previous exercises.