All right. Well, welcome back. Today, we start on the really fun stuff of portfolio construction and that is the efficient frontier. So let's get started. Efficient Frontier. Well, let's start by pulling in a dataset that we haven't actually seen before. So let me quickly go show you where that dataset lies. It's here and the one we want is this one, ind30_m_vw_rets.csv. Let's take a look at it. This is from the Ken French Research Data website and it's a file that goes back to 1926 up to present day. It is the monthly returns of 30 different industry portfolios. So here is how it works. There's 31 columns, 30 columns corresponding to the industries and then this column here is the date. So 192607 is July of 1926 and it goes all the way down to 2018. Now, a couple things I want you to notice. One is that the return when you say 2.59, that's a 2.59 percent return. I would have expected it to be 0.0259 but it's not, it's 2.59, and this funky date format, 192608, 192609 it looks like an integer but it really is a date. So those are the things we got to work around at our end, when we read it in. So let's do that. Let me close this, we don't really need to be seeing this. Import pandas as pd and so let's start reading it. pd.read_csv, and let's call this ind, I-N-D. So what are we going to do? So the name of the file is ind_m_vw_rets, that's the one we want. We know that the header column is in row zero, we know there's an index column and that's in column zero. We want to try and parse dates, so let's try that. We know that those numbers have to be divided by 100, so why don't we just do that right now. So let's take a look at what we got and let's do ind.head. So that looks good. A couple of problems that I can see right away, that got pulled in as integers not as dates. We can confirm that by looking at ind.index. If you look at it, it's an int 64 index, not at all what we want. Well, let's fix these things one by one. So I can reset the index here. So ind.index and that's pd.to_datetime, ind.index. We've already done this kind of thing before, and the format is percent Y, percent m. Then that will give us a date stamp and we've already been through this before in the data that we've been working with, is we want to convert it to a period of a month. So actually month ends, M stands for month ends. All right. So let's try that again and if you look at ind. head this time, hopefully we've got nicer looking data there. Yes, that looks better, it's definitely a date and we're in good shape. There's one more really nasty thing about this which I want to point out, which is not at all evident when you look at it here but I might as well save you some time by pointing this out. Look at this, let's look at the columns here. Do you see a problem? Well, it's easy to miss but look at the name of that column here, that is food with an embedded space. That is fin there with an embedded space. So what we want to do is we want to get rid of that because if I say for example, ind food.shape, with that embedded space, that looks fine, it's 1,110 rows. But if I just did the more obvious thing, if I just food or shape, it would give me this key error. What it's saying is you don't have a column called food. Well yeah, the column is called food space which is no good to me. So what I'm going to do here is I'm going to fix that, which is a very easy thing to do. So I'm going to say ind.columns. I'm going to rewrite the columns and that's the good old ind.columns that we always had. Now, what I want to do is I want to apply a string transformation on it because these are strings that just strips out all the spaces. So the way you do that on any series is you call the str, there's an attribute called str, and hanging off that attribute are a whole bunch of string methods. The one that we want is called strip. So you do that and now if I look at ind.columns, looks pretty good. Just to be paranoid, I am going to look at ind.shape just to make sure that I got all the columns and all that I wanted, and that looks good too. So this all looks so good that I am going to have to do this a lot and we're going to use this dataset a lot. So why don't I just take this stuff, all those commands we just entered and I'm going to put that in our file. So you're not risk kit. So let's go down here. It should be right next to hfi_returns, hedge fund returns, let's get the industry returns. So what does the industry returns command, sorry, function look like? It's very similar to what we had before. It's exactly what I just typed in, you get csv files contents divide by 100, fix the index because it needs to get converted from that integer index to a date format, and then you strip the column so that the columns have proper names. So we're done with that, now let's do our usual stuff which is to do all that load stuff that we've done. We're going to do some plotting again, so you might as well do the matplotlib inline even though it's not clear that you need it, there's no harm if you do that. Of course, we have to import edheck_risk_kit as erk. Now, let's use our code to do this. The one that we just wrote and make sure that we get the right thing. So ind is assigned erk.get_ind_returns, that's the one we want. Let's look at the shape. Make sure we're good. Let's look at ind.head. So good, all that prep work is done. Now, let's see if we can do something interesting with this data series. Well, one thing that we already have code for is to compute drawdown. So remember that? We wrote the code for that. So why don't we try computing the drawdowns for let's say food. So erk.drawdown, if you remember was the name of the function that we wrote, and we're going to give it as input. What is the return series we're going to give it? We're going to give it the food return series. If you remember, that is going to give you back a various return series packaged in a DataFrame, and the one that I want to look at for now for example is called drawdown, if you remember that, and I'm going to plot that. Just to make the point that I can do it this way, why don't we do it this way? Okay, beautiful. Let me just make this a little easier to read so I can set the figsize, I don't think I've done this before. Figsize is you can give it just the size of the figure that you want to plot. So let's do 12, 6 something like that. There you go. Much nicer, much wider, looks great. Good. So all that stuff works. We very effortlessly are able to compute drawdowns and we're in good shape. So now, let's look at things that we can do with this return series. Let's just make sure it works with all our code. So let's try something like erk.var_gaussian. Let's try the Gaussian. Let's compute the Var of these things. Let's not do it for everything, let's just do it for food, maybe smoke, let's say coal and why not beer? We couldn't forget our own industry fin, finance. I'm interested in modified equal true. So I want the Cornish Fisher bar. By the way, if you're a little puzzled by the double, you're wondering why that happened, maybe it's easier for you to think of it this way. This is a list. So all you can do is say, cols_of_interest and you can do cols_of_interest, same thing. It's just sometimes that the double square per ends confuse people. This is the square per end that says, I'm indexing into the ind variable and this is a list. Great. So there you go. We're looking at the Var. You can see that beer, the value at risk is not that bad. Tobacco, pretty bad. In fact, why don't we do this? Why don't we do, let's do it for everything. So we want to do it for, want to look at all of those, and let's do.sort_values.tail. The reason I said tail is because it increases. So you can see well mines, as a real mine, they're investing in mines. Games, these all have very, very high-value address. That's by the way,10 percent per month. If you look at the least value at risk, you can see well beer, coal, boring stuff. Good. So what we've been able to show is that we've been able to use our old code on the new data. Let's perhaps plot that just to make sure that we're able to do that. So again, instead of just sorting the values and printing it out, let's sort the values and then plot it as a bar chart and see how that works. Wonderful. Good. So again, we see beer, the lowest var and mines, lots of value at risk. The good news there now is that we've been able to read that dataset and we've been able to run it. Now, let's work on computing some statistics for it. So what we want to do is just compute some basic statistics I would say, returns volatility Sharpe ratio. So this is so routine and so simple that I'm just going to type it right in here. So let's get that out of the way. So let me quickly show you this code, I don't want to waste too much time on it. Here's the annualized volatility. That's very simple. We've already talked a lot about how that's done, you compute the standard deviation. So given a set of returns, compute the standard deviation and then scale it by the square root of the number of periods in a year. Compounding returns are also pretty straightforward. Let me clear that up. So here you go. You have a set of returns which are a certain number of periods per year, you want to annualize it. Well, compound it, compute the number of periods that you have, and then the compounded growth to the power of the number of periods per year. So let's say it's 12. So to the power of 12 divided by the total number of periods minus 1. This math should not be complicated to you, it's just simple compounding. But play around with it if you are comfortable with that. Then the Sharpe ratio is quite simple. You just basically compute the excess return, the annualized excess return. Then you compute the annualized volatility, and you divide the annualized excess return by the annualized volatility, and you've got your Sharpe ratio. So I'm going to save that. Let's just look at the Sharpe ratios for these things. So let's go back and look at this. So instead of the var_gaussian, I'm going to compute the Sharpe ratios. So what would you do? We want Sharpe ratio of the industry portfolios and let's assume the risk-free rate is call it three percent. We have this is monthly data, so it's 12 of these and you want to sort the values, plotted the bar, and why don't we just say title this just for fun is Industry Sharpe Ratios from 1926-2018. Why don't we change the color to, let's call it green, stuff on. That's nice. So you can see coal has a pretty mediocre Sharpe ratio over the entire period, and food has been great. Food, smoke, health care have been the sectors that over this very large period of time, have provided outstanding Sharpe ratios. Okay. Good. So all of that is done, we've got our data read in, we've been able to plot them, we've been able to analyze them, and all of that kind of stuff. If we wanted to do this for a shorter period, so what would you do? So let's say we wanted to do it from say 2000 onwards. So let's look at it from 2000 onwards, and let's change that fig size to 12, 5. That's the width and the height. Let's see, color I'm going to keep it as goldenrod, lovely color. What do we have to do to get the returns from 2000-2018? Well, let's see if this works, which is, I'm going to go from the year 2,000 onwards. And there you go.Right. So you see now you have a bunch of negative Sharpe ratios which are not good, that portfolio has returned lower than the risk-free rate. Books does not seem to be in a good place to be, smoking always good place to be, tobacco. All right. So I just wanted to give you an excuse to play around with this stuff and follow along. We can now jump right into the real mean of stuff. Computing the efficient frontier involves what? We need to have two sets of things to be able to compute the efficient frontier. We need a set of expected returns, and the mean in covariance matrix. This is what we did in class. We said, once we have the correlations and volatilities which are the basically embedded in the covariance matrix, and we have the expected returns, we can generate the efficient frontier. So let's do this one after the other. Now, what are we going to do for the expected returns? The truth of the matter is, I have no idea what the returns are. This is the core of what it is, we are interested in what are the expected returns over the next period that I'm going to invest in. It's very, very difficult for me to answer the question, what the expected returns are? For now, we can just think of this as an in-sample exercise, when I say in-sample, we can go back and say, what was the efficient frontier? So I can just sidestepped the question for now of where are you going to get these expected returns from? So we're not going to think about this as a forecasting exercise. Let's look at this as an in-sample. In other words, all we're going to try and do is, let's see what the efficient frontier was over that period. So one way of doing it is to say look, I'm going to study the period from 1995-2000, right, and I'm going to compute what the actual returns that we did obtain over that period were. So that we already have a way to do that because we wrote this function, which is analyzed returns, and I'm going to look at the industry portfolios from 1995-2000. Okay. We know that we need, if you say Shift Tab, it'll tell you exactly what the signature is. You need the periods per year, this is monthly data, so I'm going to do that. So now, I'm going to call them expected returns but really these were the real returns that happened during 1995-2000. Just to make sure that we have the values, let's just look at it. It's always good to get into the habit of looking at the values that you compute, and let's just plot a bar chart. That's not going to work, is it? There's your bar chart, and you see that some of these things had negative returns, some of these had positive returns, we have some answers, we have some returns now. Now, the next step is we need to generate a covariance matrix. Once we've got the expected returns and the covariance matrix, we will be able to actually start the real work of generating that efficient frontier. So let's go ahead and do that now. So I'm going to say cov, that's the covariance matrix, is what? The way I generate the covariance matrix is just taking the set of returns that we already have, which is 1995-2000, then I call the cov method on it, the covariance method on it. If I look at the thing cov.shape, I get a 30 by 30 covariance matrix. I can look at it if you want. It's a little hard to see sometimes because it's scrolling off the page, because I have 30 columns and 30 rows, and this is the covariance matrix. So just a little quick recap, if you haven't seen what a covariance matrix looks like, you need to internalize this. It's a square matrix, it has as many columns as many rows as you have assets, it's symmetric because the covariance between food and beer is the same as the covariance between beer and food. So 0.002077, 0.002077, so it is symmetric about its diagonal. Speaking of the diagonal, what are the diagonals? That is the covariance between food and food, between beer and beer, between smoke and smoke. What is the covariance of this return series with itself? Well, it's just the variance. So covariance matrix is, that's why they call it the variance covariance matrix sometimes because each of these is technically a covariance between two different assets, and then the diagonal is nothing more than the variance of the assets itself.Okay. So that's sort of all we need right now, and we will continue next time around to actually use this to see what the efficient frontier looks like. I would encourage you to play around with this data. We're going to be working with this data a lot, so take the time to get to know it a little bit, and it's a real rich data set and it's fun to work with. I will see you at the next class.