In this section, we're going to consider linear regression analysis. Now, regression is a huge topic and we could spend a whole course just examining. A regression is really one of the most frequently used statistical techniques out there. But nonetheless, we can relate the basics of regression to much of the work we've covered throughout this MOOC. So before we get into the technicalities of regression, let's just backtrack a little bit to high school mathematics. I'm sure you all remember basic algebra where you were given the mathematical equation for a line. For example, y= a + bx where a represents the y-intercept, i.e., the value of y when x is equal to zero, and b is the slope or gradient of the line. So if b was positive, we would have an upward sloping line. If b was negative, a downward sloping line. So just continuing with this mathematical world for a moment, think about a and b as parameters because if you know numerically the values of a and b, you know which specific line you are dealing with. So let's put some numbers to a and b. Suppose a is equal to two and b is equal to four. So now, y = 2 +4x. So if we choose any value of x that we like, we can substitute it in and determine the value of y. For example, if x is equal to three, y would be equal to 14. If x was equal to 10, y would be equal to 42. So you can see the deterministic relationship if we know x and we know a and b, we know exactly what y would be equal to. Now, that's on mathematical world where things are, indeed, deterministic, and there's no uncertainty about the value of y given the a, b and x values. But in statistics, we may want to try and model linear relationships between variables. You think back to our work on data presentation and data visualization. We saw a scatter diagram showing the control of corruption on the x-axis, the GDP per capita on the y-axis, and we postulated that kind of a line could go through those points, although it wasn't perhaps a perfect linear relationship. So I'd now like us to turn our mathematical model into a statistical model. So we begin with y= a + bx. We viewed a and b as parameters. Of course, we've covered a few parameters within this MOOC, and each time, we've tended to use Greek letters to represent parameters. So now, I'm going to change the a and b into alpha and beta, effectively the Greek equivalent letters. But still, we will be looking now at a deterministic relationship, i.e., a perfect linear relationship between y and x. But typically, in the real world, we tend not to find these perfect linear relationships, but we might find approximately linear relationships. So now, let's convert this deterministic model into a stochastic or random model by introducing an error term which we're going to denote by the Greek letter, epsilon. So our model now is y equals alpha plus betaX plus epsilon. Now, epsilon, although it's also a Greek letter, here, it does not represent a parameter but represents a random variable. Think of this as sharing the approximation to linearity, namely to determine the relationship between y and x, we need to introduce this random component. Now, as it's a random variable, we may seek to make some assumptions about epsilon. Indeed, in the standard regression model, we tend to assume that the error term follows a normal distribution. So we've covered the normal distribution together, and here, we now see it playing a role of a distributional assumption within a wider statistical model. So in the generic setting of the linear regression model, we would tend to say epsilon follows a normal distribution with a mean of zero. So these error terms, on average, are equal to zero. Sometimes they're positive. Sometimes they're negative. But on average, they cancel each other out, but there will be some variation to them which we generically will denote by sigma squared. So now, we've introduced our simple linear regression model. I'd like us to consider a very famous financial application of this. Now, I'm betting many of you are listening to this will own shares, maybe directly that maybe you actively invest in the financial markets, but even if you don't, you may still be a shareholder. If, for example, you've started to save for a pension, those monthly pension contributions that you make will be handed over to some fund manager who no doubt will invest this money in various stocks, bonds and maybe some other financial assets as well. So many of us will be stockholders either explicitly or implicitly. So it might be interesting to discover whether the stocks your money is invested in, are they stocks which appeal to your personal appetite for risk? So in this simple linear regression model, what might the y and x represent? Well, now I'd like to consider y as being the returns let's say, perhaps, the daily returns on a specific stock, and x, the daily returns on the underlying stock market index. So what might this market index be? Well it will depend where the particular stock was listed? If it was listed on the London Stock Exchange, the appropriate index might be the FTSE 100. If it was listed in the US, it might be the S&P 500, for example. So, indeed, why might we want to work with returns rather than stock prices directly? Well, if we think about how we would calculate a return on an asset, so the return would be the current price minus the previous price divided by the previous price. Now, if these price changes are very small, we could approximate this as being the log of the current price over the previous price. So the reason we tend to work with returns in finance rather than the prices themselves, is that the price of a particular stock will tend to exhibit some memory, i.e., if you have a high stock price today, it's quite likely to still be quite high tomorrow. Or if the stock price was quite low today it's quite likely to still be quite low tomorrow. And, indeed, when we are dealing with financial or, indeed, economic time series data, there's often this memory in the process. If a country experiences high unemployment today, well, it's still likely to be quite high unemployment in the next period. So to get around any so-called autocorrelations, which exist within a much time series data, would tend to work with returns rather than the prices themselves. And if we work with returns, this often delivers us what we might consider to be a sequence of uncorrelated random variables, which is a useful attribute to have when dealing with a simple linear regression as we are about to do. So we're looking at returns. So if we focus now on the beta, remember, in that original mathematical equation y = a+ bx, the b was the slope, the gradient, the sensitivity of y to changes in x. So now viewed in this simple asset pricing model from finance, we can now think of the beta as the sensitivity of a particular stock to changes in the underlying market index. Now, this is a very famous asset pricing model in finance referred to as the capital asset pricing model, CAPM for short. Such that all stocks out there will have their own beta, and this is a very simple metric to quantify the risk associated with any individual stock. So let's consider a few possible values of beta, and if we know the value of beta, we can then assess what kind of stock we have in terms of its risk profile. So if a stock had a beta of one, that would mean that it moves one for one with any movements in the underlying market index. So let's take the FTSE 100 as our reference market index. If the FTSE 100 went up by one percent, then the return on our stock would also go up by one percent. And just as if the market went down by one percent, the return on our stock would go down by one percent. So this would track the market. If the beta of a stock was greater than one, that means any movements in the underlying market index are going to be amplified in the returns of our stock. So the market index goes up by one percent, and if beta, let's say, was two, the return on our stock would go up by two percent. Now, this may sound wonderful, but of course, markets can go down as well as up. So when the market index goes down by one percent, the return on our stock goes down by two percent. So you see much greater volatility in the returns of a stock with a high beta. If the beta, let's say, was less than one, let's say, nought point five, then of course, any movements in the market index are dampened when converted to individual stock. The market goes up by one percent, the return on our stock only goes up by half a percent. Similarly, if it goes down by one percent, the return on our stock only goes down by half a percent. So stocks or betas greater than one, we would think of as risky stocks, and the larger the value of beta, the riskier the stock itself is. So if you are a risk-loving investor, of course, you will love high beta stocks. True you may potentially lose a lot of money, but you have to be exposed to that risk to have the opportunity to make lots of money. Of course, risk-averse investors will tend to choose low beta stock, the defensive stocks because they dislike running the risk of losing a large sum of money. So if you are a shareholder, directly or indirectly, perhaps a useful exercise is to do some internet research about the values of betas of the various stocks you hold and hence, you can judge whether your portfolio of stocks adequately reflects your personal attitude to risk. And perhaps, just one final comment. Some of you may have come across these stock-picking contests, such that you have like a three or six-month period and the person globally who picks a set of stocks which performs better than anyone else wins some prize. Well, I've had students come up to me before asking what they should do in those stock-picking contests, and my answer is very straightforward. To win, you have to make more money than anyone else. So in order to do that, don't pick a portfolio of stocks because there, you're going to be diversifying away the risk. You want to identify the riskier stock you can, and simply choose to invest in that single stock. Yes, this is a risky strategy. You may lose a lot of money, but you have to be have this sort of necessary condition in order to have any hope of achieving a very high return. Of course, knowing which of those risky stocks is going to be one which performs very strongly. Well, of course, that's decision making under uncertainty. I don't have a crystal ball, but I do know I will need to be exposed to risk to have any chance of getting a very good return. So if you try and play it safe in these stock-picking contests and have a sort of portfolio of various stocks, you're perhaps unlikely to be the worst performing player in this game, but also you're not going to be the best performing player in the game. So mean and variance: two very important statistical concepts and really at decision making certainly, as far as financial investments are concerned, are going to draw on these concepts very heavily indeed.