All right.

So as I'd mentioned at the start, we're going to focus on using regression

as the technique for building up our forecasting models.

And we've seen a little bit of this already with the autoregressive models.

So what we have to make a decision for is what's going to go in the model itself?

Do I need to include a trend component?

What is the nature of that trend component look like?

Is it going to be a linear trend?

Is it going to be a trend where we see increasing

or decreasing returns as we go further out in time?

We may want to include a cyclical component, taking into account macrostate of economy.

We may want to take into account seasonal factors, you know,

for example, if we're dealing with seasonal demand for products,

perhaps that needs to come into our analysis.

If we've got marketing activity,

if we have information as far as the amount of marketing spend

that's been used in each of our past time periods,

that's something that we can bring into the analysis.

When we get into looking at store-level or individual-level predictions,

the regression analysis,

we can even include factors to take into account that heterogeneity,

that exists across stores, across regions, across individuals.

So that's the big advantage with the regression framework,

is we get a lot more control over what's going in that right-hand side of the equation.

Now in terms of assumptions that are being made,

we're typically going to be putting these in as additive terms,

and that's something that you can play around with

as far as changing the form of the regression model.

I'm going to demonstrate it using linear regression today,

but there's nothing stopping us from building a multiplicative model to change the way

in which these components enter into that prediction.

So, let's go back to the data that we had started with,

where we had said that there seems to be a positive trend here, in the data.

And so we might say, well, let's test out a model that includes a linear trend.

And what I want to emphasize here is the idea

of let's start with the simplest model possible,

because when we're interested in forecasting,

our goal is not to come up with the model that best fits the data

that we're going to use during what's called the calibration period.

You know, the calibration period is the data that's used to construct the model.

The holdout or the forecasting period,

that's where we want to make sure that the model is capturing what's going on,

so that it's going to be useful for the future.

So we're gonna take the data that's available to us.

We're gonna divide it into those two components: calibration period, forecasting period.

We're gonna build the model on the calibration data

and then apply it to the forecasting data.

So the forecasting data has never been used in terms of estimating coefficients.

Once we come up with a modeling framework that works,

we can then apply it to the next period of time

that hasn't happened yet and make our predictions.

And so again, in this case, looks like we've got a linear trend.

We've got actually three years of data in here, we're going 2008 through 2010.

So what we're gonna do is we're gonna use the earlier part

of this time period to calibrate the model,

and then we're gonna see how well the model does at predicting the final year in the data.

So this dataset is already up on the course website, that you can work with it;

but the model that we're going to run, simple linear regression,

it's going to be our y variable, in this case, the weekly demand.

There's going to be an intercept that we're going to estimate,

and then there's going to be a slope that we're going to estimate,

and the only predictor that we're going to include is the week number.

So week one, number one; week two, number two; week three, number three; and so forth.

So we're just assuming that there's a linear trend.

Now there are two different ways that we can go about doing this.

One is to use the regression tool

that's built into the Data Analysis ToolPak within Excel.

The other way that you can do this is to use the commands

that you type directly into the cells, the =intercept command and =slope command.

So these two commands have one particular advantage over using the Data Analysis ToolPak.

Because these commands are going to stay active.

Because these formulas you're not going to be typing in fixed values.

Anytime that you update the data set, if you've used the intercept and slope commands,

if you change the y values, the x values, it's automatically going to recalculate

and re-estimate the intercept and slope coefficients for you.

So if you're building a worksheet that's going to be used,

let's say, every quarter to make your forecast,

you want to keep as much active in that spreadsheet as possible.

If you're running the Data Analysis ToolPak's regression tool,

every single time when new data comes in, you've got to run that analysis tool again.

Whereas, if we were to build it using the formulas,

new data comes in, I update the data, I feeded it,

I import the data into the appropriate cells.

If I've used these formulas, I don't have to retype anything.

I don't have to run anything new.

It will just automatically update once the new data's there.

So what does this look like if we were to use the first two years of data as calibration,

and that's the dark black line,

and then try to predict what happens in the future?

And those forecasts are represented by the dashed, black line.

And if we were to look at this, it doesn't look that bad,

looks like we're doing reasonably well.

It doesn't look like we're systematically missing anything in our data,

but there are some places where perhaps we can do better.

So for example, if we look at these couple of regions,

and they seem to recur around the summer.

You know, those two regions might tell us that we're missing something systematically.

And then, if we go out into the future, it also seems like we're missing

and turns out that it's in that summer period again.

So one way that we can try to dig in a little bit more to understand

what needs to be done is to look at the residuals.

Before we get there, is it actually a linear trend?

Well, rather than drawing a straight line over the two year period,

what if we were to draw a line that best capture the trend during the first year

and drew a separate line that best captures the trend during the second year?

Now, if it's a linear trend, these two lines should have the exact same slope.

If we see a difference in the slopes,

that's gonna be the tip-off that it's not a linear trend.

So let's take a look.

So again, we haven't gotten around that issue in both of these years --

we still have that underprediction happening in the summer observations.

But notice the difference in the slope of these lines.

If we were to continue these lines out,

notice how much faster the trend is in the first year,

and that's evident by how quickly the red line is growing compared to the green line.

If these were the same slope, these lines would overlap each other.

So this difference in slope is telling us that we're observing much more growth early on,

in that first year, in the 2008 data,

whereas in 2009, it looks like we're observing slower growth.

Well, what does that mean for us as we try to forecast out 2010?

Well, that's something that we're gonna have to be careful of

because if I forecast using that red line, I'm gonna overpredict a lot more.

So, we want to recognize that this is not a linear trend.

It looks like it's slowing down over time,

so we need to build our trend in such a way that it accounts

for the fact that it's decreasing, still growing but at a decreasing rate, over time.

One of the ways that we can take that into account

is instead of just using the week number as our predictor,

what we're gonna replace that linear trend with is the logarithm or in this case,

more specifically, the natural logarithm of the week number.

The natural log curve, it takes on the form of growing,

and then there's going to be curvature to it so that it's gonna keep on growing

but at a slower and slower rate over time.

And so given what we observed in our data,

fast growth during year one relative to year two,

we might decide that this is a better function,

a better functional form for our trend variable.

Now here's the trick.

As far as Excel is concerned, we're still running linear regression.

We still have an intercept, we'll call that alpha.

We still have a slope, we're going to estimate that's beta.

And what's our x variable?

Well, instead of just being the week number,

now our x variable is the logarithm of the week number.

So that's what's going to go in as our sole predictor, right?

So as far as Excel is concerned, we can put in any functional form that we might want,

but it's linear regression because of the general structure of this equation.

All right.

So let's take a look how that looks when we estimate that model.

And we seem to be doing a little bit better, you know, we're not capturing it perfectly,

and we still have those summer periods we pointed out earlier,

where we're not doing the best job.

But in terms of the forecasting, you know,

we seem to be more in line with this general pattern, you know,

not picking up those high levels that we'd like to pick up.

All right.

So how do we fix that part?

So we've talked about looking at your trend,

perhaps making it more flexible than just a linear function,

and there's nothing magical about using the natural logarithm.

Anything that grows at a decreasing rate would suffice,

so we might use something like the square root of the week number.

In other cases, if the trend were accelerating,

we would need a functional form that increases over time,

so perhaps a square term or a cubic polynomial.

There are many different ways that we can think about trying to capture the trend.

It really is just gonna be a matter of what works best based on your data.