Where we're weighting each of our observations Y, by the weight based on how
far into the past they are, and that set of weights is given by the Ws, all right?
And then, in the denominator, we're just adding up what those weights are.
Now these weights, think of them as probabilities,
think of them as being between 0 and 1, all right?
So imagine, for example, if we were to say each weight is going to be equal to 1?
All right, well if each of these weights is equal to 1,
the top term is just adding up my Y value.
So it's just taking the sum over those Y values.
The denominator just becomes the number of observations that we have.
So if we plug in weights equal to 1, we're back to using our simple moving average.
If we think of these weights as probabilities,
in the denominator, the sum of the weights is going to add up to 1.
So what we're left with is just a weighted sum where we get to determine
what are those weights W.
So if you believe that the most recent observation is more valuable,
perhaps we say the most recent observation gets a weight of 50%.
The next observation gets a weight of 30%.
The next observation gets a weight of 15%.
The final observation, which they get to weight of whatever is remaining.
Now, these weights don't have to be between 0 and 1,
you can put in any positive integer, or any 0 and above, any number that you like.
But if we think about these weights as proportions, that's going to make it
a little bit easier for us to understand how this weighting is happening.
So a simple moving average,
all that it's doing is saying all of our weights are equal.
So the weight is really just 1 over the number of observations that I have here.
Much more flexibility, we get to decide ultimately what those weights are, and
hopefully that is going to be informed based on what we've learned in the past.
A set of values that we've seem to be particularly predictive.