Suppose you're taking measurements of a physical quantity that you know to be smooth and slowly varying. For instance, the tide level at the sea or the blood pressure of a patient. Because of the measurement process your values, the values that you actually measure are corrupted by some form of additive noise. Which at this point we could just say it's some extraneous quantity that perturbs the measured value. So what you get in the end instead of the smooth curve of the first panel is a curve where you see that there's really fast wiggles that you interpret as being a noise corruption. And the question is can we process this data as a discrete time-signal, so that most of the noise is removed and we get back a measurement this in line with our expectations of smoothness. The first idea is to replace each sample of the average sequence with a local average. Averages are usually good when you want to eliminate random variations that you don't know much about. So for instance, assume a situation where we process the signal and the process signal is at each sample the simple two point average of the current sample and the past sample. Well it could very well be that the two point average is not enough to remove most of the noise. And so in general, we could take longer averages by considering the output samples to be the average of capital M points from the input. If we try to do that experimentally, we can see immediately that the strategy works for increasing values of M. So if we take just the two point average, we see than we don't really get a lot of leverage. You can see in blue the original signal, and in red, the filtered output and with a two point average, we still have a lot of wiggles. But as we increase the number of points that we use to compute the average, we can see that we start to remove a little bit more of the noise. And by a 12 point average we already can see that the strategy already pays off. We have I'll write this mood without the signal requires significantly, and if we're willing to increase the average up to say, 100 point, then we can see that the signal we get the output is actually very smooth. You will also see that with respect to the original signal, the output has been delayed by a certain amount. This is the price we pay because we are actually using up 100 samples to compute the output value. So in a sense we are accumulating a certain delay with respect to the input because we're averaging from the past. We will see this in more detail later on. What we have done is really a filtering operation, and if we want to compute the input's response, we need to do nothing more than just put the delta function inside our averaging machine, and it's very easy to see that if we put the delta function in here, this, remember, is the averaging relationship that we saw before. We have that the impulse response is equal to 1 over M for n that goes from 0 to M minus 1, and 0 otherwise. It's easy to convince yourself that in this summation there is only one non zero term, which is the one where n is equal to k. Since the index goes from 0 to n minus 1. If your value of n is outside of this range, none of these terms will be non 0, and you will get 0. If we plot the impulse response, it looks like this. So you will have a string of samples that are equal to 1 over m from 0 to M minus 1. So what we had learned experimentally, is that the smoothing effect of the filter is proportional to the length of the impulse response. As a consequence of that a number of operations that we have to perform to compute each output sample and also the storage, because we have to keep in the memory of our processing unit the past values in order to be able to compute the average while both of these quantities are proportional to the length of the filter. So as the filter becomes longer and as its smoothing effect increases, the price we pay in terms of computational resources and memory are higher. So we might not like that and we might want to see whether there's a smarter way to achieve the same effect without using up so much in terms of computer power. Let's look at the equation for the moving average, it's just one over capital m, times the sum of input samples from m to n minus capital m plus now we can split this into two parts. We have the first term, which is one over capital M times the current input, plus 1 over capital M, times the sum of past input samples from m minus 1 to m minus capital M plus 1. And this is the sum of capital M minus 1 terms. So the second term here looks suspiciously close to the moving average computed this time over capital M minus 1 point, so 1 point less than before, and delayed by 1. So it's close, but it's not quite the same thing, so let's look at ways to formalize this relationship. To do so, we first write out the standard equation for the moving average filter. Which we have seen before, so nothing new here. Now we try and compute the delayed output, so y of capital M of n minus 1 is 1 over capital M times the sum from k that goes to 0 to capital M minus 1, of x of n minus 1 minus k. We can rearrange the index of the signal like so and now this plus one that affects only the summation index can be propagated to the summation range and we have that if we delay the moving average by one. We basically just effect the range of the summation that now goes from one to M, instead of from 0 to N minus 1. Okay, now let's look at the formula for the moving average over M- 1 point and this is just a moving average so 1 over capital M- 1 times the sum from k that goes to 0 to capital M- 2 of n of x- k. And now we do the same, we try to delay this by 1 and again this delay will propagate to this summation arranged and we will have the sum that goes now from k = 1 to capital M minus 1. So, formula for the moving average over capital M points and here, formula for the moving average over n- 1 points delayed by 1. All right, but now if we take the sum of the input from 0 to M minus 1, we can split this as a current sample plus N minus 2 samples in the past. And so this part here corresponds to this and this part here corresponds to this so we can. Small terms and we obtain the relationship that we were looking for between the moving average over capital M points, and the moving average over capital M minus 1 points delayed by 1. So we rearrange the terms, and we have that, the moving average over capital M points in n Is equal to capital M minus 1 over M, times the moving average over n minus 1 points, delayed by 1, plus 1 over M times x of n. So if we want to write this a little bit better, we define a parameter lambda as M minus 1 over M, and we have this final relationship here.