Data repositories in which cases are related to subcases are identified as hierarchical. This course covers the representation schemes of hierarchies and algorithms that enable analysis of hierarchical data, as well as provides opportunities to apply several methods of analysis.

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

Is that all I can say using PACF and ACF?

Well, remember, so, we talked about trends,

we talked about auto-regressive functions,

we talked about moving average time series,

but we also talked about seasonal series.

So, then the question essentially becomes,

what about for seasonal series?

Rather can ACF or PACF tell me anything about seasonal series?

If I have seasonality on my data that I couldn't remove using differencing before.

Can ACF or PACF tell me anything about that.

It turns out that ACF can actually tell you also about that.

So in general, if you have a seasonal series or cyclic series, so ACF,

the auto-correlation function, will have

a large and positive value at the seasonal lag.

If you have seasonal lag,

that's if your period is 12 months,

you will have a large and positive value at the 12th month mark.

If your period is 24 months,

you will have a large and positive value for the ACF at the 24th month lag.

Now, that's great. So, this is great,

if it is true, that's great. So let's see.

So here, I have a cyclic or seasonal time series.

So, it's essentially a cyclic behavior with plus some random variations, some randomness.

So it's not perfect. It's basically seasonal plus a random term.

On the right hand side,

we have the ACF plot.

I can now tell what's happening.

What's happening is that the ACF also has a large and positive value,

has a large and positive value.

Has a large and positive value at

the 24/25 lag position.

This essentially is telling me is that the time series that I have here

is seasonal with a 24 lag behavior,

its period, the seasonal period is 24 units,

24 months, 24 days whatever,

that is 24 units.

This is great, I can actually tell that.

In fact, in this example,

I also have

a strong and negative autocorrelation

at 12/13 months or 12/13 units position.

Here it is strong and positive,

here it's strong and negative.

What this tells me is that indeed I have a cyclic behavior

with a half period of 12 units.

The full period is indeed

24 unit and it confirms that the half period of this is 12 units.

So, because half period,

I expect that basically the time series is going to switch

over and indeed this is exactly it's showing me.

The autocorrelation function it's showing me that,

it's showing me a 24-unit period and 12th unit half period.

Now, it turns out that basically you can use the PACF to confirm that.

You can use PACF to confirm that.

So, in this case, if I take a look at the PACF,

I will see if you look at that,

that the value is getting close to zero,

it has basically there's non-zero.

First positive then negative values but non-zero values up to

10 to 12 unit range.

So as anything below that,

I essentially need to- for statistical reasons,

I need to basically treat as non-zeros.

In this case, essentially what's happening is that my partial autocorrelation function,

essentially indicates or confirms that there is a cyclic pattern with a half lag,

half period which is around 10, 12 units.

Not that I am not exactly getting 12 because I have some random components,

because I have some random components

that might basically somehow draw off perfect results.

The reason basically here I'm not getting perfect zeros for example,

is because of the random component that I have.

Another reason is that this is finite series,

so I don't basically extend this infinitely as I have finite series with only one,

two, three, four, five, six cycles,

which essentially means that basically I'm learning

about the statistical properties of the cycles using only six cycle samples.

If I had 60, if I had 600 of these,

I might basically be able to get a more accurate results,

a more predictive results and it could be at these the bounds could have been tighter.

But even now, even if I am basically have a small sample,

as of small observations,

small number of observations.

So, what's happening is that

the autocorrelation function and

the partial autocorrelation function is telling me a lot of things.

It is telling me that, "Hey, Shadrack,

you might be looking at a cyclic data with a period of 24 units and

the half period of somewhere close to 12 units rather it's telling me that,

that it's very strong very strong.

But, of course, I mean, you will tell me that basically in the real world

the data is not- is almost never perfectly cyclic,

almost never perfect trend,

almost never perfect autoregressive,

it's almost never perfect moving average,

you have complex complex models.

What do we do then?

How do we deal with this complex models?

Well, what we do is, what we can do is basically we again plot the

auto-correlation and partial auto-correlation plots and see what it tells us.

So, in this case, if I take the complex model and if I basically plot,

it's the autocorrelation function first, let's start from there.

What we are seeing is that it decays-

Instead of shutting off, it decays slowly.

Remember what this had told us earlier,

is that when I see such decays in the auto-correlation function,

this was indicating trend.

So what this tells me that,

well maybe your data actually has a trend.

If you remove that trend,

you might be able to study and understand these time series better and more easily,

because they are non-stationary series with a trend.

So let's do that. So let's basically remove the trend.

Well, how do you remove the trend?

We'd also discussed it before,

to remove the trend we need to apply differencing.

Let's do that. Let's apply differencing.

Let's take the original time series and let's basically apply differencing.

Essentially, what I'm doing is that here,

I create a new time series, say Y.

To create the Y, I basically take Xt and subtract Xt minus one.

So this was the original time series with Xt and this is the new time series with Yt.

So for now note that basically,

this time series looks kind of stationary.

So because the original time series maybe was showing these weird behaviors,

I don't see that anymore.

It looks more or less flat. All right?

Good. Maybe I can study this more effectively.

But I still don't know whether it's autoregressive function,

I still don't know whether it's a moving average function and what not.

I don't know that.

Let's basically take a look at the ACF and PACF,

to see whether we can learn that.

So, let's take a look at

the auto correlation function of this differencing series. So what's happening?

So essentially here, so we basically see a series

with strong non-zero values at two lags.

At lag equal to five and lag equal to 10.

So we see a strong non-zero value at lag equal to five and lag equal 10.

If you remember, this might indicate that we are

looking at a moving average series with two components,

moving average five lag and moving average 10 lag.

This might indicate that.

It can tell us, "Hey,

this time series has two moving averages".

So good. Okay, I'm learning something.

So this may be a moving average time series and it may have two lag's.

So how can I confirm that this moving average time series?

So if you remember another thing that we had discussed before,

are that moving average series,

if I take a look at the PACF.

The PACF wouldn't shut off.

So would basically start having sort of this non-zero values and this non-zero values,

it will basically sort of go towards zero.

It wouldn't shut off. So that's what we had learned.

So if it is a moving average time series,

we would expect that if I plot the PACF,

we will basically see a behavior where it takes non-zero values here and there,

eventually getting close to zero but non-zero.

It takes some non-zero values. So let's see if it is.

So in this case, I have the PACF values,

and these PACF values I can see as it takes non-zero values, here and there.

It doesn't shut off quickly.

It take these non-zero values into the future.

It gets close to zero but still there are non-zero values.

So this essentially re-confirms to me that I might be

looking at a moving average time series.

A time series that represents a moving average process.

Great. So what did I learn?

I learned that I might be looking at moving average time, this Yt,

not the original series but Yt is maybe moving average.

I also kind of learned that this moving average has two components,

one of them is lag five and the other one lag 10.

Can I basically use that to actually discover the model?

Well, we can, rather we can.

So what's happening is that basically essentially,

remember here I said Dt but it's the same Yt that we are talking about.

Here, we obtain Yt by subtracting Xt minus one from Xt.

We kind of learned that,

basically this is going to have the resulting function as the moving average series,

with two moving average components,

at lag five and at lag 10.

So what this means is that from the study that I have done,

I would expect that my time series has this shape.

I would expect that my time series has a shape, has a formula,

is a closed form formula,

that is determined by this function.

It's actually to do that what I did is simple,

I took the Xt minus one and I moved to

the other side of the equation and I obtained this.

So essentially what I am looking at a curve,

my Xt is a curve which has

one autoregressive component and two moving average components.

That's what I expect.

Now, what I can do is that I can take

this close form formula and I can try to basically find the best fit parameters.

In this case, I have two parameters, alpha and beta.

The best-fitting alpha and beta parameters.

Well, if I do that,

I would actually see that the value of alpha

here will be 0.5 and the value of beta is also 0.5,

because indeed when I generated the series,

I used this formula to generate it.

So magically, the procedure that we described actually gave me

a closed form formula that

matches the actual formula that I used to generate the original time series.

So this shows us the power of the ACF and of the PACF plots.

So given in this arbitrary looking time series,

I can actually learn how the time series is generated.

Once I learn how these time series is generated,

then I can take this information to for example,

forecast the future because now I know,

now I have a formula that tells me how this time series is generated, right?

I can use this to actually predict the future.

This is great. This is an amazing tool.

One thing that I want to basically highlight,

these time series, I mean I told you guys that I generated from this.

But it also kind of shows this cyclic behavior.

Isn't this cyclic?

It is true, it is a cyclic behavior.

I can kind of basically study that.

I can basically locate the autocorrelation function of the time series again.

So this is the autocorrelation function of the time series.

Remember what's happening here is that,

so if the autocorrelation of this series is showing indeed a cyclic behavior,

there is a non-zero value at some point,

and there is a negative non-zero value before and so it might be a cyclic data.

It might indeed be a cyclic data.

Essentially, what I'm trying to tell you guys that,

it is possible that the same time series might

be represented by kind of different models.

But why wouldn't I use this?

Why wouldn't I use the cyclic representation?

Well, I wouldn't use the cyclic representation,

I wouldn't trust this information,

because if you look at this,

what I had told you guys before,

is that anything between these two lines.

Three is not basically significant.

So what essentially it means is that,

if I discount for any values here, the cyclics disappears.

So essentially, this time series that we are looking at is apparently cyclic,

and if you don't closely look at the PACF and ACF,

you might think that it's actual cyclic data,

but it's really not cyclic.

It is kind of cyclic but it is better described with a ARMA model.

Where it has a AR term,

autoregressive term, and it has two moving average terms and there is no cyclic terms.

The cyclic terms are not necessary to describe these time series.

I can actually tell that, although the series look cyclic.

I can actually tell that by looking at its ACF and PACF plots. So that's it.

That's it for now. In this class,

we have learned how to characterize the high level properties of time series

and we have learned how to use ACF and PACF plots to actually

discover details about the shape or the underlying function for a given time series.

It is easy to see that once you discover

the shape or the underlying function of a time series,

you can use that to predict the future.

Very strong. In the next lecture,

we will start looking at the remaining questions.

We will look at the question,

how do we measure similarity between time series?

We will look at the question,

how do we discover patterns or motifs in time series?

Which are once again important tasks when we are using

time series for decision-making or for data analysis.

Thank you so much. I will see you guys next lecture.

Ознакомьтесь с нашим каталогом

Присоединяйтесь бесплатно и получайте персонализированные рекомендации, обновления и предложения.