We might want to see in

variable one is there some relationship between a spatial distribution of this?

This maybe cancer rates and this may

be number of refineries

in a particular region.

Is there a correlation between these types of variables?

What other correlation might we find?

What is the spatial relationships?

And so we've talked a lot about correlation and measurements of

correlation and doing this four time series for example in our time series module,

trying to see how things line up and if we have two lines that look like this,

and I have another data set that looks like this and they follow the same pattern,

these can be considered correlated time series,

I'm at a scatter plot.

If I have some data that's distributed like this and I could fit a nice line to it,

that data is considered correlated.

Covariance has a very similar formula.

We're trying to measure the pattern of common variation observed in

a collection of two or more data sets or partitions of data.

So with covariance, we have data set one,

so we have the mean of data set X and we have

all the measures x_ sub_i and the mean of data set Y and the measurements of y_sub_i.

So we can think of this as this is data set X and Y,

we can figure out the mean and each point is,this is x_sub_1,

x_sub_2, x_sub_3, y_sub_1, y_sub_2,

y_sub_3 and so forth.

And so once I calculate the mean for x,

then I can find the covariance between x and y for i=1.

So I put y1 here, x1 here,

subtract the mean, multiply them and I add this over my entire dataset.

And that value gives us some indication of the covariance between those.

And this is fine for this sort of data where we have time series data or we have

relational patterns but it's not taking into

account space and what we're really interested in is how do we take into account space.

And again, we had other measures like

correlation so we can do the correlation coefficient,

how correlated, how similar are two or more paired datasets.

So we get no space into account here.

We have a very similar formula.

Notice this chunk of our formula

looks identical to this chunk here. We should have a bar here.

And what we're looking for is just normalizing this.

So correlation ranges from minus one to one.

And this gives us some measure of similarity between

two or more paired data sets and there's

measures to test how significant that pairing is and so forth that

oftentimes we calculate this correlation because we want to show people this correlation.

With space, we still need to modify this to handle space.

There's other metrics such as entropy to show a measure of

the amount of pattern disorder information in a set of data x.

So we have some probability is the proportion of events or value occurring in

the ith class or range so we can calculate Shandon's entropy using this sort of formula.

And all of these give us some sort of information about the distribution

between two datasets or in a single data set we can calculate things like diversity.

So the entropy standardized by the number of classes in a dataset for example.

And so now we're hearing the word class we might

start thinking about our choropleth maps we have

different classes in the choropleth maps so we can start thinking about diversity.

But for spatial data,

we want to find related regions.

You know our eyes are drawn to these clusters,

these clumps of different data in different regions.

Are those statistically correlated?

And can I do an analysis prior to the visualization to find areas that are

statistically correlated and maybe help the visualization pop those out?

And one means of doing this is adapting some of those measures I

showed earlier for spatial autocorrelation.

Now there's a whole lot of issues in spatial statistics ranging

from scales of the data to how do we sample,

to logical fallacies and ecological fallacies.

And in analyzing our spatial data,

we have to be aware of these different issues.

Try to report those to our users so that we're clear about what's going on in

the data set and try to see if we can come up with ways to overcome those.

Now one critical thing in measurements in space is distance and direction.

So we know the location of let's say of events so we

can look at a data set of all the crimes that occurred in your town.

You can go to local police blotter,

collect all this information and knowledge of locations

allows the analysts to determine the distance and direction between different locations.

We can have traffic and trajectories,

we can download the New York taxicab data for example.

And a lot of spatial analysis requires the calculation of

a table expressing the relative proximity of pairs and places.

So if I have a bunch of events,

what's the distance between those events?

And so I might make a table like,

crime number one, crime number two,

crime number three, crime one, crime two,

crime three and so forth,

and I may have a data set where this is the distance between those crimes.

So if I know the latitude and longitude of crime two and crime one,

I can find the distance between one and two.

These don't have to be crime events these can be counties as well.

So could be the distance between all the counties.

It could be the distance between police stations,

the distance between bars,

all sorts of different things can be

expressed as this relative proximity of pairs of places.

We can also create what's called a weights matrix.

And so with a weights matrix,

we may have a geographic region and we may have a different counties for example.