We have learned support and confidence.

These two measures are not sufficient to describe association.

So the problem becomes what

additional interesting measures are good enough to describe their relationships?

So that's the reason we want to examine a little more like lift and

chi square whether they are good enough to describe additional interesting measures.

So lift has been properly used in statistics as well.

We look at the same table,

the same table we can think B means playing basketball,

C means eating cereal.

So we have the exact same distribution.

Then for this continuous table,

we use lift to compute it.

The lift is defined as this: B and C are two item sets.

For rule B implies C,

that confidence if its divided by C support, we get lift.

Or we can say if BC this lose support divided by B support times C support.

So for this lift,

the general rule is if the lift is one,

then these two items are independent.

If it's greater than one,

they are positively correlated.

If it is less than one,

they are negative correlated.

For our example data set,

we will calculate a lift of B and C and B and not C. We divide 0.89 and 1.33.

Then from those data sets and the rules we've broken C,

B and C should be negative correlated because the lift is less than one.

B and not C are positive co-related because the lift is greater than one.

This actually fix our problem because we know B and C should be negative correlated,

B and not C should be positive correlated.

So this looks very nice.

Let's look at another measure popularly used in statistics as well called chi square.

In chi square, the definition,

we need to calculate the expected value.

How to calculate the expected value?

If we can see this 400 is a real value, it's observed value.

But expect value is just based on the distribution.

For example C and not C the distribution is 700 over 250.

This is three to one and all 600 students with three to one you get 450 versus 150.

In that case, we probably can't see,

we still can't use the popular,

the rules like if chi square is zero, they are independent.

It's greater than zero, they are correlated either positively or negatively.

So we need additional test to see whether they are positively or negatively correlated.

Now for our example,

we can easy calculate chi square should be almost 76.

So B and C should be correlated.

Further, we can say they are negatively correlated because the expected value is 450.

The observed value is only 400. It's less.

So these teams can solve the problem as well.

But the problem becomes whether lift and chi square are good in all the cases.

Let's examine some interesting case.

In this case, you probably can see this not B not C actually is quite big.

There are 100000.

These actually called null transactions because the transactions contain neither B nor

C. And if we just look at a B and C relationship,

we first see B and C should be negative

correlated because it's not easy to get B and C together.

B and not C is far bigger.

C and not B is also far bigger.

But if we use a lift,

we compute a lift B and C,

we will get this 8.44 which is far bigger than one.

That shows B and C should be strongly positive correlated.

This seems not right.

Either we tried to use this same contingency table.

We add the expected value.

We do the computation.

We will find chi square is bigger than zero.

In the meantime, you observed value is far bigger than the expected value.

So we also should say B and C are strongly positively correlated.

This seems to be wrong.

What's the problem?

Actually, there are too many null transactions.

That may make things distorted.

We need to fix it.