0:05

Just want to give you a little bit of a thought exercise.

Â I've picked a couple of companies here that deal a lot in consumer data.

Â Apple or any online business really, Netflix and then Whole Foods.

Â One brick and mortar company in the bunch.

Â But think for a second about the types of decisions consumers make

Â with these businesses, and framing them in terms of choices that consumers make.

Â So for example, at Whole Foods,

Â it might be am I going to buy a particular brand on a given shopping trip?

Â Yes or no.

Â Well, for the company's standpoint,

Â it might be helpful to know which of those brands are going to be popular.

Â Which ones are people going to buy on different trips?

Â Is there seasonality associated

Â with their products when they're making their ordering decisions?

Â Am I going to come to Whole Foods when I need groceries?

Â Yes or no?

Â I could go to one of the other grocery stores that's available to me.

Â So what it is it about the people who choose to shop at Whole Foods,

Â they're more likely to go there when they're making their shopping trips?

Â With Netflix, do I retain service this month?

Â Yes or no?

Â Do I choose to watch the recommended series, yes or no?

Â Do I choose a larger plan this month?

Â Do I add on the DVD service this month, yes or no?

Â Similar types of decisions you can imagine consumers making,

Â whether it's Apple, or Amazon, or any other business.

Â So there are a lot of customer choices that are driving these businesses,

Â again, highlighting the importance of understanding what's the right way for

Â us to be analyzing this choice data?

Â 1:55

And the reason that I wanted to talk up front about distributional assumptions,

Â as I said, we're used to using the normal distribution.

Â Well, what we're really going to be changing is

Â that distributional assumption.

Â When it comes to binary choices, we're not going to be using the normal distribution.

Â We're going to assume that a customer choices between a yes or

Â no outcome follows a Bernoulli distribution.

Â And there are only two values allowed under a Bernoulli decision, 1 or 0.

Â Yes or no.

Â And the only parameter that's associated with the Bernoulli distribution

Â is the probability p.

Â So with the probability p, you get a 1.

Â With the probability of 1- p, you get a 0.

Â Again, framed differently with the probability of p, there's a yes outcome.

Â With the probability of 1- p, there's a no outcome.

Â Now, we can calculate the mean and

Â the variance associated with the Bernoulli distribution and we've done that here.

Â 2:58

So the expected value under a Bernoulli distribution if we take

Â what are the outcomes, 1 and 0, and

Â what are the probabilities associated with those outcomes?

Â That weighted sum, our best guess, our expectation,

Â is that's the mean of the Bernoulli distribution, so probability p.

Â We can also calculate the variance under the Bernoulli distribution.

Â So when it comes to writing out the likelihood of a single observation from

Â the Bernoulli distribution, this is the form that it takes on.

Â Now, notice, it's the probability p raised to the power y (1-

Â p) raised to the power of 1- 1.

Â Now it looks a little bit foreign, but

Â let's break it down based on the values that y can take on.

Â Suppose we observe a 1, all right, y equals 1.

Â Well, p raised to the power of y means I have a value of p.

Â (1- p) raised to the power of 1- y,

Â so raised to the power of 0, that term is going to go away.

Â So the likelihood for a single draw from a Bernoulli distribution,

Â if I observe a 1, y equals 1, the likelihood is p.

Â All right, well, what if I observe y equals 0?

Â If y equals 0, it's p raised to the power of y.

Â P raised to the 0, well that term equals 1, so that essentially goes away and

Â then, I'm left with a likelihood of (1- p) raised to the power of 1- 0.

Â So when I observe a 1, the likelihood is p, when I observe a 0,

Â the likelihood is 1- p.

Â Now again,

Â that's just mapping onto the two values that we had talked about earlier.

Â And then the product, saying, let's multiply that

Â function over all the data points that we observe.

Â All right, now how do we go about bringing covariance or

Â marketing activity into this?

Â Recall when we looked at linear regression,

Â what we said was the outcome's y following normal distribution with the mean mu.

Â All right, and we said mu was a function of marketing activity.

Â Well, what we're going to do here is say,

Â my outcome is a function of the parameter p.

Â Well, my probability p is going to be a function of marketing activity.

Â We're just going to change the form in which that marketing activity effects

Â the probability p.

Â All right, so we talked about this piece already, said outcomes follow

Â Bernoulli distributions, and we can write out the likelihood function.

Â When we bring in marketing activity, we're going to change that a little bit, and

Â say that the probabilities p, well,

Â they're going to be a function of the marketing activity.

Â All right, so we're going to look at an example for customer acquisition.

Â Well, marketing actions are going to affect the acquisition probability.

Â So the acquisition probability may be affected by, did I send you an email?

Â Did I send you a coupon?

Â 6:32

All right, so two different models that are commonly used.

Â One is the Logit Model,

Â and you can see here, this is the functional form that we're going to use.

Â So the probability, it's the exponential function where e raised to the power

Â of (X transpose beta) divided by 1 + e raised to the power of (X transpose beta).

Â One thing to keep in mind, we're talking about a probability,

Â p is always going to be a value between 0 and 1.

Â This X transpose beta term, well, that's actually our regression equation.

Â Our regression equation previously looked like we

Â had an intercept beta 0 + coefficient beta 1 x X1 + coefficient beta 2 x X2.

Â And however many coefficients we had, that's our regression term.

Â So every time you see that X transposed beta,

Â just plug in your regression equation, because that's all we're doing.

Â So think of this as rescaling your regression equation.

Â That regression equation can take on values negative and positive.

Â We've gotta somehow make that into a probability, bounded between 0 and 1.

Â So the exponential e raised to that power divided by 1 + e raised to that

Â power guarantees that it's going to be between 0 and 1.

Â That's the logit model.

Â Another model that we could use, it's referred to as the probit model,

Â where we plug our, excuse me.

Â We plug in the regression equation we have into the normal CDF.

Â 8:09

And that's going to give us our probability between 0 and 1.

Â For the most part,

Â you're going to get very similar predictions between these two approaches,

Â with the exception of when we get far out to the tails of the distribution.

Â Just to give you a sense, this is going to be consistent with economic theory,

Â random utility theory,

Â where you choose the option that provides you the highest utility.

Â So utility's going to be comprised of two components.

Â X transposed beta, that's our deterministic component.

Â That's the place where the marketing activity comes in.

Â And then the random component.

Â Well depending on what assumptions we make about the distribution that that random

Â component comes from, we're either going to end up with the logit model or

Â the probit model.

Â All right, so we have the logit model on one side,

Â we've got the probit model on the other,

Â just different ways of translating that utility into a probability.

Â