[MUSIC]
Let's go to some conceptual preliminaries
that would help us navigate the rest of the session.
So preliminaries on math modeling and
these are some basic questions that would arise.
Question one on math modeling, what is a model?
What is a model?
So basically, the word is used quite often and
we have some weak sense of what it means.
Let me define it for our purposes.
A model is a set of relations between variables of interest.
There are two variable types, broadly speaking,
there are dependent variables and there are independent variables.
If you want to think about an example for a model, think about let's say sales
being a function of the four Ps in marketing or the CAPM model or
industry profitability being a function of the five forces and so on.
So, you have this bunch of variables and we are trying to in some sense estimate or
put together a relationship between them.
Question two, any examples of traditional models from your daily work?
True, we use a lot of methods and they would count as models too.
Think about a regression model, a logit model, factor and cluster analyses.
Think about ANOVA, the analysis of variance and so on.
Question three, why care about modeling in analytics?
And why should we care so much about it?
If you recall the original definition of analytics, there were two paths.
So, getting from real world questions to real world answers.
One was experimentation, the direct path and the second was through math modeling.
And thereby, analytics.
We care about modeling in analytics,
because it provides us two major things, explanation and prediction both.
What are the characteristics of a good model?
Well, yeah, these are ideal characteristics and
there are very few models.
Seldom are these characteristics met in full.
So for instance, a model should be simple, it should be small,
it should be generalizable, it should quick to setup and run.
It should have high explanatory power, high predictive power, even with small
samples and then what are the odds of finding a model that does all of these?
But that in some sense is the ideal and it helps to keep the ideal in mind,
which brings me to modeling typology.
Remember, we talked about these two variable types,
the dependent variable Y and the independent variables X.
So for instance, there are two types of well, let's say,
marketing research model types.
Your modeling of dependence relations on the one hand and
modeling of interdependent relations on the other.
Modeling of dependent relations we had seen in the last session.
These are models of the kind Y is some function of f(X) where Y is
a dependent set of variables and Xs are the independent variables,
and three components of a dependent relation model are these three.
The dependent variable Y, the functional form f and the independent variables X.
We are not going here today where we are going today is that,
modeling of interdependence relations.
Interdependent modeling.
Why are we going here?
Sometimes, there may be no dependent model clear card.
I mean, there may be no dependent variable that comes out.
In such a situation, interest would center on exploring interrelationships between
whatever variables we have and all of them would be in some sense X variable,
independent variables.
Sometimes we also say, we are actually looking for
the underlying structure of this variable set.
For example, take a look at that.
That is a bill, basically, a shopping bill.
And what you see there, a set of products that were bought and
these items are part of the same basket.
And because they're part of the same basket,
perhaps there exists an affinity between them perhaps.
If I have millions of shopping bills, I can actually compute probabilities for
these affinities.
What are the chances that product A and
product B are going to be part of the same shopping bill?
Even better, if I know in some sense a customer unique ID,
things become even more interesting.
Now the question is in some sense, what are basically the similarities?
So in some sense the question now would be which customers are similar right and
their basket composition and so on.
So interdependence modeling helps us answer questions such as one,
what are the similarities and difference among our customers in our products and
service lines?
Two, how similar are these guys?
How similar or different are they?
Three, why do we need primary data?
I mean, why?
Why are they different or similar?
So, you might need primary data for this.
So that's one of the possibilities.
Which finally brings me in some sense here?
In what follows we are going to focus on three approaches to exploring,
understanding and
working with interdependent modeling on the customer side of business data?
What are these three things?
One, factorizing data.
Two, clusterizing and I don't know if there is such a word, but
you get the point.
Clusterizing data and three, visualizing it.
In addition, we will apply these three approaches to
an unstructured form of customer side data which is text and
subsets of it would be opinions and sentiment.
So, we will actually see all of these in action in the rest of today.
[MUSIC]