0:05
All right, so the first step that we're going to be looking at,
deciding how many factors do we need to include in our analysis?
There are a couple of different criteria that can be used.
One criteria is to say, we want to capture, we want to retain at least
a given percentage of the original variation in the service.
So we might say, okay, I want to retain at least 50% of the variation in the survey.
Another criterion that we could use is to say,
well let's include as many factors as are necessary such that
each factor that we include is doing its fair share of explaining variation.
Well mathematically, what this maps on to is saying that all of the eigenvalues in
the analysis have to be greater that 1.
Or saying that the amount of variation, a given factor explains has to
be greater than 1 over j where j is the number of survey items that we have.
So if I have 20 survey items, we're going to include as many factors as necessary
until a survey item falls below the 5% threshold or the 1 over 20 threshold.
1:21
Another way that we could do this is to look at what's
referred to as a scree plot.
What a scree plot does is essentially plot out.
We can look at it in terms of the eigenvalues or
as the percentage of variation.
And we look for point where there's a plateau, or
there's a kink in the curve and it kind of flattens out.
So that's more of a visual way of accessing this.
So these are different criteria that we can use.
It's ultimately an analyst choice which want we're going to be using.
But one of the more common ones is to say this, 1 over the number
of the surveyor items or eigenvalue greater than 1, right?
So the software package I'm going to show you is XL Stat.
It's an add on for Microsoft Excel,
adds multivariate statistics capabilities to the platform.
It does allow for a 30 day free trial.
I believe there are discounted rates available for
students who want to purchase the license for I believe a one year term.
And so we'll move over to the platform in a second to show you that.
But what you'll see on the screen when we begin to do the factor analysis is it's
going to ask where is the data contained?
So in Excel, we highlight that region.
We're going to leave it as the default principal components method for
identifying those underlying factors.
And then on the next screen,
we can specify under the Option screen, do we want a rotation to be conducted?
And if so, how many factors do you want to include?
All right, so let's move over to the Excel document itself.
And I've zoomed in so that we can see what's going on.
And I'll do the same for us as we're looking at the raw data.
3:24
Now I've relabeled the header row.
So rather than looking at Q1, Q2, Q3 and so forth,
just to look at a summary of what that question contains.
So I've renamed it based on the questions themselves.
So the column B corresponds to the responses to that question of am I in good
physical condition, do I wear fashionable clothing,
am I on my cloths more stylish than most of my friends?
Do I like to take gambles, I'm not concerned with the Ozone,
the government's too involved and so forth.
All right, so this is our raw data, each row corresponds to a different respondent,
each column corresponds to a different survey item.
Now you'll see, after you install XLSTAT,
you'll see a tab built for that.
And what we're interested in is going to be analyzing data, if we click on that,
you'll see the drop down menu of what we're going to be using.
Through out this course, we're going to be using factor analysis right now.
We'll move on to looking at key means in agglomeraty of cluster as methods of
conducting market segmentation.
We'll also look at using multidimensional scaling to construct perceptual maps.
But you can also use XLSTAT for regression ANOVA analysis.
That's built in to Excel, but techniques, such as logistic regression,
which we've looked at in this specialization.
Those are techniques that are not built into the standard
Excel data analysis package.
So this add-on really does expand the capabilities of what you can do
within the Microsoft Excel environment.
5:38
All right, so that's B1 through AE401.
We're going to indicate the structure of our data, it's in observations.
All right, in a table, and we're going to leave principal components as is.
Under Options,
we're going to automatically determine the number of factors that are necessary.
And I'm not going to turn on the rotate right now.
We'll come back to look at what the rotated results look like in a little bit.
And the reason for
that is what I want to do is I want to see how many factors we actually need.
Then once I know many factors we need,
we can tell it how many factors to include in that rotation, rght?
Missing data we don't have that problem here, but
XLSTAT allows you to determine how you want to handle missing data.
The output, you can decide what comes as part of your output.
The important one that you want to make sure is checked off no matter
what package you're using, is the factor scores.
And that's what's going to allow us to conduct the subsequent analysis of
effectively replacing the raw survey responses
with the summarized results of the factor analysis.
6:48
And so you'll see a summary of what your selection is.
And we'll click on Continue, and we're going to just click through
the first couple of charts, has to do with the display of information.
If you're using XLSTAT, you'll get this popup window,
asking you to add it as a trusted source.
And again, XLSTAT is one tool that's out there,
there's a free package called Real Statistics that's a nice package.
The limitation there with factor analysis,
it doesn't allow you to save those factor scores.
Those of you who are teaching yourselves statistical languages such as R.
Factor analysis is built into R,
it's built into environments such as Matlab, Jump, SAS.
So you can conduct this really using whatever
software you're most comfortable with, right.
And that's all there is to conducting the analysis, so
let's just take a look at the output.
We have a summary of the range of
each of the survey items mean and standard deviation.
Notice, we get this lovely correlation matrix, and
then we can try to eyeball it.
We can try to look for patterns here ourselves, but that's going to get
difficult, especially since it doesn't all fit on one screen, right?
We're going to move down, in terms of looking at the output.
We do see what the eigenvalues are, and
notice that this analysis has been run out to 18 factors.
And you'll see that the eigenvalues continue to decline, that's by design.
The first factor is going to have the largest eigenvalue,
the second factor will have the second largest, and so forth.
And that's directly related to the variation that's going to be explained,
and that continues to decline with smaller eingenvalues.
So that's the variation being explained by each incremental factor.
And then the row below that giving is going to give us
the commutative amount of variation that's explained.
And what we're looking at here, notice that when we get up to 9 factors,
we're capturing almost 72% of the variation in the original survey.
So we've gone from about 30 questions down to about a third of those questions.
And we still have more than 70% of the information contained in the survey.
We could keep on adding more and more factors to capture more and
more information.
But notice that we see very little gained in terms of the amount of
information being explained as we add more factors.
That's mimicked in the screen plot that we see.
Notice that early on the red line giving us that accumulative variation that we're
capturing, does a pretty good job, and then it plateaus.
And so that plateau, or if we were to invert this, it would look like an elbow.
That's what we're looking for as a means of deciding, when do we want to stop?
So it looks like in this case, we're going to stop after the 9 factors.
And that's what's been done automatically for us.