Here, we have data samples in a two dimensional space that is defined

by the x axis and the y axis.

You can see that most of the variation in the data lies along the red diagonal line.

This means that the dat samples are best differentiated along this dimension

because they're spread out, not clumped together along this dimension.

This dimension indicated by the red line is the first principle component

labelled as PC1 in the part.

It captures the large amount of variance along a single dimension in data.

PC1, indicated by the red line does not correspond to either axis.

The next principle component is determined by looking in the direction that is

orthogonal, in other words perpendicular, to the first principle component which

captures the next largest amount of variance in the data.

This is the second principal component PC2 and

it's indicated by the green line in the plot.

This process can be repeated to find as many principal components as desired.

Note that the principal components do not align with either the x-axis or

the y-axis.

And that they are orthogonal, in other words, perpendicular to each other.

This is what PCA does.

It finds the underlined dimensions, the principal

components that capture as much of the variation in the data as possible.

These principal components form a new coordinates system to transform

the data to, instead of the conventional dimensions like X, Y, and Z.

So how does PCA help with dimensionality reduction?

Let's look again in this plot with the first principle component.

Since the first principle component captures most of the variations in

the data, the original data sample can be mapped to this dimension indicated by

the red line with minimum loss of information.

In this case then, we map a two-dimensional dataset to

a one-dimensional space while keeping as much information as possible.

Here are some main points about principal components analysis.

PCA finds a new coordinate system for your data,

such that the first coordinate defined by the first principal

component Captures the greatest variance in your data.

The second coordinate defined by the second principal component captures

the second greatest variance in a data, etc..

The first few principle components that capture most of the variance in a data

can be used to define a lower-dimensional space for your data.

PCA can be a very useful technique for dimensionality reduction,

especially when working with high-dimensional data.

While PCA is a useful technique for reducing the dimensionality of your

data which can help with the downstream analysis,

it can also make the resulting analysis models more difficult to interpret.

The original features in your data set have specific meanings such as income,

age and occupation.

By mapping the data to a new coordinate system defined by principal components,

the dimensions in your transformed data no longer have natural meanings.

This should be kept in mind when using PCA for dimensionality reduction.