The most commonly used transformation is the natural log transformation,

which is often applied when much of the data cluster near zero relative to larger

values in the dataset and all observations are positive.

For example, we saw earlier that the distributions of income per person was

heavily right skewed.

But after applying a natural log transformation,

the data become much more symmetric.

Sometimes this type of data are much easier to model,

because they are much less skewed and outliers are usually less extreme.

Transformations can also be applied to one or both variables in a scatter plot to

make the relationship between the variables more linear.

And hence, easier to model with simple methods.

For example,

here we have a scatter plot of income per person versus life expectation.

The relationship is positive and curved.

If we apply a log transformation to the response variable and

then plot the relationship again, the relationship stays positive, but

becomes more linear, which makes it easier to model than the untransformed data.

Transformations other than the logarithm can be useful too.

Let's take a look at a new dataset.

Here, we have a scatter plot of a random sample of cars weight versus their city

mileage.

We can see that the two variables are inversely related, which is expected.

Cars that are bigger get fewer gallons to the mile, but

the relationship is not linear.

In addition to the log, we can also try a square root transformation where

we plot the square root of the weight versus miles per gallon or

the inverse transformation where we divide one by the weight of the car.

It's difficult to tell just looking at these plots which transformation works

better or if either of the transformation

actually yield something better than the original data.

Later in the course,

we'll get into a little more detail about how to make such a call.

But for now, it's important to just realize that transformations can be useful

even though they complicate the interpretations a bit.

After all, log of income or the square root of weight are not easy to evaluate.

While we can come up with some guidelines for which transformation is useful.

Instead of focusing on a list of rules, it's important to

understand why we might even want to apply transformation in the first place.

So let's review once again the common goals in transforming data.

We might want to see the data structure a little differently.