Welcome to module eight. One fallacy that is becoming increasingly prevalent recently,

is that more data will automatically produce better results.

One thing the previous lesson should have taught you however,

is that the only thing more data will definitely do is to

enable more spurious correlations to be identified.

Another problem that can arise when analyzing different amounts of data,

is how to best visualize distributions of data.

Either in one or two dimensions.

this module you will learn an important skill known as density estimation.

The basic concept of density estimation is that the data you are

analyzing is simply a sample of an underlying population.

Density estimation attempts to reconstruct the distribution

of the underlying population from the sample.

By first using kernels or predefined functions to

represent data points and to then sum these kernels together to generate a smooth,

functional representation of the data.

In fact this last step,

is one of the most common uses of density estimation which is known as smoothing.

Since a smooth representation of the data distribution is generated.

The smooth version can often be easier to interpret

or to compare and contrast than a potentially jagged histogram.

The density estimate can be combined with a box plot to also

produce the underlying data distribution on the sides of the box.

This is known as a violin plot.

This technique can be extended to two dimensions,

where the scatter plot can be converted to a contour plot.

This technique is very effective when dealing with

large quantities of data in two dimensions since

the smoothed version is visually easier to interpret than

a continuous smudge from a large number of overlapping points.

If you ever looked at a topological map, such as when hiking,

you already have an idea of how useful this approach can be since the changes in

elevation are easy to comprehend without seeing the heights of every hill or valley.

Another good example is the track behind me.

If you imagine the positions of the runners during a race.

That might be hard to understand where they're going and where they start.

But with a contour you can understand where they spend most of their time on the track.

However smoothing alone undersells the true power of a density estimation.

Imagine a situation where it can be challenging to acquire new data.

The process of performing

density estimation actually constructs a model representation of

the underlying data distribution which can then itself be sampled to generate new data.

I know this might seem a bit confusing or even circular as

we use the data to make a model that can then make more data.

But this technique of creating a generative model from data is powerful.

You will see this demonstrated in the current module where we first

build a density estimator from images of handwritten numerals,

zero through nine before generating new images from the density estimator.

I always find non-traditional applications of data analytics extremely interesting.

And creating a generative model from image data certainly is non-traditional.

But the principle is applicable in many areas.

With the lessons in this module,

I expect that you too will learn to appreciate the power of this technique. Good luck.