We are now at the point of our process improvement projects.

When we have data,

we have that in hand but now,

we want to do analysis.

We want a deeper understanding of the process through the numbers that are available.

So, the next few modules are going to be around some of the basic concepts.

We're not going to cover everything on analysis but some of the high points

that we typically have seen in quality of process improvement.

Now, one of the first questions you have to ask is,

what is the appropriate tool?

But that is dependent on what type of data it is.

Because whether is attribute,

discrete or categorical, versus variables or continuous data,

that puts you into position of which tools are available,

because not all tools can be applied to all of those data types.

So, in a future conversation,

we're going to get into more detail on what those are.

And it also depends on what tool do you use on the purpose of the analysis.

Do you just want to characterize the data?

Characterize your process?

Or do you then want to extend that thought into modeling and predicting

and trending what the behaviors and

the interactions and relationships of your factors are?

So, descriptive is just characterizing,

but inferential is what can you glean from that data

in order to take it forward into the future for trends and relationships.

So, let's start with the types of data.

As I said before, attributes/discrete/categorical versus continuous.

Now, attribute, it is just something that is counted.

It is not going to do a different calculation,

is not going to give you a better number by having a higher technology or

higher accuracy tool because you're just counting what the things are.

Your hands and fingers,

it's just counting things.

There isn't a finer description of those.

You are not going to have a better measurement system.

Now variables and continuous,

those are things that are measured either with

a stopwatch or a tape measure or something like that,

so that if you wanted to have more decimal points more fractions and there,

you would be able to if you had a better measurement system.

So, instead of having just individuals,

you're going to have that straight line with any point on that line you could look for.

So, better technology would give you more accuracy.

So, the attribute, that's just things like how many people and

how many observations or late cases anything like that or categories,

what state your patients came from.

Those are just very unique one-time characteristics.

But the variables, that's like height and weight and time

that those are continuous because if you had a better or more accurate system,

you could then add more decimal points in your number.

So, those are the two data types that you have to understand what you have going into

the analysis phase because now you're going to have

some limitations one way or another on what tools are available.

In fact, you're going to see that if you have attribute data,

you are probably going to have fewer opportunities.

There aren't as many tools and you are going to also need much more data for

attributes because all you have in your system is pass or fail,

your heads or tails on a coin.

You're going to need to have a lot more of those events or

situations in order to see what the trend is going forward.

As opposed to variables and continuous,

you can have a sample size of, say,

15-30 and be able to understand a lot more of what's going

on in the behaviors and interactions of your systems.

So, if at all possible,

try to find variables or

continuous data rather than the attribute because you're going to have

not only more tools but you are a going to have

a requirement for fewer data items in order to understand more quickly.

If you look at the side here,

you have some attributes which are, say,

parade paretos and bar charts that we will discuss later,

but there are some tools like scatter plots or

overall normal distributions that are going to be

much more powerful if you have the variables data.

So, going into the details of the description of your data,

means, medians are the best way to just describe

or characterize descriptive statistics of what your data is.

You can take the whole population and just in one calculation,

it is the mean,

the average of your data.

And the median of all the data items in your set,

what is the one that is in the middle?

What is the value that's dead center in your information?

So, that is just characterizing or describing your data.

Also, the range, highest to lowest.

Or if you say interquartile range.

So, what is like the 25th percentile to the 75 percentile?

What is the size of that middle range?

Or variances or standard deviations are ways to describe the spread of your data.

How broad are the data items they have from the highest to lowest?

Of all those four, actually,

standard deviation is quite often the best just because it

gives you more power in tool usage going forward.

Now, we've just gone through the descriptive statistics,

things like mean, median, mode.

That would just be the snapshot of the data and the process that you currently have.

But also, there is that category of inferential statistics.

Now taking that data and now being able to trend it,

look at relationships such as the scatter plot that we saw in the last slide,

or trends controlled charts over time, and also,

there's a group of hypothesis testing where you're

just testing the behavior of some of those relationships.

In the next two modules,

we're going to go through descriptive statistics and inferential.

So, join us for those.