Welcome to Data Science Methodology 101 From problem to approach Analytic Approach!

Selecting the right analytic approach depends on the question being asked.

The approach involves seeking clarification from the person who is asking the question,

so as to be able to pick the most appropriate path or approach.

In this video we'll see how the second stage of the data science methodology is applied.

Once the problem to be addressed is defined, the appropriate analytic approach for the

problem is selected in the context of the business requirements.

This is the second stage of the data science methodology.

Once a strong understanding of the question is established, the analytic approach can

be selected.

This means identifying what type of patterns will be needed to address the question most

effectively.

If the question is to determine probabilities of an action, then a predictive model might

be used.

If the question is to show relationships, a descriptive approach maybe be required.

This would be one that would look at clusters of similar activities based on events and

preferences.

Statistical analysis applies to problems that require counts.

For example if the question requires a yes/ no answer, then a classification approach

to predicting a response would be suitable.

Machine Learning is a field of study that gives computers the ability to learn without

being explicitly programmed.

Machine Learning can be used to identify relationships and trends in data that might otherwise not

be accessible or identified.

In the case where the question is to learn about human behaviour, then an appropriate

response would be to use Clustering Association approaches.

So now, let's look at the case study related to applying Analytic Approach.

For the case study, a decision tree classification model was used to identify the combination

of conditions leading to each patient's outcome.

In this approach, examining the variables in each of the nodes along each path to a

leaf, led to a respective threshold value.

This means the decision tree classifier provides both the predicted outcome, as well as the

likelihood of that outcome, based on the proportion at the dominant outcome, yes or no, in each

group.

From this information, the analysts can obtain the readmission risk, or the likelihood of

a yes for each patient. If the dominant outcome is yes, then the risk

is simply the proportion of yes patients in the leaf.

If it is no, then the risk is 1 minus the proportion of no patients in the leaf.

A decision tree classification model is easy for non-data scientists to understand and

apply, to score new patients for their risk of readmission.

Clinicians can readily see what conditions are causing a patient to be scored as high-risk

and multiple models can be built and applied at various points during hospital stay.

This gives a moving picture of the patient's risk and how it is evolving with the various

treatments being applied. For these reasons, the decision tree classification

approach was chosen for building the Congestive Heart Failure readmission model.

This ends the Analytic Approach section for this course.

Thanks for watching!