In other words, we're interested in whether the rate of nicotine dependence
differs according to which explanatory group the observations belong to, that is,
which smoking frequency group.
Notice that we are not interested in the column percentages for
those observations without nicotine dependence.
Indicated with a dummy code of 0.
Instead, we're interested in describing the presence of nicotine dependence within
the smoking frequency groups; that is, these column percentages circled in blue.
If I want to graph the percent of young adult smokers with nicotine dependence
within each smoking frequency category, I would first import the seaborn and
matplotlib.pyplot libraries and then add the following code.
First setting out explanatory variable to categorical and
a response variable to numeric.
And then requesting a bivariate bar chart.
With smoking frequency categories on the x-axis, and the mean for
nicotine dependence, which is the proportions of ones on the y-axis.
Now I can visualize the association, and see even more clearly that there seems
to be a positive linear relationship, that is the more days per month a young adult
smokes, the more likely they are to have nicotine dependence.
I know from looking at the significant P value,
that I will accept the alternate hypothesis.
That not all nicotine dependents rates are equal across smoking frequency categories.
If my explanatory variably had only two levels,
I could interpret the two corresponding column percentages and be able to say
which group had a significantly higher rate of nicotine dependents.
But my explanatory variable has six categories.
So I know that not all are equal.
But I don't know which are different and which are not.