Now let's test for moderation within the context of our final inferential test, the correlation coefficient. You might remember this scatter plot and correlation based on the Gapminder data between rate of urban dwellers in each country and the Internet use rate. We found that this was a significant association with a correlation of 0.61. But might this relationship, this correlation between urban rate and Internet use rate, differ based on countries with different income levels? To explore this question, we create a third variable that is categorical. For this new variable, the income per person variable, which is quantitative, will be categorized as high-income countries, middle-income countries, and low-income countries. The adjustments we make to our program are very similar to the adjustments we made to our ANOVA syntax and to our chi-square syntax when testing moderation. We'll start our program calling in our libraries and loading the Gapminder data. Next, we set our three variables of interest to numeric and set blank data on our third variable to NAN. Then I create a new data frame. I am calling data_clean that drops all missing. That is NAN values for each of the variables in the dataset. Now I create my income group variable, which splits the sample of countries into low, middle, and high-income groups using the dummy codes one, two, and three. Next, I create three different data frames that include only one income group, each. Here called sub1 for low-income countries, sub2 for middle-income countries, and sub3 for high-income countries. Then we request a Pearson correlation measuring the association between urban rate and Internet use rate, as well as its associated p-value for each of our new data frames. We use the Pearson r function from the SciPy stats library and include our variables, urban rate and Internet use rate. When we examine the correlation coefficients between urban rate and Internet use rate for each of the income groups, we find the following. For the low-income group, the correlation between urban rate and Internet use rate is 0.11 and the p-value is not significant. For the middle-income countries, the association between Internet use rate and urban rate is 0.32 with a significant p-value of 0.001. Finally, among high-income countries, the correlation coefficient is 0.089 with a large p-value suggesting that the association between urban rate and Internet use rate is not significant for high-income countries. When we map these findings onto the associated scatter plots for each income group, we are better able to visualize the significant and non-significant relationships. Estimating a line of best fit within each scatter plot shows the positive association between urban rate and Internet use rate among the middle-income countries and almost no relationship between these variables in both the low-income and high-income countries. Asking questions about statistical interactions can be an interesting way to explore your data and your associations of interest. This is not difficult to do using the skills you've acquired thus far. There are more advanced topics that we can cover here, such as multivariate techniques that can be very powerful. But even without these techniques, we can still use bivariate inferential tools of ANOVA, chi-square, and correlation to describe our sample, make inferences about the larger population, and really begin to understand what relationships these associations hold under what conditions or at what levels of our third variable. Now that we've found associations, can we assume that association implies causation? We'll answer that question soon.