A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

43 ratings

Johns Hopkins University

43 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 4: Additional Topics in Regression

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll give two more examples of propensity score methods and propensity score adjustment. Focusing on two articles from the scientific literature.

So hopefully this will reinforce the concept of propensity scores and propensity score adjustment with these examples.

So the first example we're going to look at was an article from the journal the American medical association in 2010. Pneumococcal vaccination and risk of acute myocardial infarction in men. So they give it from the abstract here. They say multiple studies have shown that preventing influenza by vaccine reduces the risk of vascular events. However, the effect of the pneumo, pneumococcal polysaccharride vaccine on vascular events remains controversial. So, their objective is to examine the association between this vaccination and the risk of acute MI and stroke as well among men. And, they used a prospective cohort study of Kaiser Permanente Northern and Southern Californian health plans. With ov, over 84,000 participants between 45 and 69 years from the California Men's Health Study. Who were recruited between January 2002-December 2003, and they were followed up until December 31st, 2007. And the cohort was similar to the population of health plan members and men who responded to a general health survey in California, on important demographic and clinical characteristics. Demographic and detailed lifestyle characteristics were collected from surveys. And the vaccination records were obtained from the Kaiser Immunization Tracking System. So their main outcome measure here was the incidence of acute myocardial infarction and stroke during the followup period in men. WHo had no history of such conditions. And they go on to detail the results, the unadjusted incident rates. And then my amongst those who received the vaccine, and then those who didn't. And then they say here. With the propensity score adjustment, we found no evidence for an association between pneumococal vaccination and reduced risk of acute MI. And their adjusted hazard ratio was 1.09, so slightly higher estimated risk. But the 95% confidence interval went from .98 to 1.21, so the result was not statistically significant. And they also do the same thing for the outcome of stroke. And the adjust hazard ratio of stroke comparing those who got the vaccine to those who did not was 1.14 with a confidence interval of 1 to 1.31. So in both situations they found an increased risk. Amongst those who got the vaccine, but it was not statistically significant. They go on to say the inverse association was not, also not found of men of different age and risk groups. In other words, there was no apparent effect modification. And, so they go on to conclude, among a cohort of men aged 45 years or older. Recipi, receipt of the pnumococcal vaccine was not associated with subsequent reduced risk of acute miocardio infarction and stroke. So let's look at how theyd did this. Let's first look at their methods section. And just for some, we'll read, I'll read this to you because we'll hear some things we've covered over the entire course before we get into the propensity scores part. But they say the association between vaccination status and patient characteristics was assessed either using the chi-squared test for categorical factors or the Keruskal-Wallis for continuous factors. The Kruskal-Wallis test is a, a slightly different version of the tee test teranova but it's testing the same idea. That the centers of the distributions being compared are the same. The association between vaccination and myocardial infarction or stroke was assessed using the Cox proportional hazards regression model,

both in bivariate and multivariate models. The latter, to adjust for the propensity score of vaccination with a pneumococcal vaccine estimated from a logistic regression model. The number of pneumococcal vaccines received was included in the models as a time varied covariant which means they allowed this. Measured to change the information unless they had multiple measures for each person over the followup period. The number of vaccines received before the study entry was included as the initial number of vaccines, and the each vaccination during the followup period resulted in the increment in the number vaccines administered. Followup started at the time of the base line survey.

And ended at the time of acute MI or stroke diagnosis, so if they had the event termination of health plan or membership. So they were censored before having the even, death, or December 31, 2007, the end of the study, whichever came first. Mainly reported having a previous acute MI or stroke at baseline were excluded from the analysis. So let's talk about, let's read how they did the propensity scores and then we'll get into the results. They say the propensity score was created using a logistic regression model predicting the probability of receiving at least one dose of pneumococcal vaccine during the study period. Quintiles the propensity score were used to adjust for the likelihood of vaccination in the models of MI and stroke outcomes. So what they did was estimated the probability of being vaccinated given a bunch of patient characteristics and then adjusted for these.

Adjusted with this, split it into quintiles, five categories, in the Cox model relating myocardial infarction to vaccination status. So they go on to give a whole list of the variables they used to estimate the propensity scores. And these include race, ethnicity, region, household income, etc.

So they had a lot of potential confounders here that they rolled into a single propensity score by doing that logistic aggression. And then they in turn used the propensity score to adjust for these distributions of these confounders that may differ between the vaccinated and non-vaccinated group in the Cox regression. Looking at the ri, log hazard of myocardial infractio, farction or stroke.

So then they go on, I just want to note this because they're actually getting into the idea of effect modification here. To examine whether the association between pneumococcal vaccination, acute MI, and stroke would vary among participants of different ages, risk groups. And number of influenza vaccination records, adjusted hazard ratios and 95% confidence intervals estimated by Cox proportional hazards regressions were presented separately for various subgroups. And we'll show that this means they had to re, replicate their analyses that they had done on everyone. Slightly differently when they were scoring effect modification. So when they were to look and see whether there was effect modification by age, for example, they had to fit a Cox model that included predictors of, vaccination status, call it X1.

Between, vaccination status and age and they actually ultimately dichotomized age for this analysis. So age, or x two, was a binary above or below a threshold. And then they had to adjust for propensity score quintiles. But they would have had to re-run for this analysis the propensity scores without using age as one of the factors they use to estimate the propensity score.

Similarly when they did it for the three presumable high risk groups. Separately for current smokers, patients with history of diabetes, and pasties, patients with history of hypertension, and one low risk group. They had four different groups. And they had to exclude things like smoking and history of diabetes in the propensity score computation to do this analysis. Since the risk groups that they need to include is predictors on their own, so they could do the interaction, could be included in the model.

So here is what they got, here are the results they got. I am only showing you for the myocardial outcomes, myocardial infarction outcomes, but they had another table for stroke. They give the hazard ratio 95% confidence interval unadjusted, and then adjusted with the propensity scores. So what they've actually found is that unadjusted those men who actual received the vaccine had a higher risk of acute myocardial infarction and it was statistically significant. After adjustment, there was a slightly elevated risk among men in the sample but it was not statistically significant. As evidenced by this confidence interval containing one and the p value for the association of 0.13. They went on to show separate estimates of this association by age. So here they dichotomized ages. I was referring to for men less than 65. And greater than 65 and they found an elevated risk of myocardial infarction for those men who were vaccinated in the younger group, but no association in the older group. So there's some effect modification perhaps. Those who were the effect actually presented itself, the effect was harmful in terms of the myocardial infarction outcome.

And, they go on to do this again for the risk groups that we talked about, and then the doses of influenza vaccine they had to see if there was any evidence of effect modification. And, they didn't find any subgroups with these different investigations where the vaccine was protective against the outcome of myocardial infarction.

So, just to reiterate their results and what they said in their dis, conclusions was that, with adjustment for propensity scores. We found no evidence for association between pneumococcal vaccination and reduced risk of myocardial infarction. And this in addition part, they just go on to say. That in addition there was no evidence of effect modification such that the vaccine was protective for some subgroups.

Here's another example where propensity scores are used for adjustment. This is from the British medit, Medical Journal, a study on mortality and anti-psychotic drug use, or prescription in the elderly,. So the abstract here says the TM objective is to assess the risks of mortality associated with use of individual anti psychotic drugs in elderly residents in nursing homes. And this is actually done even despite the fact it was published in the British Medical Journal, it was done in the U.S.. It was a population based cohort study with link data from Medicaid, Medicare, something called the minimum data set in the national death index, and a national assessment of nursing home quality. And the setting was nursing homes in the United States. And they had 75,000 plus participants. Who were new users of anti psychotic drugs, and previously used them and they had, six different drugs haloperidol, et cetera. And they all participants were age greater than 65, greater than or equal to 65 were eligible for Medicaid and lived in a nursing home. And, what they used, their mean outcome measure, they used Cox proportional hazard models to compare the 180 day risks of all-cause. And cause-specific mortality by individual drugs after the start of the drug use, with propensity score adjustment to control for potential confounders.

And they highlighted the drugs where they found differences. And the results they say compared with risperidone users of haloperidol had an increased risk of mortality, hazard ratio 2.07 95% confidence interval. Not 1.89 to 2.26, and users of quetiapine a decreased risk compared to risperidone and has a ratio of 0.81 with a significant confidence interval. They go on to say the effects were strongest after the start of the treatment and that implies to me that they, they didn't observe, or somehow estimated, or allowed for non-proportional hazards. But there was no mention of that in the article. So, maybe I'm misinterpreting that. But, the associations remain after adjustment for dose, and were seen for all causes of death. And they go on then in their conclusions to say, though these findings cannot prove causality. And we cannot rule out the possibility of residual confounding, confounders that they didn't control for with their propensity score approach. They provide more evidence of the risk of using these drugs in older patients, reinforcing that the concept that they should not be used in the absence of, in clear need.

The data suggests that the risk of mortality with these drugs is generally increased with higher doses and seems to be highest for haloperidol, and least for quetiapine.

So what they had to do here is they had 6 different antipsychotic drugs, all comparing them to risperidone. So they had to estimate separate propensity scores for comparing or estimating the probability of using each of these drugs versus risperidone, that reference drug. So they talk about though the set up here, they say we compare distributions of sociodemic graphic, clinical and use characteristics among participants who started taking. Different antipsychotics and calculated mortality rates during follow-up. We censored, this is important, follow-up time at the time of discontinuation of treatment, augmentation, or switch to a different drug. And admission to the hospital for ten days or more as treatment status is unknown during in-patient days.

So they go on to say, we fitted proportional hazards models for pairwise comparisons against risperidone, unadjusted. Then adjusted for age, sex and calendar years. And then adjusted for multiple variables. So they did separate Cox regressions. Where they, their exposure or their predictor variable was the drug they were looking at, was it wrong? And if it was the drug they were looking at versus the common reference of risperdone.

And what they had to do when they did the propensity scores for each of these is that they had to estimate them separately.

For the probability of being on each of these drugs compared to risperidone. In multi varied analysis we use propensity score adjustment to balance potential confounders. Propensity scores were derived from predicted probabilities of the started treatment estimated. In logistic models that contain all covariance above. So I, there's a previous section that listen the entire list of these. Cox models were stratified across tenths of the propensity score. This means they created ten categories or no categories for the propensity scores. The first decile, the second decile, and adjusted for that, using the categorical indicators.

In addition we plotted multivariable adjusted Kaplan-Meier curves for survival as a function of the duration of the use of the index antipsychotic drug using what's called the inverse probability treatment weight. That's another way of saying they estimated adjusted survival curves from the Cox regressions. It went on saying something co, in confirmatory analysis we hit we hit high dimensional propensity scores. So, they did one more layer of adjustment with another set of propensity score that included more confounders. Just to see if, by adjusting further, the results were robust and similar to what they saw with their original adjustment.

So here's what we got. The hazard ratios for death in elderly people in nursing homes within 180 days of the start of the treatment with various antipsychotic drugs. And what they show is the ha, hazard ratios comparing the relative hazard of mortality across the follow-up period. For each of these five treatments, all each compared to risperidone. So for haloperidol the unadjusted association was a hazard ratio of 2.42 and it was statistically significant. After they adjusted for age, sex and calendar year, the association was about the same. When they adjusted for the propensity scores it was. There was still over a two fold increase in the hazard. And it was statistically significant but the estimate and confidence intervals shifted down slightly. And when they did this, high dimensional propensity scores, something where they've added even more covariants in estimating the propensity score. They got very similar results, a slightly attenuated estimate in confidence interval. But again, and even in this best case scenario, an estimated 81% increase in mortality.

With a confidence interval 1.65 to 1.98 for haloperidol compared to ziprasidone. And then they showed the results for the other drugs as well.

And after adjustment, the associations that were significant at the original adjustment with the original propensity scores were the difference between Haliperadol and Risperadone. And the difference between Queitiapine and Risperadol. The other results were not statistically significant.

And as a sensitivity analysis to adding even more potential confounders, they showed that this significance held.

And similar magnitude and direction of association with the additional adjustment. So, anyway, hopefully this is giving you some sense of how propensity scores are used to adjust with a large number of covariance when there's one. Predictor outcome interest of association. These authors were not concerned with the relative impact of these other things on mortality. And didn't want to water down the precision of these estimates between competing antipsychotic drugs over the followup period. By including multiple Xs for each of the potential confounders, so they roll them all into one propensity score and adjust it with that.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.