A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

136 ratings

Johns Hopkins University

136 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 2C: Summarization and Measurement

This module consists of a single lecture set on time-to-event outcomes. Time-to-event data comes primarily from prospective cohort studies with subjects who haven to had the outcome of interest at their time of enrollment. These subjects are followed for a pre-established period of time until they either have there outcome, dropout during the active study period, or make it to the end of the study without having the outcome. The challenge with these data is that the time to the outcome is fully observed on some subjects, but not on those who do not have the outcome during their tenure in the study. Please see the posted learning objectives for each lecture set in this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this lecture set, we're going to see how to quantify time of an outcome difference between two samples. And the quantification will involve creating what's called an Incidence Rate Ratio, a ratio of the incidence rates of the events between the two groups. And, the interpretation is very similar to the relative risk we had for binary outcomes before. Unlike binary outcomes however, we won't go forth and create a difference in the incidence rates between two groups. We'll see in the next couple sections after this, that there's a richer way to actually get a baseline in understanding of what's going on at any given time across the entire follow up period using what are called Kaplan-Meier curves.

Okay now lets get to comparing time to event data between two or more samples numerically. It won't probably come as any surprise to you but what we're going to do is compare the incidence rates computed for some time event outcome on different samples and compare those incidence rates But what will do it in a ratio format.

So upon completion of this section, you will be able to estimate a numerical comparison of timed event outcomes between two populations using sample rate estimates.

Interpret the resulting estimate, this incidence rate ratio in words and a public health scientific context. And remind yourselves something will recur throughout the course, that sometimes ratios, and we talked about this in the last set of lectures, sometimes ratios are presented on the log scale.

So let's go back to our Mayo clinic data and actually get into the real data as posed to using the sort of hypothetical. Patient profiles that I used before just to illustrate the time to event context. As I noted before this was a randomized trial amongst patients with primary biliary cirrhosis, PD, PBC. And they were randomized to receive either D-Penicillamine, or placebo, and a research question of interest was How does mortality, and therefore survival, those are, mortality is the opposite or compliment of survival, right? Death or survival for PD, PBC patients randomized to receive the drug DPCA. How does that compare to the patients who got the placebo?

Now I'm going to give you some data that I computed. I had the dataset available to me. And I'll break it down into the subgroups or DPCA and placebo group.

Amongst those who were randomized over the ten year study period. To the DPCA group they contributed collectively 872.5 years of follow up, and this includes people who died during the study follow-up. And persons who either dropped out or made it to the end of the study without dying and hence were censored. So that's a total cumulative amount of time contributed to the study by everyone randomized to the drug group. And amongst those, there were 65 deaths.

let's take a look at the incidence rate. The incidence rate in the of death in the DPCA group is 65 deaths per 872.5 years of follow-up time. If we compute this, mathematically this can be expressed as 0.075 deaths per year of follow up time.

Let's compare and contrast this with the placebo group. The persons randomized with the placebo group, contributed collectively 842.5 years of follow up, or 60 deaths. So the incidence rate from this group. With 60 deaths per 842.5 years of follow up time, which can be expressed as 0.071 deaths per year. So immediately, you see that these estimated incidence rates are similar, but the estimate is larger slightly for the DPCA group.

Just to give you a little more perspective on this and I should have noted this before, there were a total of 312 persons enrolled in this trial. 312 people total of which, 158 are randomized to get DPCA. And 154 were randomized to get placebo. So you can see that amongst the people who originally enrolled there was, a high proportion that ultimately died.

So, one way to actually take these two One very common way to take these two incidence rates between the two groups, and mash them into a single number summary is to take the ratio of these two. Unlike proportions, we don't take a difference as often when it comes to incidents rates, we jump straight to the ratio. And this is the commonly used summary method.

It's still important to get information on the underlying rates that go in to this ratio, so that you can get some context for how risky the outcome is.

So, we have an Incidence Rate Ratio of 0.075 deaths per year in the DPCA group divided by the .071 deaths per year In the placebo group. This ratio is 1.06. So this highlights. This estimate highlights the fact that the rate was slightly higher in the drug group, interestingly enough. So how could we interpret this in words? We could say the risk of death in the DPCA group within the study follow-up period is 1.06 times the risk in the placebo group. Another way to express this is to say subjects in the drug group, the DPCA group, had 6% higher risk of death in the follow-up period, when compared to subjects in the placebo group. So this quantifies the degree of increase we saw amongst those who were randomized to the treatment group.

Let's look at another example. This is the antiretroviral therapy and partner to partner HIV transmission among discordant sexual couples that we talked about before. Let's just go on and read some of this, and then discuss it. They say there were 28 partner to partner linked transmissions.

Amongst the couples, [UNKNOWN] couples in the study. And only one occurred in the early therapy group. Which means that the other 27 occurred in the standard or delayed therapy group, the, the comparison group. And they report something that will become synan, synonymous, and we'll flesh this out in more detail in the second term Something relatively synonymous to a incidence rate, ratio, a hazard ratio. And so we'll just say this is a synonym for incidence rate ratio and we'll call it incidence rate ratio.

This ratio is 0.04. How did they get that? We, I don't have access to the data But I can go through and talk about what they did. So, again hazard ratio and incidence rate ratio for our purposes now are synonymous. Okay. So what they found, wha, how they compute this. There were 28 linked transmissions. And only one occurred in the early therapy group. That's what they said. So, essentially, I don't have access to the entire data set but they took the incidence rate of linked transmissions in the early, early therapy group. Whereas there was one linked transmission divided by the total follow-up time amongst the couples in the early therapy group and divided by the incidence rate estimate in the standard group. Push there were 27 link transmissions, divided by the total follow up time in the standard therapy group. I'm calling it standard therapy, it was called delayed in that abstract, but essentially, it was synonymous with what the standard of care was. And this ratio turned out to be 0.04, as they reported. So how do we interpret this? Well, we could say that HIV discordant at baseline couples, in which the HIV positive partner was given early anti-retroviral therapy had point O four times the risk of in couple transmission. When compared to couples in which the HIV positive partner was given standard therapy.

HIV discordant at baseline couples in which the HIV positive partner was given early antiretroviral therapy had 96% lower risk. This is another way to say that within couple transmission as compared to couples in which the HIV positive part was given standard therapy. So how do we get that? Again, well our incidence rate ratio is 0.04. You can think of that implicitly as being 0.04 to 1, for every 1 part risk they had in the standard therapy group. The early treatment group had 0.04 of that. So if we wanted to compute what kind of change or decrease this was. We'd, we could say, well, we'll take 0.04 minus 1, and divide it by the comparison of 1. So this is negative 0.96 or a 96% reduction. We could certainly report this either way, but the, expressing it is the percent reduction drives home the point that there was substantially lower risk of link transmission in the early treatment group.

Okay, let's go back to our maternal vitamin supplementation infant mortality, and here's the abstract we looked at before.

But again, what we were interested in looking at, and we were given a 2 3rds random sample of these data on the live births group. so 10,295 live births with six month follow up. Here are the incidence rates of infant mortality In the six month followups. So I'll just give you summaries based on the data that I was given. Amongst the infants who were born to mothers treated with vitamin A during pregnancy there were 578,590 days of follow-up, and a total of 236 deaths. So this turns out to be an incidence rate of 0.00041 deaths per day. We could play around with that to make it more user friendly, but let's keep it as it is for now, because we're ultimately going to compare it to the other groups. If we did the same thing for beta carotine. Here are the summary statistics in term of follow-up time and number of deaths. And the incidence rate in this group was 0.00039 deaths per day. And interestingly enough, if you do the math for the placebo group. The estimated incidence rate is essentially identical to that in the beta carotene

So we wanted to compute these incidence rate ratios. Well, we have three groups. We have three groups.

And what we can do, what we've talked about in other situations with comparing means or comparing proportions, is we make one of these the reference or comparison group which we compare the other two groups to. So I suggest, although you won't have to, I suggest just making the placebo the reference group. Since we're really interested in the potential efficacy of vitamin supplementation on infant mortality. So if we actually compare the incidence ratio for the vitamin A group to the placebo group, if you actually take those two numbers before and take the ratio of them, it's 1.05. In other words, we see 5% higher risk of death among those whose mothers got Vitamin A. You remember this is just a sample based estimate but it's interesting. If we compare the beta carotene group to the placebo group well, as we said before, the incidence rates were numerically identical so this incidence rate ratio is one, indicating equality in the estimated risk of mortality.

So, we could say about this 1.05, the estimated child mortality rate in the Vitamin A group is 5% greater than the estimated child mortality in the placebo group. And for the comparison of beta carotene to placebo we could say the estimated child mortality in the beta carotene is the same as the estimated child mortality rate in the placebo group.

Let's look at one more example to sort of drive home this idea of incidence rate ratios as a measure of, summary measure for the association of grouping with a timed event outcome. This is an article published in the Journal of Medical, American Medical Association called association of race and age with survival among patients undergoing diag dialysis.

And so the context for the study, I'll just read it in case it's hard to read here is, many studies have reported that black individuals undergoing dialysis survive longer. Then those are white. This observation is paradoxical, given racial disparities in access to, in quality of care and was, is inconsistent with the observed lower survival among black patients with chronic kidney disease. So they go on to say, we hypothesize that age in the competing risk of translate [INAUDIBLE] modifies survival differences by race. And their main outcome measure is death in black versus white patients who receive dialysis. And their comparison of interest is mortality in the follow-up periods that they have in these data for black patients to white. And then they go on to tell where they got this data here. This was an observational cohorts study.

The implied word there that they didn't put is perspective. It's a lot of data here, but this was captured in the United States renal data system between January 1st 1995 and September 28th, 2009. And, the median potential follow-up time was 6.7 years. That gives us a sense. And that can range anywhere from one day to almost 15 years. So there's variation how much follow up time each of the subjects in this data base, in this cohort they created from this database, contributed to the study.

So what they've done here in this graphic is they've shown into this rate ratio. They were actually interested, not only in comparing the mortality amongst black and white patients, but they didn't necessarily want to do that on the whole. They wanted to stratify that by age and see if the associate differed. If the association between mortality and race was different by age. So they were taking on a phenomena which we'll get on to in start reasoning two called interaction, or affect modification. That is, instead of taking the entire sample and comparing all black patients to all white patients and potentially adjusting for sweeping differences between those two racial groups. They first broke them into different age categories and then compared Black to white patients within the narrow age categories. And they did adjust these, and again we'll get into adjustment in the second term. But, ultimately what they have here in this graphic are the estimates. And, then these bands around them, on the graphics are called confidence intervals which we'll define shortly. But, the dots in the middle Are the actual estimates we're looking at, so amongst 18 to 30 year olds for example, amongst 18 to 30 year olds, the estimated relative incidence rate ratio for black patients on dialysis to white patients is nearly two. So higher risk for black patients in this group. When we get in the 31 to 40 year old group, the [UNKNOWN] rate ratio, which compares the observed rate of death in the follow up period for black patients 31 to 40 years old. To the incidence rate in the follow up period for white patients, 31 to 40 years old as close to 1.5. So it's still showing the black patients have higher risk. But you see what happens with age is that the older the group we're looking at, the lower the risk of mortality for black patients to white patients until changes direction. Amongst older persons, black patients have lower risks. So the incidence rate ratio is down dipped below one once we get to the 51 to 60 year old age category. So it's sort of an interesting, a very interesting find that the association between mortality and dialysis,

the association between mortality on dalysis patients and race. Depends on how old the patients are. So one size doesn't fit all for the race comparison. You actually have to look and ask, what age group are we looking at? But they use these incident rate ratios to quantify this. They call them [INAUDIBLE] ratios. For our purposes now, those are synonymous. And these have been adjusted for other characteristics that may be different among the black and white patients that contribute to mortality. And again, we'll get more into adjustment detail in the second term, but the interpretation of these things is as incident rate ratios. Notice they present it on a long scale, and what this does is that it's hard to see here where the numbers are relatively comparable and to, and not too variable in the positive effects. Positive associations, if you will, in ones above 1. But if you look carefully, here, you can see the scaling of the Y axis is not the traditional arithmetic scale. If you look at the distance between 0.75 and 1, for example, that's very close to Well that's actually equivalent, if you were to look at the distance between 1 and 1.33. It's hard to see here because the log of 1.33 is just the opposite of the log of 0.75. If we had even larger positive effects, this would be easier to see. But what this is doing on the, they've relabelled this with the actual ratio values but the scaling here is not arithmetic or traditional. You can see that the distance between certain things is not as it would be on a traditional linear scaling. So this is again just to put this seed in your head about the issue with log scaling an ratios.

So just so you think about, an I'll come back to this in the end of, unit review questions, but what could potentially happen here if follow-up time was ignored? We, we talked about this for any single summary measure, but what, how about with group comparisons? So instead of comparing the follow-up time via the incident rate ratios between the groups, what if they actually just took the proportions and compared them, the proportions having the outcome? I want you to think about the implications of that.

The IRR which we can estimate for any sample data, called IRR with a hat on it, can be used to quantify the differences in the timed event information from two samples. And you can really think of this as a relative risk measure like you saw before but, but that incorporates. It recognizes the two dimensions of our data. Both the binary outcome of interest and the differences in subject follow-up time in to the comparison. But the ratios have very similar interpretation as the relative risk comparisons we were looking at before.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.