Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So [NOISE] it's not just enough to have a testing procedure. we'd also like to have some sort of confidence interval. So, let's let pi j hat be the sample proportions. And imagine if we want to estimate d equal to the difference in the marginal proportions. So in this case this would be the difference in the marginal probability of an approve vote. so so then, this is equal to n 1 2 minus n 2 1 over n. So that estimates the difference in the marginal proportions.

so we talked in the previous slide about the variance of this estimator, about the variance of this estimator, under the null hypothesis. Let's talk about the variance of the estimator in general, and the variance works out to be this format. This form, pi 1 plus 1 minus pi 1 plus plus pi plus 1 1 minus pi plus 1. So that's the you know, divided by n, that would be the kind of difference in binomial type variance that you would expect to see. And because the the, the samples are correlated. We have this correlation term, minus twice pi 1 1 pi 2 2 minus pi 1 2 pi 2 1. Okay? And so that's subtracting out the correlation here. And

what would happen you know If, if basically there's a lot of counts in these off-diagonal cells, pi 1 2 and pi 2 1, right? Then pi 1 2 and pi 2 1, pi 1 2 times pi 2 1 would be a big number. We have minus twice that big number, which would result in a larger variance.

then pi 1 2 at times pi 2 2 would be very large, and we'd have minus twice that number. And we'd wind up with a much smaller variance, than the standard kind of difference in binomials variance. Okay?

so we could take d minus the true difference in proportions divided by the standard error estimate here. And that follows an asymptotic normal distribution.

and we can use that again to create confidence intervals. I think, I hope everyone at this point in the class, could do something like that.

So this last bullet point here, I say compare sigma d to what we would use if the proportions were independent. So compare the result to if, instead of asking the same people on two occasions whether or not they approve. What if we asked different set of people each time? Then this minus twice part would go away.

people who approve on the first occasion, would be more likely to approve on the second occasion. You might think if you are in the U.S, if you're, If you're a democrat, you might, you know, approve of, say, President Obama. On, on a first question, you'd be more likely to approve on the second question, on the second time point. And the same thing with the people who disapprove. If you're a republican, and you disapproved on the first time point, you, you'd be more likely to disapprove the second time point. So and that follows, you know, that's a very frequent form of correlation.

In other words, things will tend to lie on the main diagonal of that 2 by 2 table, of the matched 2 by 2 table in that people will tend to agree. And so if that's the case, this covariance term here will be positive, so we'll have minus twice this positive number. And, and you'll, you'll get a dramatic reduction in the variance. So in other words failing to account for the fact that the same people were asked twice. In, in this case would be a a really kind of dumb thing to do. Because you have a reduction, reduction, you'd have a reduction in precision, you get a much wider confidence interval if you, if you fail to do that. So it gets, it's interesting in general. But even if it, even if it resulted in a, in a wider interval to account for the dependency. You'd still want to do it, because that will give you the correct interval rather than one that's based on completely incorrect assumptions.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.