A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 оценки

Johns Hopkins University

238 оценки

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Introduction and Module 1

This module, consisting of one lecture set, is intended to whet your appetite for the course, and examine the role of biostatistics in public health and medical research. Topics covered include study design types, data types, and data summarization.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings, and welcome to Statistical Reasoning One.

It's an honor and privilege to be with you, and

I'm very excited at talking about some very important and

exciting concepts that will guide your understanding of research in

the world of public health and medicine, over the next term.

So, in this lecture set one, I'd like to do a couple things.

First, in this section, section A, I want to

welcome you to the class and talk about,

some possible answers to the question of, why

do I need bio-statistics in my academic life?

In lecture one B, we'll talk about a very important concept that will

permeate the rest of what we do, the idea of samples versus populations.

In lecture set one C, I'll give an overview of study designs and some of the

issues of translating the statistical to the scientific, depending on the type

of study design.

And then finally, in lectures set one D, we'll enumerate some

of the data types we'll be looking at in this term.

So, you might be asking yourself, why do I need bio-statistics in my life?

Well, one answer, some may say, is you need it

because the world is in the midst of a data craze.

You may have heard reference to 2013 and a little

earlier being the era of BIG data.

Data coming from genome sequencing and, and gene expression data sets.

Medical informatics with the implementation of electronic Medial Record

Systems, Medical Imaging data coming from MRIs and CAT scans.

And of course the avalanche of data that's being collected every

minute on us as a population collectively, based on our internet usage.

Some may say data has never been more relevant than it is today.

I would argue that data has always been relevant, but it's

certainly getting a lot more attention in the popular media these days.

So for example, in 2009, Hal Varian, the

Google Chief Economist was quoted as saying, I keep

saying that the sexy job in the next

10 years will be statisticians and I'm not kidding.

More recently

in 2012, a headline from the

Harvard Business Review complain, proclaims, Data Scientist,

which is a pseudonym for statistician, is the sexiest job of the 21st century.

So here we've got two of the many instances

of the juxtaposition of the words sexy and statistics.

In 2009, the New York Times proclaimed in

the headline, For Today's Graduate, Just One Word: Statistics.

Well, certainly, research results in data are

utilized and summarized in the popular medium.

And I'll just give you a couple of examples

from the popular media to get us started, so.

Last year, in the Baltimore Sun, there was a

headline that said Elmo makes apples more appealing to kids.

And Elmo is a famous character in the children's program Sesame Street.

And according to the study results, kids took to, to nearly twice as many apples.

And I've highlighted that red, in red because that's a statistic based on data.

When they had Elmo stickers on them, as

to when they didn't, researchers from Cornell University

said in a letter in the August issue

of the Archives of Pediatrics and Adolescent Medi, Medicine.

Here's something that was in quote in the New York Times in August of 2012.

The Widespread Problem of Doctor Burnout.

Analysing questionnaires sent to more than 7,000 doctors, so there's information

about the study they did there in terms of the size.

Researchers found that almost half complained

of being emotionally exhausted, feeling detached

from their patients, and work, or

suffering from a low sense of accomplishment.

The researchers then compared the doctors' responses with

those of nearly 3,500 people working in other fields.

And found that even after counting for differences between

doctors and the other workers, in terms of things like

gender, age, number of hours worked, and amount of education,

the doctors were still more likely to suffer from burnout.

From The Washington Post, this was a very telling result, a very shocking result

from August 5, 2009 is that headline, pro,

proclaimed that DC was to offer STD tests in every high school.

Which was important for public health perspective.

Because the program conducted last year, eight

high schools found that 13% of about 3,000

students tested positive for an STD.

Mostly gonorrhea or chlamydia, according the the D.C. Department of Health.

So that really, that statistic there lays out the burden of STDs in the high school

population in Washington, DC and gives an estimate

of the prevalence, which is very high at 13%.

And so that result motivated D.C to expand their STD testing to every high school

in the city.

So data is very important when it comes to research.

Data provides information.

Good data can be analyzed and summarized to provide useful information, but results

from research are only as good as the data that comes from it.

Bad data can be analyzed and summarized

to provide incorrect, harmful or non-informative information.

But let's thing about possible steps for doing our research projects.

Here's the study design phase and data needs to be

collected, data needs to be analyzed, the results need to

be presented, and then they will be interpreted, by both

the researchers and the people who read the resulting presentation.

So bio-statistics as a science, can play a role

in each of these steps.

But unfortunately, sometimes it's only called upon for the data analysis part,

at which point there's no opportunity to correct for bad data coming in.

So let's just talk about some of biostatistic issues we'll encounter in

this class with regards to these parts of the study design process.

So with regards to the planning and design of studies,

we can talk about what are the primary question or

questions of interest involved in a research project.

Are we, is the researcher interested in

quantifying the information about a single group?

Or comparing multiple groups? What about the issue of sample size?

How many subjects will be needed total to do the study?

And if there's a group comparison going on, how many

in each of the groups to be compared will be needed?

How are the researcher groups going to select

their study participants?

Are they going to be able to randomly choose them from a master list?

Will they select them from a pool of interested persons?

Will they take whoever shows up to be part of the study?

And if group comparisons are of interest, how

are researchers going to assign people to the groups?

Do they have the ability to do it on their end, like by randomization.

Or do people self-select, based on their characteristics

to be in the groups, for example, comparing smokers to non-smokers.

That would be a, an example of self-selection.

Then there's the data collection process, and we won't spend much time on

that in this course, but we'll jump to the idea of data analysis.

Once the data is collected, how best to

summarize the information coming from the raw data.

We're also going to have to deal with variability,

both natural variability, some may call it biological or sociological variability.

And then also variability related to the fact that we

will be sampling a subset from some larger group of interest.

And important patterns in data can be obscured by variability, so one

of the goals with statistics is

to distinguish real patterns from chance variation.

And we'll drive towards this idea of

inference, which is using the information from

a single study, coupled with information about variability,

to make a statement about the larger population

or process, from which the data was collected.

When

it comes to presentation, we'll think about

what summary measures will best convey the

main messages in the data about the

primary and secondary research questions of interest.

And how will, do we convey and rectify, deal

with the uncertainty in estimates based on the data.

And finally, we'll have to contend with interpretation.

Whether the results mean, the statistics

we computed in terms of practical purposes,

in terms of the practice, the program, the population etcetera.

So our goals for statistical reasoning one will be as follows.

We're first going to spend a fair amount of time on how to

summarize data, whether it be from

a single sample or comparing multiple samples.

What measures can take a fair lot of information,

and whittle it down to key pieces of the story.

So we'll do this for single populations, and

then when it comes to measuring associations between populations.

Then we will get into the idea that, whenever we do a study we

are only able to observe some imperfect

subset of the processes or populations under study.

And we're going to have to contend with

the uncertainty in the estimates we get, that

come from the fact that we can't observe

everyone in the populations we wish to study.

So we'll talk

about the idea of confidence interval estimation and

doing statistical inference through a process called hypothesis testing.

And then we'll conclude on some sample

size considerations when design the study in advance.

In Statistical Reasoning 2, if you come around

for that part of the course, we'll expand

our toolbox to include the idea of adjustment,

which we'll talk about in this first lecture set.

How to assess something called effective modification, or statistical interaction.

Prediction of an outcome using potentially multiple inputs, through

methods such as linear, logistic, and time to event regression.

Throughout the course, throughout all of what we are going to, the

focus will be on interpreting the results of statistical procedures correctly.

Summarizing the results from published studies in an

understandable fashion, and assessing the strengths and the

weaknesses of public research results including, the study

design, clarity of the research question or questions.

The appropriateness of this statistical

methods, the clarity of the reported results, and

the appropriateness of the overall scientific/ substantive conclusions.

So, onward and upward as we go through the first term.

Let's get started by talking about samples and populations in the next section.

Coursera делает лучшее в мире образование доступным каждому, предлагая онлайн-курсы от ведущих университетов и организаций.