0:00

[BLANK_AUDIO]

So far we talked about a variety of different,

more or less qualitative techniques to test questions.

And in this segment, we'll shift the

focus a little bit on more quantitative techniques.

And, you know, this shouldn't turn into a statistics class, even so we'll talk about

statistical models for question testing, latent class

analysis and a hint at structural equation modeling.

0:33

So, you know, we're now going to talk about these techniques in general

just to give you a sense, this is out there and if you were to dig deeper where to

look and in what kind of arena you'll find the research on the practical tools there.

0:48

We'll also talk a little bit about field tests,

and then in the final segment we'll talk about

other question testing methods that have come up more recently.

So latent class analysis is nothing specific for question

design, it is used in all kinds of fields,

1:05

but it is a technique that sort of allow us of for, you

know, mitigating the effect of measurement error

or estimating measurement error with particular items.

Brief background, you basically have a set of

questions that you observed, that you measure, multiple indicators.

And observables in latent variable modeling notation

are displayed in boxes here.

And they all are the function of a latent variable.

So let's say someone is pregnant or not pregnant, right, that

would be in the early weeks the unobservable construct that we try to measure.

And then you do tests of various sorts, you know,

you can ask the person, she may or may not know.

You can take a urine sample that these days is usually pretty good but maybe in

earlier days they weren't quite as good, so maybe you want to take a few of those.

Let's say you have flawed urine test and three of them?

Any error associated with that, you know, should be independent of each other.

So each sample, if they weren't bought in the same place and produced in

the same batch, should have a different likelihood to show up erroneous or not.

But, given that the person is pregnant or not pregnant, they in

principle should all lead to the same result. So that's the spirit here.

Meaning that you have, you know, in latent class analysis a setting

where you have indicators that don't necessarily have to be error free.

Any error associated with the error is assumed to be independent

condition on the latent variable that you actually tried to measure.

2:41

And so what you estimate are unconditional and

conditional probabilities, that's sort of happening in the process.

For question testing two condition probabilities are important, one is

false positive rates and the other one false negative rates.

Indicators with high error rates, they are usually assumed to be bad questions.

So if you think of this in a 2 x 2 table you can have an indicator u1 and u2

and your latent outcome c1 and c2.

Every indicator can be correct.

So, your indicator u1, the question, for example, for drug use in your survey,

says "yes". And in your latent class, the actual construct, this is a drug user,

it is also "yes". This would be a correct measurement and

likewise this one, "no" in both, would be a correct measurement.

What you don't want is a lot of values in

these off diagonals where your measurement device has a false

negative. So it would be "no" in the measurement, but

"yes" in the true underlying construct and likewise here, falls negative.

The unconditional probabilities is what we refer to as the

actual probabilities to be in one or the other latent class.

3:56

So, inside the cell it's the

conditional probabilities, down here, the unconditional probabilities.

In the study then we did the University of Maryland alumni, Ting Yan, Roger Tourangeau, and I had

fielded three different questions that all got grades, whether students ever had

a failing grade in their time at the University of Maryland.

We had administrative records to compare these answers to.

We didn't know who these responders were,

this is all, you know, stripped of identifiers,

but we were able to link the true score record to the respondent answers.

So, we can examine the error rate relative to the actual values.

And so, what you see here are the error rates for these three different questions,

relative to the truth, in this particular study.

So, you know, Q12 has a very low error rate and Q18b

very low error rate, and Q18a had a very high error rate.

So these are all false negative rates that you see going up here,

and these are all false positive rates. You see a dashed and a solid line.

Well, the solid line is the comparison to the actual records, just comparing the

two. The dashed line is the result of the of the LCA, the latent class analysis.

And you see that the technique

gave us the same pattern although it did not quite

give us the same point estimate for the false negative rates.

You can read more about this in the paper, that is provided to you.

5:24

There are some limitations here, you know, you need to have

a separate data collection, the sample size can't be too small.

For latent classes, two classes, you need at least

three items in order for the model to be identified.

If you don't have that, there

can assumptions be made to achieve identifiability.

A lot of work that Paul Biemer and colleagues have done, is

going through these assumption. He's using early work from Hue and Walter,

to have grouping variables that help with the identifiability.

5:55

However, you can get biased estimates of these

error rates when the assumptions are not met, and

that also is discussed in the paper that I

just mentioned and that you have in your course pack.

You can identify bad survey items this way, if

the model assumptions are held or you have enough indicators.

One problem, though, is that it doesn't help you

to know why a particular problem exists in the question.

So it doesn't suggest a fix, unless the fix

is to take out that question that you didn't like.

6:27

Another sort of latent variable modeling/

structure equation modeling technique that's out there

is at the core of the SQP software.

So this is a piece of software written by Daniel Oberski

and Willem Saris and developed in that group from Willem Saris.

The link is provided online for this unit's course pack.

6:51

What this SQP software does,

it has a collection of results from a

series of multi-trade multi-method experiments that were conducted on

multiple items across multiple countries, in order to

assess, generally, the validity and reliability of survey questions.

So the idea being, if you tried to measure the same thing,

the same underlying trait,

7:13

with different methods, then all these methods, all these results,

should correlate highly, if they're related to this trait. And

they should be, you know, correlated, these answers to this

question should be correlated much less so with a different trait.

But there is, of course, some method effect going on, the kind

of scale you use, or, you know, where you ask the question,

they might contribute to measurement error. So you would see

correlations across items of different traits measured with the same methods.

So you can separate out methods effect and others, you know, in generally

trying to estimate validity and reliability.

So this huge amount of work,

way to many, we could have a whole lecture on how this was done,

8:00

but a good tool to look up and we have the references in the pack.

When you use this piece of software, you code your question characteristics in

it, and it uses, you know, underlining regression models to

fit over all reliability and validity scores for you.

8:21

When you use, as I say, when you use the program, you enter your survey

question, you code your question, and then

you get these predicted validity and reliability scores.

8:32

So when you use SQP program, as I said, you enter your survey

question, you code your questions, and

then you get validity and reliability scores.

I strongly encourage you to try that out.

Look at the website for the link that gives you more information here.

8:48

Now after these two more quantitative techniques

I'll move to the last one that we had here, field test.

So field tests are sometimes referred to as conventional

pretest, sometimes referred to as dress rehearsal or pilot study.

Any of those, after the notion that they implement the questionnaire for a smaller

sample, could be, you know, 15 to

35 respondents similar to your actual respondent

or if it's in, you know, a real dress rehearsal

maybe even thousands, if you're rehearsing the next US Census.

The goal is to adopt a similar data collection protocol

than that you will actually use in fielding the survey

with the goal to find out practical problems. Is

there an issue with interviewers? Is there an issue with the respondent?

Does the length work? You want to time

on the question level, on the section level, for the full instrument to

see if you are in the scope, if you matched your production targets.

And definitely you want to look at the

action distribution on your key variables, tabulation, missing data.

This field test data is also great

resource to actually use the results and start,

you know, coding up your analysis in accordance

with your analysis plan that we talked about in unit one.

Because now you can double check, triple check for that matter,

"Do I really have all the variables in my data set?",

"Are there any issues?", "Should I have measured them on a different scale?", "Do

I need to make any changes in order to implement this in the right way?"

10:22

With the field test, you can also do, in addition, behavior coding, what

we talked before. You can build in

cognitive probes, we learned about those. You

can do some respondent debriefing, ask the

respondent afterwards, you know, "Any experiences you

want to share with us with respect to

this questionnaire?" You could debrief the interviewer.

And then you can start your statistic analysis, validity,

reliability, the latent class models, the structural equation models.

All that is possible, if your field test is large enough and

you actually have enough cases to do any of the more quantitative techniques.

The more of cases you have, the more costly of course this

will be, especially if you have these add- ons in your goal here.