0:00

In this video,

we explore comparing independent means from a Bayesian perspective.

We'lll take an estimation perspective using credible intervals to quantify how

large the difference is.

We'll illustrate this using the distracted eater study from the Statistics

course to compare the average snack consumption of distracted and

non-distracted eaters post lunch.

0:23

As a reminder, the study was called playing a computer game during

lunch affects fullness, memory for lunch, and later snack intake.

The researchers set out to evaluate the relationship between distraction,

recall a food consumed and snacking.

They had a sample of 44 volunteer patients that they randomized

into two equally sized groups.

One group played solitaire on the computer while eating and

was instructed to win as many games as possible and focus on the game.

0:54

The other group was asked to eat their lunch without any distractions,

focusing on what they were eating.

Both groups were provided the same amount of lunch and offered the same

amount of biscuits or for the Americans, cookies to snack on afterwards.

The researchers measured the snack consumption of subjects in each group.

The study reports average snack consumption levels for both groups,

as well as the standard deviations in sample sizes.

Suppose we want to estimate how much more or

less distracted eaters snack compared to non-distracted eaters?

We would use an estimate of difference with a credible interval.

1:37

We need to specify models for the snack consumption in the two groups.

I'll use A and B to denote the two groups respectively.

As with all inferential methods, there's some conditions that we need to meet.

First is independence both within and between the groups.

We will assume independent normal distributions where each group has their

own mean and their own variance.

Inference in the case where the variances are not assumed to be the same is known is

the Behrens-Fisher problem.

2:08

Next, we need to specify a prior distribution on all four unknown

parameters.

We will use the reference priors for the parameters in each group.

This is known as the independent Jeffrey's prior and

is a limit of congregant prior distributions.

Under the independent Jeffrey's prior, the marginal group distribution for

the mean of Group A has a student t distribution centered at the sample mean

and with scale given by the standard error of the estimate of the mean.

Likewise, the marginal posterior distribution for

the mean of Group B has a student t distribution with parameters, again,

taken from the frequented summaries because the data and

the parameters of the two groups, the means are independent, posteriorly.

The point estimate is the posterior mean of the difference, which is also

the difference of the posterior means with the independent Jeffrey's prior this

the difference between the two sample means are about 25 grams.

This amounts to about two cookies.

To provide some measure of uncertainty, we would report a credible interval for

the difference of the group means.

This requires the posterior distribution of the difference.

Unfortunately, there's no closed form expression for

the distribution of the difference of two student t distributions.

However, we can use simulation to draw samples from posterior

distribution using what is called Monte Carlos sampling.

From Monte Carlo samples, we simulate possible values of the parameters from

their posterior distributions.

In this case, first we generate a large number of values from

the student t distribution for the mean for Group A.

Second, we generate an equivalent from the student t distribution for

the mean of Group B.

4:03

For sample m, we formed a difference of the generated means.

And with our samples from the posterior distribution, we can now make inference

by calculating what it called Monte Carlo averages and using the frequentist

definition probability to calculate posterior probabilities.

4:25

The figure shows our Monte Carlo estimate of the posterior distribution using

a smooth version of a histogram of the sample differences of the means.

The blue area represents the highest posterior density, or HPD region,

where the probability that the differences in that region is equal to 0.95.

These are the most plausible values for the difference.

The 95% HPD interval is a 95% credible interval using our

Monte Carlo samples, this is 1.85 to 48.37 grams,

suggesting that being distracted does increase snack consumption later.

This estimate is based on 25,000 Monte Carlo samples but

if you try this on your own you may get a slightly different answer

if the number of Monte Carlo samples is different.

5:13

Let's recap everything we've done so far.

We started with a study where the researchers randomly assigned respondents

into distracted and non-distracted eating groups and

compared their snack intake post meal.

The sample statistics suggested that the distracted eaters consume more snacks on

average.

However, just because we observe a difference in the sample means doesn't

necessarily mean that there is something going on that is statistically significant

or practically significant in the actual population.

So, we use statistical inference tools to evaluate if this apparent

relationship between distracted eating and

snacking provides evidence of a real difference at the population level.

The credible intervals for the average difference was 1.85 to

48.37 grams using the Monte Carlo estimate.

Which corresponds to some crumbs to just more than four cookies.

Note that we have a randomized controlled trial here, so

if we do indeed find a significant result we could then talk about a causal

relationship between these two variables.

6:19

We use the independent Jeffrey's prior as a reference analysis.

This problem where we have two groups with unequal variances is known

as the Behrens–Fisher problem.

There other prior distributions that you may come across for

this famous problem such as matching priors.

Those require more advanced methods for simulating than what we will cover here.

6:39

The posterior distribution was computed assuming that the means were different

using credible intervals to quantify the magnitude of the difference.

However, if we are interested in testing that the means are the same,

and that playing solitaire has no affect on consumption,

then we need to assign positive probability to the means being equal.

In the next video, we'll explore testing this hypothesis using Bayes factors and

posterior probabilities.