Now let's take a look at what summary gives us.
So there are a few things to note, it echoes the call to the function here.
The number of imputations by default is 5, but you can control it.
You could do more than 5, if you wanted to.
The number of missing cells or values for
each column in the data set is reported here, and
then it gives you in this row here the imputation methods that are used.
So age is not missing, so I don't need to impute for that.
BMI is continuous so the default is predictive mean matching.
Hypertension is categorical, the default is a logistic regression.
Serum cholesterol is also continuous, so
I get predicted mean matching as the default there, but you can control that.
if you've got a better idea of how to do it,
you can use one of the other methods that are available.
So the VisitSequence, as it shows here is that
I impute BMI first, hypertension second,
serum cholesterol third in the sequence of imputing.
Now the other piece of information here is a matrix
that tells us what covariates were used to impute each variable.
So what you see here is, this row of zeros means that age did not need to be imputed,
so nothing was used, no covariates there.
For BMI on the other hand, age and hypertension and
serum cholesterol were all used to form a model to impute for BMI.
So everything except itself was used to impute BMI.
Hypertension, we see a similar thing, age and BMI and
total cholesterol were used, and then for
total serum cholesterol, age, BMI and hypertension.
Now you can control that if you want.
If you know a better model involves just the subset of the variables,
you can specify that to the function.