Multiple Imputation by SPSS

Remark: In this article, you cannot find the detail procedure to do multiple imputation. For more details, please read this article

Recently when I read some articles about missing value analysis, most of them said multiple imputation is the better way to deal with the missing value. Then I decided to change my mind and take a look at what that is.

In multiple imputation, there are two terms very important, MCAR and MAR. Missing completely at random (MCAR) means the missing values does not depend on other values. While Missing at Random (MAR) means the pattern of missing data is related to the observed data only. When deletion is better than multiple imputation? My answer is if there are few missing values, usually less than 5%, and the missing value does not depend on other values (MCAR), then deletion is relatively “safe”.

There is a test called Little’s MCAR test to determine if the missing value is MCAR or MAR. The null hypothesis is the missing data is MCAR. Then Multiple Imputation procedure provides multiple versions of dataset (5 versions by default), each containing its own set of imputed datasets. When doing statistical analysis in SPSS, the results for all of the imputed dataset are pooled, which are more accurate than deletion and only one imputation.

There are several methods for estimating missing values. They are listwise, pairwise, regression, and EM method. The first three, listwise, pairwise and regression method require the missing data are MCAR. In this condition, they can give consistent and unbiased estimates of the correlations and covariances. However, EM method only requires the missing data is MAR, so it could be used when MCAR is violated. When the missing data is neither MCAR nor MAR, which is uncommon, none of these methods is appropriate.

After creating several “complete” datasets, “Analytic procedures that work with multiple imputation datasets produce output for each complete dataset, plus pooled output that estimates what the results would have been if the original dataset had no missing values.” Fortunately, most of the frequently used techniques support pooling, such as descriptive, t-test, ANOVA, Linear model and so on.

In summary, the steps for doing Multiple Imputation by SPSS are:

  1. Run descriptive statistics to describe the pattern of missing data.
  2. Run Little’s MCAR Test to confirm the conclusion we drew from the descriptive statistics.
  3. When missing value is MCAR, deletion is relatively safe if there are less than 5% missing values, or use listwise, pairwise or regression methods. When missing value is MAR, use EM method for estimate.
  4. After multiple imputation, use desired statistical techniques that work with the created “complete” dataset, for each dataset as well as the pooled outputs.

Reference:

  1. PASW Missing Values 18

About Lincoln

Welcome to Haolai(Lincoln)'s Website! I am now a doctoral student in Statistics at Western Michigan University, USA. Have fun here!
This entry was posted in Statistics and tagged , , . Bookmark the permalink.

5 Responses to Multiple Imputation by SPSS

  1. Emma Sterrett says:

    Hi,

    I really appreciate your succint and clear description of the steps for using multiple imputation in SPSS. Quick question-have you ever had
    difficulties obtaining the pooled results for multivariate regressions (using the GLM command)?

  2. Lincoln says:

    Hi Emma, Thank you for your comments. I never met that kind of questions before. But as far as I know, many methods in SPSS supports pooled results. Good Luck!

  3. John says:

    Hi Lincoln. Thanks for the description, it is really useful! This may seem like a stupid question, but if after Little’s your data is MCAR, is it still possible to use multiple imputation if you know the variables in the dataset are sufficiently related that imputation could produce good estimates? We are trying to preserve our number of subjects due to missing data, and thought MI would be better than deletion or the more primitive missing data methods. Thanks for any ideas.

  4. George says:

    Hi Lincoln,

    Thank you for your assessment of the MI approach. I am actually using it right now for my research and so your comments were encouraging. I have two questions, if you can help. First, according to Little’s MCAR test that I conducted on my missing data the chi-square was NOT significant at .05. I am I correct in concluding that that the pattern in my data is MCAR? Second, SPSS (18) automatically selected the Fully Conditional Specification MI method as appropriate for my data. Would you agree that for MCAR data, the Fully Conditional Specification method is appropriate? Thank you so much in advance!

  5. Robby Ratan says:

    Thanks for this description! I’m trying to figure this stuff out so I can get my dissertation done!

    I’m confused about 1 thing. My data are MCAR, so I see that I can use any of the methods. But in SPSS (I’m using 19.0), it appears that the Missing Value Analysis (which allows for specification of listwise, pairwise, EM, etc.) is separate from Multiple Imputation, which uses MCMC. Does that mean that listwise, pairwise, etc. are single imputation methods?

    If so, then I should just use Multiple Imputation, right? The SPSS manual implies that this is more accurate. But I can’t tell if/why I am supposed to do BOTH the listwise/pairwise/EM and the multiple imputation methods.

    Any advice?

    Thanks,
    Robby

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>