BookmarkSubscribeRSS Feed
novice_SAS_usr
Calcite | Level 5

I have a question about using Multiple Imputation in SAS. To understand the question some context is needed so I thank those who are willing to keep reading. 

 

The overarching goal of the study is to evaluate whether scores on a personality scale relate to participants' experience of positive emotions in daily life. Participants completed a background survey which included personality measures and then began the Experiencing Sampling Methodology (ESM) component of the study. Here, they were prompted to respond to a short survey 5 times/day for a period of 12 days. The surveys assessed their positive emotion among other items. To minimize the burden on participants, during the ESM portion we used a planned missingness design (e.g., Silvia, Kwapil, Walsh, & Myin-Germeys, 2014). At each ESM assessment, participants were presented with 2/3rds of the full emotion item pool (141 out of a total of 205 items), with the subset of items presented during an assessment chosen at random (and thus varying from assessment to assessment) (i.e., data is missing completely at random). 

 

For our analyses, we plan to create seven discrete positive emotion composites (e.g., happiness)  which will each include 2-3 emotion items (e.g., happy, joyful, glad). These emotion composites are the outcome variables in our analyses (e.g., personality scores predicting discrete positive emotion). Given the hierarchical nature of our data (surveys nested within days, days nested with in participants), to do this, we will run a series of multilevel models using PROC MIXED in SAS and have the syntax prepared for these analyses. 

 

Given our planned missingness design, however, not every item included in an emotion composite was presented at every ESM assessment. As such, we are considering using Multiple Imputation to handle this missing data and we are familiar with the steps in SAS which we outline here: Phase 1 - Imputation using Proc MI, which creates a series of n datasets (based on nimpute) with missing data imputed, Phase 2: Analysis in which we run the main analyses in each of the n datasets created in Phase 1 using the by _imputation_ command and, Phase 3: Pooling using Proc mianalyze which pools the results across all of the n datasets. 

 

The problem, as we see it, is that there needs to be an additional step between Phases 1 and 2. That is, after imputing the missing data in each of the n datasets we need to then create the emotion composites which are our outcome variables (e.g., creating the happiness composite from happy, joyful, glad ← missing data has now been imputed for all three of these variables). Only then can we move to Phase 2, analysis. So the main question here is whether we can create new variables in the n datasets created with Proc MI  and if so, how we would do that? If not, is there a good alternative method of data imputation in SAS that allows us to continue with our Proc mixed planned analyses (i.e., we know FIML is another option if we used Proc Calis/SEM in SAS). 

 

In working through this approach a few additional questions cropped up which we outline below…

  1. Is there a way to distinguish between planned missing and randomly missing values?
  2. As we understand it, all of the variables included in the final models need to be in the Var statement when we run Proc MI. Is there a way to make clear we only want to impute values for some variables (e.g., the emotion terms) rather than the full set of variables (e.g., we don’t want to impute responses in our personality measures). 
  3. Is multiple imputation appropriate for our hierarchical data (e.g., each participant has 50+ assessments - each a separate row in our dataset). 
  4. Some of our models include interaction terms and we’ve read that multiple imputation might not be appropriate in these circumstances. Thoughts?
2 REPLIES 2
ballardw
Super User

Is there a way to distinguish between planned missing and randomly missing values?

It depends to an extent on how you are creating your data. SAS actually has 28 "missing values" for numeric variables. The default . and 27 .A-.Z and ._  . So when a missing value is created you have the option of assigning one of the "special" missing values so that you know the cause later. You can assign formats to the missing values for the variables to display information like "skipped" "not asked" or what have you.

Without seeing example data sets I can't specify an approach on doing this. If you have some sort of "order" variable and you know at which "order" value specific variables were skipped then it would be possible to write something like as part of a data step.

 

if order=<some value> then do;

   var1   = .S;

   var12 = .S;

   var27 = .S;

end;

 

If the skipped values were not the same for each respondent at the same "order" time then a more complex approach of make a transaction data set with the desired values and use that to update the main data based on respondent Id and "order" might be appropriate.

 

 

 

 

 

novice_SAS_usr
Calcite | Level 5

This information is very helpful, thank you for taking the time to respond!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 494 views
  • 0 likes
  • 2 in conversation