BookmarkSubscribeRSS Feed
David_M
Obsidian | Level 7

I'm a new SAS user and I finally managed to impute a 110 mixed variable dataset using proc mi using the code below:

 

proc mi data=F0_Data NIMPUTE=50 out=F0_Imputed_Data seed=54321;
    
     class Q4_diseaseF0 Q5_MSD_SymptomsF0 Q6a1F0 Q6b1F0 Q6c1F0 Q6d1F0 Q6e1F0 Q6f1F0 Q7a1F0 Q7b1F0 Q7c1F0 Q7d1F0 Q7e1F0 Q7f1F0 Q7g1F0 ... ;

     var Q3_BMI_CategoryF0 Q4_diseaseF0 Q5_MSD_SymptomsF0 Q6a1F0 Q6b1F0 Q6c1F0 Q6d1F0 Q6e1F0 Q6f1F0 Q7a1F0 Q7b1F0 Q7c1F0 Q7d1F0 Q7e1F0 Q7f1F0 Q7g1F0 ... ; /* up to 110 variables */
/* Use FCS for mixed types */
/* Use linear regression for continuous variables */ fcs reg(Q3_BMI_CategoryF0 Q10_raw_sleep_scoreF0 Q11_13_sleep_deficitF0 Q16bF0 Q16cF0 Q19_CESDF0 Q20_WRFQF0 ...) /* Use logistic regression for binary variables*/ logistic(Q4_diseaseF0 Q7a1F0 Q7b1F0 Q7c1F0 Q7d1F0 Q7e1F0 Q7f1F0 Q7g1F0 ... /details LIKELIHOOD=AUGMENT link=logit descending) /* Use logistic regression for ordinal variables*/ logistic(Q5_MSD_SymptomsF0 Q6a1F0 Q6b1F0 Q6c1F0 Q6d1F0 Q6e1F0 ... /details LIKELIHOOD=AUGMENT link=logit descending) /* Use discriminat analysis for nominal variables*/ discrim(Q16aF0 Q16dF0 Q24_Job_title_codeF0 Q26_SHIFTCAT_F0 ... /CLASSEFFECTS=INCLUDE); run;

1) I need to pick the best /most optimum imputed dataset from the 50 generated using Proc MiAnalyze (and other procedures?), but I have no clue how to properly use it in my mixed variables case. BTW, as explained earlier, I have 110 mixed variables from over 400 survey respondents with no missing values at this point in time. Proc MiAnalyze SAS examples are not helpful.

 

2) Secondly, many of the imputed values for non-continuous variables are floating point, which doesn't make sense. How do I round up or down  while leaving the continuous values as is. It seems the "round=0.1" option for example, applies to all variables, which is not desirable.

 

Thanks,

David

 

2 REPLIES 2
Ksharp
Super User
1)
If you "need to pick the best /most optimum imputed dataset" ,there has nothing to do with PROC MIANALYZE.
PROC MIANALYZE is just combining these 50 imputed datasets into ONE model.
If you want to pick up the best imputed dataset, why not running your model(e.g. PROC MIXED) on these 50 datasets and found out the smallest AIC,BIC of model ?

2) For this scenario, you need to put this variable in CLASS statement to let PROC MI know it is a category variable not a continuous variable.
David_M
Obsidian | Level 7

Thank you for this important clarification, @Ksharp!  Now, coming to think of it, picking the best dataset that gives the smallest AIC, BIC, etc , is a form of selection bias or p-hacking that cherry-picks the results that best support my hypothesis. Doesn't choosing one best dataset ignore the uncertainty involved in the imputation process that is captured in the other 49 datasets. My concern is that this would lead to standard errors that are too small and confidence intervals that are too narrow, which could result in you incorrectly finding a statistically significant result (Type I error).

 

Would it be possible to perform a RMSE type operation on all 25 imputed values of a particular cell and use that value as the optimum value for the missing cell? For example, get all 25 imputed values for empty cell 5 for variable BMI_F0, compute their root mean square and replace the empty cell with the RMSE value. This would be done for all cells with missing values.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 144 views
  • 0 likes
  • 2 in conversation