BookmarkSubscribeRSS Feed
cywong
Calcite | Level 5

Hello everyone,

 

I am new to multiple imputaton method and PROC MI procedure. I found that if the order of the variables change, the imputed values will change. In this case, how do you determine the order of the variables in var statement? Do I put the most important one first? Please find the following example.

 

(SAS 9.4; Window 7)

 

data missingdata;
input id sex $ age ind1 ind2 ind3 $ ind4 score;
datalines;
1 F 35 17 . 1 3 98
17 M 50 14 5 . 2 80
33 F 45 6 7 0 . 75
49 M 24 . 5 0 8 75
65 F 44 11 9 . 5 88
81 M 34 9 5 1 7 .
2 F 40 . 3 1 9 46
18 F 47 3 . 0 1 76
34 F 58 16 8 0 2 .
50 M 63 18 1 . 3 83
;

 

proc mi data = missingdata seed = 1 out = mi_data1 nimpute = 1 noprint;
class sex;
var sex ind4 score;
fcs reg(ind4);
fcs reg(score);
run;

proc mi data = missingdata seed = 1 out = mi_data2 nimpute = 1 noprint;
class sex;
var ind4 sex score;
fcs reg(ind4);
fcs reg(score);
run;

 

proc mi data = missingdata seed = 1 out = mi_data3 nimpute = 1 noprint;
class sex;
var ind4 sex score;
fcs reg(ind4 = sex);
fcs reg(score);
run;

 

Thanks,

CY

 

3 REPLIES 3
StatsGeek
SAS Employee

The key feature that distinguishes FCS imputation from MCMC imputation is that FCS imputes variables one at a time. In PROC MI with the FCS statement, the variables are imputed sequentially in the order specified in the VAR statement. Because the underlying algorithm relies on random sampling, changing the order of the variables changes which part of the random number stream is used to draw each new parameter and imputed value. So, changing the order of the variables on the VAR statement should always result in some change to the imputed data.

 

As for the ordering of the variables, the order should not have much impact on your pooled parameter estimates if your number of iterations is large (van Buuren & Groothuis-Oudshoorn, 2011). Increasing the number of imputed data sets will further reduce the impact of the ordering of the variables on the results. However, the ordering of the variables will still effect the efficiency of the imputation process. It is generally recommended that you list the variables according to the number of missing observations for each (from most complete to least complete).

mahmood
Calcite | Level 5

Hello,

 

Thanks for this useful discussion. I have a related question: do I have to put the dependent (outcome) variable first and then all independent variables? I mean, my (mianalyze) model is y = b1*x1 + b2*x2....

How proc mi could understand wich one is the dependent varaiable? Or how FCS is able to identify which one is the dependent/ouctome variable?

 

Would anyone help please?

 

Rahid

hbd
Calcite | Level 5 hbd
Calcite | Level 5

Is there a good reference for the following answer you provided regarding the order of covariates, other than your answer?

 

hbd_0-1586538785255.png

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 3128 views
  • 1 like
  • 4 in conversation