Hello everyone,
I am new to multiple imputaton method and PROC MI procedure. I found that if the order of the variables change, the imputed values will change. In this case, how do you determine the order of the variables in var statement? Do I put the most important one first? Please find the following example.
(SAS 9.4; Window 7)
data missingdata;
input id sex $ age ind1 ind2 ind3 $ ind4 score;
datalines;
1 F 35 17 . 1 3 98
17 M 50 14 5 . 2 80
33 F 45 6 7 0 . 75
49 M 24 . 5 0 8 75
65 F 44 11 9 . 5 88
81 M 34 9 5 1 7 .
2 F 40 . 3 1 9 46
18 F 47 3 . 0 1 76
34 F 58 16 8 0 2 .
50 M 63 18 1 . 3 83
;
proc mi data = missingdata seed = 1 out = mi_data1 nimpute = 1 noprint;
class sex;
var sex ind4 score;
fcs reg(ind4);
fcs reg(score);
run;
proc mi data = missingdata seed = 1 out = mi_data2 nimpute = 1 noprint;
class sex;
var ind4 sex score;
fcs reg(ind4);
fcs reg(score);
run;
proc mi data = missingdata seed = 1 out = mi_data3 nimpute = 1 noprint;
class sex;
var ind4 sex score;
fcs reg(ind4 = sex);
fcs reg(score);
run;
Thanks,
CY
The key feature that distinguishes FCS imputation from MCMC imputation is that FCS imputes variables one at a time. In PROC MI with the FCS statement, the variables are imputed sequentially in the order specified in the VAR statement. Because the underlying algorithm relies on random sampling, changing the order of the variables changes which part of the random number stream is used to draw each new parameter and imputed value. So, changing the order of the variables on the VAR statement should always result in some change to the imputed data.
As for the ordering of the variables, the order should not have much impact on your pooled parameter estimates if your number of iterations is large (van Buuren & Groothuis-Oudshoorn, 2011). Increasing the number of imputed data sets will further reduce the impact of the ordering of the variables on the results. However, the ordering of the variables will still effect the efficiency of the imputation process. It is generally recommended that you list the variables according to the number of missing observations for each (from most complete to least complete).
Hello,
Thanks for this useful discussion. I have a related question: do I have to put the dependent (outcome) variable first and then all independent variables? I mean, my (mianalyze) model is y = b1*x1 + b2*x2....
How proc mi could understand wich one is the dependent varaiable? Or how FCS is able to identify which one is the dependent/ouctome variable?
Would anyone help please?
Rahid
Is there a good reference for the following answer you provided regarding the order of covariates, other than your answer?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.