PROC MI: the listing order of the variables in var statement

cywong · Posted 09-24-2015 06:03 PM

Hello everyone,

I am new to multiple imputaton method and PROC MI procedure. I found that if the order of the variables change, the imputed values will change. In this case, how do you determine the order of the variables in var statement? Do I put the most important one first? Please find the following example.

(SAS 9.4; Window 7)

data missingdata;
input id sex $ age ind1 ind2 ind3 $ ind4 score;
datalines;
1 F 35 17 . 1 3 98
17 M 50 14 5 . 2 80
33 F 45 6 7 0 . 75
49 M 24 . 5 0 8 75
65 F 44 11 9 . 5 88
81 M 34 9 5 1 7 .
2 F 40 . 3 1 9 46
18 F 47 3 . 0 1 76
34 F 58 16 8 0 2 .
50 M 63 18 1 . 3 83
;

proc mi data = missingdata seed = 1 out = mi_data1 nimpute = 1 noprint;
class sex;
var sex ind4 score;
fcs reg(ind4);
fcs reg(score);
run;

proc mi data = missingdata seed = 1 out = mi_data2 nimpute = 1 noprint;
class sex;
var ind4 sex score;
fcs reg(ind4);
fcs reg(score);
run;

proc mi data = missingdata seed = 1 out = mi_data3 nimpute = 1 noprint;
class sex;
var ind4 sex score;
fcs reg(ind4 = sex);
fcs reg(score);
run;

Thanks,

CY

StatsGeek · Posted 10-06-2015 10:27 AM

The key feature that distinguishes FCS imputation from MCMC imputation is that FCS imputes variables one at a time. In PROC MI with the FCS statement, the variables are imputed sequentially in the order specified in the VAR statement. Because the underlying algorithm relies on random sampling, changing the order of the variables changes which part of the random number stream is used to draw each new parameter and imputed value. So, changing the order of the variables on the VAR statement should always result in some change to the imputed data.

As for the ordering of the variables, the order should not have much impact on your pooled parameter estimates if your number of iterations is large (van Buuren & Groothuis-Oudshoorn, 2011). Increasing the number of imputed data sets will further reduce the impact of the ordering of the variables on the results. However, the ordering of the variables will still effect the efficiency of the imputation process. It is generally recommended that you list the variables according to the number of missing observations for each (from most complete to least complete).

mahmood · Posted 07-08-2017 01:32 PM

Hello,

Thanks for this useful discussion. I have a related question: do I have to put the dependent (outcome) variable first and then all independent variables? I mean, my (mianalyze) model is y = b1*x1 + b2*x2....

How proc mi could understand wich one is the dependent varaiable? Or how FCS is able to identify which one is the dependent/ouctome variable?

Would anyone help please?

Rahid

hbd · Posted 04-10-2020 01:13 PM

Is there a good reference for the following answer you provided regarding the order of covariates, other than your answer?

PROC MI: the listing order of the variables in var statement

Re: PROC MI: the listing order of the variables in var statement

Re: PROC MI: the listing order of the variables in var statement

Re: PROC MI: the listing order of the variables in var statement