Multiple imputation

seltonsy · Posted 12-05-2016 02:46 PM

Hi everyone,

I have a question regarding mcmc imputation. From what I understand, the 1st step should be the imputation phase with proc mi data= ........

then the 2nd should be the analysis phase: e.g proc glm.......

and the 3rd is the pooling phase with proc mianalyze ......

My Question: I used mcmc in the 1st step and would like to include more than one variable in the 2nd step since I have more than one variable to impute:

The problem is that only the last model statement is used: ( WARNING: Only the last MODEL statement is used.)

If I can only model one variable at a time, how can I combine those together in the end.??

Here is the code:

proc mi data=cohort1 nimpute=20 out=cohort_MCMC_1
seed=2017
round= 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
min= 14 1 0 0 0 0 0 0 0 0 0 0 30 1 0 0 0 0 0 0 0 0 0 0 0 1200 1200 1200 1200
max= 45 20 10 20 1 1 1 1 1 1 1 1 42 9 1 1 1 1 1 10 10 10 10 10 10 5500 5000 5000 5000 ;
mcmc impute=monotone ;
var x1 x2 x3 x4 -------- x20 ;
run ;

/**************************************************************************************/
proc glm /* or mixed */ data=cohort_MCMC_1 ;
class x1 x2 x3 ------x10;
model x1= x1------x5;
model x2= x2-----x7;
model x3= x6-------x15 ;
model x4= x4------x20;
by _imputation_;
ods output ParameterEstimates=gm_mcmc;
run;

SteveDenham · Posted 12-09-2016 10:23 AM

GLM can handle multiple dependent variables, but it really doesn't like having a variable on both sides of the equals sign.

So, it appears that your dependent variables are to be modeled with different independent variables. To me, that means setting up separate streams for each dependent variable, up to k dependent variables. The process would look like:

PROC MI (across all data)

PROC GLM-dependent variable 1

PROC GLM-dependent variable 2

.

PROC GLM-dependent variable k

PROC MIANALYZE-dependent variable 1

PROC MIANALYZE-dependent variable 2

.

PROC MIANALYZE-dependent variable k

Now if the indpendent variables are identical across all k dependent variables, I think this could be collapsed using multiple dependent variable syntax.

If I am missing something on this, then let me know.

Steve Denham

seltonsy · Posted 12-13-2016 02:44 PM

Thanks Steve,

My main concern (and apologies if I don't understand the missing data handling very well in SAS) is that I have 5 different variables with different levels of missingness that I want to impute together. I guess you want me to proceed with several GLM models which will yield different sets. I'm worried that I won't be able to combine those at the end with proc mianalyze.

Please correct me if I'm wrong.

Thanks

SteveDenham · Posted 12-22-2016 12:47 PM

Not quite. I visualize three or four rounds of imputation for all missing independent variables, simultaneously. Then, for each dependent variable/independent variable combination, you run PROC GLM with a by imputation statement. You can then combine these analyses using MIANALYZE, one for each dependent/independent variable combination.

Now, if (and only if) you have the same independent variables for ALL of the dependent variables, you would run one GLM in multivariate mode, with a by imputation statement, followed by one MIANALYZE to combine these analyses.

For instance, suppose you had y1, y2 and y3 as dependent variables, and x1, x2, x3, and x4 as independent variables. You would need to run PROC MI to impute any missing values in x1, x2, x3 and x4--say three to five imputations. Then suppose you wanted to fit y1 from x1 and x2, y2 from x3 and x4, and y3 from x1, x3 and x4. Your GLM statements would look something like:

proc glm data=imputeddata;

by _imputation_:

class x1 x2;

model y1=x1 x2/inverse;

ods output ParameterEstimates=glmparms_y1 InvXPX=glmxpxi_y1;

quit;

proc mianalyze parms=glmparms_y1 xpxi=glmxpxi_y1 edf=(you insert the correct degrees of freedom based on number of imputed values in x1 and x2);

modeleffects intercept x1 x2;

run;

You would then repeat this for y2, using x3 and x4, and for y3 using x1, x3, and x4.

Now suppose you use x1, x2, x3, and x4 for y1, y2, and y3. You might try:

proc glm data=imputeddata;

by _imputation_:

class x1 x2 x3 x4;

model y1 y2 y3=x1 x2 x3 x4/inverse;

ods output ParameterEstimates=glmparms_y1 InvXPX=glmxpxi_y1;

quit;

proc mianalyze parms=glmparms_y1 xpxi=glmxpxi_y1 edf=(you insert the correct degrees of freedom based on number of imputed values in all of the xi's);

modeleffects intercept x1 x2 x3 x4;

run;

This, I assume would give univariate analyses for y1, y2 and y3.

Accommodating any multivariate relationship among y1, y2 and y3, using the MANOVA statement in GLM may be possible, but I would say "Get in touch with Tech Support" for something like that.

Steve Denham

Multiple imputation

Re: Multiple imputation

Re: Multiple imputation

Re: Multiple imputation

Multiple imputation

Re: Multiple imputation

Re: Multiple imputation

Re: Multiple imputation

Registration is open