Hello,
I make an multiple imputation from an uniform distribution. So, I create 5 dataset with each a different seed for the distribution uniforme to have different dataset. Then I concatenate the5 dataset in a single dataset "Mimpute".
I realize a mixted model on this dataset.
Then, I use PROC MIANALYZE to pooling results. But when I count how many I have p-value < 0.05, I obtain a number very small number, and this number does not match with the number for each imputation.
I make that for 1000 simulations, so the variable simul indicated the simulation number.
Here, my code :
proc mixed data=Mimpute method=reml ;
class prod (ref="Placebo") TV_VISIT_TMP(ref="V1") id_bis;
model changeVn_V0 = prod TV_VISIT_TMP prod*TV_VISIT_TMP / s ddfm=kenwardroger;
repeated TV_VISIT_TMP / subject=id_bis type=CS;
lsmeans prod*TV_VISIT_TMP / slice=TV_VISIT_TMP cl;
slice prod*TV_VISIT_TMP / sliceby(TV_VISIT_TMP="V1") cl pdiff=control("Placebo" "V1");
slice prod*TV_VISIT_TMP / sliceby(TV_VISIT_TMP="V2") cl pdiff=control("Placebo" "V2");
by simul _Imputation_;
ods output SolutionF=mixparms ;
ods output SliceDiffs=slicev1v2;
run;
proc mianalyze parms(classvar=full)=mixparms ;
class prod TV_VISIT_TMP;
modeleffects Intercept prod TV_VISIT_TMP prod*TV_VISIT_TMP ;
by simul;
ods output ParameterEstimates=param_simul;
run;
To count the number of global p-value of interaction <0.05, I do :
data param_simul;
set param_simul;
where Parm="prod*TV_VISIT_TMP" AND prod="Actif" AND TV_VISIT_TMP="V2" ;
run;
proc sql;
select count(simul)
from param_simul
where Probt < 0.05;
quit;
And I obtain 45/1000. But, in each imputation, I obtain numbers : 127/1000, 129/1000, 133/1000, 127/1000, 144/1000. For me, after PROC MIANALYZE, I should get a number arround 130/1000 because I combine the different imputations.
Could you help me please ? I think that it's the statement 'by' of proc mianalyze which is the problem....
Thanks;
Clemence
I would say that first you probably should be using more than 5 imputed data sets, especially if your fraction of missing information is high (see the table of Relative Efficiency here.
This may make the results closer to what you are expecting, but quite frankly, your expectations are not correct. If the combined estimates factored in only the within imputation variances, then it would be correct. But it also includes the between imputation variance, which naturally will raise the variance of the estimates. This is the price you pay for the uncertainty associated with imputation. Therefore, you should expect that the number of significant estimates will be less (and often significantly less so) than any of the individual imputations.
Thank you for your answer. I am going tried also with more imputations.
Clémence
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.