I want to use PMM(pattern mixed model) to impute missing data, so I used the MNAR modelobs to specify the observations used to build the imputation model, but i found if the observations in the dataset but are NOT specified as modelobs change, the imoutation model will change too, it confused me, from my understanding, the imputation model is derived from the observations i specified in the option 'modelobs=x', so it will not be impacted by the other observations, but it seems this is not the truth.
Here are some examples:
the only difference between below 2 codes is the input dataset, in code 2, more observations are included in the input dataset. bothe of the 2 codes use observations 'model_ob='M4'' to specify the observations used to derive the imputation model.
in the SAS output, the parameters that are estimated from the same observations used to build model are different, why ?
ps: i tried to procude the below pictures in English version SAS, but failed due to codeing issue, hope the CHINESE charactors doesn't affect your reading.
The reason for the differences has to do with the fact that all of the variables are standardized using all of the observations prior to fitting the imputation model. Adding observations changes the mean and variance and thus the standardized values. These values are then used in the imputation model, which is built only on one of the groups, which leads to slightly different estimates.
To see this more explicitly, look at the example below. Notice how the "obs-data" estimates change slightly because all the observations are standardized. This can be readily verified using Proc STANDARD and Proc REG.
data Mono1;
do Trt=0 to 1;
do j=1 to 5;
y0=10 + rannor(999);
y1= y0 + Trt + rannor(999);
if (ranuni(999) < 0.3) then y1=.;
output;
end; end;
do Trt=0 to 1;
do j=1 to 45;
y0=10 + rannor(999);
y1= y0 + Trt + rannor(999);
if (ranuni(999) < 0.3) then y1=.;
output;
end; end;
drop j;
run;
proc mi data=Mono1 seed=14823 nimpute=1 out=outex15;
class Trt;
monotone reg (/details);
mnar model( y1 / modelobs= (Trt='0'));
var y0 y1;
ods select MonoReg;
run;
proc standard data=mono1 mean=0 std=1 out=out1;
var y0 y1;
run;
proc reg data=out1;
where trt=0;
model y1=y0;
ods select ParameterEstimates;
run;
data add;
trt=1;
do j=1 to 45;
y0=10 + rannor(1);
y1= y0 + Trt + rannor(999);
if (ranuni(999) < 0.3) then y1=.;
output;
end;
drop j;
run;
data mono2;
set mono1 add;
run;
proc mi data=Mono2 seed=14823 nimpute=15 out=outex15;
class Trt;
monotone reg (/details);
mnar model( y1 / modelobs= (Trt='0'));
var y0 y1;
ods select MonoReg;
run;
proc standard data=mono2 mean=0 std=1 out=out2;
var y0 y1;
run;
proc reg data=out2;
where trt=0;
model y1=y0;
ods select ParameterEstimates;
run;
But you are filtering input data differently. The where clause selects different observations so each proc works on different data.
It's like you'd do:
proc reg data=sashelp.class(where=(sex in ('F', 'M') ));
model height = weight age;
run;
proc reg data=sashelp.class(where=(sex in ('F') ));
model height = weight age;
run;
Bart
yes, the input data are different, but the modelobs used to derive the imputation model are the same (both are modelobs='M4'), so the regression parameters (the numbers in red boxes in the lower pictures) derived from the same observations should be same too, right?
You right, running this simple example on Cars dataset shows it:
data cars;
set sashelp.cars;
if _N_ in (6 8 47 99 101) then invoice=.;
run;
proc mi data=cars(where=(origin in ('Asia', 'Europe', 'USA') )) out=imp1 seed=123 nimpute=1;
class origin;
var Weight invoice;
monotone reg (invoice = Weight / details);
mnar model(invoice / modelobs=(origin='Europe'));
run;
proc mi data=cars(where=(origin in ('USA', 'Europe') )) out=imp2 seed=123 nimpute=1;
class origin;
var Weight invoice;
monotone reg (invoice = Weight / details);
mnar model(invoice / modelobs=(origin='Europe'));
run;
proc mi data=cars(where=(origin in ('Europe') )) out=imp3 seed=123 nimpute=1;
class origin;
var Weight invoice;
monotone reg (invoice = Weight / details);
mnar model(invoice / modelobs=(origin='Europe'));
run;
but at the bottom of this page: https://documentation.sas.com/doc/en/statug/15.2/statug_mi_details61.htm
Under the MNAR assumption, the following steps are used to impute missing values for each imputed variable in each imputation (when you specify a MONOTONE statement) or in each iteration (when you specify an FCS statement):
1. For each imputed variable, a conditional model, such as a regression model for continuous variables, is fitted using either all applicable observations or a specified subset of observations.
2. A new model is simulated from the posterior predictive distribution of the fitted model.
3. Missing values of the variable are imputed based on the new model, and the imputed values for a specified subset of observations can be adjusted using specified shift and scale parameters.
It looks like after model from selected observations is fitted, another one is fitted.
That's my best guess.
Bart
The reason for the differences has to do with the fact that all of the variables are standardized using all of the observations prior to fitting the imputation model. Adding observations changes the mean and variance and thus the standardized values. These values are then used in the imputation model, which is built only on one of the groups, which leads to slightly different estimates.
To see this more explicitly, look at the example below. Notice how the "obs-data" estimates change slightly because all the observations are standardized. This can be readily verified using Proc STANDARD and Proc REG.
data Mono1;
do Trt=0 to 1;
do j=1 to 5;
y0=10 + rannor(999);
y1= y0 + Trt + rannor(999);
if (ranuni(999) < 0.3) then y1=.;
output;
end; end;
do Trt=0 to 1;
do j=1 to 45;
y0=10 + rannor(999);
y1= y0 + Trt + rannor(999);
if (ranuni(999) < 0.3) then y1=.;
output;
end; end;
drop j;
run;
proc mi data=Mono1 seed=14823 nimpute=1 out=outex15;
class Trt;
monotone reg (/details);
mnar model( y1 / modelobs= (Trt='0'));
var y0 y1;
ods select MonoReg;
run;
proc standard data=mono1 mean=0 std=1 out=out1;
var y0 y1;
run;
proc reg data=out1;
where trt=0;
model y1=y0;
ods select ParameterEstimates;
run;
data add;
trt=1;
do j=1 to 45;
y0=10 + rannor(1);
y1= y0 + Trt + rannor(999);
if (ranuni(999) < 0.3) then y1=.;
output;
end;
drop j;
run;
data mono2;
set mono1 add;
run;
proc mi data=Mono2 seed=14823 nimpute=15 out=outex15;
class Trt;
monotone reg (/details);
mnar model( y1 / modelobs= (Trt='0'));
var y0 y1;
ods select MonoReg;
run;
proc standard data=mono2 mean=0 std=1 out=out2;
var y0 y1;
run;
proc reg data=out2;
where trt=0;
model y1=y0;
ods select ParameterEstimates;
run;
Hi, could you please help to explain the meaning of numbers from "Imputation" column?
It is produced by your first proc mi.
Those are the parameters that are used to generate the first imputed data set. They are the result of Step 1, that are used in Step 2, as detailed in the documentation for the Montone Regression Method.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.