Hello All:
Please, I am trying to simulate a GLM model with two classication variables and an interaction term between them. That is,
y=b1*s+b2*t+b3*t*s +e; s and t have 2 levels. I see an example in Wilkin book Simulation with SAS, but for some reasons when I tried obtaining the estimate after simulation I obtain a parameter estimate that is way off. Number of observation in each level of s are equal.
I appreciate your thought in advance.
J
Is there anything else that you need? If not, please mark the problem as answered so that future readers know that the issue has been resolved.
The ANOVA and GLM sections presents main-effects models. Look at the section "Linear Models with Interaction and Polynomial Effects," which has an example of a 3x2 GLM model with interaction terms.
It can be tricky (impossible?) to get the parameter estimates for some models to agreee with the simulation parameters. The GLM parameterization can be non-intuitive because it "moves around" the coefficient weights. The last level of each main effects and several levels of the interaction effect are set to zero by the GLM parameterization. Consequently, the intercept term and other parameter estimates can be different than the specified values, even though the simulation is correct. This is the reason that it says (p. 215) "because the design matrix is singular, the parameter estimates found by PROC GLM might not be the same as the parameter values that were used to construct the data."
When you use the SOLUTION option on the MODEL statement, GLM reports the note
Note: | The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. |
That phrase "not uniquely estimable" means "the estimates reported by GLM might not agree with the parameters specified in your simulation."
Model is:
y=alpha+ beta1*drug+beta2*disease+beta3*drug*disease+error.
alpha=0; beta1=0; beta2=3; beta3=1.5.
data int;
y=0;
do drug=1 to 2;
do disease=1 to 2;
do subject=1 to 5;
output;
end;
end;
end;
run;
proc print data=int;
run;
/*Design Matrix*/
proc logistic data=int
outdesignonly outdesign=designref(drop=y);
class drug disease/param=reference;
model y=drug|disease;
run;
proc print data=designref;
run;
proc iml;
call randseed(1);
use designref;
read all var _NUM_ into X;
close design;
beta={0, 0, 3, 1.5};
eps=j(nrow(X),1);
call randgen(eps,"Normal");
y=X*beta+eps;
create Y var{y};append;close Y;
data d;
merge y int(drop=y);
run;
proc print data=d;
run;
proc glm data=d;
class drug disease;
model y=drug|disease/solution p;
run;
Thanks
Thanks for your helps. Above is the SAS code that I am using. I am okay with any parameteriztion that will enable the parameter estimate to be close values used for simulation.
You are using a reference parameterization to generate the data, so you need to use the same reference parameterization if you want the parameter estimates to be close to the parameters. You can use PROC GENMOD as follows:
proc genmod data=d;
class drug disease /param=ref;
model y=drug|disease;
run;
Of course, with only 20 data points, you should not expect a four-parameter model to give estimates that are very close to the parameters, but you will see that the 95% Wald CIs include the parameter values for the data simulated from this random number seed.
Is there anything else that you need? If not, please mark the problem as answered so that future readers know that the issue has been resolved.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.