Hi all, in the code below I would like to simulate a data based on logistic regression model. My primary goal is to create the simluated data and retreive the coefficinets on the log scale, making sure predictor means by the binary outcome variable differ by, say ~ 0.5 standard deviation. When I employ proc logistic on the simulated data coefficients are off on the log scale. In addition trying many samples, I found the mean of the predictor by the binary outcome are very close to each other regardless of the magnitude of beta coefficients. Something is wrong, and I can't seem to pinpoint the problem. Please help.
%let N=200;
proc iml;
t = J(&N, 1);
X = J(&N, 2);
call randseed(4321);
call RANDGEN(X, "NORMAL", 0, 1);
beta = {1.40, -0.60, -0.40};
Xb = J(&N,1,1)||X;
eta = Xb*beta;
mu = LOGISTIC(eta);
call RANDGEN(t, "BERNOULLI", mu);
tempdata = t||x;
create logdata from tempdata[colname={'t' 'x1' 'x2'}];
append from tempdata;
close logdata;
quit;
proc logistic data=logdata;
model t = x1 x2;
run;
proc means data = logdata;
class t;
var x1 x2;
run;
It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.
If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'. Change your MODEL statement to
model t(event='1') = x1 x2 / clparm=wald;
and you will see that your parameter estimates are close to the population parameters from the simulation.
It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.
If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'. Change your MODEL statement to
model t(event='1') = x1 x2 / clparm=wald;
and you will see that your parameter estimates are close to the population parameters from the simulation.
Thanks Rick! Modeling the data based on (event='0') or (event='1') shouldn't make a difference, as it only changes the sign. I also could not induce mean difference on x1 or x2, between t=1 and t=0 groups. But the code suprisingly works using SAS university edition on my laptop, whereas it consistently underestimated coefficients almost by half with no mean difference induced on my office PC. That's weird, SAS on the PC may need some update, I believe it was v9.2!
I can'tthink of any reason why the results would be different in 9.2 versus 9.4, but I'm glad you were able to resolve the issue.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.