Solved
Contributor
Posts: 43

Logistic regression simulation - inducing mean difference

Hi all, in the code below I would like to simulate a data based on logistic regression model. My primary goal is to create the simluated data and retreive the coefficinets on the log scale, making sure predictor means by the binary outcome variable differ by, say ~ 0.5 standard deviation. When I employ proc logistic on the simulated data coefficients are off on the log scale. In addition trying many samples, I found the mean of the predictor by the binary outcome are very close to each other regardless of the magnitude of beta coefficients. Something is wrong, and I can't seem to pinpoint the problem. Please help.

``````%let N=200;
proc iml;
t = J(&N, 1);
X = J(&N, 2);
call randseed(4321);
call RANDGEN(X, "NORMAL", 0, 1);
beta = {1.40, -0.60, -0.40};
Xb = J(&N,1,1)||X;
eta = Xb*beta;
mu = LOGISTIC(eta);
call RANDGEN(t, "BERNOULLI", mu);
tempdata = t||x;
create logdata from tempdata[colname={'t' 'x1' 'x2'}];
append from tempdata;
close logdata;
quit;
proc logistic data=logdata;
model t = x1 x2;
run;
proc means data = logdata;
class t;
var x1 x2;
run;``````

Accepted Solutions
Solution
‎02-29-2016 04:48 PM
SAS Super FREQ
Posts: 3,831

Re: Logistic regression simulation - inducing mean difference

It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.

If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'.  Change your MODEL statement to

model t(event='1') = x1 x2 / clparm=wald;

and you will see that your parameter estimates are close to the population parameters from the simulation.

All Replies
Solution
‎02-29-2016 04:48 PM
SAS Super FREQ
Posts: 3,831

Re: Logistic regression simulation - inducing mean difference

It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.

If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'.  Change your MODEL statement to

model t(event='1') = x1 x2 / clparm=wald;

and you will see that your parameter estimates are close to the population parameters from the simulation.

Contributor
Posts: 43

Re: Logistic regression simulation - inducing mean difference

Thanks Rick! Modeling the data based on (event='0') or (event='1') shouldn't make a difference, as it only changes the sign. I also could not induce mean difference on x1 or x2, between t=1 and t=0 groups. But the code suprisingly works using SAS university edition on my laptop, whereas it consistently underestimated coefficients almost by half with no mean difference induced on my office PC. That's weird, SAS on the PC may need some update, I believe it was v9.2!

SAS Super FREQ
Posts: 3,831