turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Logistic regression simulation - inducing mean dif...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-29-2016 04:00 PM

Hi all, in the code below I would like to simulate a data based on logistic regression model. My primary goal is to create the simluated data and retreive the coefficinets on the log scale, making sure predictor means by the binary outcome variable differ by, say ~ 0.5 standard deviation. When I employ proc logistic on the simulated data coefficients are off on the log scale. In addition trying many samples, I found the mean of the predictor by the binary outcome are very close to each other regardless of the magnitude of beta coefficients. Something is wrong, and I can't seem to pinpoint the problem. Please help.

```
%let N=200;
proc iml;
t = J(&N, 1);
X = J(&N, 2);
call randseed(4321);
call RANDGEN(X, "NORMAL", 0, 1);
beta = {1.40, -0.60, -0.40};
Xb = J(&N,1,1)||X;
eta = Xb*beta;
mu = LOGISTIC(eta);
call RANDGEN(t, "BERNOULLI", mu);
tempdata = t||x;
create logdata from tempdata[colname={'t' 'x1' 'x2'}];
append from tempdata;
close logdata;
quit;
proc logistic data=logdata;
model t = x1 x2;
run;
proc means data = logdata;
class t;
var x1 x2;
run;
```

Accepted Solutions

Solution

02-29-2016
04:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-29-2016 04:18 PM

It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.

If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'. Change your MODEL statement to

model t(event='1') = x1 x2 / clparm=wald;

and you will see that your parameter estimates are close to the population parameters from the simulation.

All Replies

Solution

02-29-2016
04:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-29-2016 04:18 PM

It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.

If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'. Change your MODEL statement to

model t(event='1') = x1 x2 / clparm=wald;

and you will see that your parameter estimates are close to the population parameters from the simulation.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-29-2016 05:05 PM

Thanks Rick! Modeling the data based on (event='0') or (event='1') shouldn't make a difference, as it only changes the sign. I also could not induce mean difference on x1 or x2, between t=1 and t=0 groups. But the code suprisingly works using SAS university edition on my laptop, whereas it consistently underestimated coefficients almost by half with no mean difference induced on my office PC. That's weird, SAS on the PC may need some update, I believe it was v9.2!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-29-2016 08:10 PM

I can'tthink of any reason why the results would be different in 9.2 versus 9.4, but I'm glad you were able to resolve the issue.