BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MetinBulus
Quartz | Level 8

Hi all, in the code below I would like to simulate a data based on logistic regression model. My primary goal is to create the simluated data and retreive the coefficinets on the log scale, making sure predictor means by the binary outcome variable differ by, say ~ 0.5 standard deviation. When I employ proc logistic on the simulated data coefficients are off on the log scale. In addition trying many samples, I found the mean of the predictor by the binary outcome are very close to each other regardless of the magnitude of beta coefficients. Something is wrong, and I can't seem to pinpoint the problem. Please help.  

 

%let N=200;
proc iml;
	t = J(&N, 1);
	X = J(&N, 2);
	call randseed(4321);
	call RANDGEN(X, "NORMAL", 0, 1);
		beta = {1.40, -0.60, -0.40};
		Xb = J(&N,1,1)||X; 
	  	eta = Xb*beta;
	  	mu = LOGISTIC(eta);
	call RANDGEN(t, "BERNOULLI", mu);
	tempdata = t||x;
	create logdata from tempdata[colname={'t' 'x1' 'x2'}];
	append from tempdata;
	close logdata;
quit;
proc logistic data=logdata;
	model t = x1 x2;
run;
proc means data = logdata;
	class t;
	var x1 x2;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.

 

If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'.  Change your MODEL statement to

 

model t(event='1') = x1 x2 / clparm=wald;

 

and you will see that your parameter estimates are close to the population parameters from the simulation.

View solution in original post

3 REPLIES 3
Rick_SAS
SAS Super FREQ

It's always helpful to provide a reference. This code appears to have come from the article "Simulate data for a logistic model" except that your code uses different values for the beta coefficients and you changed a few variable names.

 

If you read to the end of the article, you will see that the MODEL statement uses (Event='1'), whereas you are using the default event='0'.  Change your MODEL statement to

 

model t(event='1') = x1 x2 / clparm=wald;

 

and you will see that your parameter estimates are close to the population parameters from the simulation.

MetinBulus
Quartz | Level 8

Thanks Rick! Modeling the data based on (event='0') or (event='1') shouldn't make a difference, as it only changes the sign. I also could not induce mean difference on x1 or x2, between t=1 and t=0 groups. But the code suprisingly works using SAS university edition on my laptop, whereas it consistently underestimated coefficients almost by half with no mean difference induced on my office PC. That's weird, SAS on the PC may need some update, I believe it was v9.2!

Rick_SAS
SAS Super FREQ

I can'tthink of any reason why the results would be different in 9.2 versus 9.4, but I'm glad you were able to resolve the issue.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1975 views
  • 1 like
  • 2 in conversation