BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PurpleNinja
Obsidian | Level 7

I read this blog post by @Rick_SAS on simulating data for linear regression.

 

https://blogs.sas.com/content/iml/2017/01/25/simulate-regression-model-sas.html

 

I modified the code to simulate data for logistic regression, and it works very well. 

 

 

mu = beta[0]; /* intercept term */

do j = 1 to &ncovar;
     mu = mu + beta[j] * x[j]; /* + sum(beta[j]*x[j]) */
end;

prob = 1 / (1 + exp(-mu)); /* specify the probability of success using the logistic function and "mu" */

y = RAND('BERNOULLI', prob); /* simulate binary variables based on "prob" */

Thanks, Rick!

 

However, I now want to 

 

a) add a binary covariate that is way more significant than the other continuous covariates

 

b) specify the response rate (i.e. the proportion of success for "Y").  For my particular example, I need a very low response rate.

 

 

Based on Rick's example, I know how to specify the regression coefficients, but I don't know how to also specify the response rate.  (It may be impossible to do both.)  

 

For my example, I don't need to specify the regression coefficients; I just need the binary covariate to be much more significant than the continuous covariates, and I need a very low response rate in my raw data.

 

How can I accomplish both of these goals?

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I also wrote several articles on simulating logistic data directly:

https://blogs.sas.com/content/iml/2014/06/25/simulate-logistic-data.html

https://blogs.sas.com/content/iml/2014/06/27/simulate-many-samples-from-a-logistic-regression-model....

 

 

For the binary covariate, just use (or simulate) a 0/1 variable and give it a large relative coefficient such as

G = rand("Bern", 0.5);

eta = intercept + 50*G + 0.3*x1 - 0.4*x2;

 

By "response rate," I assume you mean the relative proportion of 0/1 responses.  That depends on the linear predictor (eta). You can use the Intercept term to raise or lower the rate, but the rate will depends not only on mean(eta) but also on the values of the explanatory variables. Adjusting the response rate is easiest when the explanatory variables are normally distributed with mean 0.

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

I also wrote several articles on simulating logistic data directly:

https://blogs.sas.com/content/iml/2014/06/25/simulate-logistic-data.html

https://blogs.sas.com/content/iml/2014/06/27/simulate-many-samples-from-a-logistic-regression-model....

 

 

For the binary covariate, just use (or simulate) a 0/1 variable and give it a large relative coefficient such as

G = rand("Bern", 0.5);

eta = intercept + 50*G + 0.3*x1 - 0.4*x2;

 

By "response rate," I assume you mean the relative proportion of 0/1 responses.  That depends on the linear predictor (eta). You can use the Intercept term to raise or lower the rate, but the rate will depends not only on mean(eta) but also on the values of the explanatory variables. Adjusting the response rate is easiest when the explanatory variables are normally distributed with mean 0.

PurpleNinja
Obsidian | Level 7

Hi Rick,

 

Thanks for your quick reply.

 

1) Yes, by response rate, I mean the proportion of success.

 

2) Your first example of simulating data for linear regression uses the normal distribution to simulate the values of the predictors.  Recall your code

 

do j = 1 to dim(x);
				  
     x[j] = rand("Normal"); /* 2. Simulate explanatory variables   */
				
end;

3) That's a very good idea to use the intercept to adjust the response rate.  Thank you!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1945 views
  • 1 like
  • 2 in conversation