07-20-2015 09:08 AM
I wish to simulate data from a logistic regression model, with the following elements:
1. A binary response ( 1 = Success, 0 = Failure)
2. A binary independent variable (1 = Treatment, 0 = Control)
3. For each subject in the sample, there are 2 observations (like two ears if it's ear drops, for example), which are assumed to be correlated, or at least can't be assumed not to be.
The purpose is to do a power analysis for a given sample size. Before I try the power analysis with the looping, I am not sure how to simulate a single data set of that kind.
I have a copy of Rick Wicklin's book on simulations, I saw the code for logistic regression. The independent variables there are continuous (I think), and there is no correlation (no clusters), the clusters comes later with a normal dependent variable. I am not sure how to merge the two examples.
One more comment, I would prefer to do it using the data step and not IML, if possible .
Any assistant will be very appreciated !
(when helping you can make up any correlation and proportion you like, I can always change it later as part of a simulation).
Thank you in advance !
P.S. I have posted this on the data step forum and was suggested to move it here, the previous one shall be ignored and removed if an admin sees this comment. Thank you
07-20-2015 09:45 AM
The easiest way to do this is to think of it as two steps.
In Step 1, create the explanatory variables by using the techniques described throughout the book. Since you have a repeated measures term, pay paticular attention to section 12.3.
After you have the explanatory variables modeled to your satisfaction, Step 2 is to add the response variable as in section 12.2:
set Explanatory; /* all BY groups and explanatory variables (including random effect) in here */
eta = b0 + b1*treatment + randomEffect; /* specify linear model of treatment + random effect */
p = logistic(eta); /* convert to probability via logistic transformation */
y = rand("Bernoulli", p);