I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C). See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html
Sample is

Could can anyone please show me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0). Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.
Also can I sample both dependent and independent variable together? I mean by one correlation matrix.
Thanks so much!
Best,
Heather
In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.
And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.
Have you seen this post:
https://blogs.sas.com/content/iml/2014/06/25/simulate-logistic-data.html
Does it answer your question?
If not, do you have the covariance matrix or do you need to just generate random variables to simulate a logistic regression?
@HappySASUE wrote:
I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C). See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html
Sample is
Could can anyone please show me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0). Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.
Also can I sample both dependent and independent variable together? I mean by one correlation matrix.
Thanks so much!
Best,
Heather
Hi I have not read this link yet. Let me read it first. thanks a lot!
Hi,
I read the blog. It sample random variable using ranfun and Bernoulli. For my question, I have a correlation and covariance matrix calculated from a real dataset (proc corr). and I plan to generate random variables using this matrix. As Dr. Wicklin explain in his blog: https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html...
After generate the standard normal variables, these are not the real data. I am not sure whether I can use these variable fit the logistic or OSL model directly? For example binary variable, such as gender, I need to "transfer back" to 1/0, and variable age I can't use negative value -- but how can I do that?
Thanks.
Yes, that's what I want to know, but you r right, doesn't make sense......thanks.
You could try ROOT() in IML .Search it at Rick's blog you will find it .
1) firstly simulate some random data by RAND().
2) root=ROOT(COV) .
3) the data you need = 1)*2)
In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.
And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.
Hi Paige ,
Thank you for your explanation. I get your meaning, but since this is my first time to do this, could you please show me some examples, on how to "think a function"--you don't need to give me any detailed description, only some study case, or links you happen to know.
Thank you!
Best,
Heather
Is Y determined as a function of X1 X2 X3 plus some randomness? If so, what function? Or is it completely random, making it independent of X1 X2 X3?
Is the categorical variable independent of age and time? Or is the some dependence based on covariance/correlation? If so, what? Or is it completely random, making it independent of time and age?
Some of your questions are somewhat advanced. All are answered in Chapters 8-12 of Simulating Data with SAS (Wicklin, 2013).
For simulation, it is important to model either a set of data or a process with known properties. You haven't specified any relationships between the variables, so I will just make some up. I assume you know how to construct a covariance matrix for the MVN data in your data. The following program simulates using the basic ideas in the blog post "Simulating data for a logistic regression model."
%let N = 150;                                     /* N = sample size */
proc iml;
call randseed(1);     
mu = {45  0  0};   /* population mean */
Sigma = {10     0.1   0,
         0.1   1000   0.05,
         0      0.05  1};
/* X = (x1, x2, x3) ~ MVN(mu,Sigma). Then 
   and x3 is transformed to binary by x3 = (x3 > 0)  */
X = randnormal(&N, mu, Sigma); 
X[,3] = (X[,3] > 0);
/* Logistic model with parameters */
Intercept = -90;
beta = {2, -0.4, 3};
eta = Intercept + X*beta;           /* 2. linear model */
mu = logistic(eta);                 /* 3. transform by inverse logit */
/* 4. Simulate binary response. Notice that the 
      "probability of success" is a vector (SAS/IML 12.1)            */
y = j(&N,1);                             /* allocate response vector */
call randgen(y, "Bernoulli", mu);        /* simulate binary response */
 
/* 5. Write y and x1-x2 to data set*/
varNames = {"y"  "Age" "Time" "Gender"};
Out = y || X;                              /* simulated response in 1st column */
create LogisticData from Out[c=varNames];  /* no data is written yet */
append from Out;                           /* output this sample */
close LogisticData;
quit;
proc logistic data=LogisticData plots(only)=fitplot;
   class Gender;
   model y = Age Time Gender;
run;Thank you all so much for your input and time. I think I had a better idea to handle my questions.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.