BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HappySASUE
Quartz | Level 8

 

I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C).  See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html

 

Sample is

 

 

Could can anyone please show  me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0).  Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.   

 

Also can I sample both dependent and independent variable together? I mean by one correlation matrix.  

 

Thanks so much! 

 

Best, 

 

Heather

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.

 

And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.

--
Paige Miller

View solution in original post

11 REPLIES 11
Reeza
Super User

Have you seen this post:

https://blogs.sas.com/content/iml/2014/06/25/simulate-logistic-data.html

 

Does it answer your question?

 

If not, do you have the covariance matrix or do you need to just generate random variables to simulate a logistic regression? 

 


@HappySASUE wrote:

 

I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C).  See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html

 

Sample is

 

 

Could can anyone please show  me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0).  Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.   

 

Also can I sample both dependent and independent variable together? I mean by one correlation matrix.  

 

Thanks so much! 

 

Best, 

 

Heather

 

 


 

HappySASUE
Quartz | Level 8

Hi I have not read this link  yet. Let me read it first. thanks a lot! 

HappySASUE
Quartz | Level 8

Hi, 

I read the blog. It sample random variable using ranfun and Bernoulli.  For my question, I have a correlation and covariance matrix calculated from a real dataset (proc corr).  and I plan to generate random variables using this matrix.  As Dr. Wicklin explain in his blog: https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html...

 

After generate the standard normal variables, these are not the real data.  I am not sure whether I can use these variable fit the logistic or OSL model directly? For example binary variable, such as gender, I need to "transfer back" to 1/0,  and variable age I can't use negative value -- but how can I do that?    

 

Thanks.

 

 

Reeza
Super User
You're generating standard normal variables but you want to create a categorical variable? Seeing any issue with that statement there?
HappySASUE
Quartz | Level 8

Yes, that's what I want to know, but you r right, doesn't make sense......thanks. 

Ksharp
Super User

You could try ROOT() in IML .Search it at Rick's blog you will find it .

 

1) firstly simulate some random data by RAND().

2) root=ROOT(COV) .

3)  the data you need = 1)*2) 

PaigeMiller
Diamond | Level 26

In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.

 

And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.

--
Paige Miller
HappySASUE
Quartz | Level 8

Hi Paige ,

Thank you for your explanation.  I get your meaning, but since this is my first time to do this, could you please show me some examples, on how to "think a function"--you don't need to give me any detailed description, only some study case, or links you happen to know.

Thank you! 

 

Best, 

Heather

PaigeMiller
Diamond | Level 26

Is Y determined as a function of X1 X2 X3 plus some randomness? If so, what function? Or is it completely random, making it independent of X1 X2 X3?

 

Is the categorical variable independent of age and time? Or is the some dependence based on covariance/correlation? If so, what? Or is it completely random, making it independent of time and age?

--
Paige Miller
Rick_SAS
SAS Super FREQ

Some of your questions are somewhat advanced. All are answered in Chapters 8-12 of Simulating Data with SAS (Wicklin, 2013).

  • Chapter 8 discusses simulating multivariate data from a specified distribution.
  • Chapter 9 discusses simulating data when each component is from a different marginal distribution, such as positive values, binary values, etc.
  • Chapter 10 discusses linear regression models.
  • Chapter 11 discusses generalized linear models such as logistic models.

For simulation, it is important to model either a set of data or a process with known properties. You haven't specified any relationships between the variables, so I will just make some up. I assume you know how to construct a covariance matrix for the MVN data in your data. The following program simulates using the basic ideas in the blog post "Simulating data for a logistic regression model."

 

 

%let N = 150;                                     /* N = sample size */
proc iml;
call randseed(1);     
mu = {45  0  0};   /* population mean */
Sigma = {10     0.1   0,
         0.1   1000   0.05,
         0      0.05  1};
/* X = (x1, x2, x3) ~ MVN(mu,Sigma). Then 
   and x3 is transformed to binary by x3 = (x3 > 0)  */
X = randnormal(&N, mu, Sigma); 
X[,3] = (X[,3] > 0);

/* Logistic model with parameters */
Intercept = -90;
beta = {2, -0.4, 3};
eta = Intercept + X*beta;           /* 2. linear model */
mu = logistic(eta);                 /* 3. transform by inverse logit */

/* 4. Simulate binary response. Notice that the 
      "probability of success" is a vector (SAS/IML 12.1)            */
y = j(&N,1);                             /* allocate response vector */
call randgen(y, "Bernoulli", mu);        /* simulate binary response */
 
/* 5. Write y and x1-x2 to data set*/
varNames = {"y"  "Age" "Time" "Gender"};
Out = y || X;                              /* simulated response in 1st column */
create LogisticData from Out[c=varNames];  /* no data is written yet */
append from Out;                           /* output this sample */
close LogisticData;
quit;

proc logistic data=LogisticData plots(only)=fitplot;
   class Gender;
   model y = Age Time Gender;
run;
HappySASUE
Quartz | Level 8

Thank you all so much for your input and time. I think I had a better idea to handle my questions.  

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 11 replies
  • 3734 views
  • 1 like
  • 5 in conversation