Quartz | Level 8

## How to create variables using random samples generated by given correlated matrix

I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C).  See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html

Sample is

Could can anyone please show  me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0).  Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.

Also can I sample both dependent and independent variable together? I mean by one correlation matrix.

Thanks so much!

Best,

Heather

1 ACCEPTED SOLUTION

Accepted Solutions
Diamond | Level 26

## Re: How to create variables using random samples generated by given correlated matrix

In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.

And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.

--
Paige Miller
11 REPLIES 11
Super User

## Re: How to create variables using random samples generated by given correlated matrix

Have you seen this post:

https://blogs.sas.com/content/iml/2014/06/25/simulate-logistic-data.html

If not, do you have the covariance matrix or do you need to just generate random variables to simulate a logistic regression?

@HappySASUE wrote:

I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C).  See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html

Sample is

Could can anyone please show  me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0).  Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.

Also can I sample both dependent and independent variable together? I mean by one correlation matrix.

Thanks so much!

Best,

Heather

Quartz | Level 8

Quartz | Level 8

## Re: How to create variables using random samples generated by given correlated matrix

Hi,

I read the blog. It sample random variable using ranfun and Bernoulli.  For my question, I have a correlation and covariance matrix calculated from a real dataset (proc corr).  and I plan to generate random variables using this matrix.  As Dr. Wicklin explain in his blog: https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html...

After generate the standard normal variables, these are not the real data.  I am not sure whether I can use these variable fit the logistic or OSL model directly? For example binary variable, such as gender, I need to "transfer back" to 1/0,  and variable age I can't use negative value -- but how can I do that?

Thanks.

Super User

## Re: How to create variables using random samples generated by given correlated matrix

You're generating standard normal variables but you want to create a categorical variable? Seeing any issue with that statement there?
Quartz | Level 8

## Re: How to create variables using random samples generated by given correlated matrix

Yes, that's what I want to know, but you r right, doesn't make sense......thanks.

Super User

## Re: How to create variables using random samples generated by given correlated matrix

You could try ROOT() in IML .Search it at Rick's blog you will find it .

1) firstly simulate some random data by RAND().

2) root=ROOT(COV) .

3)  the data you need = 1)*2)

Diamond | Level 26

## Re: How to create variables using random samples generated by given correlated matrix

In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.

And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.

--
Paige Miller
Quartz | Level 8

## Re: How to create variables using random samples generated by given correlated matrix

Hi Paige ,

Thank you for your explanation.  I get your meaning, but since this is my first time to do this, could you please show me some examples, on how to "think a function"--you don't need to give me any detailed description, only some study case, or links you happen to know.

Thank you!

Best,

Heather

Diamond | Level 26

## Re: How to create variables using random samples generated by given correlated matrix

Is Y determined as a function of X1 X2 X3 plus some randomness? If so, what function? Or is it completely random, making it independent of X1 X2 X3?

Is the categorical variable independent of age and time? Or is the some dependence based on covariance/correlation? If so, what? Or is it completely random, making it independent of time and age?

--
Paige Miller
SAS Super FREQ

## Re: How to create variables using random samples generated by given correlated matrix

Some of your questions are somewhat advanced. All are answered in Chapters 8-12 of Simulating Data with SAS (Wicklin, 2013).

• Chapter 8 discusses simulating multivariate data from a specified distribution.
• Chapter 9 discusses simulating data when each component is from a different marginal distribution, such as positive values, binary values, etc.
• Chapter 10 discusses linear regression models.
• Chapter 11 discusses generalized linear models such as logistic models.

For simulation, it is important to model either a set of data or a process with known properties. You haven't specified any relationships between the variables, so I will just make some up. I assume you know how to construct a covariance matrix for the MVN data in your data. The following program simulates using the basic ideas in the blog post "Simulating data for a logistic regression model."

``````%let N = 150;                                     /* N = sample size */
proc iml;
call randseed(1);
mu = {45  0  0};   /* population mean */
Sigma = {10     0.1   0,
0.1   1000   0.05,
0      0.05  1};
/* X = (x1, x2, x3) ~ MVN(mu,Sigma). Then
and x3 is transformed to binary by x3 = (x3 > 0)  */
X = randnormal(&N, mu, Sigma);
X[,3] = (X[,3] > 0);

/* Logistic model with parameters */
Intercept = -90;
beta = {2, -0.4, 3};
eta = Intercept + X*beta;           /* 2. linear model */
mu = logistic(eta);                 /* 3. transform by inverse logit */

/* 4. Simulate binary response. Notice that the
"probability of success" is a vector (SAS/IML 12.1)            */
y = j(&N,1);                             /* allocate response vector */
call randgen(y, "Bernoulli", mu);        /* simulate binary response */

/* 5. Write y and x1-x2 to data set*/
varNames = {"y"  "Age" "Time" "Gender"};
Out = y || X;                              /* simulated response in 1st column */
create LogisticData from Out[c=varNames];  /* no data is written yet */
append from Out;                           /* output this sample */
close LogisticData;
quit;

proc logistic data=LogisticData plots(only)=fitplot;
class Gender;
model y = Age Time Gender;
run;``````
Quartz | Level 8

## Re: How to create variables using random samples generated by given correlated matrix

Thank you all so much for your input and time. I think I had a better idea to handle my questions.

From The DO Loop