Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-03-2019 08:38 AM
(1046 views)

I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C). See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html

Sample is

Could can anyone please show me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0). Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.

Also can I sample both dependent and independent variable together? I mean by one correlation matrix.

Thanks so much!

Best,

Heather

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.

And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.

--

Paige Miller

Paige Miller

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Have you seen this post:

https://blogs.sas.com/content/iml/2014/06/25/simulate-logistic-data.html

Does it answer your question?

If not, do you have the covariance matrix or do you need to just generate random variables to simulate a logistic regression?

@HappySASUE wrote:

I read a message about "generate 1,000 random observations from a multivariate normal distribution with a specified mean and covariance structure" using proc iml Y = RandNormal(&n, j(1,ncol(C),0), C). See link https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html

Sample is

Could can anyone please show me how could I create variables, such as X1, X2, X3, Y based this random sample created? Y is binary dependent variables (1/0). Xs include both continuous and character predictors, with positive or negative value, for example, age (0, 100), time during (- infinite, + infinite), gender (1/0).....I need to build logistic regression using the simulated variables.

Also can I sample both dependent and independent variable together? I mean by one correlation matrix.

Thanks so much!

Best,

Heather

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi I have not read this link yet. Let me read it first. thanks a lot!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I read the blog. It sample random variable using ranfun and Bernoulli. For my question, I have a correlation and covariance matrix calculated from a real dataset (proc corr). and I plan to generate random variables using this matrix. As Dr. Wicklin explain in his blog: https://blogs.sas.com/content/iml/2011/01/12/sampling-from-the-multivariate-normal-distribution.html...

After generate the standard normal variables, these are not the real data. I am not sure whether I can use these variable fit the logistic or OSL model directly? For example binary variable, such as gender, I need to "transfer back" to 1/0, and variable age I can't use negative value -- but how can I do that?

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You're generating standard normal variables but you want to create a categorical variable? Seeing any issue with that statement there?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, that's what I want to know, but you r right, doesn't make sense......thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You could try ROOT() in IML .Search it at Rick's blog you will find it .

1) firstly simulate some random data by RAND().

2) root=ROOT(COV) .

3) the data you need = 1)*2)

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In general, the blog by Rick Wicklin talks about continuous variables. So you could use this to compute random multivariate normal values of age and time, given a specific correlation or covariance between age and time. As far as the categorical variable gender is concerned, you can generate at that at random, or you can generate that based upon some function of the randomly generated age and time. There really isn't a concept here of a correlation matrix between the continuous and categorical variables. So you will have to define how to compute gender from age and time, or how to compute gender ignoring age and time.

And of course the binary Y-variable can also be generated as a random function of the three x-variables, or generated independently of the x-variables. Again, that's something you will have to define, exactly how you think this should be done.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Paige ,

Thank you for your explanation. I get your meaning, but since this is my first time to do this, could you please show me some examples, on how to "think a function"--you don't need to give me any detailed description, only some study case, or links you happen to know.

Thank you!

Best,

Heather

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Is Y determined as a function of X1 X2 X3 plus some randomness? If so, what function? Or is it completely random, making it independent of X1 X2 X3?

Is the categorical variable independent of age and time? Or is the some dependence based on covariance/correlation? If so, what? Or is it completely random, making it independent of time and age?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Some of your questions are somewhat advanced. All are answered in Chapters 8-12 of *Simulating Data with SAS* (Wicklin, 2013).

- Chapter 8 discusses simulating multivariate data from a specified distribution.
- Chapter 9 discusses simulating data when each component is from a different marginal distribution, such as positive values, binary values, etc.
- Chapter 10 discusses linear regression models.
- Chapter 11 discusses generalized linear models such as logistic models.

For simulation, it is important to model either a set of data or a process with known properties. You haven't specified any relationships between the variables, so I will just make some up. I assume you know how to construct a covariance matrix for the MVN data in your data. The following program simulates using the basic ideas in the blog post "Simulating data for a logistic regression model."

```
%let N = 150; /* N = sample size */
proc iml;
call randseed(1);
mu = {45 0 0}; /* population mean */
Sigma = {10 0.1 0,
0.1 1000 0.05,
0 0.05 1};
/* X = (x1, x2, x3) ~ MVN(mu,Sigma). Then
and x3 is transformed to binary by x3 = (x3 > 0) */
X = randnormal(&N, mu, Sigma);
X[,3] = (X[,3] > 0);
/* Logistic model with parameters */
Intercept = -90;
beta = {2, -0.4, 3};
eta = Intercept + X*beta; /* 2. linear model */
mu = logistic(eta); /* 3. transform by inverse logit */
/* 4. Simulate binary response. Notice that the
"probability of success" is a vector (SAS/IML 12.1) */
y = j(&N,1); /* allocate response vector */
call randgen(y, "Bernoulli", mu); /* simulate binary response */
/* 5. Write y and x1-x2 to data set*/
varNames = {"y" "Age" "Time" "Gender"};
Out = y || X; /* simulated response in 1st column */
create LogisticData from Out[c=varNames]; /* no data is written yet */
append from Out; /* output this sample */
close LogisticData;
quit;
proc logistic data=LogisticData plots(only)=fitplot;
class Gender;
model y = Age Time Gender;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you all so much for your input and time. I think I had a better idea to handle my questions.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.