Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-12-2020 07:28 AM
(812 views)

Hello,

I am trying to simulate bivariate normal data for two groups with different covariance matrix. I need to generate this data for different sample sizes, e.g., sample=20, 40 etc. and 2000 replicates. I want the ratio of sample size in group 1: group 2 to be 1:2.

Below is the code I have for one group. For some reasons, the CREATE statement is creating two separate MVN datasets, and the last datasets overrides the initial one.

Any help will be appreciated.

proc iml;

Numsamples=10;

/*specify population mean and covariance:grp1*/

mean1={6.0 6.0};

Cov1={0.5280563 0.502445,

Numsamples=10;

/*specify population mean and covariance:grp1*/

mean1={6.0 6.0};

Cov1={0.5280563 0.502445,

0.502445 0.5280563

};

};

/*specify population mean and covariance:grp2*/

mean2={6.2499 5.7399};

Cov2={0.6280563 0.200978,

mean2={6.2499 5.7399};

Cov2={0.6280563 0.200978,

0.200978 0.401956

};

call randseed(132);

do N=5 to 10 by 5;

X=RandNormal(N*Numsamples,Mean2,Cov2);

ID=colvec(repeat(T(1:Numsamples),1,N));

Z=ID||X;

create MVN from Z[c={"ID" "y0" "y1" }];

append from Z;

*end;

close MVN;

end;

};

call randseed(132);

do N=5 to 10 by 5;

X=RandNormal(N*Numsamples,Mean2,Cov2);

ID=colvec(repeat(T(1:Numsamples),1,N));

Z=ID||X;

create MVN from Z[c={"ID" "y0" "y1" }];

append from Z;

*end;

close MVN;

end;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It looks like you've already read the article "How to generate multiple samples from the multivariate normal distribution in SAS."

For simulation studies, it can be convenient to write each sample to a data set from within a SAS/IML loop.

So put the CREATE statement before the loop and the CLOSE statement after the loop. Inside the loop, you create both samples and then use the APPEND statement.

It's not clear to me how you want a 1:2 ratio when the sample size is not divisible by 3. For example, when N = 20, do you want 6 and 14 as the sample sizes, or do you want 20 and 40. In the following program, I've used the second option.

I suspect you will also need a second ID variable to identify which observations come from the first distribution and which from the second. I called that the GROUP variable:

```
proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563 0.502445,
0.502445 0.5280563
};
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563 0.200978,
0.200978 0.401956
};
call randseed(132);
Z = {. . . .}; /* tell IML Z is numeric */
create MVN from Z[c={"Group" "ID" "y0" "y1" }];
do N=5 to 10 by 5;
N1 = N; /* or N1 = floor(N/3); ? */
X=RandNormal(N1*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N1));
Group = j(nrow(ID), 1, 1);
Z=Group||ID||X;
append from Z;
N2 = 2*N; /* or N2 = N - N1; ? */
X=RandNormal(N2*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N2));
Group = j(nrow(ID), 1, 2);
Z=Group||ID||X;
append from Z;
end;
close MVN;
quit;
```

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, each time it creates a data set named MVN (the exact same name each time, so it overwrites the previous version of MVN). That's how the program is written.

This article from @Rick_SAS explains how you can overcome this

https://blogs.sas.com/content/iml/2015/02/09/array-of-matrices.html

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As Paige says you are overwritting the same data set in the loop. An alternative would be to build one data set with successive appends as follows:

```
create MVN var {"N" "ID" "y0" "y1" };
do N=5 to 10 by 5;
X=RandNormal(N*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N));
Z = j(nrow(X),1,N)||ID||X;
append from Z;
end;
close MVN;
```

I have added the loop variable N to the output data set which you can use in WHERE or BY statements in other SAS PROCs.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Great! Many thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It looks like you've already read the article "How to generate multiple samples from the multivariate normal distribution in SAS."

For simulation studies, it can be convenient to write each sample to a data set from within a SAS/IML loop.

So put the CREATE statement before the loop and the CLOSE statement after the loop. Inside the loop, you create both samples and then use the APPEND statement.

It's not clear to me how you want a 1:2 ratio when the sample size is not divisible by 3. For example, when N = 20, do you want 6 and 14 as the sample sizes, or do you want 20 and 40. In the following program, I've used the second option.

I suspect you will also need a second ID variable to identify which observations come from the first distribution and which from the second. I called that the GROUP variable:

```
proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563 0.502445,
0.502445 0.5280563
};
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563 0.200978,
0.200978 0.401956
};
call randseed(132);
Z = {. . . .}; /* tell IML Z is numeric */
create MVN from Z[c={"Group" "ID" "y0" "y1" }];
do N=5 to 10 by 5;
N1 = N; /* or N1 = floor(N/3); ? */
X=RandNormal(N1*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N1));
Group = j(nrow(ID), 1, 1);
Z=Group||ID||X;
append from Z;
N2 = 2*N; /* or N2 = N - N1; ? */
X=RandNormal(N2*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N2));
Group = j(nrow(ID), 1, 2);
Z=Group||ID||X;
append from Z;
end;
close MVN;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Excellent!! Many thanks! Yes, I did read the referenced resource.

Is it possible to add column for sample size (N) to Allow analysis by ID N?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.