Obsidian | Level 7

## Simulate bivariate normal data for multiple Sample size for two groups

Hello,

I am trying to simulate bivariate normal data for two groups with different covariance matrix. I need to generate this data for different sample sizes, e.g., sample=20, 40 etc. and 2000 replicates. I want the ratio of sample size in group 1: group 2 to be 1:2.

Below is the code I have for one group. For some reasons, the CREATE statement is creating two separate MVN datasets, and the last datasets overrides the initial one.

Any help will be appreciated.

proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563  0.502445,
0.502445   0.5280563
};
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563  0.200978,
0.200978   0.401956
};
call randseed(132);
do N=5 to 10 by 5;
X=RandNormal(N*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N));
Z=ID||X;

create MVN from Z[c={"ID" "y0" "y1" }];
append from Z;
*end;
close MVN;
end;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Simulate bivariate normal data for multiple Sample size for two groups

It looks like you've already read the article "How to generate multiple samples from the multivariate normal distribution in SAS."

For simulation studies, it can be convenient to write each sample to a data set from within a SAS/IML loop.

So put the CREATE statement before the loop and the CLOSE statement after the loop. Inside the loop, you create both samples and then use the APPEND statement.

It's not clear to me how you want a 1:2 ratio when the sample size is not divisible by 3. For example, when N = 20, do you want 6 and 14 as the sample sizes, or do you want 20 and 40. In the following program, I've used the second option.

I suspect you will also need a second ID variable to identify which observations come from the first distribution and which from the second. I called that the GROUP variable:

``````proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563  0.502445,
0.502445   0.5280563
};
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563  0.200978,
0.200978   0.401956
};
call randseed(132);

Z = {. . . .};    /* tell IML Z is numeric */
create MVN from Z[c={"Group" "ID" "y0" "y1" }];

do N=5 to 10 by 5;
N1 = N;      /* or N1 = floor(N/3); ? */
X=RandNormal(N1*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N1));
Group = j(nrow(ID), 1, 1);
Z=Group||ID||X;
append from Z;

N2 = 2*N;    /* or N2 = N - N1; ? */
X=RandNormal(N2*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N2));
Group = j(nrow(ID), 1, 2);
Z=Group||ID||X;
append from Z;
end;

close MVN;
quit;

``````
6 REPLIES 6
Diamond | Level 26

## Re: Simulate bivariate normal data for multiple Sample size for two groups

Yes, each time it creates a data set named MVN (the exact same name each time, so it overwrites the previous version of MVN). That's how the program is written.

https://blogs.sas.com/content/iml/2015/02/09/array-of-matrices.html

--
Paige Miller
Barite | Level 11

## Re: Simulate bivariate normal data for multiple Sample size for two groups

As Paige says you are overwritting the same data set in the loop.  An alternative would be to build one data set with successive appends as follows:

``````create MVN var {"N" "ID" "y0" "y1" };

do N=5 to 10 by 5;
X=RandNormal(N*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N));
Z = j(nrow(X),1,N)||ID||X;
append from Z;
end;

close MVN;``````

I have added the loop variable N to the output data set which you can use in WHERE or BY statements in other SAS PROCs.

Obsidian | Level 7

## Re: Simulate bivariate normal data for multiple Sample size for two groups

Great! Many thanks!

SAS Super FREQ

## Re: Simulate bivariate normal data for multiple Sample size for two groups

It looks like you've already read the article "How to generate multiple samples from the multivariate normal distribution in SAS."

For simulation studies, it can be convenient to write each sample to a data set from within a SAS/IML loop.

So put the CREATE statement before the loop and the CLOSE statement after the loop. Inside the loop, you create both samples and then use the APPEND statement.

It's not clear to me how you want a 1:2 ratio when the sample size is not divisible by 3. For example, when N = 20, do you want 6 and 14 as the sample sizes, or do you want 20 and 40. In the following program, I've used the second option.

I suspect you will also need a second ID variable to identify which observations come from the first distribution and which from the second. I called that the GROUP variable:

``````proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563  0.502445,
0.502445   0.5280563
};
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563  0.200978,
0.200978   0.401956
};
call randseed(132);

Z = {. . . .};    /* tell IML Z is numeric */
create MVN from Z[c={"Group" "ID" "y0" "y1" }];

do N=5 to 10 by 5;
N1 = N;      /* or N1 = floor(N/3); ? */
X=RandNormal(N1*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N1));
Group = j(nrow(ID), 1, 1);
Z=Group||ID||X;
append from Z;

N2 = 2*N;    /* or N2 = N - N1; ? */
X=RandNormal(N2*Numsamples,Mean2,Cov2);
ID=colvec(repeat(T(1:Numsamples),1,N2));
Group = j(nrow(ID), 1, 2);
Z=Group||ID||X;
append from Z;
end;

close MVN;
quit;

``````
Obsidian | Level 7

## Re: Simulate bivariate normal data for multiple Sample size for two groups

Excellent!! Many thanks! Yes, I did read the referenced resource.

Is it possible to add column for sample size (N) to Allow analysis by ID N?

SAS Super FREQ

## Re: Simulate bivariate normal data for multiple Sample size for two groups

Yes, of course. Just add an additional column to Z. You'll want to modify the Z= assignments and the CREATE statement.

From The DO Loop