BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SWEETSAS
Obsidian | Level 7

Hello,

I am trying to simulate bivariate normal data for two groups with different covariance matrix. I need to generate this data for different sample sizes, e.g., sample=20, 40 etc. and 2000 replicates. I want the ratio of sample size in group 1: group 2 to be 1:2. 

 

Below is the code I have for one group. For some reasons, the CREATE statement is creating two separate MVN datasets, and the last datasets overrides the initial one. 

 

Any help will be appreciated.

proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563  0.502445,
     0.502445   0.5280563
     };
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563  0.200978,
     0.200978   0.401956
     };
 call randseed(132);
 do N=5 to 10 by 5;
 X=RandNormal(N*Numsamples,Mean2,Cov2);
 ID=colvec(repeat(T(1:Numsamples),1,N));
 Z=ID||X;
 
 create MVN from Z[c={"ID" "y0" "y1" }];
 append from Z;
 *end;
 close MVN;
 end;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It looks like you've already read the article "How to generate multiple samples from the multivariate normal distribution in SAS."

 

For simulation studies, it can be convenient to write each sample to a data set from within a SAS/IML loop.

So put the CREATE statement before the loop and the CLOSE statement after the loop. Inside the loop, you create both samples and then use the APPEND statement.

 

It's not clear to me how you want a 1:2 ratio when the sample size is not divisible by 3. For example, when N = 20, do you want 6 and 14 as the sample sizes, or do you want 20 and 40. In the following program, I've used the second option.

 

I suspect you will also need a second ID variable to identify which observations come from the first distribution and which from the second. I called that the GROUP variable:

 

proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563  0.502445,
      0.502445   0.5280563
  };
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563  0.200978,
      0.200978   0.401956
  };
call randseed(132);

Z = {. . . .};    /* tell IML Z is numeric */
create MVN from Z[c={"Group" "ID" "y0" "y1" }];

do N=5 to 10 by 5;
   N1 = N;      /* or N1 = floor(N/3); ? */
   X=RandNormal(N1*Numsamples,Mean2,Cov2);
   ID=colvec(repeat(T(1:Numsamples),1,N1));
   Group = j(nrow(ID), 1, 1);  
   Z=Group||ID||X;
   append from Z;

   N2 = 2*N;    /* or N2 = N - N1; ? */
   X=RandNormal(N2*Numsamples,Mean2,Cov2);
   ID=colvec(repeat(T(1:Numsamples),1,N2));
   Group = j(nrow(ID), 1, 2);  
   Z=Group||ID||X;
   append from Z;
end;

close MVN;
quit;

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Yes, each time it creates a data set named MVN (the exact same name each time, so it overwrites the previous version of MVN). That's how the program is written.

 

This article from @Rick_SAS explains how you can overcome this

https://blogs.sas.com/content/iml/2015/02/09/array-of-matrices.html

 

 

--
Paige Miller
IanWakeling
Barite | Level 11

As Paige says you are overwritting the same data set in the loop.  An alternative would be to build one data set with successive appends as follows:

create MVN var {"N" "ID" "y0" "y1" };

do N=5 to 10 by 5;
 X=RandNormal(N*Numsamples,Mean2,Cov2);
 ID=colvec(repeat(T(1:Numsamples),1,N));
 Z = j(nrow(X),1,N)||ID||X;
 append from Z;
end;
  
close MVN;

I have added the loop variable N to the output data set which you can use in WHERE or BY statements in other SAS PROCs.

 

Rick_SAS
SAS Super FREQ

It looks like you've already read the article "How to generate multiple samples from the multivariate normal distribution in SAS."

 

For simulation studies, it can be convenient to write each sample to a data set from within a SAS/IML loop.

So put the CREATE statement before the loop and the CLOSE statement after the loop. Inside the loop, you create both samples and then use the APPEND statement.

 

It's not clear to me how you want a 1:2 ratio when the sample size is not divisible by 3. For example, when N = 20, do you want 6 and 14 as the sample sizes, or do you want 20 and 40. In the following program, I've used the second option.

 

I suspect you will also need a second ID variable to identify which observations come from the first distribution and which from the second. I called that the GROUP variable:

 

proc iml;
Numsamples=10;
/*specify population mean and covariance:grp1*/
mean1={6.0 6.0};
Cov1={0.5280563  0.502445,
      0.502445   0.5280563
  };
/*specify population mean and covariance:grp2*/
mean2={6.2499 5.7399};
Cov2={0.6280563  0.200978,
      0.200978   0.401956
  };
call randseed(132);

Z = {. . . .};    /* tell IML Z is numeric */
create MVN from Z[c={"Group" "ID" "y0" "y1" }];

do N=5 to 10 by 5;
   N1 = N;      /* or N1 = floor(N/3); ? */
   X=RandNormal(N1*Numsamples,Mean2,Cov2);
   ID=colvec(repeat(T(1:Numsamples),1,N1));
   Group = j(nrow(ID), 1, 1);  
   Z=Group||ID||X;
   append from Z;

   N2 = 2*N;    /* or N2 = N - N1; ? */
   X=RandNormal(N2*Numsamples,Mean2,Cov2);
   ID=colvec(repeat(T(1:Numsamples),1,N2));
   Group = j(nrow(ID), 1, 2);  
   Z=Group||ID||X;
   append from Z;
end;

close MVN;
quit;

SWEETSAS
Obsidian | Level 7

Excellent!! Many thanks! Yes, I did read the referenced resource. 

 

Is it possible to add column for sample size (N) to Allow analysis by ID N?

Rick_SAS
SAS Super FREQ

Yes, of course. Just add an additional column to Z. You'll want to modify the Z= assignments and the CREATE statement.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 6 replies
  • 1539 views
  • 3 likes
  • 4 in conversation