Statistical programming, matrix languages, and more

Data Driven Simulation of 2-Group Multivariate Normal Data With Multiple Study Conditions

Reply
New Contributor
Posts: 2

Data Driven Simulation of 2-Group Multivariate Normal Data With Multiple Study Conditions

 

I am using SAS 9.4 and attempting to adapt Rick Wicklin's data driven simulation procedure as described at The DO Loop September 27, 2017.

 

The two adaptions I am intending to make are (1) generating multivariate data (here, 2 dvs for simplicity) for 2  groups given a single study condition and (2) doing this same thing for each of several study conditions that involve, for example, different sample sizes or different correlations.  The advantage of generating data for all of these conditions with one program is, of course, so that I do not have to conduct many separate simulations when sample size and so forth change.

 

The first adaption is below and is not causing any problems.  However, for the second adaptation, I have having difficulty reading from each of several lines from the data set that contains the study conditions.  As you see under the Two Condition header, I am using a Scan Loop procedure that works well for creating the data set containing the groups but, as you can see under the Two Condition Multivariate header, does not create multivariate outcomes (with this problem occurring in Proc IML.)  I assume that I am not reading in the vectors properly in Proc IML, and I do not know how to proceed. I am not sure if Scan Loop can work with PROC IML and I am just not implementing it correctly or if some different method needs to be used.

 

I have 3 sections of code below.  The first generates 2 group multivariate data using values from one condition that are contained in a data set.  Again, there are no problems here, but I am showing this syntax to give you a better idea of the nature of the simulation.

 

The second section contains includes the Scan Loop procedure but applies it only to the part of the simulation where groups are assigned (i.e., no multivariate data).  I do this to show how that the Scan Loop procedure does exactly what I want it to do: create data for each of several conditions (just 2 conditions here) and obtains a single data set having raw data simulated many times under each of the study conditions.

 

The third section attempts to apply the Scan Loop procedure to obtain the multivariate outcome data but fails.

 

Any help or pointers you could give would be greatly appreciated!

 

Thanks, Keenan

 

Well, here goes . . .    


/*************************************************

For one Condition;  

*************************************************/

 

/* Data file that contains the parameters */


data cormat;
input var1 covar1 covar2 var2 Meanres1 Meanres2 N NumSamples Cond ESY1 ESY2;
datalines;
1 .5 .5  1 0 0 50 10 1 .5 .5
;

/* Assigns macro variable names and obtains values of parameters from Data set Cormat: */

data _null_;
set cormat;
CALL SYMPUT('N',N);
CALL SYMPUT('Numsamples',Numsamples);
CALL SYMPUT ('Cond', Cond);
CALL SYMPUT ('Mean1', Mean1);
CALL SYMPUT ('Mean2', Mean2);
CALL SYMPUT ('ESY1', ESY1);
CALL SYMPUT ('ESY2', ESY2);
run;


/* Creates data set with 2 groups with Group N = N/2 and reps = Numsamples, retains study parameters;  */

Data Group;
Cond=&Cond;
do Reps = 1 to &Numsamples;
  do ID = 1 to &N;
   If ID le &N/2  then T = 0;
   else T=1;
   output;
  end;
 end;
run;


Data Group; Set Group;
   If T = 1 then do;  /* Assigns treatment means */
     PredY1 = &ESY1;
     PredY2 = &ESY2;
    end;
   Else if T = 0 then do;
      PredY1 = 0;  /*Assigns control means */
      PredY2 = 0;
     end;
   Output;
   run;


/* Obtaining residuals from multivariate normal distribution using values from cormat */

proc iml;
use cormat (Keep = var1 covar1 covar2 var2);
read all var _NUM_ into vector;
Cov = shape (vector, 2, 2);
close cormat;
use cormat (Keep = Meanres1 Meanres2);
read all var _NUM_ into Mean;
close cormat;
use cormat (keep = N Numsamples);
read all var {N} into N;
read all var {Numsamples} into Numsamples;
close cormat;
R = RandNormal(N*Numsamples, Mean, Cov);
Reps = colvec(repeat(T(1:Numsamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = Reps || R;
create MVN from Z[c={"Reps" "r1" "r2"}];
append from Z;
close MVN;
quit;


/* Merging data set having predicted values with data set having residuals  */
/*  Creating Observed Y scores as predicted plus residual */

Data all;
 merge group mvn;
 Y1 = PredY1 + r1;
 Y2 = PredY2 + r2;
run;

 

 


/*************************************************************************************

For Two Conditions;  Scan Loop works for Group data set

*************************************************************************************/

 

/* Data file that contains the parameters, with each line having different study conditions */


data cormat;
input var1 covar1 covar2 var2 Meanres1 Meanres2 N NumSamples Cond ESY1 ESY2;
datalines  /* 2 conditions now included with N differing across conditions */;
1 .5 .5  1 0 0 50 10 1 .5 .5
1 .5 .5  1 0 0 30 10 2 .5 .5
;


/* Macro to SCAN through cormat data file */
%MACRO SCANLOOP(cormat,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11);

/* First provide the number of records in cormat */
DATA _NULL_;
IF 0 THEN SET &cormat NOBS=X;
CALL SYMPUT('RECCOUNT',X);
STOP;
RUN;

/* loop from one to number of records */
%DO I=1 %TO &RECCOUNT;

/* Advance to the Ith record */
DATA _NULL_;
SET &cormat (FIRSTOBS=&I);

/* store the variables of interest in macro variables */

CALL SYMPUT('Var1',&Field1);
CALL SYMPUT('Covar1',&Field2);
CALL SYMPUT('Covar2',&Field3);
CALL SYMPUT('Var2',&Field4);
CALL SYMPUT('Meanres1',&Field5);
CALL SYMPUT('Meanres2',&Field6);
CALL SYMPUT('N',&Field7);
CALL SYMPUT('NumSamples',&Field8);
CALL SYMPUT('Cond',&Field9);
CALL SYMPUT('ESY1',&Field10);
CALL SYMPUT('ESY2',&Field11);
STOP;
RUN;


/* Creates data set with 2 groups with Group N = N/2 and reps = Numsamples, retains study parameters;  */
/*  Does this for each study condition  */

Data Group;
Cond=&Cond;
do Reps = 1 to &Numsamples;
do ID = 1 to &N;
  If ID le &N/2  then T = 0;
    else T=1;
  output;
 end;
 end;
run;


Data Group; Set Group;
   If T = 1 then do;
     PredY1 = &ESY1;
     PredY2 = &ESY2;
    end;
   Else if T = 0 then do;
      PredY1 = 0;
      PredY2 = 0;
     end;
   Output;
   run;


/* Proc datasets appends data sets from each study condition */
PROC DATASETS; APPEND BASE=ALLDATA DATA=Group; RUN;  QUIT;


%END;
%MEND SCANLOOP;


/* Call SCANLOOP macro */
%SCANLOOP(cormat,var1,covar1,covar2,var2,Meanres1,Meanres2,N,NumSamples,Cond,ESY1,ESY2);
RUN;

 

 



/*************************************************************************************

For Two Conditions Multivariate Data: Problem occurs in Proc IML part

*************************************************************************************/

 

/* Data file that contains the parameters, with each line having different study conditions */


data cormat;
input var1 covar1 covar2 var2 Meanres1 Meanres2 N NumSamples Cond ESY1 ESY2;
datalines  /* 2 conditions now included with N differing across conditions */;
1 .5 .5  1 0 0 50 10 1 .5 .5
1 .5 .5  1 0 0 30 10 2 .5 .5
;


/* Macro to SCAN through cormat data file */
%MACRO SCANLOOP(cormat,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11);

/* First provide the number of records in cormat */
DATA _NULL_;
IF 0 THEN SET &cormat NOBS=X;
CALL SYMPUT('RECCOUNT',X);
STOP;
RUN;

/* loop from one to number of records */
%DO I=1 %TO &RECCOUNT;

/* Advance to the Ith record */
DATA _NULL_;
SET &cormat (FIRSTOBS=&I);

/* store the variables of interest in macro variables */

CALL SYMPUT('Var1',&Field1);
CALL SYMPUT('Covar1',&Field2);
CALL SYMPUT('Covar2',&Field3);
CALL SYMPUT('Var2',&Field4);
CALL SYMPUT('Meanres1',&Field5);
CALL SYMPUT('Meanres2',&Field6);
CALL SYMPUT('N',&Field7);
CALL SYMPUT('NumSamples',&Field8);
CALL SYMPUT('Cond',&Field9);
CALL SYMPUT('ESY1',&Field10);
CALL SYMPUT('ESY2',&Field11);
STOP;
RUN;


/* Creates data set with 2 groups with Group N = N/2 and reps = Numsamplese, retains study parameters;  */
/*  Does this for each study condition  */

Data Group;
Cond=&Cond;
do Reps = 1 to &Numsamples;
do ID = 1 to &N;
  If ID le &N/2  then T = 0;
    else T=1;
  output;
 end;
 end;
run;


Data Group; Set Group;
   If T = 1 then do;
     PredY1 = &ESY1;
     PredY2 = &ESY2;
    end;
   Else if T = 0 then do;
      PredY1 = 0;
      PredY2 = 0;
     end;
   Output;
   run;


/* Proc data sets appends data sets from each study condition */
PROC DATASETS; APPEND BASE=ALLDATA DATA=Group; RUN;  QUIT;



/* The procedure fails from this point on */

/* Obtaining residuals from multivariate normal distribution using values from cormat */

proc iml;
use cormat (Keep = var1 covar1 covar2 var2};  /* I  am not reading these variables, as well as others below, in properly; I also tried placing an & before each variable but that also failed */
read all var _NUM_ into vector;
Cov = shape (vector, 2, 2);
close cormat;
use cormat (Keep = Meanres1 Meanres2);
read all var _NUM_ into Mean;
close cormat;
use cormat (keep = N Numsamples);
read all var {N} into N;
read all var {Numsamples} into Numsamples;
close cormat;
R = RandNormal(N*Numsamples, Mean, Cov);
Reps = Fieldvec(repeat(T(1:Numsamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = Reps || R;
create MVN from Z[c={"Reps" "r1" "r2"}];
append from Z;
close MVN;
quit;


%END;
%MEND SCANLOOP;


/* Call SCANLOOP macro */
%SCANLOOP(cormat,var1,covar1,covar2,var2,Meanres1,Meanres2,N,NumSamples,Cond,ESY1,ESY2);
RUN;


Below are some of the error messages I receive when I run the syntax above.

 

ERROR 23-7: Invalid value for the KEEP option.
ERROR: Invalid value for the KEEP option.
ERROR: Some options for file WORK.CORMAT were not processed because of
       errors or warnings noted above.

 statement : USE at line 290 column 1
ERROR: No data set is currently open for input.

 statement : READ at line 290 column 1
ERROR: (execution) Matrix has not been set to a value.

 operation : SHAPE at line 290 column 1
 operands  : vector, *LIT1001, *LIT1002

vector      0 row       0 col     (type ?, size 0)


*LIT1001      1 row       1 col     (numeric)

         2

*LIT1002      1 row       1 col     (numeric)

         2

SAS Super FREQ
Posts: 4,275

Re: Data Driven Simulation of 2-Group Multivariate Normal Data With Multiple Study Conditions

Here are a few tips:

1. When I am having a problem with a macro loop, I eliminate the loop for debugging. Get rid of the %MACRO definition and instead use

the following:

%let cormat = cormat;
%let Field1 = var1;
%let Field2 = covar1;
%let Field3 = covar2;
%let Field4 = var2;
%let Field5 = Meanres1;
%let Field6 = Meanres2;
%let Field7 = N;
%let Field8 = NumSamples;
%let Field9 = Cond;
%let Field10=ESY1;
%let Field11=ESY2;
%let I = 1;

Debug the body of the program. Then use

%LET I = 2;

and do the same thing.

 

2. Maybe I am misunderstanding, but it looks like you are trying to read the I_th row of the data set into macro variables. If so, you are using the FIRSTOBS= option incorrectly. Use

DATA _NULL_;
SET &cormat (FIRSTOBS=&I OBS=&I);
...

3. There is a typo for the first KEEP= clause. The KEEP= option on the USE statement needs to be in parentheses, not brackets.

use cormat (keep=var1 covar1 covar2 var2);

Optionally, you can use a VAR clause instead of KEEP. The VAR clause does use brackets:

use cormat;
read all var {var1 covar1 covar2 var2} into vector;

4. I am not sure, but I assume you also want to read the I_th row in the IML program?  If so, don't use READ ALL VAR _NUM_. Use

read POINT &I var {x1 x2 x3} into X;

 

5. I assume the unknown call to FIELDVEC is supposed to be a call to COLVEC.

 

There might be other issues, but by "unwrapping the macro" and debugging the straight program, you should be able to solve the rest.  After all is working, you can reconstruct the macro loop.

New Contributor
Posts: 2

Re: Data Driven Simulation of 2-Group Multivariate Normal Data With Multiple Study Conditions

Thanks, Rick.  Your careful reading of the post is greatly appreciated, and your remarks were right on target.

 

Below is the syntax that does exactly what I had intended.  I added some additional comments and tried to pretty up the syntax a bit.

 

Happy holidays!

 

Keenan

 



/*****************************************************

Simulating Two-Group Multivariate Data For Two
Conditions Using Parameter Values Residing in Data Set

******************************************************/

/* Inputting study conditions in "parent" data set Cormat;
   Each line contains values used for simulation study*/

DATA Cormat;
INPUT Var1 Covar1 Covar2 Var2 Meanres1 Meanres2 N NumSamples Cond Esy1 Esy2;
DATALINES  /* 2 conditions now included with N differing across conditions */;
1 .5 .5  1 0 0 50 10 1 .5 .5
1 .5 .5  1 0 0 30 10 2 .5 .5
;


/* Macro to SCAN through cormat data file
   Will read each line of data sequentially from cormat data file*/
%MACRO SCANLOOP(Cormat,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11);


/* First provide the number of records in cormat */
DATA _NULL_;
IF 0 THEN SET &cormat NOBS=X;
CALL SYMPUT('RECCOUNT',X);
STOP;
RUN;

/* Loop from one to number of records */
%DO I=1 %TO &RECCOUNT;

/* Advance to the Ith record */
DATA _NULL_;
SET &Cormat (FIRSTOBS=&I OBS=&I);


/* Store the variables of interest in macro variables */

CALL SYMPUT('Var1',&Field1);
CALL SYMPUT('Covar1',&Field2);
CALL SYMPUT('Covar2',&Field3);
CALL SYMPUT('Var2',&Field4);
CALL SYMPUT('Meanres1',&Field5);
CALL SYMPUT('Meanres2',&Field6);
CALL SYMPUT('N',&Field7);
CALL SYMPUT('NumSamples',&Field8);
CALL SYMPUT('Cond',&Field9);
CALL SYMPUT('ESY1',&Field10);
CALL SYMPUT('ESY2',&Field11);
STOP;
RUN;


/* Creates data set with 2 groups with Group size = N/2 and reps = Numsamples, retains study parameters;  */
/*  Does this for each study condition  */

DATA Group;
Cond=&Cond;
DO Reps = 1 to &Numsamples;
 DO ID = 1 to &N;
    IF ID le &N/2  then T = 0;
    ELSE T=1;
    OUTPUT;
  END;
END;
RUN;


DATA Group; SET Group;
   IF T = 1 THEN DO;
     PredY1 = &Esy1;
     PredY2 = &Esy2;
   END;
    ELSE IF T = 0 THEN DO;
      PredY1 = 0;
      PredY2 = 0;
    END;
   OUTPUT;
   RUN;


/* Proc datasets appends Group data sets from each study condition */
PROC DATASETS; APPEND BASE=AllGroup DATA=Group; RUN;  QUIT;



/* Obtaining residuals from multivariate normal distribution using values from cormat */

PROC IML;
Cond = &Cond;
USE Cormat;
READ POINT &I VAR {Var1 Covar1 Covar2 Var2} INTO Vector;
Cov = SHAPE (vector, 2, 2);
CLOSE Cormat;
USE Cormat;
READ POINT &I VAR {Meanres1 Meanres2} INTO mean;
CLOSE Cormat;
USE Cormat;
READ POINT &I VAR {N} INTO N;
READ POINT &I VAR {Numsamples} INTO Numsamples;
READ POINT &I VAR {Cond} INTO Cond;
CLOSE Cormat;
Newcond = REPEAT ({&Cond},N*Numsamples);  
R = RANDNORMAL(N*Numsamples, Mean, Cov);
Reps = COLVEC(repeat(T(1:Numsamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */
Z = Reps || R || Newcond;
CREATE MVN FROM Z[c={"Reps" "R1" "R2" "Cond"}];
APPEND FROM Z;
CLOSE MVN;
QUIT;

/* Proc datasets appends MVN data sets from each study condition */
PROC DATASETS; APPEND BASE=Allmvn DATA=mvn; RUN;  QUIT;

%END;
%MEND SCANLOOP;

/* Call SCANLOOP macro */
%SCANLOOP(Cormat,Var1,Covar1,Covar2,Var2,Meanres1,Meanres2,N,NumSamples,Cond,Esy1,Esy2);
RUN;


/* Merging fixed data set with residuals data set  */
/*  Creating Observed Y scores as predicted plus residual */
/* ALLDATA contains simulated data for all study conditions */

Data ALLDATA;
 merge Allgroup Allmvn;
 Y1 = PredY1 + R1;
 Y2 = PredY2 + R2;
RUN;

Ask a Question
Discussion stats
  • 2 replies
  • 266 views
  • 1 like
  • 2 in conversation