BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
feyzi
Calcite | Level 5

Hello,

I was wondering if anyone knows a way of taking a dynamic array in PROC FCMP as such:


PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

       SUBROUTINE Y(SIM);

            ARRAY Z[&S, 2] / NOSYMBOLS;    

            RC2 = READ_ARRAY('COPULA', Z, 'SIM', 'Z');;    

            PUT Z=;    

       ENDSUB;

QUIT;

...and then taking a defined subsection of this to create a new dynamic array.

Thanks a lot

Emre

1 ACCEPTED SOLUTION

Accepted Solutions
Vince28_Statcan
Quartz | Level 8

Ok, let's make sure I understand properly here - you are looking to generate a multivariate normal variable N(mu, sigma) with sigma = (1 .5 , .5 1) so you generate two independant normal random vars and use the covar matrix to generate the resulting bivariate normal random variable? So in the end you wind up with a rescaled value X+.5Y in the same row as your X simulation and .5X+Y in the row of your Y simulation for each simulation pair.

As Snoopy pointed out, this can probably be done in moreorless a single pass of data which would save you tremendous amounts of time for processing. The bigger question would be do you need to build a generic process that will be able to further handle N-variate random variables or are you looking for a solution tailored to your problem?

A simple approach to save time if you don't need an expandable solution is to do everything in one line and hard the multiplication since its easy to achieve when N = 2 dimensions.

E.g.

data want;

     do sim = 1 to 100;

          Z_A = rand('norm', 0, 1);

          Z_B = rand('norm', 0, 1);

          W_A=Z_A+.5*Z_B;

          W_B=.5*Z_A+Z_B;

          output;

     end;

run; /* you could use transpose if you really needed the bivariate data over 2 rows */

Assuming you want this to be somewhat generic, here's how I would approach the problem:

proc fcmp OUTLIB=WORK.FUNCS.MATRIX;
     subroutine mmult(z {*,*}, corr {*,*}, y {*,*});
     outargs y;
     call mult(z, corr, y);
     endsub;

run;

  

options cmplib=work.funcs;


%macro multivarnormsim(numsim=, nvar=, corrmat= );
data want;
     array z {&nvar.} z1-z&nvar.;
  array zz {1, &nvar.} _TEMPORARY_;
     array corr {&nvar., &nvar.} _temporary_ (&corrmat);
     array w {&nvar.} w1-w&nvar.;
  array ww {1, &nvar.} _TEMPORARY_;
     do i=1 to &numsim;
          do j=1 to dim(z);
               z{j} = rand('norm', 0, 1);
      zz{1, j}=z{j};
          end;
          call mmult(zz, corr, ww);
    do j=1 to dim(w);
   w{j}=ww{1, j};
    end;
    output;
     end;
     drop i j;
run;
%mend;

%multivarnormsim(numsim=100, nvar=5, corrmat=     %quote( 1 0 0 0 .5,
                                                                                      0 1 .2 .2 0,
                                                                                       0 .2 1 0  0,
                                                                                       0 .2 0 1 0,
                                                                                       .5 0 0 0 1)
                              ); /*the nvar*nvar values could've been written in a single line it doesn't matter */

View solution in original post

19 REPLIES 19
Vince28_Statcan
Quartz | Level 8

Could you be more specific about what you are trying to achieve and why? Complex matrix processing is a feature of proc IML much, much more than proc fcmp but depending on what all you wish to achieve, you might be able to program sub-arrays.

I don't think the call dynamic_array routine was designed to support selection of subarrays for scale-down. So I think you'd have to program a subroutine specific for your task or maybe look online for others that might have needed a similar feature.

feyzi
Calcite | Level 5

Hi Vince,

Sorry sure thing.

So I have 100,000 simulations of z values (from a standard normal distribution) for two categories (A, B), which look like this:

LOBSIMZ
A1-0.32659
B10.450515
A21.542437
B20.396981
A30.259029
B30.045685
A4-0.55583
B40.728347
A50.77872
B5-1.27385

I want to carry out a matrix multiplication of each of these simulations and the following correlation matrix table so I get correlated variables:

AB
10.5
0.51

Now I can do this with the code below but my problem is because there's a lot of reading and writing going on for 100,000 simulations my run-time is way too long.So I thought a better way would be to read in the Z variable in COPULA table into an array, take each simulation at a time (this is the bit where I want to take the subsection of the Z array), multiply with the correlation matrix, then append each output array on top of previous ones. Do all of this in temporary arrays and then once complete, write the final array containing the correlated values once.

So I would have to take a subsection of an array and be able append two arrays as well.

Any other alternative suggestions also most welcome! Unfortunately I don't have PROC IML so have to do with other means. I hope

Thanks a lot for the time!

Emre

DATA COPULA (KEEP=LOB_LT_UW SIM Z)

  CORR (DROP=LOB_LT_UW SIM Z);

  SET CORR;

  OUTPUT CORR;

  DO SIM = 1 TO 100;

  Z = NORMAL(123);

  OUTPUT COPULA;

  END;

RUN;

PROC SORT DATA=COPULA;

  BY SIM LOB_LT_UW;

RUN;

DATA _NULL_;

  SET CORR END=LASTOBS;

  IF LASTOBS THEN CALL SYMPUT('R',_N_) ;

RUN;

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

  SUBROUTINE MMULT(SIM);

  ARRAY Y[&R]; OUTARGS Y;

  ARRAY CORR[&R,&R] / NOSYMBOLS;

  ARRAY Z[&R] / NOSYMBOLS;

  RC1 = READ_ARRAY('CORR', CORR);

  RC2 = READ_ARRAY('Z', Z, 'Z');

  CALL MULT(Z, CORR, Y);

  RC3 = WRITE_ARRAY('Y', Y, 'Y');

  ENDSUB;

QUIT;

OPTIONS CMPLIB=WORK.FUNCS;

%MACRO MMULT;

   %DO N = 1 %TO 100;

  DATA Z;

  SET COPULA (WHERE=(SIM=&N));

  RUN;

  DATA _NULL_;

  SET Z;

  CALL MMULT(SIM);

  RUN;

  DATA TMP;

  MERGE Z Y;

  RUN;

  PROC SORT DATA=COPULA;

  BY SIM LOB_LT_UW;

  RUN;

  DATA COPULA;

  MERGE COPULA TMP;

  BY SIM LOB_LT_UW;

  RUN;

   %END;

%MEND;

%MMULT;

snoopy369
Barite | Level 11

Seems like you could do that in one pass of the data.  I don't know CALL MMULT so perhaps you might have to abandon that and just do matrix multiplication directly, but does it really require dynamic arrays?  Just use BY group and perform the same operation once per by group on a fixed array.

feyzi
Calcite | Level 5

Hi Snoopy,

MMULT is the Subroutine that I define in the PROC FCMP statement. I've tried the code below, which uses a BY group in the DATA step but I can't get it to work. Would you be able point me in the right direction any chance?

Hi Vince,

Thanks for the heads-up on the RAND function. Look forward to hearing from you.

Thanks guys!

DATA COPULA (KEEP=LOB_LT_UW SIM Z Y)

       CORR (DROP=LOB_LT_UW SIM Z Y);

       SET CORR END=LASTOBS;

       IF LASTOBS THEN CALL SYMPUT('R',_N_) ;

       OUTPUT CORR;

       DO SIM = 1 TO 100;

            Z = RAND('NORMAL', 0, 1);

            OUTPUT COPULA;

      END;

RUN;

PROC SORT DATA=COPULA;

       BY SIM LOB_LT_UW;

RUN;

DATA _NULL_;

       SET COPULA END=LASTOBS;

       IF LASTOBS THEN CALL SYMPUT('S',_N_) ;

RUN;

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

       SUBROUTINE MMULT(SIM);

            ARRAY Z[&S] / NOSYMBOLS;

            ARRAY CORR[&R,&R] / NOSYMBOLS;

            ARRAY Y[&R]; OUTARGS Y;

            RC1 = READ_ARRAY('CORR', CORR);

            RC2 = READ_ARRAY('COPULA', Z, 'SIM', 'Z');;

            CALL MULT(Z, CORR, Y);

            PUT Z=;

       ENDSUB;

QUIT;

OPTIONS CMPLIB=WORK.FUNCS;

DATA _NULL_;

       SET COPULA; BY SIM;

            CALL MMULT(SIM);    

RUN;

snoopy369
Barite | Level 11

I'm mostly out of my depth here since I don't really know matrices very well (other than 'use IML' of course), but what I'm suggesting is writing MMULT to take as arguments all of the elements you want multiplied for one sim - it's a 2x2 array, right, so take as arguments eight values or whatever (in arrays or just as individual values), get all of those values on 1 row of a dataset, and call the function directly (once per by value, either from a dataset that is 1 record per sim, or at LAST.SIM) rather than trying to read from separate datasets. 

Most of the time in SAS, any answer that can be done in a normal data step will be superior to any answer that goes off and reads things on its own.  I don't know matrices well enough to know what the potential issues are here, but it seems to me the straightforward answer should not be difficult.

Vince28_Statcan
Quartz | Level 8


Before I forget since this is partly off topic,

I suggest you use RAND function instead of RANNORM (or equivalently NORMAL) function. Read

C:\Users\martvin\Desktop\Six reasons you should stop using the RANUNI function to generate random numbers - The DO Loop.htm

for implementation details and the less-random aspect of older RAN<distribution name> functions.

I'm looking at the code real quick and will try to help out

Vince28_Statcan
Quartz | Level 8

Rofl thanks, I had had to save a copy to move it to closed network for my supervisor to read -_-. My bad on the local copy of Rick's blog post.

For the sake of it, I ran a 100K simulation pass on a windows vista / SAS9.2 slowpoke tower. Here's the log. Just opening and closing 100K without any processing would take substantially longer than that!

2971  %multivarnormsim(numsim=100000, nvar=2, corrmat=%quote(1 .5, .5 1));

NOTE: The data set WORK.WANT has 100000 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           2.31 seconds

      cpu time            0.18 seconds

Vince28_Statcan
Quartz | Level 8

Ok, let's make sure I understand properly here - you are looking to generate a multivariate normal variable N(mu, sigma) with sigma = (1 .5 , .5 1) so you generate two independant normal random vars and use the covar matrix to generate the resulting bivariate normal random variable? So in the end you wind up with a rescaled value X+.5Y in the same row as your X simulation and .5X+Y in the row of your Y simulation for each simulation pair.

As Snoopy pointed out, this can probably be done in moreorless a single pass of data which would save you tremendous amounts of time for processing. The bigger question would be do you need to build a generic process that will be able to further handle N-variate random variables or are you looking for a solution tailored to your problem?

A simple approach to save time if you don't need an expandable solution is to do everything in one line and hard the multiplication since its easy to achieve when N = 2 dimensions.

E.g.

data want;

     do sim = 1 to 100;

          Z_A = rand('norm', 0, 1);

          Z_B = rand('norm', 0, 1);

          W_A=Z_A+.5*Z_B;

          W_B=.5*Z_A+Z_B;

          output;

     end;

run; /* you could use transpose if you really needed the bivariate data over 2 rows */

Assuming you want this to be somewhat generic, here's how I would approach the problem:

proc fcmp OUTLIB=WORK.FUNCS.MATRIX;
     subroutine mmult(z {*,*}, corr {*,*}, y {*,*});
     outargs y;
     call mult(z, corr, y);
     endsub;

run;

  

options cmplib=work.funcs;


%macro multivarnormsim(numsim=, nvar=, corrmat= );
data want;
     array z {&nvar.} z1-z&nvar.;
  array zz {1, &nvar.} _TEMPORARY_;
     array corr {&nvar., &nvar.} _temporary_ (&corrmat);
     array w {&nvar.} w1-w&nvar.;
  array ww {1, &nvar.} _TEMPORARY_;
     do i=1 to &numsim;
          do j=1 to dim(z);
               z{j} = rand('norm', 0, 1);
      zz{1, j}=z{j};
          end;
          call mmult(zz, corr, ww);
    do j=1 to dim(w);
   w{j}=ww{1, j};
    end;
    output;
     end;
     drop i j;
run;
%mend;

%multivarnormsim(numsim=100, nvar=5, corrmat=     %quote( 1 0 0 0 .5,
                                                                                      0 1 .2 .2 0,
                                                                                       0 .2 1 0  0,
                                                                                       0 .2 0 1 0,
                                                                                       .5 0 0 0 1)
                              ); /*the nvar*nvar values could've been written in a single line it doesn't matter */

feyzi
Calcite | Level 5

Thanks a lot Vince. It works beautifully!

FYI:

I was looking for a generic solution. At the moment I actually have 114 variables (ran in 2 mins 10 secs) but is open to change.

Thanks a lot again!

Emre

Vince28_Statcan
Quartz | Level 8

Glad it helped.

I'm guessing since you were already a FCMP user that it shouldn't be too difficult for you to figure out what the code does but if you needed explanation on some of the implementation details, feel free to ask. I usually put more comments in my codes on the forums but it has not always proven beneficial to others -_-

Vince

feyzi
Calcite | Level 5

Thanks Vince,

It's pretty straightforward to follow!

Here's the final code. One thing I don't get is why the NVAR variable defined in the CALL SYMPUT won't work in the DATA step but if I add in the %LET NVAR = &NVAR, it works. Any thoughts?

Many thanks

Emre

DATA _NULL_;

       SET CORREL NOBS=N;

       CALL SYMPUT('NVAR',N);

RUN;

%LET NVAR = &NVAR;

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

     SUBROUTINE MMULT(Z [*,*], Y [*,*]);

     OUTARGS Y;


      ARRAY CORREL[&NVAR,&NVAR] / NOSYMBOLS;

      RC1 = READ_ARRAY('CORREL', CORREL);

      CALL MULT(Z, CORREL, Y);

      ENDSUB;

RUN;

OPTIONS CMPLIB=WORK.FUNCS;

DATA COPULA;

    ARRAY Z [&NVAR] Z1-Z&NVAR;

    ARRAY ZZ [1, &NVAR] _TEMPORARY_;

    ARRAY Y [&NVAR] Y1-Y&NVAR;

    ARRAY YY [1, &NVAR] _TEMPORARY_;

    ARRAY X [&NVAR] X1-X&NVAR;

    ARRAY XX [1, &NVAR] _TEMPORARY_;

    ARRAY U [&NVAR] U1-U&NVAR;

    ARRAY UU [1, &NVAR] _TEMPORARY_;

    KEEP SIM U1-U&NVAR;

    DO SIM=1 TO &SIMS;

         DO J=1 TO DIM(Z);

             Z = RAND('NORM', 0, 1);

             ZZ[1, J] = Z;

         END;

         CALL MMULT(ZZ, YY);

         DO J=1 TO DIM(Y);

              Y = YY[1, J];

        END;

        DO J=1 TO DIM(X);

            CHI = RAND('CHISQ', &DOF);

            X = Y *  SQRT(&DOF) / SQRT(CHI);

            XX[1, J] = X;

        END;

        DO J=1 TO DIM(U);

             U = CDF('T', X, &DOF);

             UU[1, J] = U;

        END;

        OUTPUT;

   END;

RUN;

Vince28_Statcan
Quartz | Level 8

Hrm if you wanted to access &NVAR within the same data step in which the call symput was made, this is an issue with compilation vs execution. call symput statements create "piles" of macro statements that are only compiled after the execution of a given data step. Hence, &NVAR defined by the call symput only after the entire data step is run. This is similar, conceptually to how call execute macros are only ran after the data step making it impossible with this approach to use the results of the macro you executed with call execute within the same data step.

With SAS 9.2+ it is possible to go around this issue with proc FCMP and run_macro but it is rarely necessary and requires a lot of wrapping. Most often it simply means that you either do not fully grasp compilation vs execution and the timing complexity between macro code and non-macro code.

I don't think the %let NVAR=&NVAR; statement actually achieved anything in your case. It is normally only used if there were leading/trailing spaces in the macro variable and that you need to "strip" them because you deref (&) the macro variable in double quotes as a string within a data step/proc.

feyzi
Calcite | Level 5

Perfect again Vince.

DATA _NULL_;

  SET CORREL NOBS=N;

  CALL SYMPUT('NVAR', STRIP(N));

RUN;

Thanks a lot.

Emre

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 4804 views
  • 0 likes
  • 4 in conversation