Solved: PROC FCMP Subsection of Dynamic Array

feyzi · Posted 11-15-2013 05:24 AM

Hello,

I was wondering if anyone knows a way of taking a dynamic array in PROC FCMP as such:

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

SUBROUTINE Y(SIM);

ARRAY Z[&S, 2] / NOSYMBOLS;

RC2 = READ_ARRAY('COPULA', Z, 'SIM', 'Z');;

PUT Z=;

ENDSUB;

QUIT;

...and then taking a defined subsection of this to create a new dynamic array.

Thanks a lot

Emre

Vince28_Statcan · Posted 11-15-2013 12:48 PM

Ok, let's make sure I understand properly here - you are looking to generate a multivariate normal variable N(mu, sigma) with sigma = (1 .5 , .5 1) so you generate two independant normal random vars and use the covar matrix to generate the resulting bivariate normal random variable? So in the end you wind up with a rescaled value X+.5Y in the same row as your X simulation and .5X+Y in the row of your Y simulation for each simulation pair.

As Snoopy pointed out, this can probably be done in moreorless a single pass of data which would save you tremendous amounts of time for processing. The bigger question would be do you need to build a generic process that will be able to further handle N-variate random variables or are you looking for a solution tailored to your problem?

A simple approach to save time if you don't need an expandable solution is to do everything in one line and hard the multiplication since its easy to achieve when N = 2 dimensions.

E.g.

data want;

do sim = 1 to 100;

Z_A = rand('norm', 0, 1);

Z_B = rand('norm', 0, 1);

W_A=Z_A+.5*Z_B;

W_B=.5*Z_A+Z_B;

output;

end;

run; /* you could use transpose if you really needed the bivariate data over 2 rows */

Assuming you want this to be somewhat generic, here's how I would approach the problem:

proc fcmp OUTLIB=WORK.FUNCS.MATRIX;
     subroutine mmult(z {*,*}, corr {*,*}, y {*,*});
     outargs y;
     call mult(z, corr, y);
     endsub;

run;

options cmplib=work.funcs;

%macro multivarnormsim(numsim=, nvar=, corrmat= );
data want;
     array z {&nvar.} z1-z&nvar.;
array zz {1, &nvar.} _TEMPORARY_;
     array corr {&nvar., &nvar.} _temporary_ (&corrmat);
     array w {&nvar.} w1-w&nvar.;
array ww {1, &nvar.} _TEMPORARY_;
     do i=1 to &numsim;
          do j=1 to dim(z);
               z{j} = rand('norm', 0, 1);
      zz{1, j}=z{j};
          end;
          call mmult(zz, corr, ww);
    do j=1 to dim(w);
   w{j}=ww{1, j};
    end;
    output;
     end;
     drop i j;
run;
%mend;

%multivarnormsim(numsim=100, nvar=5, corrmat=     %quote( 1 0 0 0 .5,
                                                                                      0 1 .2 .2 0,
                                                                                       0 .2 1 0 0,
                                                                                       0 .2 0 1 0,
                                                                                       .5 0 0 0 1)
                              ); /*the nvar*nvar values could've been written in a single line it doesn't matter */

View solution in original post

Vince28_Statcan · Posted 11-15-2013 08:18 AM

Could you be more specific about what you are trying to achieve and why? Complex matrix processing is a feature of proc IML much, much more than proc fcmp but depending on what all you wish to achieve, you might be able to program sub-arrays.

I don't think the call dynamic_array routine was designed to support selection of subarrays for scale-down. So I think you'd have to program a subroutine specific for your task or maybe look online for others that might have needed a similar feature.

feyzi · Posted 11-15-2013 09:20 AM

Hi Vince,

Sorry sure thing.

So I have 100,000 simulations of z values (from a standard normal distribution) for two categories (A, B), which look like this:

LOB	SIM	Z
A	1	-0.32659
B	1	0.450515
A	2	1.542437
B	2	0.396981
A	3	0.259029
B	3	0.045685
A	4	-0.55583
B	4	0.728347
A	5	0.77872
B	5	-1.27385

I want to carry out a matrix multiplication of each of these simulations and the following correlation matrix table so I get correlated variables:

A	B
1	0.5
0.5	1

Now I can do this with the code below but my problem is because there's a lot of reading and writing going on for 100,000 simulations my run-time is way too long.So I thought a better way would be to read in the Z variable in COPULA table into an array, take each simulation at a time (this is the bit where I want to take the subsection of the Z array), multiply with the correlation matrix, then append each output array on top of previous ones. Do all of this in temporary arrays and then once complete, write the final array containing the correlated values once.

So I would have to take a subsection of an array and be able append two arrays as well.

Any other alternative suggestions also most welcome! Unfortunately I don't have PROC IML so have to do with other means. I hope

Thanks a lot for the time!

Emre

DATA COPULA (KEEP=LOB_LT_UW SIM Z)

CORR (DROP=LOB_LT_UW SIM Z);

SET CORR;

OUTPUT CORR;

DO SIM = 1 TO 100;

Z = NORMAL(123);

OUTPUT COPULA;

END;

RUN;

PROC SORT DATA=COPULA;

BY SIM LOB_LT_UW;

RUN;

DATA _NULL_;

SET CORR END=LASTOBS;

IF LASTOBS THEN CALL SYMPUT('R',_N_) ;

RUN;

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

SUBROUTINE MMULT(SIM);

ARRAY Y[&R]; OUTARGS Y;

ARRAY CORR[&R,&R] / NOSYMBOLS;

ARRAY Z[&R] / NOSYMBOLS;

RC1 = READ_ARRAY('CORR', CORR);

RC2 = READ_ARRAY('Z', Z, 'Z');

CALL MULT(Z, CORR, Y);

RC3 = WRITE_ARRAY('Y', Y, 'Y');

ENDSUB;

QUIT;

OPTIONS CMPLIB=WORK.FUNCS;

%MACRO MMULT;

%DO N = 1 %TO 100;

DATA Z;

SET COPULA (WHERE=(SIM=&N));

RUN;

DATA _NULL_;

SET Z;

CALL MMULT(SIM);

RUN;

DATA TMP;

MERGE Z Y;

RUN;

PROC SORT DATA=COPULA;

BY SIM LOB_LT_UW;

RUN;

DATA COPULA;

MERGE COPULA TMP;

BY SIM LOB_LT_UW;

RUN;

%END;

%MEND;

%MMULT;

snoopy369 · Posted 11-15-2013 10:07 AM

Seems like you could do that in one pass of the data. I don't know CALL MMULT so perhaps you might have to abandon that and just do matrix multiplication directly, but does it really require dynamic arrays? Just use BY group and perform the same operation once per by group on a fixed array.

feyzi · Posted 11-15-2013 12:11 PM

Hi Snoopy,

MMULT is the Subroutine that I define in the PROC FCMP statement. I've tried the code below, which uses a BY group in the DATA step but I can't get it to work. Would you be able point me in the right direction any chance?

Hi Vince,

Thanks for the heads-up on the RAND function. Look forward to hearing from you.

Thanks guys!

DATA COPULA (KEEP=LOB_LT_UW SIM Z Y)

CORR (DROP=LOB_LT_UW SIM Z Y);

SET CORR END=LASTOBS;

IF LASTOBS THEN CALL SYMPUT('R',_N_) ;

OUTPUT CORR;

DO SIM = 1 TO 100;

Z = RAND('NORMAL', 0, 1);

OUTPUT COPULA;

END;

RUN;

PROC SORT DATA=COPULA;

BY SIM LOB_LT_UW;

RUN;

DATA _NULL_;

SET COPULA END=LASTOBS;

IF LASTOBS THEN CALL SYMPUT('S',_N_) ;

RUN;

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

SUBROUTINE MMULT(SIM);

ARRAY Z[&S] / NOSYMBOLS;

ARRAY CORR[&R,&R] / NOSYMBOLS;

ARRAY Y[&R]; OUTARGS Y;

RC1 = READ_ARRAY('CORR', CORR);

RC2 = READ_ARRAY('COPULA', Z, 'SIM', 'Z');;

CALL MULT(Z, CORR, Y);

PUT Z=;

ENDSUB;

QUIT;

OPTIONS CMPLIB=WORK.FUNCS;

DATA _NULL_;

SET COPULA; BY SIM;

CALL MMULT(SIM);

RUN;

snoopy369 · Posted 11-15-2013 12:28 PM

I'm mostly out of my depth here since I don't really know matrices very well (other than 'use IML' of course), but what I'm suggesting is writing MMULT to take as arguments all of the elements you want multiplied for one sim - it's a 2x2 array, right, so take as arguments eight values or whatever (in arrays or just as individual values), get all of those values on 1 row of a dataset, and call the function directly (once per by value, either from a dataset that is 1 record per sim, or at LAST.SIM) rather than trying to read from separate datasets.

Most of the time in SAS, any answer that can be done in a normal data step will be superior to any answer that goes off and reads things on its own. I don't know matrices well enough to know what the potential issues are here, but it seems to me the straightforward answer should not be difficult.

Vince28_Statcan · Posted 11-15-2013 11:08 AM

Before I forget since this is partly off topic,

I suggest you use RAND function instead of RANNORM (or equivalently NORMAL) function. Read

C:\Users\martvin\Desktop\Six reasons you should stop using the RANUNI function to generate random numbers - The DO Loop.htm

for implementation details and the less-random aspect of older RAN<distribution name> functions.

I'm looking at the code real quick and will try to help out

snoopy369 · Posted 11-15-2013 11:38 AM

FYI - the non-c: link to that post is here:

Six reasons you should stop using the RANUNI function to generate random numbers - The DO Loop

Vince28_Statcan · Posted 11-15-2013 12:49 PM

Rofl thanks, I had had to save a copy to move it to closed network for my supervisor to read -_-. My bad on the local copy of Rick's blog post.

For the sake of it, I ran a 100K simulation pass on a windows vista / SAS9.2 slowpoke tower. Here's the log. Just opening and closing 100K without any processing would take substantially longer than that!

2971 %multivarnormsim(numsim=100000, nvar=2, corrmat=%quote(1 .5, .5 1));

NOTE: The data set WORK.WANT has 100000 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 2.31 seconds

cpu time 0.18 seconds

Vince28_Statcan · Posted 11-15-2013 12:48 PM

Ok, let's make sure I understand properly here - you are looking to generate a multivariate normal variable N(mu, sigma) with sigma = (1 .5 , .5 1) so you generate two independant normal random vars and use the covar matrix to generate the resulting bivariate normal random variable? So in the end you wind up with a rescaled value X+.5Y in the same row as your X simulation and .5X+Y in the row of your Y simulation for each simulation pair.

As Snoopy pointed out, this can probably be done in moreorless a single pass of data which would save you tremendous amounts of time for processing. The bigger question would be do you need to build a generic process that will be able to further handle N-variate random variables or are you looking for a solution tailored to your problem?

A simple approach to save time if you don't need an expandable solution is to do everything in one line and hard the multiplication since its easy to achieve when N = 2 dimensions.

E.g.

data want;

do sim = 1 to 100;

Z_A = rand('norm', 0, 1);

Z_B = rand('norm', 0, 1);

W_A=Z_A+.5*Z_B;

W_B=.5*Z_A+Z_B;

output;

end;

run; /* you could use transpose if you really needed the bivariate data over 2 rows */

Assuming you want this to be somewhat generic, here's how I would approach the problem:

proc fcmp OUTLIB=WORK.FUNCS.MATRIX;
     subroutine mmult(z {*,*}, corr {*,*}, y {*,*});
     outargs y;
     call mult(z, corr, y);
     endsub;

run;

options cmplib=work.funcs;

%macro multivarnormsim(numsim=, nvar=, corrmat= );
data want;
     array z {&nvar.} z1-z&nvar.;
array zz {1, &nvar.} _TEMPORARY_;
     array corr {&nvar., &nvar.} _temporary_ (&corrmat);
     array w {&nvar.} w1-w&nvar.;
array ww {1, &nvar.} _TEMPORARY_;
     do i=1 to &numsim;
          do j=1 to dim(z);
               z{j} = rand('norm', 0, 1);
      zz{1, j}=z{j};
          end;
          call mmult(zz, corr, ww);
    do j=1 to dim(w);
   w{j}=ww{1, j};
    end;
    output;
     end;
     drop i j;
run;
%mend;

%multivarnormsim(numsim=100, nvar=5, corrmat=     %quote( 1 0 0 0 .5,
                                                                                      0 1 .2 .2 0,
                                                                                       0 .2 1 0 0,
                                                                                       0 .2 0 1 0,
                                                                                       .5 0 0 0 1)
                              ); /*the nvar*nvar values could've been written in a single line it doesn't matter */

feyzi · Posted 11-16-2013 01:19 PM

Thanks a lot Vince. It works beautifully!

FYI:

I was looking for a generic solution. At the moment I actually have 114 variables (ran in 2 mins 10 secs) but is open to change.

Thanks a lot again!

Emre

Vince28_Statcan · Posted 11-18-2013 07:29 AM

Glad it helped.

I'm guessing since you were already a FCMP user that it shouldn't be too difficult for you to figure out what the code does but if you needed explanation on some of the implementation details, feel free to ask. I usually put more comments in my codes on the forums but it has not always proven beneficial to others -_-

Vince

feyzi · Posted 11-18-2013 08:00 AM

Thanks Vince,

It's pretty straightforward to follow!

Here's the final code. One thing I don't get is why the NVAR variable defined in the CALL SYMPUT won't work in the DATA step but if I add in the %LET NVAR = &NVAR, it works. Any thoughts?

Many thanks

Emre

DATA _NULL_;

SET CORREL NOBS=N;

CALL SYMPUT('NVAR',N);

RUN;

%LET NVAR = &NVAR;

PROC FCMP OUTLIB=WORK.FUNCS.MATRIX;

SUBROUTINE MMULT(Z [*,*], Y [*,*]);

OUTARGS Y;

ARRAY CORREL[&NVAR,&NVAR] / NOSYMBOLS;

RC1 = READ_ARRAY('CORREL', CORREL);

CALL MULT(Z, CORREL, Y);

ENDSUB;

RUN;

OPTIONS CMPLIB=WORK.FUNCS;

DATA COPULA;

ARRAY Z [&NVAR] Z1-Z&NVAR;

ARRAY ZZ [1, &NVAR] _TEMPORARY_;

ARRAY Y [&NVAR] Y1-Y&NVAR;

ARRAY YY [1, &NVAR] _TEMPORARY_;

ARRAY X [&NVAR] X1-X&NVAR;

ARRAY XX [1, &NVAR] _TEMPORARY_;

ARRAY U [&NVAR] U1-U&NVAR;

ARRAY UU [1, &NVAR] _TEMPORARY_;

KEEP SIM U1-U&NVAR;

DO SIM=1 TO &SIMS;

DO J=1 TO DIM(Z);

Z = RAND('NORM', 0, 1);

ZZ[1, J] = Z;

END;

CALL MMULT(ZZ, YY);

DO J=1 TO DIM(Y);

Y = YY[1, J];

END;

DO J=1 TO DIM(X);

CHI = RAND('CHISQ', &DOF);

X = Y * SQRT(&DOF) / SQRT(CHI);

XX[1, J] = X;

END;

DO J=1 TO DIM(U);

U = CDF('T', X, &DOF);

UU[1, J] = U;

END;

OUTPUT;

END;

RUN;

Vince28_Statcan · Posted 11-18-2013 08:23 AM

Hrm if you wanted to access &NVAR within the same data step in which the call symput was made, this is an issue with compilation vs execution. call symput statements create "piles" of macro statements that are only compiled after the execution of a given data step. Hence, &NVAR defined by the call symput only after the entire data step is run. This is similar, conceptually to how call execute macros are only ran after the data step making it impossible with this approach to use the results of the macro you executed with call execute within the same data step.

With SAS 9.2+ it is possible to go around this issue with proc FCMP and run_macro but it is rarely necessary and requires a lot of wrapping. Most often it simply means that you either do not fully grasp compilation vs execution and the timing complexity between macro code and non-macro code.

I don't think the %let NVAR=&NVAR; statement actually achieved anything in your case. It is normally only used if there were leading/trailing spaces in the macro variable and that you need to "strip" them because you deref (&) the macro variable in double quotes as a string within a data step/proc.

feyzi · Posted 11-18-2013 10:04 AM

Perfect again Vince.

DATA _NULL_;

SET CORREL NOBS=N;

CALL SYMPUT('NVAR', STRIP(N));

RUN;

Thanks a lot.

Emre

PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Re: PROC FCMP Subsection of Dynamic Array

Registration is open

SAS Training: Just a Click Away