BookmarkSubscribeRSS Feed
kthenaj
Fluorite | Level 6

Hello all,

I am working on a macro in which I perform calculations on subsets of variables from a dataset. Below is the code and as you can see I've manually duplicated the subset calculations, I've done pairs, groups of 3, 4, 5. What I want to do is make my code a little easier so it handles any number of subsets. It works just fine the way it is, I just think there must be an easier/shorter/cleaner way. I am limited in that the output needs to have the same notation y_##, because as  you can see in the final lines a dataset is created just containing variables with the prefix "y_" because they are the only ones of interest in the end. Let me know if further explanation is needed (a sample data set is also supplied below). Thanks!

data pfisher;

set fisher;

/* pairs of two */

%do i=1 %to &k;

%do j=&i+1 %to &k;

  py&i&j = b&i + b&j;

  pymin&i&j = min&i + min&j;

  pymax&i&j = max&i + max&j;

  y_&i&j = (py&i&j-pymin&i&j)/(pymax&i&j-pymin&i&j);

%end; %end;

/* groups of 3 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

  py&i&j&l = b&i + b&j + b&l;

  pymin&i&j&l = min&i + min&j + min&l;

  pymax&i&j&l = max&i + max&j + max&l;

  y_&i&j&l = (py&i&j&l-pymin&i&j&l)/(pymax&i&j&l-pymin&i&j&l);

%end; %end; %end;

/* groups of 4 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

  py&i&j&l&m = b&i + b&j + b&l + b&m;

  pymin&i&j&l&m = min&i + min&j + min&l + min&m;

  pymax&i&j&l&m = max&i + max&j + max&l + max&m;

  y_&i&j&l&m = (py&i&j&l&m-pymin&i&j&l&m)/(pymax&i&j&l&m-pymin&i&j&l&m);

%end; %end; %end; %end;

/* groups of 5 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

%do n=&m+1 %to &k;

  py&i&j&l&m&n = b&i + b&j + b&l + b&m +b&n;

  pymin&i&j&l&m&n = min&i + min&j + min&l + min&m + min&n;

  pymax&i&j&l&m&n = max&i + max&j + max&l + max&m + max&n;

  y_&i&j&l&m&n = (py&i&j&l&m&n-pymin&i&j&l&m&n)/(pymax&i&j&l&m&n-pymin&i&j&l&m&n);

%end; %end; %end; %end; %end;

/**** need to extend to any possible subset size *****/

data pydefs;

set pfisher;

keep y_:;

run;

Here is an example of a data set that the macro would be used on where k = 4 (and thus the group of 5 wouldn't be applicable):

data fisher;

  do i = 1 to 10;

  call streaminit(123);

  u1 = rand("Uniform");

  b1 = ceil(5*u1);

  min1 = ceil(3*u1);

  max1 = ceil(2*u1);

  call streaminit(456);

  u2 = rand("Uniform");

  b2 = ceil(5*u2);

  min2 = ceil(3*u2);

  max2 = ceil(2*u2);

  call streaminit(789);

  u3 = rand("Uniform");

  b3 = ceil(5*u3);

  min3 = ceil(3*u3);

  max3 = ceil(2*u3);

  call streaminit(235);

  u4 = rand("Uniform");

  b4 = ceil(5*u4);

  min4 = ceil(3*u4);

  max4 = ceil(2*u4);

  output;

  end;

drop u1 u2 u3 u4 i;

run;

6 REPLIES 6
art297
Opal | Level 21

Given your example data you end up with some cases where py..max equals py..min equals 0 and then is used as a divisor, resulting in a division by 0 warning and a missing value being assigned.

Is that the correct result or am I missing something?

kthenaj
Fluorite | Level 6

Sorry, I didn't run that particular example data set before I posted. This should fix that problem:

data fisher;

  do i = 1 to 10;

  call streaminit(123);

  u1 = rand("Uniform");

  b1 = ceil(5*u1);

  min1 = ceil(3*u1);

  max1 = ceil(2*u1);

  call streaminit(456);

  u2 = rand("Uniform");

  b2 = ceil(5*u2);

  min2 = ceil(3*u2);

  max2 = ceil(2*u2);

  call streaminit(789);

  u3 = rand("Uniform");

  b3 = ceil(5*u3);

  min3 = ceil(3*u3);

  max3 = ceil(2*u3);

  call streaminit(235);

  u4 = rand("Uniform");

  b4 = ceil(5*u4);

  min4 = ceil(3*u4);

  max4 = ceil(2*u4);

  output;

  end;

drop u1 u2 u3 u4 i;

run;

%macro test(k);

data pfisher;

set fisher;

/* pairs of two */

%do i=1 %to &k;

%do j=&i+1 %to &k;

  py&i&j = b&i + b&j;

  pymin&i&j = min&i + min&j;

  pymax&i&j = max&i + max&j;

  y_&i&j = (py&i&j-pymin&i&j)/(pymax&i&j-pymin&i&j+0.0001);

%end; %end;

/* groups of 3 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

  py&i&j&l = b&i + b&j + b&l;

  pymin&i&j&l = min&i + min&j + min&l;

  pymax&i&j&l = max&i + max&j + max&l;

  y_&i&j&l = (py&i&j&l-pymin&i&j&l)/(pymax&i&j&l-pymin&i&j&l+0.0001);

%end; %end; %end;

/* groups of 4 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

  py&i&j&l&m = b&i + b&j + b&l + b&m;

  pymin&i&j&l&m = min&i + min&j + min&l + min&m;

  pymax&i&j&l&m = max&i + max&j + max&l + max&m;

  y_&i&j&l&m = (py&i&j&l&m-pymin&i&j&l&m)/(pymax&i&j&l&m-pymin&i&j&l&m+0.0001);

%end; %end; %end; %end;

/* groups of 5 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

%do n=&m+1 %to &k;

  py&i&j&l&m&n = b&i + b&j + b&l + b&m +b&n;

  pymin&i&j&l&m&n = min&i + min&j + min&l + min&m + min&n;

  pymax&i&j&l&m&n = max&i + max&j + max&l + max&m + max&n;

  y_&i&j&l&m&n = (py&i&j&l&m&n-pymin&i&j&l&m&n)/(pymax&i&j&l&m&n-pymin&i&j&l&m&n+0.0001);

%end; %end; %end; %end; %end;

/**** need to extend to any possible subset size *****/

run;

data pydefs;

set pfisher;

keep y_:;

run;

%mend;

%test(k=4);

art297
Opal | Level 21

: You said that you would like it to work for any number of variables.  Is there a max number of variables that it should be able to work for?  I ask because, if there isn't, the code would be a little more complicated to develop.  If there is, what would that number be?

kthenaj
Fluorite | Level 6

I think k = 10 would be a reasonable maximum.

art297
Opal | Level 21

I only took it out to six iterations, but you could easily expand it to 10.

The only things I really did was reduce the number of loops involved, used a proc sql call to identify the maximum number of variables, kept the %do variables to a range beginning with i, and limited whether code would run (and variables created) based on the maximum number of variables.

If you wanted to generalize the code, you could write a fairly simple data _null_ step that wrote the code as an include file.

/*determine the value of k*/

proc sql noprint;

  create table temp as

    select name

      from dictionary.columns

        where libname="WORK" and

              memname="FISHER" and

              upcase(name) like "MAX%"

  ;

quit;

%let maxvars=&sqlobs.;

%macro newtest;

  data pfisher;

    set fisher;

    %if &maxvars. ge 2 %then %do;

      %do i=1 %to &maxvars.;

        %do j=&i+1 %to &maxvars.;

          py&i&j = b&i + b&j;

          pymin&i&j = min&i + min&j;

          pymax&i&j = max&i + max&j;

          y_&i&j = (py&i&j-pymin&i&j)/(pymax&i&j-pymin&i&j+0.0001);

          %if &maxvars. ge 3 %then %do k=&j+1 %to &maxvars.;

            py&i&j&k = b&i + b&j + b&k;

            pymin&i&j&k = min&i + min&j + min&k;

            pymax&i&j&k = max&i + max&j + max&k;

            y_&i&j&k = (py&i&j&k-pymin&i&j&k)/(pymax&i&j&k-pymin&i&j&k+0.0001);

            %if &maxvars. ge 4 %then %do l=&k+1 %to &maxvars.;

              py&i&j&k&l = b&i + b&j + b&k + b&l;

              pymin&i&j&k&l = min&i + min&j + min&k + min&l;

              pymax&i&j&k&l = max&i + max&j + max&k + max&l;

              y_&i&j&k&l = (py&i&j&k&l-pymin&i&j&k&l)/(pymax&i&j&k&l-pymin&i&j&k&l+0.0001);

              %if &maxvars. ge 5 %then %do m=&l+1 %to &maxvars.;

                py&i&j&k&l&m = b&i + b&j + b&k + b&l +b&m;

                pymin&i&j&k&l&m = min&i + min&j + min&k + min&l + min&m;

                pymax&i&j&k&l&m = max&i + max&j + max&k + max&l + max&m;

                y_&i&j&k&l&m = (py&i&j&k&l&m-pymin&i&j&k&l&m)/(pymax&i&j&k&l&m-pymin&i&j&k&l&m+0.0001);

                %if &maxvars. ge 6 %then %do n=&m+1 %to &maxvars.;

                  py&i&j&k&l&m&n = b&i + b&j + b&k + b&l +b&m;

                  pymin&i&j&k&l&m&n = min&i + min&j + min&k + min&l + min&m + min&n;

                  pymax&i&j&k&l&m&n = max&i + max&j + max&k + max&l + max&m + max&n;

                  y_&i&j&k&l&m&n = (py&i&j&k&l&m&n-pymin&i&j&k&l&m&n)/(pymax&i&j&k&l&m&n-pymin&i&j&k&l&m&n+0.0001);

                %end;

              %end;

            %end;

          %end;

        %end;

      %end;

    %end;

  run;

  data pydefs;

    set pfisher;

    keep y_:;

  run;

%mend newtest;

%newtest

art297
Opal | Level 21

: Here is some code that will cover all possibilities.  You, of course, are responsible for insuring that it runs correctly.  It ran for me and appears to produce the same results as your original code:

/*specify input filename*/

%let fname=fisher;

/*create some sample data*/

data &fname.;

  do i = 1 to 10;

  call streaminit(123);

  u1 = rand("Uniform");

  b1 = ceil(5*u1);

  min1 = ceil(3*u1);

  max1 = ceil(2*u1);

  call streaminit(456);

  u2 = rand("Uniform");

  b2 = ceil(5*u2);

  min2 = ceil(3*u2);

  max2 = ceil(2*u2);

 

  call streaminit(789);

  u3 = rand("Uniform");

  b3 = ceil(5*u3);

  min3 = ceil(3*u3);

  max3 = ceil(2*u3);

  call streaminit(235);

  u4 = rand("Uniform");

  b4 = ceil(5*u4);

  min4 = ceil(3*u4);

  max4 = ceil(2*u4);

 

  output;

  end;

drop u1 u2 u3 u4 i;

run;

/*determine the value of maxvars*/

proc sql noprint;

  create table temp as

    select name

      from dictionary.columns

        where libname="WORK" and

              memname="%sysfunc(upcase(&fname.))" and

              upcase(name) like "MAX%"

  ;

quit;

%let maxvars=&sqlobs.;

/*create the code*/

filename sascode temp;

data _null_;

  length code stem part2 part3 part4 part5 end $255;

  file sascode;

  do i=105 to %eval(104+&maxvars.);

    part1=byte(i);

    if i eq 105 then do;

      part2='=1 %to &maxvars.;';

      stem=catt('&',byte(i));

      part3=catt('=b&',byte(i));

      part4=catt('=min&',byte(i));

      part5=catt('=max&',byte(i));

      end='%end;';

      code='%macro create_code;';

      put code;

      code='data p&fname.;';

      put code;

      code='set &fname.;';

      put code;

    end;

    else do;

      part2=catt('=&',byte(i-1),'+1 %to &maxvars.;');

      stem=catt(stem,'&',byte(i));

      part3=catt(part3,'+b&',byte(i));

      part4=catt(part4,'+min&',byte(i));

      part5=catt(part5,'+max&',byte(i));

      end=catt(end,'%end;');

    end;

    code='%do '||part1||part2;

    put code;

    if i gt 105 then do;

      code=catt('py',stem,part3,';');

      put code;

      code=catt('pymin',stem,part4,';');

      put code;

      code=catt('pymax',stem,part5,';');

      put code;

      code=catt('y_',stem,'=(py',stem,'-pymin',stem,

       ')/(pymax',stem,'-pymin',stem,'+0.0001);');

      put code;

    end;

    if i eq %eval(104+&maxvars.) then do;

      put end;

      code='run;';

      put code;

      code='%mend create_code;';

      put code;

    end;

  end;

run;

/*include and run the created code*/

%include sascode /source2;

%create_code

data pydefs;

  set p&fname.;;

  keep y_:;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 965 views
  • 0 likes
  • 2 in conversation