DATA Step, Macro, Functions and more

Is there an easier way to do my do-loops? (it's within a macro)

Reply
Occasional Contributor
Posts: 13

Is there an easier way to do my do-loops? (it's within a macro)

Hello all,

I am working on a macro in which I perform calculations on subsets of variables from a dataset. Below is the code and as you can see I've manually duplicated the subset calculations, I've done pairs, groups of 3, 4, 5. What I want to do is make my code a little easier so it handles any number of subsets. It works just fine the way it is, I just think there must be an easier/shorter/cleaner way. I am limited in that the output needs to have the same notation y_##, because as  you can see in the final lines a dataset is created just containing variables with the prefix "y_" because they are the only ones of interest in the end. Let me know if further explanation is needed (a sample data set is also supplied below). Thanks!

data pfisher;

set fisher;

/* pairs of two */

%do i=1 %to &k;

%do j=&i+1 %to &k;

  py&i&j = b&i + b&j;

  pymin&i&j = min&i + min&j;

  pymax&i&j = max&i + max&j;

  y_&i&j = (py&i&j-pymin&i&j)/(pymax&i&j-pymin&i&j);

%end; %end;

/* groups of 3 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

  py&i&j&l = b&i + b&j + b&l;

  pymin&i&j&l = min&i + min&j + min&l;

  pymax&i&j&l = max&i + max&j + max&l;

  y_&i&j&l = (py&i&j&l-pymin&i&j&l)/(pymax&i&j&l-pymin&i&j&l);

%end; %end; %end;

/* groups of 4 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

  py&i&j&l&m = b&i + b&j + b&l + b&m;

  pymin&i&j&l&m = min&i + min&j + min&l + min&m;

  pymax&i&j&l&m = max&i + max&j + max&l + max&m;

  y_&i&j&l&m = (py&i&j&l&m-pymin&i&j&l&m)/(pymax&i&j&l&m-pymin&i&j&l&m);

%end; %end; %end; %end;

/* groups of 5 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

%do n=&m+1 %to &k;

  py&i&j&l&m&n = b&i + b&j + b&l + b&m +b&n;

  pymin&i&j&l&m&n = min&i + min&j + min&l + min&m + min&n;

  pymax&i&j&l&m&n = max&i + max&j + max&l + max&m + max&n;

  y_&i&j&l&m&n = (py&i&j&l&m&n-pymin&i&j&l&m&n)/(pymax&i&j&l&m&n-pymin&i&j&l&m&n);

%end; %end; %end; %end; %end;

/**** need to extend to any possible subset size *****/

data pydefs;

set pfisher;

keep y_:;

run;

Here is an example of a data set that the macro would be used on where k = 4 (and thus the group of 5 wouldn't be applicable):

data fisher;

  do i = 1 to 10;

  call streaminit(123);

  u1 = rand("Uniform");

  b1 = ceil(5*u1);

  min1 = ceil(3*u1);

  max1 = ceil(2*u1);

  call streaminit(456);

  u2 = rand("Uniform");

  b2 = ceil(5*u2);

  min2 = ceil(3*u2);

  max2 = ceil(2*u2);

  call streaminit(789);

  u3 = rand("Uniform");

  b3 = ceil(5*u3);

  min3 = ceil(3*u3);

  max3 = ceil(2*u3);

  call streaminit(235);

  u4 = rand("Uniform");

  b4 = ceil(5*u4);

  min4 = ceil(3*u4);

  max4 = ceil(2*u4);

  output;

  end;

drop u1 u2 u3 u4 i;

run;

PROC Star
Posts: 7,358

Re: Is there an easier way to do my do-loops? (it's within a macro)

Given your example data you end up with some cases where py..max equals py..min equals 0 and then is used as a divisor, resulting in a division by 0 warning and a missing value being assigned.

Is that the correct result or am I missing something?

Occasional Contributor
Posts: 13

Re: Is there an easier way to do my do-loops? (it's within a macro)

Sorry, I didn't run that particular example data set before I posted. This should fix that problem:

data fisher;

  do i = 1 to 10;

  call streaminit(123);

  u1 = rand("Uniform");

  b1 = ceil(5*u1);

  min1 = ceil(3*u1);

  max1 = ceil(2*u1);

  call streaminit(456);

  u2 = rand("Uniform");

  b2 = ceil(5*u2);

  min2 = ceil(3*u2);

  max2 = ceil(2*u2);

  call streaminit(789);

  u3 = rand("Uniform");

  b3 = ceil(5*u3);

  min3 = ceil(3*u3);

  max3 = ceil(2*u3);

  call streaminit(235);

  u4 = rand("Uniform");

  b4 = ceil(5*u4);

  min4 = ceil(3*u4);

  max4 = ceil(2*u4);

  output;

  end;

drop u1 u2 u3 u4 i;

run;

%macro test(k);

data pfisher;

set fisher;

/* pairs of two */

%do i=1 %to &k;

%do j=&i+1 %to &k;

  py&i&j = b&i + b&j;

  pymin&i&j = min&i + min&j;

  pymax&i&j = max&i + max&j;

  y_&i&j = (py&i&j-pymin&i&j)/(pymax&i&j-pymin&i&j+0.0001);

%end; %end;

/* groups of 3 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

  py&i&j&l = b&i + b&j + b&l;

  pymin&i&j&l = min&i + min&j + min&l;

  pymax&i&j&l = max&i + max&j + max&l;

  y_&i&j&l = (py&i&j&l-pymin&i&j&l)/(pymax&i&j&l-pymin&i&j&l+0.0001);

%end; %end; %end;

/* groups of 4 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

  py&i&j&l&m = b&i + b&j + b&l + b&m;

  pymin&i&j&l&m = min&i + min&j + min&l + min&m;

  pymax&i&j&l&m = max&i + max&j + max&l + max&m;

  y_&i&j&l&m = (py&i&j&l&m-pymin&i&j&l&m)/(pymax&i&j&l&m-pymin&i&j&l&m+0.0001);

%end; %end; %end; %end;

/* groups of 5 */

%do i=1 %to &k;

%do j=&i+1 %to &k;

%do l=&j+1 %to &k;

%do m=&l+1 %to &k;

%do n=&m+1 %to &k;

  py&i&j&l&m&n = b&i + b&j + b&l + b&m +b&n;

  pymin&i&j&l&m&n = min&i + min&j + min&l + min&m + min&n;

  pymax&i&j&l&m&n = max&i + max&j + max&l + max&m + max&n;

  y_&i&j&l&m&n = (py&i&j&l&m&n-pymin&i&j&l&m&n)/(pymax&i&j&l&m&n-pymin&i&j&l&m&n+0.0001);

%end; %end; %end; %end; %end;

/**** need to extend to any possible subset size *****/

run;

data pydefs;

set pfisher;

keep y_:;

run;

%mend;

%test(k=4);

PROC Star
Posts: 7,358

Re: Is there an easier way to do my do-loops? (it's within a macro)

: You said that you would like it to work for any number of variables.  Is there a max number of variables that it should be able to work for?  I ask because, if there isn't, the code would be a little more complicated to develop.  If there is, what would that number be?

Occasional Contributor
Posts: 13

Re: Is there an easier way to do my do-loops? (it's within a macro)

I think k = 10 would be a reasonable maximum.

PROC Star
Posts: 7,358

Re: Is there an easier way to do my do-loops? (it's within a macro)

I only took it out to six iterations, but you could easily expand it to 10.

The only things I really did was reduce the number of loops involved, used a proc sql call to identify the maximum number of variables, kept the %do variables to a range beginning with i, and limited whether code would run (and variables created) based on the maximum number of variables.

If you wanted to generalize the code, you could write a fairly simple data _null_ step that wrote the code as an include file.

/*determine the value of k*/

proc sql noprint;

  create table temp as

    select name

      from dictionary.columns

        where libname="WORK" and

              memname="FISHER" and

              upcase(name) like "MAX%"

  ;

quit;

%let maxvars=&sqlobs.;

%macro newtest;

  data pfisher;

    set fisher;

    %if &maxvars. ge 2 %then %do;

      %do i=1 %to &maxvars.;

        %do j=&i+1 %to &maxvars.;

          py&i&j = b&i + b&j;

          pymin&i&j = min&i + min&j;

          pymax&i&j = max&i + max&j;

          y_&i&j = (py&i&j-pymin&i&j)/(pymax&i&j-pymin&i&j+0.0001);

          %if &maxvars. ge 3 %then %do k=&j+1 %to &maxvars.;

            py&i&j&k = b&i + b&j + b&k;

            pymin&i&j&k = min&i + min&j + min&k;

            pymax&i&j&k = max&i + max&j + max&k;

            y_&i&j&k = (py&i&j&k-pymin&i&j&k)/(pymax&i&j&k-pymin&i&j&k+0.0001);

            %if &maxvars. ge 4 %then %do l=&k+1 %to &maxvars.;

              py&i&j&k&l = b&i + b&j + b&k + b&l;

              pymin&i&j&k&l = min&i + min&j + min&k + min&l;

              pymax&i&j&k&l = max&i + max&j + max&k + max&l;

              y_&i&j&k&l = (py&i&j&k&l-pymin&i&j&k&l)/(pymax&i&j&k&l-pymin&i&j&k&l+0.0001);

              %if &maxvars. ge 5 %then %do m=&l+1 %to &maxvars.;

                py&i&j&k&l&m = b&i + b&j + b&k + b&l +b&m;

                pymin&i&j&k&l&m = min&i + min&j + min&k + min&l + min&m;

                pymax&i&j&k&l&m = max&i + max&j + max&k + max&l + max&m;

                y_&i&j&k&l&m = (py&i&j&k&l&m-pymin&i&j&k&l&m)/(pymax&i&j&k&l&m-pymin&i&j&k&l&m+0.0001);

                %if &maxvars. ge 6 %then %do n=&m+1 %to &maxvars.;

                  py&i&j&k&l&m&n = b&i + b&j + b&k + b&l +b&m;

                  pymin&i&j&k&l&m&n = min&i + min&j + min&k + min&l + min&m + min&n;

                  pymax&i&j&k&l&m&n = max&i + max&j + max&k + max&l + max&m + max&n;

                  y_&i&j&k&l&m&n = (py&i&j&k&l&m&n-pymin&i&j&k&l&m&n)/(pymax&i&j&k&l&m&n-pymin&i&j&k&l&m&n+0.0001);

                %end;

              %end;

            %end;

          %end;

        %end;

      %end;

    %end;

  run;

  data pydefs;

    set pfisher;

    keep y_:;

  run;

%mend newtest;

%newtest

PROC Star
Posts: 7,358

Re: Is there an easier way to do my do-loops? (it's within a macro)

: Here is some code that will cover all possibilities.  You, of course, are responsible for insuring that it runs correctly.  It ran for me and appears to produce the same results as your original code:

/*specify input filename*/

%let fname=fisher;

/*create some sample data*/

data &fname.;

  do i = 1 to 10;

  call streaminit(123);

  u1 = rand("Uniform");

  b1 = ceil(5*u1);

  min1 = ceil(3*u1);

  max1 = ceil(2*u1);

  call streaminit(456);

  u2 = rand("Uniform");

  b2 = ceil(5*u2);

  min2 = ceil(3*u2);

  max2 = ceil(2*u2);

 

  call streaminit(789);

  u3 = rand("Uniform");

  b3 = ceil(5*u3);

  min3 = ceil(3*u3);

  max3 = ceil(2*u3);

  call streaminit(235);

  u4 = rand("Uniform");

  b4 = ceil(5*u4);

  min4 = ceil(3*u4);

  max4 = ceil(2*u4);

 

  output;

  end;

drop u1 u2 u3 u4 i;

run;

/*determine the value of maxvars*/

proc sql noprint;

  create table temp as

    select name

      from dictionary.columns

        where libname="WORK" and

              memname="%sysfunc(upcase(&fname.))" and

              upcase(name) like "MAX%"

  ;

quit;

%let maxvars=&sqlobs.;

/*create the code*/

filename sascode temp;

data _null_;

  length code stem part2 part3 part4 part5 end $255;

  file sascode;

  do i=105 to %eval(104+&maxvars.);

    part1=byte(i);

    if i eq 105 then do;

      part2='=1 %to &maxvars.;';

      stem=catt('&',byte(i));

      part3=catt('=b&',byte(i));

      part4=catt('=min&',byte(i));

      part5=catt('=max&',byte(i));

      end='%end;';

      code='%macro create_code;';

      put code;

      code='data p&fname.;';

      put code;

      code='set &fname.;';

      put code;

    end;

    else do;

      part2=catt('=&',byte(i-1),'+1 %to &maxvars.;');

      stem=catt(stem,'&',byte(i));

      part3=catt(part3,'+b&',byte(i));

      part4=catt(part4,'+min&',byte(i));

      part5=catt(part5,'+max&',byte(i));

      end=catt(end,'%end;');

    end;

    code='%do '||part1||part2;

    put code;

    if i gt 105 then do;

      code=catt('py',stem,part3,';');

      put code;

      code=catt('pymin',stem,part4,';');

      put code;

      code=catt('pymax',stem,part5,';');

      put code;

      code=catt('y_',stem,'=(py',stem,'-pymin',stem,

       ')/(pymax',stem,'-pymin',stem,'+0.0001);');

      put code;

    end;

    if i eq %eval(104+&maxvars.) then do;

      put end;

      code='run;';

      put code;

      code='%mend create_code;';

      put code;

    end;

  end;

run;

/*include and run the created code*/

%include sascode /source2;

%create_code

data pydefs;

  set p&fname.;;

  keep y_:;

run;

Ask a Question
Discussion stats
  • 6 replies
  • 285 views
  • 0 likes
  • 2 in conversation