BookmarkSubscribeRSS Feed
SvenCrona
Calcite | Level 5

Hi. I have a program that takes all cases from a data set and assigns them to 5 equal sized groups. Now, i want the program to do this a large amount of times so that i can assess the variance of a value in a specific group e.t.c.

The code is like this:

data SASUSER.HEJ;

  set SASUSER.HEJ;

  srt=ranuni(999890);

run;

proc sort data=SASUSER.HEJ out=SASUSER.OUT(drop=srt); by srt; run;

data group1 group2 group3 group4 group5;

   retain cnt ;

   set SASUSER.OUT  nobs=nobs;

   if _n_=1 then cnt= int(nobs/5);

   if _n_<= cnt  then output group1;

   else if _n_<= 2*cnt  then output group2;

   else if _n_<= 3*cnt  then output group3;

   else if _n_<= 4*cnt  then output group4;

   else                     output group5;

   drop cnt;

run;

 

In order to run this, say 10k times, i think i should make the names of the datasets (groups) partially determined by a macro variable and run a loop for 10k times which changes the dataset names by 1 for every iteration, so that the names changes from group1_1 to group1_2 to ... group 1_1000 e.t.c.

The problem is that i can't figure out how to program the loop correctly. Everything i try gives me different errors. And in the case when i don't get errors, the loop seems to just stop at the first iteration.

Any input on this would be very much appreciated.

/Sven

13 REPLIES 13
joehinson
Calcite | Level 5

Hello Sven,

About the input dataset.

You have initially:

set SASUSER.HEJ;

What do you have in SASUSER.HEJ to begin with?

Do you want to generate a single random number in sasuser.hej or a set of random numbers in that dataset?

ballardw
Super User

%macro doit(n=);

%do loop= 1 %to &n;

data SASUSER.HEJ;

  set SASUSER.HEJ;

  srt=ranuni(&loop); /* change the seed or the output is going to be the same*/

run;

proc sort data=SASUSER.HEJ out=SASUSER.OUT(drop=srt); by srt; run;

data group1_&loop group2_&loop group3_&loop group4_&loop group5_&loop;

   retain cnt ;

   set SASUSER.OUT  nobs=nobs;

   if _n_=1 then cnt= int(nobs/5);

   if _n_<= cnt  then output group1_&loop;

   else if _n_<= 2*cnt  then output group2_&loop;

   else if _n_<= 3*cnt  then output group3_&loop;

   else if _n_<= 4*cnt  then output group4_&loop;

   else                     output group5_&loop;

   drop cnt;

run;

%end;

%mend;

%doit(n=3); /* to make 3 sets of loops*/

Should create the output data sets. How you want to to the analysis you haven't provided enough information.


Linlin
Lapis Lazuli | Level 10

A sample code:

%let n=10;

%macro test;
%do i=1 %to &n;
data HEJ;
  set sashelp.class;
  srt=ranuni(2*&i);
run;
proc sort data=HEJ out=OUT(drop=srt); by srt; run;

data group1_&i group2_&i group3_&i group4_&i group5_&i;
   retain cnt ;
   set OUT  nobs=nobs;
   if _n_=1 then cnt= int(nobs/5);
   if _n_<= cnt  then output group1_&i;
   else if _n_<= 2*cnt  then output group2_&i;
   else if _n_<= 3*cnt  then output group3_&i;
   else if _n_<= 4*cnt  then output group4_&i;
   else                     output group5_&i;
   drop cnt;
run;
%end;
%mend;
%test

DF
Fluorite | Level 6 DF
Fluorite | Level 6

I could be misunderstanding your requirements, but you can use the hash object to output to datasets arbitrarily, without having to use macros.  The main downside is that the dataset must be sorted by the group number, but you should be able to to this while maintaining the relative order within each group if that's important...

The below example generates some random data at the start, and appends a group number (filenum) field as it goes.  This field is included in the ultimate output, but if required we should be able to remove it.

data Test;

format key 8. x1-x4 y1-y4 21.3 filenum 8.;

array n

  • x1-x4 y1-y4;
  • filenum = 0;

    do key = 1 to 100000;

        do i = 1 to dim(n);

            n = round(ranuni(-1)*50000,0.001);

        end;

        filenum + 1;

        if filenum > 5 then filenum = 1;

        output;

    end;

    drop i;

    run;

    proc sort data=Test;

    by filenum;

    run;

    proc sql;

    select

    name

    into :fieldlist separated by "','"

    from

    sashelp.vcolumn

    where

    libname='WORK' /*change to your library*/

    and memname='TEST' /*change to dataset*/

    ;

    quit;

    data _null_;

    call symput("fieldlist","'&fieldlist.'");

    run;

    %put &fieldlist.;

    data _null_;

    if 0 then set Test; /*set up PDV with formats, but load no data*/

    declare hash h (ordered:'a');

    h.defineKey(&fieldlist.);

    h.defineData(&fieldlist.);

    h.defineDone();

    do until (last.filenum);

        set Test;

        by filenum;

        h.add();

    end;

    h.output(dataset: 'group' || strip(put(filenum,8.)));

    run;

    If you want more groups split out, then just increase the value that filenum resets at.  As long as the "output" line in the final data step constructs a valid sas dataset name, then you can use this method to output any number of datasets you like, without specifying them in code.

    Of course your analysis will then require a macro to run your proc means (etc.) against all of those datasets, but hopefully this will be tidier for creating the datasets in the first place.

    art297
    Opal | Level 21

    Sven,

    I, too, could be misunderstanding what you want, but wouldn't it be easier to analyze if you created one file that contained all of the group assignments over your 10,000 trials/simulations?  Does the following result in what you want to achieve?

    %let n=1000;

    %macro combsort (arr1 =, arr2=,  order= <);

      drop __:;

      do __g = hbound (&arr1) - 1 by 0 while (__s or __g > 1);

        __g = int (__g / 1.3);

        if      __g in (0    ) then __g =  1;

        else if __g in (9, 10) then __g = 11;

        __s = 0;

        do __j = lbound (&arr1) to hbound (&arr1) - __g;

           __k = __j + __g;

           if &arr1[__j] &order &arr1[__k] then continue;

           __t      = &arr1[__j];

           &arr1[__j] = &arr1[__k];

           &arr1[__k] = __t;

           __t      = &arr2[__j];

           &arr2[__j] = &arr2[__k];

           &arr2[__k] = __t;

           __s = 1;

        end;

      end;

    %mend;

    data want (keep=trial group:);

      array values(&n.);

      do until (eof);

        set hej end=eof;

        i+1;

        values(i)=y;

      end;

      array ordering(&n.);

      array group1_(%eval(&n./5));

      array group2_(%eval(&n./5));

      array group3_(%eval(&n./5));

      array group4_(%eval(&n./5));

      array group5_(%eval(&n./5));

      do trial=1 to 10000;

        do i=1 to &n.;

          ordering(i)=ranuni(0);

        end;

        %combsort(arr1=ordering, arr2=values);

        k=0;

        do i=1 to 5;

          do j=1 to %eval(&n./5);

            k+1;

            if i eq 1 then group1_(j)=values(k);

            else if i eq 2 then group2_(j)=values(k);

            else if i eq 3 then group3_(j)=values(k);

            else if i eq 4 then group4_(j)=values(k);

            else if i eq 5 then group5_(j)=values(k);

          end;

        end;

        output;

      end;

    run;

    DF
    Fluorite | Level 6 DF
    Fluorite | Level 6

    A very good point Art.  Depending on the analysis being done, it could be as simple as adding a grouping column (randomly or whatever suits) and then just including that as a classification variabale.  That way SAS will just split the results out by the groups anyway, and not have to bother with complex coding.

    PGStats
    Opal | Level 21

    You would gain in efficiency and simplicity by taking advantege of the BY statement available for almost every procedure in SAS. You could prepare your simulated group assignments like this :

    data simHEJ;
    set SASUSER.HEJ;
    do rep = 1 to 10000;
          group = rand("UNIFORM");
          output;
          end;
    run;

    proc sort data=simHEJ; by rep; run;

    proc rank data=simHEJ out=simHEJgrp groups=5;
    by rep;
    var group;
    run;

    and perform your analysis on simHEJgrp, BY rep. Another advantage is that you could get all results in a single dataset.

    PG

    PG
    Linlin
    Lapis Lazuli | Level 10

    Hi PG,

    How are you doing??

    I ran your code with sashelp.class. Why only less than 16% with group=0?  I thought each group should have around 20%.  Thank you!

    data simHEJ;

    set sashelp.class;

    do rep = 1 to 10000;

          group = rand("UNIFORM");

          output;

          end;

    run;

    proc sort data=simHEJ; by rep; run;

    proc rank data=simHEJ out=simHEJgrp groups=5;

    by rep;

    var group;

    run;

    proc freq data=simHEJgrp ;

      tables group/missing;

    run;

    PGStats
    Opal | Level 21

    Hi Linlin,

    That's because there are only 19 observations in sashelp.class. They get divided into 5 groups of sizes 3,4,4,4,4. 3/19=0.158. With proc rank, the small group is always the same (group 0, in this case) but observations get assigned to that group randomly.

    PG

    PG
    Tom
    Super User Tom
    Super User

    Why not just use PROC SURVEYSELECT?

    PGStats
    Opal | Level 21

    The idea crossed my mind too but I wouldn't know how. I don't think SURVEYSELECT is meant for random assignment problems.

    PG

    PG
    SvenCrona
    Calcite | Level 5

    Thank you everyone, your examples have been tremendously helpful. I stuck with your solution PGStats as the code was the simplest.

    /Sven

    SAS Innovate 2025: Save the Date

     SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

    Save the date!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 13 replies
    • 2344 views
    • 4 likes
    • 8 in conversation