DATA Step, Macro, Functions and more

Randomly assigning cases from a data set to different groups repeated times

Reply
Occasional Contributor
Posts: 5

Randomly assigning cases from a data set to different groups repeated times

Hi. I have a program that takes all cases from a data set and assigns them to 5 equal sized groups. Now, i want the program to do this a large amount of times so that i can assess the variance of a value in a specific group e.t.c.

The code is like this:

data SASUSER.HEJ;

  set SASUSER.HEJ;

  srt=ranuni(999890);

run;

proc sort data=SASUSER.HEJ out=SASUSER.OUT(drop=srt); by srt; run;

data group1 group2 group3 group4 group5;

   retain cnt ;

   set SASUSER.OUT  nobs=nobs;

   if _n_=1 then cnt= int(nobs/5);

   if _n_<= cnt  then output group1;

   else if _n_<= 2*cnt  then output group2;

   else if _n_<= 3*cnt  then output group3;

   else if _n_<= 4*cnt  then output group4;

   else                     output group5;

   drop cnt;

run;

 

In order to run this, say 10k times, i think i should make the names of the datasets (groups) partially determined by a macro variable and run a loop for 10k times which changes the dataset names by 1 for every iteration, so that the names changes from group1_1 to group1_2 to ... group 1_1000 e.t.c.

The problem is that i can't figure out how to program the loop correctly. Everything i try gives me different errors. And in the case when i don't get errors, the loop seems to just stop at the first iteration.

Any input on this would be very much appreciated.

/Sven

Contributor
Posts: 45

Re: Randomly assigning cases from a data set to different groups repeated times

Hello Sven,

About the input dataset.

You have initially:

set SASUSER.HEJ;

What do you have in SASUSER.HEJ to begin with?

Do you want to generate a single random number in sasuser.hej or a set of random numbers in that dataset?

Super User
Posts: 10,500

Re: Randomly assigning cases from a data set to different groups repeated times

%macro doit(n=);

%do loop= 1 %to &n;

data SASUSER.HEJ;

  set SASUSER.HEJ;

  srt=ranuni(&loop); /* change the seed or the output is going to be the same*/

run;

proc sort data=SASUSER.HEJ out=SASUSER.OUT(drop=srt); by srt; run;

data group1_&loop group2_&loop group3_&loop group4_&loop group5_&loop;

   retain cnt ;

   set SASUSER.OUT  nobs=nobs;

   if _n_=1 then cnt= int(nobs/5);

   if _n_<= cnt  then output group1_&loop;

   else if _n_<= 2*cnt  then output group2_&loop;

   else if _n_<= 3*cnt  then output group3_&loop;

   else if _n_<= 4*cnt  then output group4_&loop;

   else                     output group5_&loop;

   drop cnt;

run;

%end;

%mend;

%doit(n=3); /* to make 3 sets of loops*/

Should create the output data sets. How you want to to the analysis you haven't provided enough information.


Super Contributor
Posts: 1,636

Re: Randomly assigning cases from a data set to different groups repeated times

A sample code:

%let n=10;

%macro test;
%do i=1 %to &n;
data HEJ;
  set sashelp.class;
  srt=ranuni(2*&i);
run;
proc sort data=HEJ out=OUT(drop=srt); by srt; run;

data group1_&i group2_&i group3_&i group4_&i group5_&i;
   retain cnt ;
   set OUT  nobs=nobs;
   if _n_=1 then cnt= int(nobs/5);
   if _n_<= cnt  then output group1_&i;
   else if _n_<= 2*cnt  then output group2_&i;
   else if _n_<= 3*cnt  then output group3_&i;
   else if _n_<= 4*cnt  then output group4_&i;
   else                     output group5_&i;
   drop cnt;
run;
%end;
%mend;
%test

Frequent Contributor
Frequent Contributor
Posts: 94

Re: Randomly assigning cases from a data set to different groups repeated times

I could be misunderstanding your requirements, but you can use the hash object to output to datasets arbitrarily, without having to use macros.  The main downside is that the dataset must be sorted by the group number, but you should be able to to this while maintaining the relative order within each group if that's important...

The below example generates some random data at the start, and appends a group number (filenum) field as it goes.  This field is included in the ultimate output, but if required we should be able to remove it.

data Test;

format key 8. x1-x4 y1-y4 21.3 filenum 8.;

array n

  • x1-x4 y1-y4;
  • filenum = 0;

    do key = 1 to 100000;

        do i = 1 to dim(n);

            n = round(ranuni(-1)*50000,0.001);

        end;

        filenum + 1;

        if filenum > 5 then filenum = 1;

        output;

    end;

    drop i;

    run;

    proc sort data=Test;

    by filenum;

    run;

    proc sql;

    select

    name

    into :fieldlist separated by "','"

    from

    sashelp.vcolumn

    where

    libname='WORK' /*change to your library*/

    and memname='TEST' /*change to dataset*/

    ;

    quit;

    data _null_;

    call symput("fieldlist","'&fieldlist.'");

    run;

    %put &fieldlist.;

    data _null_;

    if 0 then set Test; /*set up PDV with formats, but load no data*/

    declare hash h (ordered:'a');

    h.defineKey(&fieldlist.);

    h.defineData(&fieldlist.);

    h.defineDone();

    do until (last.filenum);

        set Test;

        by filenum;

        h.add();

    end;

    h.output(dataset: 'group' || strip(put(filenum,8.)));

    run;

    If you want more groups split out, then just increase the value that filenum resets at.  As long as the "output" line in the final data step constructs a valid sas dataset name, then you can use this method to output any number of datasets you like, without specifying them in code.

    Of course your analysis will then require a macro to run your proc means (etc.) against all of those datasets, but hopefully this will be tidier for creating the datasets in the first place.

    PROC Star
    Posts: 7,363

    Re: Randomly assigning cases from a data set to different groups repeated times

    Sven,

    I, too, could be misunderstanding what you want, but wouldn't it be easier to analyze if you created one file that contained all of the group assignments over your 10,000 trials/simulations?  Does the following result in what you want to achieve?

    %let n=1000;

    %macro combsort (arr1 =, arr2=,  order= <);

      drop __:;

      do __g = hbound (&arr1) - 1 by 0 while (__s or __g > 1);

        __g = int (__g / 1.3);

        if      __g in (0    ) then __g =  1;

        else if __g in (9, 10) then __g = 11;

        __s = 0;

        do __j = lbound (&arr1) to hbound (&arr1) - __g;

           __k = __j + __g;

           if &arr1[__j] &order &arr1[__k] then continue;

           __t      = &arr1[__j];

           &arr1[__j] = &arr1[__k];

           &arr1[__k] = __t;

           __t      = &arr2[__j];

           &arr2[__j] = &arr2[__k];

           &arr2[__k] = __t;

           __s = 1;

        end;

      end;

    %mend;

    data want (keep=trial groupSmiley Happy;

      array values(&n.);

      do until (eof);

        set hej end=eof;

        i+1;

        values(i)=y;

      end;

      array ordering(&n.);

      array group1_(%eval(&n./5));

      array group2_(%eval(&n./5));

      array group3_(%eval(&n./5));

      array group4_(%eval(&n./5));

      array group5_(%eval(&n./5));

      do trial=1 to 10000;

        do i=1 to &n.;

          ordering(i)=ranuni(0);

        end;

        %combsort(arr1=ordering, arr2=values);

        k=0;

        do i=1 to 5;

          do j=1 to %eval(&n./5);

            k+1;

            if i eq 1 then group1_(j)=values(k);

            else if i eq 2 then group2_(j)=values(k);

            else if i eq 3 then group3_(j)=values(k);

            else if i eq 4 then group4_(j)=values(k);

            else if i eq 5 then group5_(j)=values(k);

          end;

        end;

        output;

      end;

    run;

    Frequent Contributor
    Frequent Contributor
    Posts: 94

    Re: Randomly assigning cases from a data set to different groups repeated times

    A very good point Art.  Depending on the analysis being done, it could be as simple as adding a grouping column (randomly or whatever suits) and then just including that as a classification variabale.  That way SAS will just split the results out by the groups anyway, and not have to bother with complex coding.

    Respected Advisor
    Posts: 4,649

    Re: Randomly assigning cases from a data set to different groups repeated times

    You would gain in efficiency and simplicity by taking advantege of the BY statement available for almost every procedure in SAS. You could prepare your simulated group assignments like this :

    data simHEJ;
    set SASUSER.HEJ;
    do rep = 1 to 10000;
          group = rand("UNIFORM");
          output;
          end;
    run;

    proc sort data=simHEJ; by rep; run;

    proc rank data=simHEJ out=simHEJgrp groups=5;
    by rep;
    var group;
    run;

    and perform your analysis on simHEJgrp, BY rep. Another advantage is that you could get all results in a single dataset.

    PG

    PG
    Super Contributor
    Posts: 1,636

    Re: Randomly assigning cases from a data set to different groups repeated times

    Hi PG,

    How are you doing??

    I ran your code with sashelp.class. Why only less than 16% with group=0?  I thought each group should have around 20%.  Thank you!

    data simHEJ;

    set sashelp.class;

    do rep = 1 to 10000;

          group = rand("UNIFORM");

          output;

          end;

    run;

    proc sort data=simHEJ; by rep; run;

    proc rank data=simHEJ out=simHEJgrp groups=5;

    by rep;

    var group;

    run;

    proc freq data=simHEJgrp ;

      tables group/missing;

    run;

    Respected Advisor
    Posts: 4,649

    Re: Randomly assigning cases from a data set to different groups repeated times

    Hi Linlin,

    That's because there are only 19 observations in sashelp.class. They get divided into 5 groups of sizes 3,4,4,4,4. 3/19=0.158. With proc rank, the small group is always the same (group 0, in this case) but observations get assigned to that group randomly.

    PG

    PG
    Super Contributor
    Posts: 1,636

    Re: Randomly assigning cases from a data set to different groups repeated times

    Thank you!

    Super User
    Super User
    Posts: 6,500

    Re: Randomly assigning cases from a data set to different groups repeated times

    Why not just use PROC SURVEYSELECT?

    Respected Advisor
    Posts: 4,649

    Re: Randomly assigning cases from a data set to different groups repeated times

    The idea crossed my mind too but I wouldn't know how. I don't think SURVEYSELECT is meant for random assignment problems.

    PG

    PG
    Occasional Contributor
    Posts: 5

    Re: Randomly assigning cases from a data set to different groups repeated times

    Thank you everyone, your examples have been tremendously helpful. I stuck with your solution PGStats as the code was the simplest.

    /Sven

    Ask a Question
    Discussion stats
    • 13 replies
    • 476 views
    • 4 likes
    • 8 in conversation