## Randomly assigning cases from a data set to different groups repeated times

Occasional Contributor
Posts: 5

# Randomly assigning cases from a data set to different groups repeated times

Hi. I have a program that takes all cases from a data set and assigns them to 5 equal sized groups. Now, i want the program to do this a large amount of times so that i can assess the variance of a value in a specific group e.t.c.

The code is like this:

data SASUSER.HEJ;

set SASUSER.HEJ;

srt=ranuni(999890);

run;

proc sort data=SASUSER.HEJ out=SASUSER.OUT(drop=srt); by srt; run;

data group1 group2 group3 group4 group5;

retain cnt ;

set SASUSER.OUT  nobs=nobs;

if _n_=1 then cnt= int(nobs/5);

if _n_<= cnt  then output group1;

else if _n_<= 2*cnt  then output group2;

else if _n_<= 3*cnt  then output group3;

else if _n_<= 4*cnt  then output group4;

else                     output group5;

drop cnt;

run;

In order to run this, say 10k times, i think i should make the names of the datasets (groups) partially determined by a macro variable and run a loop for 10k times which changes the dataset names by 1 for every iteration, so that the names changes from group1_1 to group1_2 to ... group 1_1000 e.t.c.

The problem is that i can't figure out how to program the loop correctly. Everything i try gives me different errors. And in the case when i don't get errors, the loop seems to just stop at the first iteration.

Any input on this would be very much appreciated.

/Sven

Contributor
Posts: 45

## Re: Randomly assigning cases from a data set to different groups repeated times

Hello Sven,

You have initially:

set SASUSER.HEJ;

What do you have in SASUSER.HEJ to begin with?

Do you want to generate a single random number in sasuser.hej or a set of random numbers in that dataset?

Super User
Posts: 13,583

## Re: Randomly assigning cases from a data set to different groups repeated times

%macro doit(n=);

%do loop= 1 %to &n;

data SASUSER.HEJ;

set SASUSER.HEJ;

srt=ranuni(&loop); /* change the seed or the output is going to be the same*/

run;

proc sort data=SASUSER.HEJ out=SASUSER.OUT(drop=srt); by srt; run;

data group1_&loop group2_&loop group3_&loop group4_&loop group5_&loop;

retain cnt ;

set SASUSER.OUT  nobs=nobs;

if _n_=1 then cnt= int(nobs/5);

if _n_<= cnt  then output group1_&loop;

else if _n_<= 2*cnt  then output group2_&loop;

else if _n_<= 3*cnt  then output group3_&loop;

else if _n_<= 4*cnt  then output group4_&loop;

else                     output group5_&loop;

drop cnt;

run;

%end;

%mend;

%doit(n=3); /* to make 3 sets of loops*/

Should create the output data sets. How you want to to the analysis you haven't provided enough information.

Super Contributor
Posts: 1,636

## Re: Randomly assigning cases from a data set to different groups repeated times

A sample code:

%let n=10;

%macro test;
%do i=1 %to &n;
data HEJ;
set sashelp.class;
srt=ranuni(2*&i);
run;
proc sort data=HEJ out=OUT(drop=srt); by srt; run;

data group1_&i group2_&i group3_&i group4_&i group5_&i;
retain cnt ;
set OUT  nobs=nobs;
if _n_=1 then cnt= int(nobs/5);
if _n_<= cnt  then output group1_&i;
else if _n_<= 2*cnt  then output group2_&i;
else if _n_<= 3*cnt  then output group3_&i;
else if _n_<= 4*cnt  then output group4_&i;
else                     output group5_&i;
drop cnt;
run;
%end;
%mend;
%test

Frequent Contributor
Posts: 94

## Re: Randomly assigning cases from a data set to different groups repeated times

I could be misunderstanding your requirements, but you can use the hash object to output to datasets arbitrarily, without having to use macros.  The main downside is that the dataset must be sorted by the group number, but you should be able to to this while maintaining the relative order within each group if that's important...

The below example generates some random data at the start, and appends a group number (filenum) field as it goes.  This field is included in the ultimate output, but if required we should be able to remove it.

```data Test;
format key 8. x1-x4 y1-y4 21.3 filenum 8.;
array n x1-x4 y1-y4;
filenum = 0;
do key = 1 to 100000;
do i = 1 to dim(n);
n = round(ranuni(-1)*50000,0.001);
end;
filenum + 1;
if filenum > 5 then filenum = 1;
output;
end;
drop i;
run;

proc sort data=Test;
by filenum;
run;

proc sql;
select
name
into :fieldlist separated by "','"
from
sashelp.vcolumn
where
and memname='TEST' /*change to dataset*/
;
quit;

data _null_;
call symput("fieldlist","'&fieldlist.'");
run;

%put &fieldlist.;

data _null_;
if 0 then set Test; /*set up PDV with formats, but load no data*/
declare hash h (ordered:'a');
h.defineKey(&fieldlist.);
h.defineData(&fieldlist.);
h.defineDone();
do until (last.filenum);
set Test;
by filenum;
end;
h.output(dataset: 'group' || strip(put(filenum,8.)));
run;

```

If you want more groups split out, then just increase the value that filenum resets at.  As long as the "output" line in the final data step constructs a valid sas dataset name, then you can use this method to output any number of datasets you like, without specifying them in code.

Of course your analysis will then require a macro to run your proc means (etc.) against all of those datasets, but hopefully this will be tidier for creating the datasets in the first place.

PROC Star
Posts: 8,169

## Re: Randomly assigning cases from a data set to different groups repeated times

Sven,

I, too, could be misunderstanding what you want, but wouldn't it be easier to analyze if you created one file that contained all of the group assignments over your 10,000 trials/simulations?  Does the following result in what you want to achieve?

%let n=1000;

%macro combsort (arr1 =, arr2=,  order= <);

drop __:;

do __g = hbound (&arr1) - 1 by 0 while (__s or __g > 1);

__g = int (__g / 1.3);

if      __g in (0    ) then __g =  1;

else if __g in (9, 10) then __g = 11;

__s = 0;

do __j = lbound (&arr1) to hbound (&arr1) - __g;

__k = __j + __g;

if &arr1[__j] &order &arr1[__k] then continue;

__t      = &arr1[__j];

&arr1[__j] = &arr1[__k];

&arr1[__k] = __t;

__t      = &arr2[__j];

&arr2[__j] = &arr2[__k];

&arr2[__k] = __t;

__s = 1;

end;

end;

%mend;

data want (keep=trial group;

array values(&n.);

do until (eof);

set hej end=eof;

i+1;

values(i)=y;

end;

array ordering(&n.);

array group1_(%eval(&n./5));

array group2_(%eval(&n./5));

array group3_(%eval(&n./5));

array group4_(%eval(&n./5));

array group5_(%eval(&n./5));

do trial=1 to 10000;

do i=1 to &n.;

ordering(i)=ranuni(0);

end;

%combsort(arr1=ordering, arr2=values);

k=0;

do i=1 to 5;

do j=1 to %eval(&n./5);

k+1;

if i eq 1 then group1_(j)=values(k);

else if i eq 2 then group2_(j)=values(k);

else if i eq 3 then group3_(j)=values(k);

else if i eq 4 then group4_(j)=values(k);

else if i eq 5 then group5_(j)=values(k);

end;

end;

output;

end;

run;

Frequent Contributor
Posts: 94

## Re: Randomly assigning cases from a data set to different groups repeated times

A very good point Art.  Depending on the analysis being done, it could be as simple as adding a grouping column (randomly or whatever suits) and then just including that as a classification variabale.  That way SAS will just split the results out by the groups anyway, and not have to bother with complex coding.

Posts: 5,543

## Re: Randomly assigning cases from a data set to different groups repeated times

You would gain in efficiency and simplicity by taking advantege of the BY statement available for almost every procedure in SAS. You could prepare your simulated group assignments like this :

data simHEJ;
set SASUSER.HEJ;
do rep = 1 to 10000;
group = rand("UNIFORM");
output;
end;
run;

proc sort data=simHEJ; by rep; run;

proc rank data=simHEJ out=simHEJgrp groups=5;
by rep;
var group;
run;

and perform your analysis on simHEJgrp, BY rep. Another advantage is that you could get all results in a single dataset.

PG

PG
Super Contributor
Posts: 1,636

## Re: Randomly assigning cases from a data set to different groups repeated times

Hi PG,

How are you doing??

I ran your code with sashelp.class. Why only less than 16% with group=0?  I thought each group should have around 20%.  Thank you!

data simHEJ;

set sashelp.class;

do rep = 1 to 10000;

group = rand("UNIFORM");

output;

end;

run;

proc sort data=simHEJ; by rep; run;

proc rank data=simHEJ out=simHEJgrp groups=5;

by rep;

var group;

run;

proc freq data=simHEJgrp ;

tables group/missing;

run;

Posts: 5,543

## Re: Randomly assigning cases from a data set to different groups repeated times

Hi Linlin,

That's because there are only 19 observations in sashelp.class. They get divided into 5 groups of sizes 3,4,4,4,4. 3/19=0.158. With proc rank, the small group is always the same (group 0, in this case) but observations get assigned to that group randomly.

PG

PG
Super Contributor
Posts: 1,636

Thank you!

Super User
Posts: 8,127

## Re: Randomly assigning cases from a data set to different groups repeated times

Why not just use PROC SURVEYSELECT?

Posts: 5,543

## Re: Randomly assigning cases from a data set to different groups repeated times

The idea crossed my mind too but I wouldn't know how. I don't think SURVEYSELECT is meant for random assignment problems.

PG

PG
Occasional Contributor
Posts: 5