Help using Base SAS procedures

do loop

Reply
Occasional Contributor
Posts: 13

do loop

Hello -

I have a dataset that includes a number of groups.  Each group contains a number of sets, and each set contains a varying number of units.  Like this:

Group A

Set                         # of units  

1                                   5                          

2                                   6

3                                   4


Group B

Set                         # of units

1                                   4

2                                   8

3                                  12

5                                   5    

Group C

Set                         # of units

1                                    3

2                                    4

3                                   16

4                                    8

I need a do loop that will select sets in consecutive order from each group -- e.g., select set #1 from each group, then select set #2 from each group. That's not too difficult. However, I can only select a total of 300 units.  So the code needs to add the number of units selected in each round and stop at the end of the round that hits 300.

Can anyone help me with this?

Thank you!

Super User
Super User
Posts: 7,977

Re: do loop

So you could try:

proc sort data=have out=tmp;

     by set;

run;

data have;

     set have;

     if _n_ <= 300 then output;

run;

This will give the first 300 observations e.g all the 1's, then all the 2's, ...

Occasional Contributor
Posts: 13

Re: do loop

Thank you for this.  I'm not sure this quite does it though... I need to stop at 300 units, not 300 sets (each set contains multiple units).

So I want it to choose one set from each group in each round (GroupA, Set1;  GroupB, Set1; GroupC, Set 1, etc), but to tally the UNITS being selected until it reaches 300 units.

Does that make sense?

Super User
Posts: 19,855

Re: do loop

Sample output?

Super Contributor
Posts: 1,636

Re: do loop

data GroupA;

input Set                       :units;

group='a';

cards;

1                                   5                          

2                                   6

3                                   4

;

data Groupb;

input Set                       :units;

group='b';

cards;

1                                   4

2                                   8

3                                  12

5                                   5

;

data Groupc;

input Set                       :units;

group='c';

cards;

1                                    3

2                                    4

3                                   16

4                                    8

;

data have;

set group:;

proc sort; by set group;run;

data want;

     set have;

  do _n_=1 to 1;

  if t>300 then leave;

   t+units;

  output;

  end;

  proc print;run;

Super Contributor
Posts: 275

Re: do loop

data Group_A;
input set units;
cards;
1  5                         
2  6
3  258
;

data Group_B;
input set units;
cards;
1  4
2  8
3  12
5  5   
;
data Group_C;
input set units;
cards;
1  3
2  4
3  16
4  8
;
run;

proc sql;
  create table sets as
  select coalesce(d.set,c.set) as set from (select coalesce(a.set,b.set) as set from group_A a inner join group_B b on a.set=b.set) d  inner join group_C c
  on d.set=c.set;
  quit;

 
data want;
  if 0 then set group_A group_B group_C;
  if _n_=1 then do;
    declare hash h_a(dataset:'group_A');
    h_a.definekey('set');
    h_a.definedata('set','units');
    h_a.definedone();
    declare hash h_b(dataset:'group_B');
    h_b.definekey('set');
    h_b.definedata('set','units');
    h_b.definedone();
    declare hash h_c(dataset:'group_C');
    h_c.definekey('set');
    h_c.definedata('set','units');
    h_c.definedone();
   end;
   count=0;
    do until (last);
     set sets end=last;
     retain count;
     h_a.find();;
        count+units;
        if count>300 then leave;
       else output;
      h_b.find();
     count+units;
     if count>300 then leave;
      else output;
     h_c.find();
     count+units;
     if count>300 then leave;
     else output;
    end;  
   run;

Regular Contributor
Posts: 217

Re: do loop

Hi Grayab,

I was not certain how to make the decision on the last record to write.  What happens if the next to last record read yields less than the desired # of units and the last record read yields more than the desired # of units?
data have;
input group $1 set $2. units 8.;
datalines;
A 1  5                          
A 2  6
A 3  4
B 1  4
B 2  8
B 3 12
B 5  5    
C 1  3
C 2  4
C 3 16
C 4  8
;
run;

  
proc sort data=have;
  by set group;
run;

data want;
   set have;
retain unitsum;
if _n_ = 1 then unitsum = 0;
if unitsum < 25 then do;
     unitsum = unitsum + units;
     if unitsum <= 25 then do;
     output;
     end;
end;          
run;

Record 4 has 18 total units counted.

Obs group set units unitsum

1A155
2B149
3C1312
4A2618

Record 5 has 26 total units counted.  If the limit is 25 units, is record 5 output or not?

Obs group set units unitsum

1A155
2B149
3C1312
4A2618
5B2826
Occasional Contributor
Posts: 13

Re: do loop

These are all great suggestions everyone.  Thank you very much.  I will try a few of these and I'm sure one will do the trick.

Super User
Posts: 5,516

Re: do loop

It's possible you'll get the results you want with one of those solutions, but I'm just not sure.  In case it helps, this solution assumes that all 3 data sets are already sorted BY SET:

data stop_at_300;

   set group_a group_b group_c;

   by set;

   total_units + units;

   output;

   if total_units >= 300 then stop;

run;

The way it's written, the program could stop in the middle of a SET.  For example, it could take SET=5 from Group_A and then stop without taking SET=5 from Group_B or Group_C.  If you want to complete the final SET value, you would have to change the final statement:

if total_units >= 300 and last.set then stop;

Good luck.

Occasional Contributor
Posts: 13

Re: do loop

Posted in reply to Astounding

Astounding:

This worked extremely well. Thank you very much.

To add a more complicated twist: I only want to include a certain number of sets from some of the groups.  For instance, Group B has 4 sets.  I only want to include 2. So, for the third round I want it to skip Group B and only select sets from Groups A, C, and D.  Any ideas?

Super User
Posts: 19,855

Re: do loop

You need to explain the rules. Which sets, why?

Occasional Contributor
Posts: 13

Re: do loop

For randomly assigning the sets to conditions, by block (group).  The sets have been assigned random numbers, so if I sort by set and go down the list sequentially, choosing one set per group in each round, I'll end up with a randomly assigned assortment of sets.  However, some groups have more sets than others and I want the sets in those groups to end up split between the two conditions (.5 probability of assignment within the blocks/groups).  If I just proceeded sequentially through the groups every round, the smaller groups would end up in the condition being assigned here only.  Therefore I want the do loop to skip particular groups once half their sets have been assigned to this condition.  The skipped groups will be assigned to the other condition. 

Probably a poor explanation, but there it is.

Super User
Posts: 19,855

Re: do loop

Have you looked at Proc SurveySelect?

Super User
Posts: 5,516

Re: do loop

The general solution probably involves a separate WHERE condition for each incoming data set.  To illustrate the syntax (not necessarily the conditions you want to use to subset):

set group_a (where=(set < 4))

      group_b (where=(set < 3))

      group_c;

Depending on which subsets you want to take, you might base the WHERE clauses on some other variable that you have already added to the data sets.  If you use a single WHERE statement, it would apply to all three data sets (and it might be all you need here):

set group_a group_b group_c;

where condition=1;

Good luck.

Ask a Question
Discussion stats
  • 13 replies
  • 404 views
  • 3 likes
  • 7 in conversation