BookmarkSubscribeRSS Feed
grayab
Calcite | Level 5

Hello -

I have a dataset that includes a number of groups.  Each group contains a number of sets, and each set contains a varying number of units.  Like this:

Group A

Set                         # of units  

1                                   5                          

2                                   6

3                                   4


Group B

Set                         # of units

1                                   4

2                                   8

3                                  12

5                                   5    

Group C

Set                         # of units

1                                    3

2                                    4

3                                   16

4                                    8

I need a do loop that will select sets in consecutive order from each group -- e.g., select set #1 from each group, then select set #2 from each group. That's not too difficult. However, I can only select a total of 300 units.  So the code needs to add the number of units selected in each round and stop at the end of the round that hits 300.

Can anyone help me with this?

Thank you!

13 REPLIES 13
RW9
Diamond | Level 26 RW9
Diamond | Level 26

So you could try:

proc sort data=have out=tmp;

     by set;

run;

data have;

     set have;

     if _n_ <= 300 then output;

run;

This will give the first 300 observations e.g all the 1's, then all the 2's, ...

grayab
Calcite | Level 5

Thank you for this.  I'm not sure this quite does it though... I need to stop at 300 units, not 300 sets (each set contains multiple units).

So I want it to choose one set from each group in each round (GroupA, Set1;  GroupB, Set1; GroupC, Set 1, etc), but to tally the UNITS being selected until it reaches 300 units.

Does that make sense?

Reeza
Super User

Sample output?

Linlin
Lapis Lazuli | Level 10

data GroupA;

input Set                       :units;

group='a';

cards;

1                                   5                          

2                                   6

3                                   4

;

data Groupb;

input Set                       :units;

group='b';

cards;

1                                   4

2                                   8

3                                  12

5                                   5

;

data Groupc;

input Set                       :units;

group='c';

cards;

1                                    3

2                                    4

3                                   16

4                                    8

;

data have;

set group:;

proc sort; by set group;run;

data want;

     set have;

  do _n_=1 to 1;

  if t>300 then leave;

   t+units;

  output;

  end;

  proc print;run;

slchen
Lapis Lazuli | Level 10

data Group_A;
input set units;
cards;
1  5                         
2  6
3  258
;

data Group_B;
input set units;
cards;
1  4
2  8
3  12
5  5   
;
data Group_C;
input set units;
cards;
1  3
2  4
3  16
4  8
;
run;

proc sql;
  create table sets as
  select coalesce(d.set,c.set) as set from (select coalesce(a.set,b.set) as set from group_A a inner join group_B b on a.set=b.set) d  inner join group_C c
  on d.set=c.set;
  quit;

 
data want;
  if 0 then set group_A group_B group_C;
  if _n_=1 then do;
    declare hash h_a(dataset:'group_A');
    h_a.definekey('set');
    h_a.definedata('set','units');
    h_a.definedone();
    declare hash h_b(dataset:'group_B');
    h_b.definekey('set');
    h_b.definedata('set','units');
    h_b.definedone();
    declare hash h_c(dataset:'group_C');
    h_c.definekey('set');
    h_c.definedata('set','units');
    h_c.definedone();
   end;
   count=0;
    do until (last);
     set sets end=last;
     retain count;
     h_a.find();;
        count+units;
        if count>300 then leave;
       else output;
      h_b.find();
     count+units;
     if count>300 then leave;
      else output;
     h_c.find();
     count+units;
     if count>300 then leave;
     else output;
    end;  
   run;

jwillis
Quartz | Level 8

Hi Grayab,

I was not certain how to make the decision on the last record to write.  What happens if the next to last record read yields less than the desired # of units and the last record read yields more than the desired # of units?
data have;
input group $1 set $2. units 8.;
datalines;
A 1  5                          
A 2  6
A 3  4
B 1  4
B 2  8
B 3 12
B 5  5    
C 1  3
C 2  4
C 3 16
C 4  8
;
run;

  
proc sort data=have;
  by set group;
run;

data want;
   set have;
retain unitsum;
if _n_ = 1 then unitsum = 0;
if unitsum < 25 then do;
     unitsum = unitsum + units;
     if unitsum <= 25 then do;
     output;
     end;
end;          
run;

Record 4 has 18 total units counted.

Obs group set units unitsum

1A155
2B149
3C1312
4A2618

Record 5 has 26 total units counted.  If the limit is 25 units, is record 5 output or not?

Obs group set units unitsum

1A155
2B149
3C1312
4A2618
5B2826
grayab
Calcite | Level 5

These are all great suggestions everyone.  Thank you very much.  I will try a few of these and I'm sure one will do the trick.

Astounding
PROC Star

It's possible you'll get the results you want with one of those solutions, but I'm just not sure.  In case it helps, this solution assumes that all 3 data sets are already sorted BY SET:

data stop_at_300;

   set group_a group_b group_c;

   by set;

   total_units + units;

   output;

   if total_units >= 300 then stop;

run;

The way it's written, the program could stop in the middle of a SET.  For example, it could take SET=5 from Group_A and then stop without taking SET=5 from Group_B or Group_C.  If you want to complete the final SET value, you would have to change the final statement:

if total_units >= 300 and last.set then stop;

Good luck.

grayab
Calcite | Level 5

Astounding:

This worked extremely well. Thank you very much.

To add a more complicated twist: I only want to include a certain number of sets from some of the groups.  For instance, Group B has 4 sets.  I only want to include 2. So, for the third round I want it to skip Group B and only select sets from Groups A, C, and D.  Any ideas?

Reeza
Super User

You need to explain the rules. Which sets, why?

grayab
Calcite | Level 5

For randomly assigning the sets to conditions, by block (group).  The sets have been assigned random numbers, so if I sort by set and go down the list sequentially, choosing one set per group in each round, I'll end up with a randomly assigned assortment of sets.  However, some groups have more sets than others and I want the sets in those groups to end up split between the two conditions (.5 probability of assignment within the blocks/groups).  If I just proceeded sequentially through the groups every round, the smaller groups would end up in the condition being assigned here only.  Therefore I want the do loop to skip particular groups once half their sets have been assigned to this condition.  The skipped groups will be assigned to the other condition. 

Probably a poor explanation, but there it is.

Reeza
Super User

Have you looked at Proc SurveySelect?

Astounding
PROC Star

The general solution probably involves a separate WHERE condition for each incoming data set.  To illustrate the syntax (not necessarily the conditions you want to use to subset):

set group_a (where=(set < 4))

      group_b (where=(set < 3))

      group_c;

Depending on which subsets you want to take, you might base the WHERE clauses on some other variable that you have already added to the data sets.  If you use a single WHERE statement, it would apply to all three data sets (and it might be all you need here):

set group_a group_b group_c;

where condition=1;

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1457 views
  • 3 likes
  • 7 in conversation