Re: Read balances (in decreasing order) and assign them to groups alte...

sunny86 · Posted 06-17-2023 01:59 PM

data have;

input Balance;

datalines;

5000

6000

5500

4500

5200

4800

5100

5600

5200

5300

;

run;

proc sort data=have;

by descending Balance;

run;

data result;

do i= 1 to 5, 5 to 1 until(done);

set have end=done;

group = i;

output;

end;

drop i;

run;

My output

1

6000

1

2

5600

2

3

5500

3

4

5300

4

5

5200

5

6

5200

1

7

5100

2

8

5000

3

9

4800

4

10

4500

5

Expected output

1

6000

1

2

5600

2

3

5500

3

4

5300

4

5

5200

5

6

5200

5

7

5100

4

8

5000

3

9

4800

2

10

4500

1

mkeintz · Posted 06-17-2023 02:10 PM

This is a good case for taking advantage of the automatic variable _N_, and the modulo function:

proc sort data=have out=need;
  by descending balance;
run;
data want;
  set need;
  group=mod(_n_-1,5)+1;
run;

In this case (in most cases) _N_ is equivalent to the observation sequence in dataset NEED.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sunny86 · Posted 06-17-2023 02:40 PM

@mkeintz the output I ma expecting is to reverse the sequence alternatively

Form your suggestion i got the below output but what i want is to reverse it

1

6000

1

2

5600

2

3

5500

3

4

5300

4

5

5200

5

6

5200

1

7

5100

2

8

5000

3

9

4800

4

10

4500

5

Expected

1

6000

1

2

5600

2

3

5500

3

4

5300

4

5

5200

5

6

5200

5

7

5100

4

8

5000

3

9

4800

2

10

4500

1

mkeintz · Posted 06-17-2023 10:09 PM

So, for every ten observations, you want to have groups in sequence 1,2,3,4,5,5,4,3,2,1, correct? Then you can just use a temporary array of 10 group values:

proc sort data=have out=need;
  by descending balance;
run;
data want_array;
  array grps {0:9} _temporary_ (1,1,2,3,4,5,5,4,3,2);
  set need;
  group=grps{mod(_n_,10)};
run;

The array grps has a lower bound of zero, not 1. So the first array element corresponds to _n_=10,20,30, etc, all of which have mod(_n_,10)=0. Second element corresponds to _n_=1,11,21, etc.

If the array approach seems a little too abstruse, then you can have a group increment of +1 for five observations, followed by -1 for five observations:

proc sort data=have out=need;
  by descending balance;
run;
data want (drop=_:);
  set need;
  retain _increment 1;
  group + _increment;
  if group in (0,6) then do;
    _increment=-1*_increment;
    group+_increment;
  end;  
run;

Question: what if you have, say, 36 observations. What group values will the last 6 obs have?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ballardw · Posted 06-17-2023 02:43 PM

@sunny86 wrote:

data have;

input Balance;

datalines;

5000

6000

5500

4500

5200

4800

5100

5600

5200

5300

;

run;

proc sort data=have;

by descending Balance;

run;

data result;

do i= 1 to 5, 5 to 1 until(done);

set have end=done;

group = i;

output;

end;

drop i;

run;

My output

1 6000 1

2 5600 2

3 5500 3

4 5300 4

5 5200 5

6 5200 1

7 5100 2

8 5000 3

9 4800 4

10 4500 5

Expected output

1 6000 1

2 5600 2

3 5500 3

4 5300 4

5 5200 5

6 5200 5

7 5100 4

8 5000 3

9 4800 2

10 4500 1

Your example code explicitly states 5 as a group. Would it better to say that you want HALF of the values in descending order numbered in increasing order until half are numbered and then start decreasing the group variable? Or is your data always exactly 10 records in the data set? If the number of observations may ever be odd you need describe how the group gets assigned for the observation that can't have a group partner.

This seems to work for EVEN numbers of observations.

data want;
  set have nobs=numobs;
  retain group;
  if _n_ le numobs/2 then do;
     group=mod(_n_-1,numobs/2)+1;
     output;
     if _n_ = numobs/2 then group+1;
  end;
  else do ;
     group=group-1;
     output;
  end;
run;

sunny86 · Posted 06-17-2023 02:48 PM

@ballardw I have a datset of 649 observations i want to alternate the sequence in descending order.
I am sorting the dataset in descending order and i want to assing the balances to the 5 groups such that the average balance across the groups will be close to similar
I am achieving it by assigning the balances to groups
So in the example dataset I want to alternate the Groups
In the first iteration the sequence went from 1 to 5 and then next iteration it should assign from 5 to 1

ballardw · Posted 06-17-2023 03:34 PM

@sunny86 wrote:

@ballardw I have a datset of 649 observations i want to alternate the sequence in descending order.
I am sorting the dataset in descending order and i want to assing the balances to the 5 groups such that the average balance across the groups will be close to similar
I am achieving it by assigning the balances to groups
So in the example dataset I want to alternate the Groups
In the first iteration the sequence went from 1 to 5 and then next iteration it should assign from 5 to 1

Still not a particularly clear objective. You seem to spend as much time concerned with an approach as describing the result. Is the result to have N, where N is some number you pick, such as 5 groups each having a similar (that definition may need some clarification) mean value of Balance?

BTW, I still see no mention of what your approach would do with the odd number of observations.

An example of 10 observations may be obscuring other options that may be more robust with a larger data set. Group sizes of 2 from 10 observations would require lots of control. Group sizes of over 100, such as 5 groups from 649 observations are likely amenable to other approaches such as random selection.

Consider running the following code against your larger data set and see what the means are for the groups of balance values.

proc surveyselect data=have groups=5 out=test
;
run;

proc means data=test mean;
   class groupid;
   var balance;
run;

sunny86 · Posted 06-17-2023 04:43 PM

@ballardw

What i am trying to achieve is

I have 5 departments and i have to assign the 649 observations to the groups in such a way that the variance between the groups is minimal for both balance and score.

I tried with balance of reverse sequence

Data want;

do i = 1 to &Dep;

set Sample (keep=Department Score Balance Uniq_no) end=done;

group = i;

output;

end;

do i = &Dep to 1 by -1 until(done);

set Sample (keep=Department Score Balance Uniq_no) end=done;

group = i;

output;

end;

drop i;

ballardw · Posted 06-17-2023 06:54 PM

@sunny86 wrote:

@ballardw

What i am trying to achieve is

I have 5 departments and i have to assign the 649 observations to the groups in such a way that the variance between the groups is minimal for both balance and score.

I tried with balance of reverse sequence

Data want;

do i = 1 to &Dep;

set Sample (keep=Department Score Balance Uniq_no) end=done;

group = i;

output;

end;

do i = &Dep to 1 by -1 until(done);

set Sample (keep=Department Score Balance Uniq_no) end=done;

group = i;

output;

end;

drop i;

There is no guarantee that what you are doing will do what you want. In fact by having the group=1 , at least with a sample size of two by having the largest and smallest value of Balance in the same group you increase the variance while the group=5 is more likely to have a much smaller variance than the other groups.

I ran proc means on the data set that you want to create. Note that the variance is larger for group 1 than for group 5.

Analysis Variable : Balance
group	N Obs	Mean	Variance	Std Dev
1	2	5250.00	1125000.00	1060.66
2	2	5200.00	320000.00	565.6854249
3	2	5250.00	125000.00	353.5533906
4	2	5200.00	20000.00	141.4213562
5	2	5200.00	0	0

You also have described how this algorithm is to applied to a data set of 649 observations. Or how many actual groups you want.

Did you even test the example code on your whole data set? You get see the variance or standard deviation between the groups by adding VAR and STD to the Means statement. Then you can see what the means and variance are. If you want to you could create hundreds of output datasets from Surveyselect, test each one for the "variances" , which I suspect that you might just be using incorrectly, and pick a set that has the lowest mean "variance".

Astounding · Posted 06-17-2023 07:59 PM

Logically, I think this is the right way to get what you are describing. After sorting:

data want;
   do until (5=4);
      do i=1 to 5, 5 to 1 by -1;
         set have;
         output;
      end;
   end;
run;

It might seem strange to use 5=4 as the UNTIL condition. That just means the DATA step keeps on going until the SET statement ends it by running out of observations to read.

There are other legitimate questions to ask. Is this the best statistical way to divide up your data? (Probably not.) Why didn't your earlier attempt with two SET statements work? (Each SET statement operates independently of the other. Both start reading the first observation from the incoming data, so I expect you have 649 + 645 observations from that attempt.) Those questions can be addressed in more detail. But for now, the above program should get you what you asked for.

Tom · Posted 06-17-2023 11:36 PM

Exactly.

But you don't need that extra DO loop. The data step itself will do that for you.

Astounding · Posted 06-18-2023 07:58 AM

@Tom , you're absolutely right. Good point.

After sorting, the "decluttered" version of my solution would be:

data want;
   do i=1 to 5, 5 to 1 by -1;
      set have;
      output;
   end;
run;

If HAVE runs out of observations before a loop completes, that would be a normal end to the DATA step.

Ksharp · Posted 06-19-2023 07:52 AM

data want;
 set have;
 id=mod(_n_,5);
 if id=0 then id=5;
 if id=1 then group+1;
 if mod(group,2)=0 then id=6-id;
run;

Read balances (in decreasing order) and assign them to groups alternatively in sequence and reverse

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Re: Read balances (in decreasing order) and assign them to groups alternatively in sequence and reve

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!