Solved: conditional sum

dustychair · Posted 07-23-2019 03:00 PM

Hi,

I have a data set look like below:

district number

d1 10000

d2 1000

d3 500

d4 1002

d5 700

d6 10070

d7 5

d8 9800

I want to sum possible districts where the total number is close to 11077. For example, the sum of d1 and d2 is 11000 and it's close to 11077. Same thing for d4, d6, and d7 (their total is 11077). It doesn't need to be only two districts, it can be more than two. The only things I need are all districts should have a flag number and the total should be close to 11070. I have 70 districts in my data set.

district number flag

d1 10000 1

d2 1000 1

d3 500 2

d4 1002 3

d5 700 2

d6 10070 3

d7 5 3

d8 9800 2

Thanks for any comment and help!

Ksharp · Posted 07-24-2019 08:05 AM

Do you have SAS/QC ?

Check it out.

https://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

Or you can use my GA code under Rick blog's comment .If you need I can post my GA code .

data have;
input district    $   number;     
cards;
d1                10000     
d2                  1000       
d3                    500       
d4                   1002       
d5                  700         
d6                 10070      
d7                   5            
d8                 9800
;

 
%let NumGroups =3;           /* number of treatment groups */
data Treatments;
do Trt = 1 to &NumGroups;    /* Trt is variable that assigns patients to groups */
   output;
end;
run;
%let Var = number;      /* name of covariate */
proc optex data=Treatments seed=97531 coding=orthcan;
   class Trt;
   model Trt;              /* specify treatment model */
   blocks design=have;    /* specify units */
   model &Var ;   /* fixed covariates: &Var--> mean, &Var*&Var--> variance */
   output out=Groups;      /* merged data: units assigned to groups */
run;
 
proc means data=Groups mean std sum;
  class Trt;
  var &Var;
run;

                  N
                          Trt    Obs            Mean         Std Dev             Sum
                 -------------------------------------------------------------------
                            1      3         3857.33         5386.18        11572.00

                            2      3         3900.00         5284.88        11700.00

                            3      2         4902.50         6926.11         9805.00
                 -------------------------------------------------------------------

View solution in original post

Reeza · Posted 07-23-2019 03:35 PM

How do you define close? Does the order matter, does this need to be an 'optimal solution' or any groups that get you close are good enough?

dustychair · Posted 07-23-2019 03:38 PM

It can exceed 11070 but not much. The order doesn't matter. I don't know what 'optimal solution' means but being close this number is enough for me.

Thanks

Reeza · Posted 07-23-2019 03:42 PM

Optimal solution would mean picking the groups such that the difference to the 11070 is minimized and the usage of data is optimized such that if you can choose a group that will add to 11070 and one that will add to 11060 it's smart enough to do the first option. Just use a running sum then and as you get over to 11070 you can reset the counter.

PaigeMiller · Posted 07-23-2019 03:44 PM

@dustychair wrote:

As long as it doesn't exceed 11070, it is fine. The order doesn't matter. I don't know what 'optimal solution' means but being close this number is enough for me.

Not exceeding 11070 does not define "close".

--
Paige Miller

Tom · Posted 07-23-2019 03:48 PM

If you don't need to optimize the solution then you could just process them in the order they appear.

%let upperlimit=11077;

data want ;
  group + 1;
  do until(&upperlimit < sum(cum,next_val));
    set have ;
    set have(keep=number rename=(number=next_val) firstobs=2) have(obs=1 drop=_all_);
    cum=sum(cum,number);
    output;
  end;
run;

Obs    group     cum     next_val    district    number

 1       1      10000       1000        d1        10000
 2       1      11000        500        d2         1000
 3       2        500       1002        d3          500
 4       2       1502        700        d4         1002
 5       2       2202      10070        d5          700
 6       3      10070          5        d6        10070
 7       3      10075       9800        d7            5
 8       4       9800          .        d8         9800

dustychair · Posted 07-23-2019 11:20 PM

Hmm..Your code didn't give any error and it works but now I understand what you and @Reeza mean by optimal solution. I think I need optimal solution. Can we modify this code or we need to start from the beginning?

Thanks

Shmuel · Posted 07-23-2019 04:26 PM

I would try next steps:

1) sort the data set by ascending number giving temp1

2) sort the data set by descending number giving temp2

3) merge temp1, temp2 and sum each pair (the maximum with the minimum)

4) split the result data set into: those couples that satisfy the condition, and the REST that dissatisfy

5) repeat steps 1-4 on the REST data set until all district are included

Ksharp · Posted 07-24-2019 08:05 AM

Do you have SAS/QC ?

Check it out.

https://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

Or you can use my GA code under Rick blog's comment .If you need I can post my GA code .

data have;
input district    $   number;     
cards;
d1                10000     
d2                  1000       
d3                    500       
d4                   1002       
d5                  700         
d6                 10070      
d7                   5            
d8                 9800
;

 
%let NumGroups =3;           /* number of treatment groups */
data Treatments;
do Trt = 1 to &NumGroups;    /* Trt is variable that assigns patients to groups */
   output;
end;
run;
%let Var = number;      /* name of covariate */
proc optex data=Treatments seed=97531 coding=orthcan;
   class Trt;
   model Trt;              /* specify treatment model */
   blocks design=have;    /* specify units */
   model &Var ;   /* fixed covariates: &Var--> mean, &Var*&Var--> variance */
   output out=Groups;      /* merged data: units assigned to groups */
run;
 
proc means data=Groups mean std sum;
  class Trt;
  var &Var;
run;

                  N
                          Trt    Obs            Mean         Std Dev             Sum
                 -------------------------------------------------------------------
                            1      3         3857.33         5386.18        11572.00

                            2      3         3900.00         5284.88        11700.00

                            3      2         4902.50         6926.11         9805.00
                 -------------------------------------------------------------------

dustychair · Posted 07-24-2019 02:27 PM

Awesome! Thank you so much!

conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Re: conditional sum

Registration is open

SAS Training: Just a Click Away