BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dustychair
Pyrite | Level 9

Hi,

I have a data set look like below:

 

district       number     

d1                10000     

d2                  1000       

d3                    500       

d4                   1002       

d5                  700         

d6                 10070      

d7                   5            

d8                 9800

 

I want to sum possible districts where the total number is close to 11077. For example, the sum of d1 and d2 is 11000 and it's close to 11077. Same thing for d4, d6, and d7 (their total is 11077). It doesn't need to be only two districts, it can be more than two. The only things I need are all districts should have a flag number and the total should be close to 11070. I have 70 districts in my data set.

 

district       number     flag

d1                10000      1

d2                  1000       1

d3                    500       2

d4                   1002       3

d5                  700         2

d6                 10070      3

d7                   5            3

d8                  9800      2

 

 

Thanks for any comment and help!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Do you have SAS/QC ?

Check it out.

 

https://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

 

Or you can use my GA code under Rick blog's comment .If you need I can post my GA code .

 

data have;
input district    $   number;     
cards;
d1                10000     
d2                  1000       
d3                    500       
d4                   1002       
d5                  700         
d6                 10070      
d7                   5            
d8                 9800
;

 
%let NumGroups =3;           /* number of treatment groups */
data Treatments;
do Trt = 1 to &NumGroups;    /* Trt is variable that assigns patients to groups */
   output;
end;
run;
%let Var = number;      /* name of covariate */
proc optex data=Treatments seed=97531 coding=orthcan;
   class Trt;
   model Trt;              /* specify treatment model */
   blocks design=have;    /* specify units */
   model &Var ;   /* fixed covariates: &Var--> mean, &Var*&Var--> variance */
   output out=Groups;      /* merged data: units assigned to groups */
run;
 
proc means data=Groups mean std sum;
  class Trt;
  var &Var;
run;
                  N
                          Trt    Obs            Mean         Std Dev             Sum
                 -------------------------------------------------------------------
                            1      3         3857.33         5386.18        11572.00

                            2      3         3900.00         5284.88        11700.00

                            3      2         4902.50         6926.11         9805.00
                 -------------------------------------------------------------------

View solution in original post

9 REPLIES 9
Reeza
Super User
How do you define close? Does the order matter, does this need to be an 'optimal solution' or any groups that get you close are good enough?
dustychair
Pyrite | Level 9

It can exceed 11070 but not much. The order doesn't matter. I don't know what 'optimal solution' means but being close this number is enough for me.

 

Thanks

Reeza
Super User
Optimal solution would mean picking the groups such that the difference to the 11070 is minimized and the usage of data is optimized such that if you can choose a group that will add to 11070 and one that will add to 11060 it's smart enough to do the first option. Just use a running sum then and as you get over to 11070 you can reset the counter.

PaigeMiller
Diamond | Level 26

@dustychair wrote:

As long as it doesn't exceed 11070, it is fine. The order doesn't matter. I don't know what 'optimal solution' means but being close this number is enough for me.


Not exceeding 11070 does not define "close". 

--
Paige Miller
Tom
Super User Tom
Super User

If you don't need to optimize the solution then you could just process them in the order they appear.

%let upperlimit=11077;

data want ;
  group + 1;
  do until(&upperlimit < sum(cum,next_val));
    set have ;
    set have(keep=number rename=(number=next_val) firstobs=2) have(obs=1 drop=_all_);
    cum=sum(cum,number);
    output;
  end;
run;
Obs    group     cum     next_val    district    number

 1       1      10000       1000        d1        10000
 2       1      11000        500        d2         1000
 3       2        500       1002        d3          500
 4       2       1502        700        d4         1002
 5       2       2202      10070        d5          700
 6       3      10070          5        d6        10070
 7       3      10075       9800        d7            5
 8       4       9800          .        d8         9800
dustychair
Pyrite | Level 9

Hmm..Your code didn't give any error and it works but now I understand what you and @Reeza  mean by optimal solution. I think I need optimal solution. Can we modify this code or we need to start from the beginning?

 

Thanks

 

Shmuel
Garnet | Level 18

I would try next steps:

 

1) sort the data set by ascending number giving temp1

2) sort the data set by descending number giving temp2

3) merge temp1, temp2 and sum each pair (the maximum with the minimum)

4) split the result data set into: those couples that satisfy the condition, and the REST that dissatisfy

5) repeat steps 1-4 on the REST data set until all district are included

Ksharp
Super User

Do you have SAS/QC ?

Check it out.

 

https://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html

 

Or you can use my GA code under Rick blog's comment .If you need I can post my GA code .

 

data have;
input district    $   number;     
cards;
d1                10000     
d2                  1000       
d3                    500       
d4                   1002       
d5                  700         
d6                 10070      
d7                   5            
d8                 9800
;

 
%let NumGroups =3;           /* number of treatment groups */
data Treatments;
do Trt = 1 to &NumGroups;    /* Trt is variable that assigns patients to groups */
   output;
end;
run;
%let Var = number;      /* name of covariate */
proc optex data=Treatments seed=97531 coding=orthcan;
   class Trt;
   model Trt;              /* specify treatment model */
   blocks design=have;    /* specify units */
   model &Var ;   /* fixed covariates: &Var--> mean, &Var*&Var--> variance */
   output out=Groups;      /* merged data: units assigned to groups */
run;
 
proc means data=Groups mean std sum;
  class Trt;
  var &Var;
run;
                  N
                          Trt    Obs            Mean         Std Dev             Sum
                 -------------------------------------------------------------------
                            1      3         3857.33         5386.18        11572.00

                            2      3         3900.00         5284.88        11700.00

                            3      2         4902.50         6926.11         9805.00
                 -------------------------------------------------------------------
dustychair
Pyrite | Level 9

Awesome! Thank you so much!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2030 views
  • 3 likes
  • 6 in conversation