Hi,
I have a data set look like below:
district number
d1 10000
d2 1000
d3 500
d4 1002
d5 700
d6 10070
d7 5
d8 9800
I want to sum possible districts where the total number is close to 11077. For example, the sum of d1 and d2 is 11000 and it's close to 11077. Same thing for d4, d6, and d7 (their total is 11077). It doesn't need to be only two districts, it can be more than two. The only things I need are all districts should have a flag number and the total should be close to 11070. I have 70 districts in my data set.
district number flag
d1 10000 1
d2 1000 1
d3 500 2
d4 1002 3
d5 700 2
d6 10070 3
d7 5 3
d8 9800 2
Thanks for any comment and help!
Do you have SAS/QC ?
Check it out.
https://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html
Or you can use my GA code under Rick blog's comment .If you need I can post my GA code .
data have;
input district $ number;
cards;
d1 10000
d2 1000
d3 500
d4 1002
d5 700
d6 10070
d7 5
d8 9800
;
%let NumGroups =3; /* number of treatment groups */
data Treatments;
do Trt = 1 to &NumGroups; /* Trt is variable that assigns patients to groups */
output;
end;
run;
%let Var = number; /* name of covariate */
proc optex data=Treatments seed=97531 coding=orthcan;
class Trt;
model Trt; /* specify treatment model */
blocks design=have; /* specify units */
model &Var ; /* fixed covariates: &Var--> mean, &Var*&Var--> variance */
output out=Groups; /* merged data: units assigned to groups */
run;
proc means data=Groups mean std sum;
class Trt;
var &Var;
run;
N
Trt Obs Mean Std Dev Sum
-------------------------------------------------------------------
1 3 3857.33 5386.18 11572.00
2 3 3900.00 5284.88 11700.00
3 2 4902.50 6926.11 9805.00
-------------------------------------------------------------------
It can exceed 11070 but not much. The order doesn't matter. I don't know what 'optimal solution' means but being close this number is enough for me.
Thanks
@dustychair wrote:
As long as it doesn't exceed 11070, it is fine. The order doesn't matter. I don't know what 'optimal solution' means but being close this number is enough for me.
Not exceeding 11070 does not define "close".
If you don't need to optimize the solution then you could just process them in the order they appear.
%let upperlimit=11077;
data want ;
group + 1;
do until(&upperlimit < sum(cum,next_val));
set have ;
set have(keep=number rename=(number=next_val) firstobs=2) have(obs=1 drop=_all_);
cum=sum(cum,number);
output;
end;
run;
Obs group cum next_val district number 1 1 10000 1000 d1 10000 2 1 11000 500 d2 1000 3 2 500 1002 d3 500 4 2 1502 700 d4 1002 5 2 2202 10070 d5 700 6 3 10070 5 d6 10070 7 3 10075 9800 d7 5 8 4 9800 . d8 9800
Hmm..Your code didn't give any error and it works but now I understand what you and @Reeza mean by optimal solution. I think I need optimal solution. Can we modify this code or we need to start from the beginning?
Thanks
I would try next steps:
1) sort the data set by ascending number giving temp1
2) sort the data set by descending number giving temp2
3) merge temp1, temp2 and sum each pair (the maximum with the minimum)
4) split the result data set into: those couples that satisfy the condition, and the REST that dissatisfy
5) repeat steps 1-4 on the REST data set until all district are included
Do you have SAS/QC ?
Check it out.
https://blogs.sas.com/content/iml/2017/05/01/split-data-groups-mean-variance.html
Or you can use my GA code under Rick blog's comment .If you need I can post my GA code .
data have;
input district $ number;
cards;
d1 10000
d2 1000
d3 500
d4 1002
d5 700
d6 10070
d7 5
d8 9800
;
%let NumGroups =3; /* number of treatment groups */
data Treatments;
do Trt = 1 to &NumGroups; /* Trt is variable that assigns patients to groups */
output;
end;
run;
%let Var = number; /* name of covariate */
proc optex data=Treatments seed=97531 coding=orthcan;
class Trt;
model Trt; /* specify treatment model */
blocks design=have; /* specify units */
model &Var ; /* fixed covariates: &Var--> mean, &Var*&Var--> variance */
output out=Groups; /* merged data: units assigned to groups */
run;
proc means data=Groups mean std sum;
class Trt;
var &Var;
run;
N
Trt Obs Mean Std Dev Sum
-------------------------------------------------------------------
1 3 3857.33 5386.18 11572.00
2 3 3900.00 5284.88 11700.00
3 2 4902.50 6926.11 9805.00
-------------------------------------------------------------------
Awesome! Thank you so much!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.