You could try something like this, here with example data:
data have;
do id=1 to 253;
payment=mod(id,19)+1;
output;
end;
run;
proc sort data=have;
by payment;
run;
%let wanted=43; /* the sum that you are looking for */
data want;
array payments (1000) 8 _temporary_;
do _N_=1 to nobs;
set have nobs=nobs;
payments(_N_)=payment;
end;
array idx 8 idx1-idx6;
%macro check;
%local comb;
%do comb=2 %to 6;
call missing(of idx1-idx&comb);
ncomb=comb(nobs,&comb);
do i=1 to ncomb;
rc=lexcombi(nobs,&comb,of idx1-idx&comb);
sum=0;
do _N_=1 to &comb;
sum+payments(idx(_N_));
end;
if round(sum,0.01)>&wanted then leave;
if round(sum,0.01)=&wanted then output;
end;
%end;
%mend;
%check;
keep idx1-idx6;
run;
Sorting is necessary to break out of the loop as soon as values get too large, otherwise it will take forever. (I just changed the sort to be by payment, not ID, as @FreelanceReinh reminded me, the other was a slip of fingers on the keyboard).
The variables idx1-idx6 should contain the observation numbers (in the sorted data) of observations that add up to &wanted.
The ROUND function calls are not necessary in this example, as the test data has no decimals. But with decimals, you may have to round in order to catch all the possible fits.
The reason for the macro construct is only that if you have too many indices in the LEXCOMBI call, they will all be set to missing, so it cannot be done in a data step loop.
You could try something like this, here with example data:
data have;
do id=1 to 253;
payment=mod(id,19)+1;
output;
end;
run;
proc sort data=have;
by payment;
run;
%let wanted=43; /* the sum that you are looking for */
data want;
array payments (1000) 8 _temporary_;
do _N_=1 to nobs;
set have nobs=nobs;
payments(_N_)=payment;
end;
array idx 8 idx1-idx6;
%macro check;
%local comb;
%do comb=2 %to 6;
call missing(of idx1-idx&comb);
ncomb=comb(nobs,&comb);
do i=1 to ncomb;
rc=lexcombi(nobs,&comb,of idx1-idx&comb);
sum=0;
do _N_=1 to &comb;
sum+payments(idx(_N_));
end;
if round(sum,0.01)>&wanted then leave;
if round(sum,0.01)=&wanted then output;
end;
%end;
%mend;
%check;
keep idx1-idx6;
run;
Sorting is necessary to break out of the loop as soon as values get too large, otherwise it will take forever. (I just changed the sort to be by payment, not ID, as @FreelanceReinh reminded me, the other was a slip of fingers on the keyboard).
The variables idx1-idx6 should contain the observation numbers (in the sorted data) of observations that add up to &wanted.
The ROUND function calls are not necessary in this example, as the test data has no decimals. But with decimals, you may have to round in order to catch all the possible fits.
The reason for the macro construct is only that if you have too many indices in the LEXCOMBI call, they will all be set to missing, so it cannot be done in a data step loop.
@s_lassen: Nice approach! Wise to use the ROUND function.
I assume the intention was to sort by payment, not by id.
I'd suggest modifying the stopping rule a bit, e.g.
if round(sum,0.01)>&wanted & i>&maxiter then leave;
where macro variable maxiter would be defined as a "large" integer, e.g. 1e7. (There might be more efficient stopping rules than this.)
Maybe my test dataset, shown below, containing random numbers (or the fact that I chose the uniform distribution to generate them) was unfortunate, but with the original stopping rule no matching sum was found.
data have;
call streaminit(27182818);
do _n_=1 to 253;
payment=round(rand('uniform',1,115.4),.01);
output;
end;
run;
proc print data=have(obs=4);
sum payment;
run; /* --> wanted=252.22 */
proc sort data=have;
by payment;
run;
Without a stopping rule 9511 combinations (of 2 [did not occur], 3 or 4 payments -- I omitted the checks of 5 and 6) summing to 252.22 were found, with the modified stopping rule (using maxiter=1e7) at least 372 of them (not including the a priori "known" combination, though). Run times were <20 and <2 seconds, respectively.
@robulon: The above example indicates that you might get too many solutions: thousands or even millions (see below!) of different combinations summing to 182.87. This is because the number of combinations exceeds the number of possible sums by several orders of magnitude (assuming that all payment values are integer multiples of 0.01). A coarse lower bound for the average number of combinations adding up to the same given sum can be obtained as follows:
data _null_;
do k=2 to 6;
s+comb(253,k)/(k*11540);
end;
put s=;
run; /* Result: s=5103162.1672 */
So, it would be surprising if none of the sums of 2, 3 or 4 payments were equal to 182.87. (And these cases are manageable with s_lassen's solution without the stopping rule.) Even if that happened, you could proceed by selecting one combination (of 4 payments) per distinct sum and combine these (<50,000) sums, restricted to sums <182.87, in each case with the remaining 249 payment values. Thus (most likely), you'll find a suitable combination of 5 payments and then many more when you go back to the pertinent sum of 4 payments and select all combinations leading to that particular sum.
Yes, of course, the data should be sorted by payment. I shall edit my post to show the right variable.
Thank you @s_lassen and @FreelanceReinhard, and apologies for not expressing my thanks earlier, I got a bit caught up in other matters. I've tried to resolve my issue using this solution but it appears to be falling down somewhere. Although I now know the figures that make up the total (obtained through other means but would still be great for future reference to get this sorted) are 75.9 and 106.97, the code does not identify this particular combination. In fact, it does not identify any combinations at all. I have tried amending the code using round(,0.1) and this does return some combinations but not the exact one.
I've tried some more testing putting the figures in as datalines to reduce the number of figures but it still isn't working. Would you be able to suggest where I'm going wrong? I'm wondering if it is something to do with the decimal places but have tried multiplying all the figures by 100 to turn them all into integers but am still not arriving at the solution.
This is what I'm using
data amounts;
input amount;
datalines;
75.9
106.97
100
255
643.59
251.25
;
run;
%let outstanding = 187.87;
data want;
array payments (1000) 8 _temporary_;
do _N_= 1 to nobs;
set amounts nobs = nobs;
payments(_N_) = amount;
end;
array idx 8 idx1 - idx6;
%macro check;
%local comb;
%do comb = 2 %to 6;
call missing(of idx1 - idx&comb.);
ncomb = comb(nobs, &comb.);
do i = 1 to ncomb;
rc = lexcombi(nobs, &comb., of idx1 - idx&comb.);
sum = 0;
do _N_ = 1 to &comb.;
sum + payments(idx(_N_));
end;
if round(sum,0.01) > &outstanding. then leave;
if round(sum,0.01) = &outstanding. then output;
end;
%end;
%mend check;
%check;
keep idx1 - idx6;
run;
As always, any advice will be gratefully received.
Rob
In your new example note that 75.9+106.97=182.87, not 187.87.
For the general case: Have you tried loosening (or even omitting) the stopping rule as I suggested in my previous post?
Oh dear, how embarrassing Can't even try the excuse that it was a slip of the fingers as 2 and 7 are nowhere near each other!
The loosening of the stopping rule seems to have worked. I've created a macro variable of the target amount earlier in the code using sql to calculate the difference between the expected amount and the actual amount and have use the code
if sum > &outstanding. + 9 then leave;
Interestingly enough (for me anyway), if I added a value that was any less than 9 to the macro variable, no results were returned whereas if I use 9 or higher, it correctly identifies the amounts.
Huge thanks for the help people, always impressed by the number of members who are happy to share their expertise and help us less knowledgeable types develop their skills. next step for me is to work my way through the code so that I understand how it's doing what it's doing.
Thanks again,
Rob
Please post test-data in usable form. And have a look at https://communities.sas.com/t5/SAS-Programming/How-do-I-Find-the-Combinations-that-Equal-a-Given-Sum....
It looks like a OR problem. Try post it at OR forum and @RobPratt is there .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.