Has anyone written a SAS program to solve the Subset Sum Problem? Given a reasonably small set of numbers, is there some combination of these numbers that sum up to a particular number?
Here's a very simple example:
We have the numbers 1, 4, 6, 6, 7, 19, and 23.
What combinations of these numbers will give us the number 12?
Solutions:
1 + 4 + 7 = 12
6 + 6 = 12
Thanks much,
David Oesper
P.S. I don't currently have access to SAS/IML at work, but I do at home. I would prefer a Base SAS solution, if possible, but any and all approaches using any SAS product will be appreciated!
If you sort the data, you can do something smarter than brute force, but I'm guessing this exercise if more for puzzles than work, so here is a brute force solution:
proc iml;
items = {1, 4, 6, 6, 7, 19, 23};
target = 12;
/* preliminary step: get rid of any
elements greater than the target */
items = items[ loc(items<=target) ];
N = nrow(items);
do k = 1 to N;
a = allcomb(N, k); /* all combs taken k at a time */
m = shape(items[a], 0, k); /* make matrix with k rows */
idx = loc(m[,+] = target); /* rows whose sum = target */
if ncol(idx)>0 then
print (m[idx,]);
end;
Probably @Ksharp or @FriedEgg can whip up something fast in Base SAS.
If you sort the data, you can do something smarter than brute force, but I'm guessing this exercise if more for puzzles than work, so here is a brute force solution:
proc iml;
items = {1, 4, 6, 6, 7, 19, 23};
target = 12;
/* preliminary step: get rid of any
elements greater than the target */
items = items[ loc(items<=target) ];
N = nrow(items);
do k = 1 to N;
a = allcomb(N, k); /* all combs taken k at a time */
m = shape(items[a], 0, k); /* make matrix with k rows */
idx = loc(m[,+] = target); /* rows whose sum = target */
if ncol(idx)>0 then
print (m[idx,]);
end;
Probably @Ksharp or @FriedEgg can whip up something fast in Base SAS.
Thanks, Rick! I'll give this a try at home and perhaps someone else will post a Base SAS solution as you suggest. By the way, I have really been enjoying your blog postings that feature various mathematical topics. Kudos!
Thanks again,
Dave
Here's the DATA step version. First, note that there is a somewhat natural inclination to find all combinations of 2, then all combinations of 3, etc. That can be done, but it's harder and not necessary. Assuming that your data set contains var1-var6, and you want all combinations that add to 12:
data want;
set have;
do include_v1=0, 1;
do include_v2=0, 1;
do include_v3=0, 1;
do include_v4=0, 1;
do include_v5=0, 1;
do include_v6=0, 1;
total = include_v1 * var1 + include_v2 * var2 + include_v3 * var3 + include_v4 * var4 + include_v5 * var5 + include_v6 * var6;
if total=12 then output;
end; end; end; end; end; end;
run;
You can always add arrays. You can always change the spacing and indentation. You can always add macro language. But whether you do or not, the trick is to treat this as a set of binary conditions rather than a variety of combinations.
Good luck.
From a mathematical perspective, @Astounding's solution has an interesting interpretation in terms of 0/1 matrices. when the set of numbers contains k elements, the matrix formulation of his technique is to create a 0/1 matrix with k columns in which the rows cover all possible combinations. Each of the first k rows have exactly one 1. Each of the next "k choose 2" rows have exactly two 1s, and so forth until the last row contains all 1s. You can implement Astounding's idea in a matrix language by using the ideas in the article "Creating a matrix with all combinations of zeros and ones".
The same article has a formulation that is appropriate for a DATA step implementation that does not require k nested loops. This formulation (without the nested loops) can support a set with an arbitrary number of elements (within reason, say k <= 18). This is probably similar to what Astounding was thinking with the statement "you can always add arrays."
RIck,
Just to illustrate some of the techniques I had in mind ... what if you had 25 variables instead of 6? Macro language might generate the loops:
%macro loop_begins (n_loops=);
%local i;
%do i=1 %to &n_loops;
do include_v&i=0, 1;
%end;
%mend loop_begins;
%loop_begins (n_loops=25)
Arrays might come in handy for calculating TOTAL.
array flags {25} include_v1 - include_v25;
array vars {25} var1-var25;
total=0;
do i=1 to 25;
total = total + flags{i} * vars{i};
end;
if total=12 then output;
The logic doesn't really change, just the way you generate the needed statements changes.
Here's a way to do it by using the CLP solver in PROC OPTMODEL in SAS/OR:
proc optmodel;
num n = 7;
num a {1..n} = [1, 4, 6, 6, 7, 19, 23];
num target = 12;
var X {1..n} binary;
con sum {j in 1..n} a[j]*X[j] = target;
solve with CLP / findallsolns;
for {s in 1.._NSOL_} print {j in 1..n: X[j].sol[s] > 0} a[j];
quit;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.