Dear all,
I have a dataset like this:
data have;
input id target var1 var2 var3 var4 var5 var6;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
run;
for each id I want to find all possible combinations of variables (var1-var6) whose sum among them is equal to target. Missing values do not have to enter in any combination.
The result I am looking for is something like:
data want;
input id var $ value sol1 sol2 sol3 ;
cards;
1 var1 -2000 0 0 1
1 var2 10000 1 0 0
1 var3 3000 0 1 0
1 var4 7000 0 1 0
1 var5 12000 0 0 1
1 var6 30 0 0 0
2 var1 5000 1 0 .
2 var2 1500 1 0 .
2 var3 500 1 1 .
2 var4 6500 0 1 .
2 var5 750 0 0 .
2 var6 . 0 0 .
;
run;
each variable sol (sol1-sol3) is equal to 1 if the corresponding var has to be considered in the sum.
any suggestion is very welcome.
Moreover it is very appreciated any advice on alternative structures of have and/or want data
Thank you in advance.
Hello @ciro,
The suggestion below is built around the GRAYCODE function, which generates all subsets of {VAR1, ..., VAR6}. It creates a WANT dataset in a different structure: one observation per solution with binary variables W1, ..., W6 indicating which of the VARi are included in the sum.
data _null_;
if 0 then set have;
array _[*] var:;
call symputx('n',dim(_));
stop;
run;
data want(keep=id target var: w:);
set have;
array v[*] var:;
array w[&n];
k=-1;
do j=1 to 2**&n;
r=graycode(k, of w[*]);
e=1; /* eligibility flag to exclude missing values */
do i=1 to &n while(e);
if w[i]=1 & v[i]=. then e=0;
end;
if e then do;
s=0;
do i=1 to &n;
if w[i]=1 then s+v[i];
end;
if s=target then output;
end;
end;
run;
Edit: If your real data contain non-integer VARi values, make sure to round the sum S appropriately when comparing it to TARGET in order to avoid surprises (due to numeric precision issues) as shown in the log below:
800 data _null_;
801 retain var1-var3 target (33.3 33.4 33.3 100);
802 s=sum(of var:);
803 if s ne target then put 'Surprised?';
804 if round(s, 1e-9)=target then put 'OK!';
805 run;
Surprised?
OK!
That GRAYCODE function is interesting.
Why not use it to generate a score dataset that you could use with PROC SCORE?
%let n=6;
data have;
input id target var1-var&n ;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
data score;
_type_='SCORE';
length _name_ $32 ;
array var[&n] ;
nterms=-1;
do j=1 to 2**&n;
r=graycode(nterms,of var[*]);
_name_=cats('X',of var[*]);
if nterms then output;
end;
keep _type_ _name_ var: ;
run;
proc score data=have score=score out=wide;
var var:;
id id target;
run;
Then you could use PROC TRANSPOSE to get to something similar to the requested output.
proc transpose data=wide name=inputs out=solutions(where=(col1=target));
by id target;
var x: ;
run;
data solutions;
sol+1;
set solutions;
by id target ;
output;
if last.target then sol=0;
drop col1;
run;
Results
@Tom wrote:
That GRAYCODE function is interesting.
Why not use it to generate a score dataset that you could use with PROC SCORE?
That's a great idea which makes the code more efficient by avoiding unnecessary calls of the GRAYCODE function.
@ciro: To exclude the trivial solution W1=W2=...=W6=0 in case of TARGET=0 with my DATA step approach (as Tom's code does already), insert a conditional CONTINUE statement as shown below:
... do j=1 to 2**&n; r=graycode(k, of w[*]); if r=0 then continue; ...
@FreelanceReinh has already given you the GRAYCODE() solution.
But I would suggest to use a more sophisticated solution SAS/OR and calling @RobPratt
The following code is for your first record 's solution . and I found 3 solutions:
And I believe @RobPratt could give you better code , the condition is you have SAS/OR software.
data have;
input v $ Column1;
cards;
var1 -2000
var2 10000
var3 3000
var4 7000
var5 12000
var6 30
;
proc optmodel;
set idx ;
str var{idx};
num x{idx};
var v{idx} binary;
read data have into idx=[_n_] var=v x=Column1 ;
con con1:sum{i in idx} v[i]*x[i]=10000 ;
solve with clp/findallsolns;
create data want from [solution index]={s in 1.._NSOL_,i in idx} vname=var[i] value=x[i] flag=v[i].sol[s];
quit;
proc print;run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.