Dear all,
I have a dataset like this:
data have;
input id target var1 var2 var3 var4 var5 var6;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
run;
for each id I want to find all possible combinations of variables (var1-var6) whose sum among them is equal to target. Missing values do not have to enter in any combination.
The result I am looking for is something like:
data want;
input id var $ value sol1 sol2 sol3 ;
cards;
1 var1 -2000 0 0 1
1 var2 10000 1 0 0
1 var3 3000 0 1 0
1 var4 7000 0 1 0
1 var5 12000 0 0 1
1 var6 30 0 0 0
2 var1 5000 1 0 .
2 var2 1500 1 0 .
2 var3 500 1 1 .
2 var4 6500 0 1 .
2 var5 750 0 0 .
2 var6 . 0 0 .
;
run;
each variable sol (sol1-sol3) is equal to 1 if the corresponding var has to be considered in the sum.
any suggestion is very welcome.
Moreover it is very appreciated any advice on alternative structures of have and/or want data
Thank you in advance.
Hello @ciro,
The suggestion below is built around the GRAYCODE function, which generates all subsets of {VAR1, ..., VAR6}. It creates a WANT dataset in a different structure: one observation per solution with binary variables W1, ..., W6 indicating which of the VARi are included in the sum.
data _null_;
if 0 then set have;
array _[*] var:;
call symputx('n',dim(_));
stop;
run;
data want(keep=id target var: w:);
set have;
array v[*] var:;
array w[&n];
k=-1;
do j=1 to 2**&n;
r=graycode(k, of w[*]);
e=1; /* eligibility flag to exclude missing values */
do i=1 to &n while(e);
if w[i]=1 & v[i]=. then e=0;
end;
if e then do;
s=0;
do i=1 to &n;
if w[i]=1 then s+v[i];
end;
if s=target then output;
end;
end;
run;
Edit: If your real data contain non-integer VARi values, make sure to round the sum S appropriately when comparing it to TARGET in order to avoid surprises (due to numeric precision issues) as shown in the log below:
800 data _null_;
801 retain var1-var3 target (33.3 33.4 33.3 100);
802 s=sum(of var:);
803 if s ne target then put 'Surprised?';
804 if round(s, 1e-9)=target then put 'OK!';
805 run;
Surprised?
OK!
That GRAYCODE function is interesting.
Why not use it to generate a score dataset that you could use with PROC SCORE?
%let n=6;
data have;
input id target var1-var&n ;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
data score;
_type_='SCORE';
length _name_ $32 ;
array var[&n] ;
nterms=-1;
do j=1 to 2**&n;
r=graycode(nterms,of var[*]);
_name_=cats('X',of var[*]);
if nterms then output;
end;
keep _type_ _name_ var: ;
run;
proc score data=have score=score out=wide;
var var:;
id id target;
run;
Then you could use PROC TRANSPOSE to get to something similar to the requested output.
proc transpose data=wide name=inputs out=solutions(where=(col1=target));
by id target;
var x: ;
run;
data solutions;
sol+1;
set solutions;
by id target ;
output;
if last.target then sol=0;
drop col1;
run;
Results
@Tom wrote:
That GRAYCODE function is interesting.
Why not use it to generate a score dataset that you could use with PROC SCORE?
That's a great idea which makes the code more efficient by avoiding unnecessary calls of the GRAYCODE function.
@ciro: To exclude the trivial solution W1=W2=...=W6=0 in case of TARGET=0 with my DATA step approach (as Tom's code does already), insert a conditional CONTINUE statement as shown below:
... do j=1 to 2**&n; r=graycode(k, of w[*]); if r=0 then continue; ...
@FreelanceReinh has already given you the GRAYCODE() solution.
But I would suggest to use a more sophisticated solution SAS/OR and calling @RobPratt
The following code is for your first record 's solution . and I found 3 solutions:
And I believe @RobPratt could give you better code , the condition is you have SAS/OR software.
data have;
input v $ Column1;
cards;
var1 -2000
var2 10000
var3 3000
var4 7000
var5 12000
var6 30
;
proc optmodel;
set idx ;
str var{idx};
num x{idx};
var v{idx} binary;
read data have into idx=[_n_] var=v x=Column1 ;
con con1:sum{i in idx} v[i]*x[i]=10000 ;
solve with clp/findallsolns;
create data want from [solution index]={s in 1.._NSOL_,i in idx} vname=var[i] value=x[i] flag=v[i].sol[s];
quit;
proc print;run;
Extending the CLP approach by @Ksharp, here's a way to do it with a groupBy parameter in the runOptmodel action in SAS Optimization:
%let numVars = 6;
data mycas.have;
input id target var1-var&numVars;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
proc cas;
source pgm;
num target;
set VARS init 1..&numVars;
num a {VARS};
read data have into target {j in VARS} <a[j]=col('var'||j)>;
VARS = {j in VARS: a[j] ne .};
var Select {VARS} binary;
con SumToTarget:
sum {j in VARS} a[j] * Select[j] = target;
solve with clp / findallsolns;
create data want(drop=j) from [solution j]={s in 1.._NSOL_, j in VARS} var=('var'||j) value=a[j] sol=Select[j].sol[s];
endsource;
action optimization.runOptmodel / code=pgm groupBy='id';
quit;
proc print data=mycas.want;
run;
Thank you very much everyone for all this ideas and code. Once again, this list proves to be an invaluable tool for support and for learning new things about SAS.
@RobPratt as I don't have Sas Viya can you suggest the code only with SAS/OR?
Thank you very much in advance
As requested, here's an alternative approach that uses a COFOR loop in PROC OPTMODEL and works in both SAS/OR (SAS 9) and SAS Optimization (SAS Viya):
proc optmodel printlevel=0;
set IDS;
num target {IDS};
set VARS_ALL = 1..&numVars;
set VARS {IDS} init VARS_ALL;
num a {id in IDS, VARS[id]};
read data have into IDS=[id] target {j in VARS_ALL} <a[id,j]=col('var'||j)>;
for {id in IDS} VARS[id] = {j in VARS_ALL: a[id,j] ne .};
num id_this;
var Select {VARS[id_this]} binary;
con SumToTarget:
sum {j in VARS[id_this]} a[id_this,j] * Select[j] = target[id_this];
set SOLS {IDS};
num SelectSol {id in IDS, SOLS[id], VARS[id]};
cofor {id in IDS} do;
put id=;
id_this = id;
solve with clp / findallsolns;
SOLS[id_this] = 1.._NSOL_;
for {s in SOLS[id_this], j in VARS[id_this]} SelectSol[id_this,s,j] = Select[j].sol[s];
end;
create data want(drop=j) from [id solution j]={id in IDS, s in SOLS[id], j in VARS[id]} var=('var'||j) value=a[id,j] sol=SelectSol;
quit;
When I am running your code, and get these error info .
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 %let numVars = 6; 70 71 data have; 72 input id target var1-var&numVars; 73 cards; NOTE: 数据集 WORK.HAVE 有 2 个观测和 8 个变量。 NOTE: “DATA 语句”所用时间(总处理时间): 实际时间 0.00 秒 用户 CPU 时间 0.00 秒 系统 CPU 时间 0.00 秒 内存 672.21k OS 内存 19876.00k 时间戳 2025-09-10 上午07:23:33 Step Count 24 Switch Count 2 页错误数 0 页回收数 139 页交换数 0 主动上下文切换数 10 被动上下文切换数 0 块输入操作数 0 块输出操作数 264 76 ; 77 78 proc optmodel printlevel=0; 79 set IDS; 80 num target {IDS}; 81 set VARS_ALL = 1..&numVars; 82 set VARS {IDS}; 83 num a {id in IDS, VARS[id]}; 84 read data have into IDS=[id] target {j in VARS_ALL} <a[id,j]=col('var'||j)>; ERROR: 数组元素“VARS[1]”在第 83 行第 22 列没有值。 NOTE: 从数据集 WORK.HAVE. 读取了 1 个观测 85 for {id in IDS} VARS[id] = {j in VARS_ALL: a[id,j] ne .}; ERROR: 数组元素“VARS[1]”在第 83 行第 22 列没有值。 86 87 num id_this; 88 var Select {VARS[id_this]} binary; 89 con SumToTarget: 90 sum {j in VARS[id_this]} a[id_this,j] * Select[j] = target[id_this]; 91 92 set SOLS {IDS}; 93 num SelectSol {id in IDS, SOLS[id], VARS[id]}; 94 cofor {id in IDS} do; 95 put id=; 96 id_this = id; 97 solve with clp / findallsolns; 98 SOLS[id_this] = 1.._NSOL_; 99 for {s in SOLS[id_this], j in VARS[id_this]} SelectSol[id_this,s,j] = Select[j].sol[s]; 100 end; NOTE: The COFOR statement is executing in single-machine mode. id=1 NOTE: 问题生成将使用 2 个线程。 NOTE: 先前的错误可能会导致问题无法正确解决。 ERROR: 数组元素“VARS[1]”在第 88 行第 16 列没有值。 NOTE: 由于先前的错误,无法创建问题实例。 101 102 create data want(drop=j) from [id solution j]={id in IDS, s in SOLS[id], j in VARS[id]} var=('var'||j) value=a[id,j] 102 ! sol=SelectSol; ERROR: 数组元素“SOLS[1]”在第 102 行第 67 列没有值。 103 quit; NOTE: 由于出错,SAS 系统停止处理该步。 NOTE: “PROCEDURE OPTMODEL”所用时间(总处理时间): 实际时间 0.01 秒 用户 CPU 时间 0.01 秒 系统 CPU 时间 0.00 秒 内存 1077.60k OS 内存 20648.00k 时间戳 2025-09-10 上午07:23:33 Step Count 25 Switch Count 6 页错误数 4 页回收数 501 页交换数 0 主动上下文切换数 48 被动上下文切换数 0 块输入操作数 928 块输出操作数 168 104 105 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 115
@Ksharp Sorry, I had oversimplfiied the code, which I have now corrected.
The corrected line is:
set VARS {IDS} init VARS_ALL;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.