Dear all,
I have a dataset like this:
data have;
input id target var1 var2 var3 var4 var5 var6;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
run;
for each id I want to find all possible combinations of variables (var1-var6) whose sum among them is equal to target. Missing values do not have to enter in any combination.
The result I am looking for is something like:
data want;
input id var $ value sol1 sol2 sol3 ;
cards;
1 var1 -2000 0 0 1
1 var2 10000 1 0 0
1 var3 3000 0 1 0
1 var4 7000 0 1 0
1 var5 12000 0 0 1
1 var6 30 0 0 0
2 var1 5000 1 0 .
2 var2 1500 1 0 .
2 var3 500 1 1 .
2 var4 6500 0 1 .
2 var5 750 0 0 .
2 var6 . 0 0 .
;
run;
each variable sol (sol1-sol3) is equal to 1 if the corresponding var has to be considered in the sum.
any suggestion is very welcome.
Moreover it is very appreciated any advice on alternative structures of have and/or want data
Thank you in advance.
Hello @ciro,
The suggestion below is built around the GRAYCODE function, which generates all subsets of {VAR1, ..., VAR6}. It creates a WANT dataset in a different structure: one observation per solution with binary variables W1, ..., W6 indicating which of the VARi are included in the sum.
data _null_;
if 0 then set have;
array _[*] var:;
call symputx('n',dim(_));
stop;
run;
data want(keep=id target var: w:);
set have;
array v[*] var:;
array w[&n];
k=-1;
do j=1 to 2**&n;
r=graycode(k, of w[*]);
e=1; /* eligibility flag to exclude missing values */
do i=1 to &n while(e);
if w[i]=1 & v[i]=. then e=0;
end;
if e then do;
s=0;
do i=1 to &n;
if w[i]=1 then s+v[i];
end;
if s=target then output;
end;
end;
run;
Edit: If your real data contain non-integer VARi values, make sure to round the sum S appropriately when comparing it to TARGET in order to avoid surprises (due to numeric precision issues) as shown in the log below:
800 data _null_;
801 retain var1-var3 target (33.3 33.4 33.3 100);
802 s=sum(of var:);
803 if s ne target then put 'Surprised?';
804 if round(s, 1e-9)=target then put 'OK!';
805 run;
Surprised?
OK!
That GRAYCODE function is interesting.
Why not use it to generate a score dataset that you could use with PROC SCORE?
%let n=6;
data have;
input id target var1-var&n ;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
data score;
_type_='SCORE';
length _name_ $32 ;
array var[&n] ;
nterms=-1;
do j=1 to 2**&n;
r=graycode(nterms,of var[*]);
_name_=cats('X',of var[*]);
if nterms then output;
end;
keep _type_ _name_ var: ;
run;
proc score data=have score=score out=wide;
var var:;
id id target;
run;
Then you could use PROC TRANSPOSE to get to something similar to the requested output.
proc transpose data=wide name=inputs out=solutions(where=(col1=target));
by id target;
var x: ;
run;
data solutions;
sol+1;
set solutions;
by id target ;
output;
if last.target then sol=0;
drop col1;
run;
Results
@Tom wrote:
That GRAYCODE function is interesting.
Why not use it to generate a score dataset that you could use with PROC SCORE?
That's a great idea which makes the code more efficient by avoiding unnecessary calls of the GRAYCODE function.
@ciro: To exclude the trivial solution W1=W2=...=W6=0 in case of TARGET=0 with my DATA step approach (as Tom's code does already), insert a conditional CONTINUE statement as shown below:
... do j=1 to 2**&n; r=graycode(k, of w[*]); if r=0 then continue; ...
@FreelanceReinh has already given you the GRAYCODE() solution.
But I would suggest to use a more sophisticated solution SAS/OR and calling @RobPratt
The following code is for your first record 's solution . and I found 3 solutions:
And I believe @RobPratt could give you better code , the condition is you have SAS/OR software.
data have;
input v $ Column1;
cards;
var1 -2000
var2 10000
var3 3000
var4 7000
var5 12000
var6 30
;
proc optmodel;
set idx ;
str var{idx};
num x{idx};
var v{idx} binary;
read data have into idx=[_n_] var=v x=Column1 ;
con con1:sum{i in idx} v[i]*x[i]=10000 ;
solve with clp/findallsolns;
create data want from [solution index]={s in 1.._NSOL_,i in idx} vname=var[i] value=x[i] flag=v[i].sol[s];
quit;
proc print;run;
Extending the CLP approach by @Ksharp, here's a way to do it with a groupBy parameter in the runOptmodel action in SAS Optimization:
%let numVars = 6;
data mycas.have;
input id target var1-var&numVars;
cards;
1 10000 -2000 10000 3000 7000 12000 30
2 7000 5000 1500 500 6500 750 .
;
proc cas;
source pgm;
num target;
set VARS init 1..&numVars;
num a {VARS};
read data have into target {j in VARS} <a[j]=col('var'||j)>;
VARS = {j in VARS: a[j] ne .};
var Select {VARS} binary;
con SumToTarget:
sum {j in VARS} a[j] * Select[j] = target;
solve with clp / findallsolns;
create data want(drop=j) from [solution j]={s in 1.._NSOL_, j in VARS} var=('var'||j) value=a[j] sol=Select[j].sol[s];
endsource;
action optimization.runOptmodel / code=pgm groupBy='id';
quit;
proc print data=mycas.want;
run;
Thank you very much everyone for all this ideas and code. Once again, this list proves to be an invaluable tool for support and for learning new things about SAS.
@RobPratt as I don't have Sas Viya can you suggest the code only with SAS/OR?
Thank you very much in advance
As requested, here's an alternative approach that uses a COFOR loop in PROC OPTMODEL and works in both SAS/OR (SAS 9) and SAS Optimization (SAS Viya):
proc optmodel printlevel=0;
set IDS;
num target {IDS};
set VARS_ALL = 1..&numVars;
set VARS {IDS};
num a {id in IDS, VARS[id]};
read data have into IDS=[id] target {j in VARS_ALL} <a[id,j]=col('var'||j)>;
for {id in IDS} VARS[id] = {j in VARS_ALL: a[id,j] ne .};
num id_this;
var Select {VARS[id_this]} binary;
con SumToTarget:
sum {j in VARS[id_this]} a[id_this,j] * Select[j] = target[id_this];
set SOLS {IDS};
num SelectSol {id in IDS, SOLS[id], VARS[id]};
cofor {id in IDS} do;
put id=;
id_this = id;
solve with clp / findallsolns;
SOLS[id_this] = 1.._NSOL_;
for {s in SOLS[id_this], j in VARS[id_this]} SelectSol[id_this,s,j] = Select[j].sol[s];
end;
create data want(drop=j) from [id solution j]={id in IDS, s in SOLS[id], j in VARS[id]} var=('var'||j) value=a[id,j] sol=SelectSol;
quit;
When I am running your code, and get these error info .
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 %let numVars = 6; 70 71 data have; 72 input id target var1-var&numVars; 73 cards; NOTE: 数据集 WORK.HAVE 有 2 个观测和 8 个变量。 NOTE: “DATA 语句”所用时间(总处理时间): 实际时间 0.00 秒 用户 CPU 时间 0.00 秒 系统 CPU 时间 0.00 秒 内存 672.21k OS 内存 19876.00k 时间戳 2025-09-10 上午07:23:33 Step Count 24 Switch Count 2 页错误数 0 页回收数 139 页交换数 0 主动上下文切换数 10 被动上下文切换数 0 块输入操作数 0 块输出操作数 264 76 ; 77 78 proc optmodel printlevel=0; 79 set IDS; 80 num target {IDS}; 81 set VARS_ALL = 1..&numVars; 82 set VARS {IDS}; 83 num a {id in IDS, VARS[id]}; 84 read data have into IDS=[id] target {j in VARS_ALL} <a[id,j]=col('var'||j)>; ERROR: 数组元素“VARS[1]”在第 83 行第 22 列没有值。 NOTE: 从数据集 WORK.HAVE. 读取了 1 个观测 85 for {id in IDS} VARS[id] = {j in VARS_ALL: a[id,j] ne .}; ERROR: 数组元素“VARS[1]”在第 83 行第 22 列没有值。 86 87 num id_this; 88 var Select {VARS[id_this]} binary; 89 con SumToTarget: 90 sum {j in VARS[id_this]} a[id_this,j] * Select[j] = target[id_this]; 91 92 set SOLS {IDS}; 93 num SelectSol {id in IDS, SOLS[id], VARS[id]}; 94 cofor {id in IDS} do; 95 put id=; 96 id_this = id; 97 solve with clp / findallsolns; 98 SOLS[id_this] = 1.._NSOL_; 99 for {s in SOLS[id_this], j in VARS[id_this]} SelectSol[id_this,s,j] = Select[j].sol[s]; 100 end; NOTE: The COFOR statement is executing in single-machine mode. id=1 NOTE: 问题生成将使用 2 个线程。 NOTE: 先前的错误可能会导致问题无法正确解决。 ERROR: 数组元素“VARS[1]”在第 88 行第 16 列没有值。 NOTE: 由于先前的错误,无法创建问题实例。 101 102 create data want(drop=j) from [id solution j]={id in IDS, s in SOLS[id], j in VARS[id]} var=('var'||j) value=a[id,j] 102 ! sol=SelectSol; ERROR: 数组元素“SOLS[1]”在第 102 行第 67 列没有值。 103 quit; NOTE: 由于出错,SAS 系统停止处理该步。 NOTE: “PROCEDURE OPTMODEL”所用时间(总处理时间): 实际时间 0.01 秒 用户 CPU 时间 0.01 秒 系统 CPU 时间 0.00 秒 内存 1077.60k OS 内存 20648.00k 时间戳 2025-09-10 上午07:23:33 Step Count 25 Switch Count 6 页错误数 4 页回收数 501 页交换数 0 主动上下文切换数 48 被动上下文切换数 0 块输入操作数 928 块输出操作数 168 104 105 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 115
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.