- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi There,
I have a list of data in the attached file under column A, and would like to find the combinations that equal the given sum in cell C2.
Here is a smaller example for explanation:
Data:
Data |
1 |
2 |
3 |
4 |
5 |
Sum Amount: 10
Desired Output:
Data | Combination 1 | Combination 2 | Combination 3 |
1 | x | x | |
2 | x | x | |
3 | x | x | |
4 | x | x | |
5 | x | x |
Is this possible to program on SAS? If only one combination can be found, that will be good too.
Help will be much appreciated.
Regards,
GEZK
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you have big table, I suggese to use SAS/OR, @RobPratt is there .
Data have;
input x;
cards;
1
2
3
4
5
;
run;
proc sql noprint;
select count(*) into : n separated by ' ' from have;
select x into : values separated by ' ' from have;
quit;
data temp;
array x{&n};
array v{&n} _temporary_ (&values);
k=-1;
do i=1 to 2**&n;
rc=graycode(k,of x{*});
sum=0;
do j=1 to &n;
sum+x{j}*v{j};
end;
if sum=10 then output;
end;
drop k i j sum rc;
run;
proc transpose data=have out=have1(drop=_:);
var x;
run;
data w;
set have1 temp(rename=(x1-x&n =col1-col&n ));
run;
proc transpose data=w out=want;
run;
proc print;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ksharp,
Thank you for your response. It is very much appreciated.
I have tested your code, and it is certainly heading towards the right direction, but when i tried it with the smallest data set I have of 70 variables, I am getting the below error with the Graycode function.
ERROR 72-185: The GRAYCODE function call has too many arguments.
Is there a way to overcome this error?
Thank you in advance for your assistance. I look forward to hearing from you.
Kind regards,
GEZK
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yeah. GRAYCODE() has its limit. so try SAS/OR , RobPratt's code .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is an IML approach.
data have;
input a @@;
datalines;
1 2 3 4 5
;
proc iml;
use have;
read all var {a};
close have;
comb=t(a);
do i=1 to nrow(a);
idx = allcomb(nrow(a), i);
do j=1 to nrow(idx);
sum=sum(a[idx[j, ]]);
if sum=10 then do;
idx10=j(1, nrow(a), 0);
idx10[idx[j, ]]=1;
comb=comb // idx10;
end;
end;
end;
varNames = "a1":"a5";
create want from comb[colname=varNames];
append from comb;
close want;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi draycut,
Thank you for your response. It is greatly appreciated.
I have tested your code, and it is doing what we want it to do; however, when I tested it with our smallest data set of 70 variables, I am getting the below error message.
ERROR: (execution) Unable to allocate sufficient memory. At least 6293567280 more bytes required.
This may not be possible for larger data sets, which is unfortunate, as it does achieve what we are after in concept.
Kind regards,
GEZK
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Do you want to do this for each variable in the data set or?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can do something like this and do the calculations one variable at the time with CALL EXECUTE logic. This gives you as many datasets with combinations as there are variables in your initial dataset. This will reduce memory consumption considerably. However, I do agree with KSharp that there is probably a much more efficient approach to be found in the OR community
data have;
input a b c;
datalines;
1 6 2
2 7 3
3 8 4
4 9 5
5 1 6
;
data _null_;
length string $1000;
set have(obs=1);
array vars{*} _NUMERIC_;
do i=1 to dim(vars);
varname=vname(vars[i]);
string=compbl(cats(
"
proc iml;
use have;
read all var {", varname, "};
close have;
comb=t(", varname, ");
do i=1 to nrow(", varname, ");
idx = allcomb(nrow(", varname, "), i);
do j=1 to nrow(idx);
sum=sum(", varname, "[idx[j, ]]);
if sum=10 then do;
idx10=j(1, nrow(", varname, "), 0);
idx10[idx[j, ]]=1;
comb=comb // idx10;
end;
end;
end;
varNames = '", varname, "1':'", varname, "5';
create want from comb[colname=varNames];
append from comb;
close want;
quit;
proc datasets library=work nolist;
change want=want_", varname, ";
run;quit;
"
));
output;
call execute(string);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a solution that may work in real life, even with 70 observations:
%let sum_wanted=15;
data have; /* just som test data */
do i=1 to 20;
output;
end;
run;
data _null_; /* get number of observations */
call symputx('nobs',nobs);
stop;
set have nobs=nobs;
run;
proc sort data=have;
by i;
run;
/* a recursive macro to search combinations of length up to &depth */
%macro search(iter,depth);
%local i prev_i prev_sum;
%if &iter=1 %then %do;
%let prev_i=0;
%let prev_sum=0;
%end;
%else %do;
%let prev_i=_I%eval(&iter-1);
%let prev_sum=_sum%eval(&iter-1);
%end;
do _I&iter=&prev_i+1 to &nobs;
_sum&iter=&prev_sum+temp(_I&iter);
if _sum&iter>&sum_wanted then
leave;
if _sum&iter=&sum_wanted then do;
comb_no+1;
%do i=1 %to &iter;
set have point=_i&i;
output;
%end;
end;
%if &iter<&depth %then %do;
else do;
%search(%eval(&iter+1),&depth);
end;
%end;
_sum&iter=&prev_sum-temp(_I&iter);
end;
%mend;
options mprint;
data want;
array temp(&nobs) 8 _temporary_;
do _N_=1 to &nobs;
set have;
temp(_N_)=i;
end;
%search(1,8);
keep i comb_no; /* you may want to add some key variables here */
run;
The first idea is to sort the data in ascending order, so that you can leave the loop as soon as the values get to large for a specific combination.
The second idea is to place the values in a temporary array, so that you can go through them fast.
The third is to write a recursive macro, which starts out by looking at "combinations" of length 1, then, for each of those, looking at combinations of length 2, etc. - up to the value &depth.
Output is "long", you get a row for each combination and value, you may have to use sort and transpose to get the output you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your .xlsx file contains repeated values, and that would yield a lot of essentially duplicate solutions. So you should first count the frequency of each value.
Also, the number of solutions will be huge, so better to have one row per solution rather than one column per solution. The following code generates the first one million solutions. If you instead use the commented out SOLVE statement, you can get them all.
proc freq data=have noprint;
tables a / out=freqout;
run;
%let sum = 199080736;
proc optmodel;
set ELEMENTS;
var X {ELEMENTS} >= 0 integer;
read data freqout into ELEMENTS=[a] X.ub=count;
con sum {j in ELEMENTS} j * X[j] = ∑
solve with clp / maxsolns=1000000;
/* solve with clp / findallsolns;*/
create data want(drop=s) from [s]=(1.._NSOL_) {j in ELEMENTS} <col('X'||j)=X[j].sol[s]>;
quit;
The interpretation of column Xj is that value j appears that many times in the solution for that row.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I can show you a DATA step approach that requires transposed data:
data have;
v1=1;
v2=2;
v3=3;
v4=4;
v5=5;
run;
However, if you have 70 data values, that means there are 2**70 combinations to search. I'm not sure you would live long enough for a computer to process that many combinations. But if you are interested, I can program it for 5 variables.